The video provides a sobering reality check by explaining why the discrete logic of language models fails against the infinite complexity of video pixels. It clearly illustrates that video generation isn't just a scaling problem, but a fundamental mathematical hurdle.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Why ChatGPT logic fails for videoAdded:
In the most straightforward implementation, we configure our neural network to take in the RGB pixel values from a sequence of video frames and then project the pixel values in the next frame, just as the GPT models are trained to predict the next token in language.
However, when we use these models to predict the next frame, the results are blurry.
And this blurriness compounds dramatically in longer horizon predictions. [music] Large language models are auto regressive. When ChatGPT answers a question, it generates one token at a time. At each step, it's feeding its latest generated token back into its input to create the next output. If we try this auto regressive approach with a next frame video prediction model, the results quickly devolve into blurry nothingness.
Now, the blurry frames produced by our generative video prediction approach are not some huge mystery.
Language is complex and unpredictable, but it's nothing compared to video.
Language models use fixed-size vocabularies.
GPT-2 has 50,257 discrete outputs, one for each token that the model could say next.
This complete enumeration approach is hopeless in video.
For full HD video in the most general case, each pixel can take on 256 discrete values, and we have 1920 * 1080 * 3 color pixels, meaning there are something like 10 to the power of 15 million possible next video frames, dwarfing the number of atoms in the observable universe.
So, there's no way our video prediction model can have a discrete output for each possible next video frame.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30
AI Doesn't Create Bias — It Inherits It
UXEvolved
176 views•2026-06-01











