A compelling synthesis of scaling laws that illustrates how massive computation transcends mere pattern matching to achieve genuine semantic depth. It serves as a sobering reminder that in AI development, raw scale often triumphs over human-designed ingenuity.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
More than just autocomplete?Added:
In the spring of 1958, in a basement at John's Hopkins, there's a cap strapped into a chair and its eyelids are propped open and there's a wire going directly into the back of its skull into the part of the brain that handles vision. And that wire runs to a speaker. When a brain cell fires, the speakers go off.
And for weeks, they're flashing dots of light at this cat, little black spots on glass slides, and nothing is happening.
And then by chance it happens. One of them goes to swap the glass slide and as he slides it into the projector, the edge of the slide swipes across the screen and the cat's brain explodes. Not literally, but you know, the speaker goes off like a machine gun. The neuron didn't care about some dot. It cared about the edge. And not just any edge.
It had to be like at a very specific angle. Like if you tilt it away, it goes quiet. And if you tilt it back, it lights up again. They found a single brain cell whose entire reason for existing is basically detecting that there is a line and it leans this specific way. What they basically found is that vision isn't like a camera. Your brain doesn't take a picture. Instead, your brain has a layer of cells that each give a damn about one tiny dumb thing, like a line at this angle or an edge moving that way. And they call these things simple cells. And those simple cells feed into complex cells that combine the dumb things into slightly less dumb things. So edges become corners, corners then become shapes, and shapes become a face. They won the Nobel Prize in 1981 for their discoveries in information processing in visual systems. This was the basis of convolutional neural networks which Yan Lun pioneered in 1989. It's basically a bunch of small filters, each one responsible for spotting something very simple in a little patch of an image like an edge. Then when you layer and stack these together, they start being able to detect bigger things like cats and dogs in a photo. Convolutional neural networks turned out to be incredibly good at vision tasks. It's the same tech behind the cameras in self-driving cars, but they were less suited to language. The word convolutional basically refers to sliding across an image and checking one small segment at a time, which means proximity matters. Things that are close together are what get compared. But in language, the things that matter to each other can be very far apart. You can say the word it in a sentence and it could point back to something all the way back at the beginning of the paragraph. And these networks struggled with these kind of long-distance connections. In 2017, a big leap came with the paper attention is all you need, which introduced transformers, which is what the T in GPT stands for. The GP I believe stands for guessing profusely. And the idea is that each word or really each token because words can get chopped into smaller pieces can pay attention to all the other words around it. Think of every word as sending out a little signal that says this is what I'm about while also listening to the signals coming from everyone else. Leaning harder towards the ones that are most relevant to it and barely paying any attention to the ones that aren't. And this all at once way of relating words is essentially what made today's large language models possible. Simple things scaled up. Like the visual cortex, which builds vision out of a stack of simple edge detectors, AI is a story of simple things scaled up. And it's as much a hardware story as it is a software one. Many of the core ideas that power our AI today go back decades. The first artificial neuron models appeared in the 1940s and50s. And the way we train these networks was worked out in the 1980s. And while the architectures behind today's large language models are somewhat newer, what was missing from most of that history wasn't the ideas, it was the raw computing power to bring them to life.
And now we have that. The current AI surge is in large part a story about hardware chips that run thousands of calculations in parallel, reaching trillions of operations every second. In essence, LLMs are simple and naive. So naive you would think they'd never work.
Take a dumb algorithm and just blast it with enough computing power to light up a small city. And out come these seemingly magical results. And at first I thought this was a hack. I thought the fact that we're taking something so simple, so naive, and just scaling it up couldn't be real science. But this is what Richard Sudden called the bitter lesson. The bitter lesson is a short essay that basically said over the long run, trying to be clever about AI tends to lose out to methods that just scale up with raw computing power. You don't need to handcraft human style understanding into a machine. You just have it do ungodly amounts of statistical computation. It's somewhat similar to animation. Think Mickey Mouse. People have known how to draw for thousands of years. But a stack of drawings just sit there. The magic only appears when you have a machine that can show those drawings in rapid succession, flashing about 24 pictures every second, and your brain blends those separate frames together. And suddenly, still drawings seem to move and come to life.
AI is a bit like that. And usually I would think to myself that because this is a projection that because it's all a hack, a trick, that, you know, it's nothing to get your panties in a bunch about. And I'm starting to wonder if maybe I was wrong about that. Because in fact, a lot of what the brain does might be simple things scaled up, too. It's the bitter lesson again. The idea that simple methods plus massive computation tend to win. It's doing small things millions and billions of times over, shaped across hundreds of millions of years in a lifetime of learning and reinforcement. Are they exactly the same? Are LLMs and human brains exactly the same? Obviously not. But in a sense, the human brain fits a kind of curve to the real world, approximating whatever helped our ancestors and us survive. And it turns out machines can fit a curve, too. But isn't it just autocomplete, though? I'm honestly not so sure anymore. The naive assumption is that during training, the model simply predicts the next word based on how often words appear in the training data.
But that's not quite right. The model isn't scored on raw frequency, but on whether it predicts the right next word for this particular context. So, let's say you show the model a paragraph and ask it to fill in the word after the word 'the'. In the data, maybe the single most common word afterthe' is something like man. But if the actual answer in this paragraph is something else, then guessing man counts as a mistake. And when the model guesses wrong, it gets penalized. And that penalty is used to nudge its internal settings just a little so that next time when a similar context comes up, it's more likely to get it right. Now, repeat that across millions and millions of examples. With each wrong guess, the model adjusts and slowly it settles into a kind of fit, an incredibly complex one. And the only way to actually win this game, to keep guessing the right next word, is to stop leaning on raw frequency and start picking up on what the previous sentences and paragraphs actually mean. So, it ends up capturing a sort of essence. This alludes to what's essentially called a world model.
There's a question of whether LLMs have a world model. whether they have fitted a curve not to the training data itself but to the process that produced the data. So essentially one level prior which is big if true and there's argument on both sides here. It's remarkable how well a simple thing skilled up works in the world of computers. And it's also remarkable how we find the same thing in nature. And I wonder I sort of let my science fiction mind run away with a problem sometimes and I neglect all present reality and get mesmerized by the future. But what if there truly is no limit to how well AIs can approximate reality or form a line of best fit that could be even more functionally useful than ours. And I know these are naughty thoughts that have no present basis in reality. Trust me, I'm just as annoyed about this as you are. But sometimes for me, I have to wander the underworld a little to get a clear idea of what I actually believe.
Usually, I hope I come out stronger for it. And at this point in the video, you know, I usually push back and say that, you know, AI can't do love, feeling, quality, sensation, experience. But recently, I've just been haunted by that simple phrase, simple things scaled up.
Is that what everything is? If that's how the eye works, if that's how vision works, why stop there? And to me, this conversation isn't really just idle curiosity or aimless philosophizing.
It's really a question of what it means to be human. See, because first AI came for our jobs and now it's sort of coming for our spirituality. What remaining centrality humans ascribe to themselves to make reality and suffering more tolerable to give meaning to the chaos.
And I'm exploring this now hard and it's kind of buming me out a little. I'm not going to lie. Like we hold out on intelligence and maybe consciousness, right? We say intelligence is special.
It's not a simple thing. scaled up. It's a different category altogether. And maybe that's right. That's what I like to believe. But there's a reason it's called the bitter lesson.
Thanks for watching.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











