The principle of least action in physics, which describes how systems naturally follow optimal paths, can be mathematically transformed into the Hamilton-Jacobi-Bellman framework that underlies reinforcement learning in AI. This framework creates a 'value landscape' where the slope at any point indicates the optimal action to take, enabling intelligent agents to make decisions by following gradients toward better outcomes rather than through brute-force search. The same mathematical structure that describes planetary motion also describes how AI agents should evaluate futures and choose actions, demonstrating that intelligence fundamentally involves defining costs, constraints, and following the resulting gradients to achieve optimal behavior.
Inmersión profunda
Prerrequisito
- No hay datos disponibles.
Próximos pasos
- No hay datos disponibles.
Inmersión profunda
The Incredible Physics of Artificial General IntelligenceAñadido:
Picture a ball rolling down a hill. You let it go, and without hesitation, it finds its way downward, curving around bumps, speeding up, slowing down, ending up lower than where it began.
Watching it, it's hard not to speak as if it's making a choice. We say things like, "It takes the easiest path." or "It naturally finds the minimum."
That sounds almost intelligent.
Of course, the ball is not thinking, but the surprising part is this: Some of the same mathematics we use to describe that rolling ball also helps us describe how an intelligent agent should make decisions. That's the bridge we're going to cross.
And it begins with a very old idea in physics, the principle of least action.
People often hear that phrase and imagine nature as if it were trying to be efficient, as if a planet wants the best orbit, or a beam of light chooses the fastest route. That's the wrong picture. Objects do not have goals. The deeper truth is that the equations of motion can often be rewritten as an optimization problem.
In plain English, instead of describing motion only as "This force causes this acceleration right now." you can also describe it as "Out of all possible overall paths, the real one is the path that best balances certain trade-offs."
To make things feel less abstract, imagine planning a road trip. You could decide every second, "Turn the wheel a little. Press the gas a little. Brake a little." That's the local view.
Or you could think globally. "What route gets me there best, taking into account distance, time, traffic, and fuel?"
Physics often has this second description hiding underneath the first.
The action is like a score for the entire trip.
Every possible path gets a score. The path that actually occurs is the one that makes that score come out just right. Now, action is one of those words that can sound intimidating, but for intuition, you can think of it as a cost for a whole trajectory, a cost for the entire movie of the motion from beginning to end.
Depending on the system, that score reflects tradeoffs involving motion, energy, time, and constraints.
It's not simply shortest distance or least energy in the everyday sense.
A thrown ball, for example, doesn't travel in the shortest possible line or the fastest possible route.
Its arc is the one that fits the laws of motion, and those laws can be captured through this whole path score. This is where the first big surprise lands.
Nature can be described as if it solves a global optimization problem. Again, that does not mean the universe is literally sitting there comparing every possible path like a computer brute-forcing options. It means two different mathematical descriptions turn out to be equivalent. You can describe motion step-by-step with forces or all at once with an optimal path viewpoint.
Same physics, different lens. But there's a limitation here if we care about intelligence.
If I tell you the best path between a known start and a known finish, that's useful, but only for that one trip.
An intelligent agent needs something more flexible. It needs to know what to do from wherever it happens to be. Not just one perfect path, but a general guide. Think of the difference between getting one set of driving directions and having a full navigation app.
One itinerary says, "From your house to the airport, go this way."
A navigation map says, "From any point in the city, here's the best direction toward your destination."
Intelligence needs the map. This is where the Hamilton-Jacobi idea enters, and it's one of the most beautiful upgrades in mathematical physics.
Instead of solving directly for one path, you solve for a kind of landscape spread across all possible situations.
At each point in that landscape, the height tells you something like, "How costly is it to reach the goal from here if you act in the best possible way?"
That landscape is often called a value function, but you don't need the jargon to get the idea.
Imagine a topographic map where altitude does not mean physical height. Instead, it means remaining difficulty or remaining cost. If you are standing somewhere on that map, the best move is the one that goes most steeply downhill in future trouble.
And this is a very important concept.
Optimal behavior can be encoded in the slope of a single global function. Once you have that map, decision-making becomes much simpler to picture. A robot in a maze doesn't need to imagine every future route from scratch each moment.
If it has learned or computed the right cost-to-go map, then each location already carries a hint. From here, this is how hard the rest of the journey will be. The robot can just move in the direction that reduces the remaining cost fastest while still obeying the rules of how it can move. The Hamilton-Jacobi equation is the rule that makes this map internally consistent. You can think of it as the bookkeeping law for optimal futures. It says your estimate of what remains has to match the system's actual dynamics and the cost you pay along the way.
In less technical words, your map of best choices can't be a fantasy. It must agree with how the world actually responds when you act. And this point is important because intelligent behavior is never just about wanting something.
It's always about wanting something in a world with constraints. A car can't teleport or a drone can't turn infinitely sharply.
The map of optimal decisions has to be shaped by those realities. Now listen to how close this starts to sound to modern AI.
In reinforcement learning, one of the main frameworks for training decision-making agents, we also talk about value.
We also ask, "From this state, how good or bad is the future if I behave optimally?" We also try to turn a giant planning problem into something manageable by learning a value landscape. This is why the connection between physics and AI is not just poetic. The language changes, but the structure remains. In physics, action summarizes the cost of an entire path.
In AI, we talk about cumulative reward or cumulative cost over time.
In physics, we look for the best trajectory. In AI, we look for the best policy, meaning a rule for what to do in every situation. In both cases, the heart of the problem is the same.
Evaluate futures, then act accordingly.
A common misconception is that reinforcement learning is basically random trial and error until something works.
That's only a small and often crude version of the story.
A more powerful view is that the system is trying to build a map of consequences.
It wants to know, "If I'm here, what future lies ahead if I make good choices?"
Once it has that map, behavior starts looking purposeful. The key idea underneath this is Bellman's principle, which can be said in ordinary language.
If the rest of your plan is optimal, then the next step must also be optimal.
That sounds almost obvious, but it is incredibly powerful.
It means a huge long-term decision problem can be broken into smaller pieces.
You don't have to solve life all at once. You just need a local rule that stays consistent with the best possible future. This is the step-by-step cousin of the Hamilton-Jacobi view.
Bellman gives the recursive logic. The value of being here depends on the immediate cost, plus the value of where your next action takes you.
Hamilton-Jacobi takes that same idea into smooth continuous settings like motion through space and time.
Put them together and you get the Hamilton-Jacobi-Bellman framework. This idea is simple enough to say out loud. If a system moves continuously and if actions shape that motion and if there is some long-term objective, then there is an equation that links the local slope of the value map to the globally optimal strategy.
Here's the breathtaking part.
The same kind of mathematical structure that helps describe a planet's motion can also describe what a decision-making AI agent should do.
Swap physical law for objective and trajectory through space for strategy through possibilities and the skeleton of the problem is still recognizable.
You can see this pattern in many worlds.
In physics, light bends through materials in a way that can be described as minimizing travel time.
In robotics, a drone may seek a path that balances speed, energy use, and obstacle avoidance. In machine learning, a model is adjusted by nudging its parameters downhill on a loss landscape trying to reduce error.
Different systems, same recurring idea.
Define a cost then move in a direction that lowers it. That doesn't mean all problems are easy. In real life, the landscape can be rough, high dimensional, uncertain, or partly hidden.
But the dream is the same.
Don't search blindly if you can learn the shape of the terrain. This is one reason modern AI increasingly cares about continuous optimization rather than pure brute force search.
Old-fashioned search can be like exploring a giant tree of possibilities one branch at a time.
Sometimes that works, but it can be painfully expensive.
If instead you can learn a value landscape, you can move more fluidly.
You don't need to evaluate every possible future from scratch at every moment. You have a compressed guide to what matters, and that matters especially in real-time systems. A self-driving car cannot pause at every intersection and simulate millions of futures, or a robot balancing on one foot cannot search endlessly before correcting itself.
In these settings, a good value map is the difference between graceful action and computational paralysis. At this point, it is tempting to make a philosophical leap and say, "So, the universe is intelligent, too."
But, that leap is too fast. Optimization is not the same as intention.
A rock falling under gravity is not trying to do anything.
But, there is a deeper and more interesting lesson.
Behavior that looks smart can emerge whenever a system is governed by an objective, shaped by constraints, and steered consistently by the resulting gradients.
That last word, gradients, just means slopes.
Intelligence often looks like following the slope of what gets better, provided you have the right definition of better.
And this is exactly where AI becomes both powerful and dangerous.
In physics, the objective and constraints are given by nature.
In AI, we choose them. We decide what counts as success or what the penalties are. We decide what constraints are non-negotiable.
If we choose badly, an agent may optimize the wrong thing brilliantly.
That is why building reliable agents is not just about making them smarter, but also giving them realistic objectives and hard boundaries.
A robot should not merely reach the target quickly. It must do so without falling over, exceeding torque limits, or injuring someone nearby.
An energy controller should reduce electricity use, but not by letting equipment overheat.
Constraints make optimization honest.
They reduce the chance that a system finds weird loopholes. There is another complication, too. Uncertainty.
Sometimes the world is smooth enough that the best move is almost deterministic.
But often, the agent does not know enough.
The environment may be noisy, partly hidden, or changing.
In those cases, good decision-making includes exploration, caution, and hedging.
Sometimes randomness is a tool for gathering information. A value function helps here, too. It turns uncertainty into something actionable. Instead of asking vaguely, "What should I try?" the AI agent can ask, "Which action best improves my expected future, given what I know and what I don't?"
That is a more disciplined kind of intelligence. There's also a very practical reason all of this matters.
Energy.
Intelligence is not free.
Training AI models consumes electricity.
Planning, searching, and repeatedly recomputing futures all have physical costs.
If better representations of value let systems make decisions with less wasted computation, then the benefit is speed, cost, and energy efficiency. So, what we mean by physics-inspired AI is that some of the same principles used to describe efficient motion may help us build more efficient AI agents.
>> This hints at a future way of seeing AI altogether.
Instead of treating recognition, planning, control, and resource management as separate tasks stitched together awkwardly, we can imagine a more unified picture. An agent as something that continuously chooses actions to minimize long-term cost under constraints.
A home robot, for example, does not just identify a cup. It must reach it safely, avoid bumping into furniture, preserve battery life, account for uncertainty, and adapt if the cup moves. And that brings us to the deepest conceptual landing. The same mathematics can describe wildly different substrates. A planet is not a robot, nor is a neural network a marble rolling down a track.
But all can sometimes be understood through the same pattern. Define possible states, what changes them, and costs and constraints, and then describe behavior as movement through that landscape. And AI decision-making may be less mysterious when seen through the lens of physics. So, here is the final click. Least action turns motion into optimization.
Hamilton-Jacobi turns optimization into a map.
And that map is a landscape of future cost with slopes that point toward better outcomes.
It is one of the clearest blueprints we have for understanding intelligent AI agent.
>> And that wraps it up for today. Please don't forget to cast your vote on what Physics NL premium service I can build that you will find most valuable. The direct link is in the pinned comment and the description. Thanks a lot, and I'll see you in the next video.
Videos Relacionados
Is dark matter real? - Why can't we find it? - physicist explains | Don Lincoln and Lex Fridman
LexClips
1K views•2026-05-30
Saptarshi Basu - Spectacular Voyage of Droplets: A Multiscale Journey to Extreme Flow Conditions
DAlembert-SU-CNRS
152 views•2026-06-02
A 6.0 Just Hit Hawaii — And It Came From The Wrong Place
TerraWatchHQ
115 views•2026-06-03
The Split-Second Mistake That Made Bouncing Bettys So Deadly
NoMansLandChannel
253 views•2026-06-02
Nobody Expected This Lava Reaction 🤯 #faits #facts
TendzDora
28K views•2026-05-30
The Silent Memory of Glass
UnchartedScienceworld
146 views•2026-05-30
The Difference In Charged And Neutral Particles
heavybrainspace
959 views•2026-05-29
A380 vs Every Vehicles Crash Test Challenge | Which One Win?
BeamLap
163 views•2026-05-29











