Install our extension to search inside any video instantly.

Solving Large MDPs Exactly: A Tensor-Based MDP Solver
Added: 2026-05-07

151 views127:04sudhirpratapsingh8349Original Release: 2026-05-04

This solver elegantly bypasses the curse of dimensionality by using tensor algebra to achieve massive efficiency gains without sacrificing precision. It is a sophisticated yet highly practical breakthrough for managing complex, large-scale infrastructure systems.

[00:00:01]Large language models like Gemini, Chad, GPT, self-driving cars, Netflix, Mars, rovers, all of these at their core rely on a framework called mark of decision process or MDP. It is one of the most fundamental tools in sequential decision making under uncertainty.

[00:00:21]An agent observes the state of the system, takes an action, pays a cost or earns a reward, and the system transitions to a new state. And the goal is to find the policy or the rule that tells you what to do in every possible situation and gives you the best outcome over time. Hi, I'm Sudhir, PhD candidate at Pennsylvania State University. My research focuses on life cycle optimization, stochastic optimal control of infrastructure systems like buildings, bridges, transportation networks and MDPs are the natural framework for the control problem. But as we scale these model to real world scenario, we hit a massive mathematical wall which is the curse of dimensionality.

[00:01:11]Now to understand this curse of dimensionality let's take a simple example where we have one component like a building or a bridge and we represent its condition state using four discrete states from perfect to severely deteriorated and we have four possible maintenance actions from do nothing to replace and for this one component it's easy the transition matrix is a 4x4 uh total 16 entries and this is totally manageable.

[00:01:42]So for a single component the state space is 4 to1 that is four states for two component it grows to 4² that is 16 states and similarly for three component it is 4 cube that is 64 and so on four raised to power 4 256 states. So for a 10 component system the system state grows up to 1 million states. So the transition matrix in this particular case is 1 million by 1 million and this is over a trillion entries in the matrix and we can't even store this matrix in memory let alone multiply and solve the problem and this is just 10 components. a real infrastructure system is even bigger. That's what we call the curse of dimensionality where the state space grows exponentially and the standard MDP solvers they break down.

[00:02:46]So here's the question is there a way to solve these large MDPs exactly not approximately?

[00:02:54]It turns out yes we have addressed this problem in our latest research where we introduce a novel tensorbased MTP solver. To understand this well let's see mathematically what the bottleneck is and how our solver avoids it.

[00:03:10]So in a typical Bellman optimality update, this expectation step here is computationally challenging and this is because the state space as well as the action space grows exponentially with the number of components in matrix form. This can be represented by the product of PA * V. PA is the transition probability matrix and V is the value function vector.

[00:03:40]The key insight here is that in many infrastructure systems, the physical evolution of one of the components is conditionally independent of the other components. And this assumption is not overly restrictive as this can be maintained on common hyperparameters within a dynamic basian network.

[00:04:02]Then we can utilize the chronical product representation for the transition matrix. This representation avoids the explicit construction and storage of this enormous system transition matrix. Instead only the smaller component level matrices are needed. So with this instead of storing one big matrix with trillion entries, we now store 10 smaller matrices with 16 entries each. That's it. So instead of 10 to 12 entries, now you have 10 * 16 160 entries in total. And this is only half of the work. We still need to compute this massive matrix vector product.

[00:04:41]And here's a trick. We use the mathematical tensor algebra, specifically the mixed product property.

[00:04:49]So in the first step, we reshape the value function vector into a multi-dimensional tensor.

[00:04:56]In the second step, we perform these sequential modewise multiplication along each dimension of the tensor with each component.

[00:05:07]And each of these multiplication only involves a 4x4 matrix. So 10 smaller operations instead of a one impossible one.

[00:05:17]And vectorzing this final tensor gives you the exact same result as the original product. no approximation whatsoever.

[00:05:26]So how much does this actually help?

[00:05:30]Let's look at the numbers. For a system with 10 components, the memory required by the standard solver is 8.8 tab. Our solver does it in 8.4 mgabytes. That's a million times reduction. And the computational cost, our method is already 10,000 times faster at 10 components. Scale it up to 15 components and the gap becomes truly staggering. A billion times reduction in memory and 10 million times reduction in operations.

[00:06:00]And this gap keeps widening with each component you add. Now to be fair, our method does not eliminate the curse of dimensionality. No exact method can. And there are other powerful techniques like approximate dynamic programming, deep reinforcement learning which tackle this problem from a different angle by trading accuracy for scalability.

[00:06:22]Our framework supports those as well.

[00:06:24]But when your problem demands the exact optimal policy, our solver pushes that boundary significantly further than what was previously possible.

[00:06:34]The preprint of the paper is available on archive and the solver code both in MATLAB and Python is open source on GitHub. Links are given in the post below.

[00:06:47]So if you're working on infrastructure management, stochastic control, reliability or reinforcement learning, I would love to hear your thoughts. And if you've been stuck on a problem where the state space is too large for standard solvers, give this a try. Thank you.

Related Videos

Artificial Intelligence

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Artificial Intelligence

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Artificial Intelligence

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Artificial Intelligence

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Artificial Intelligence

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Artificial Intelligence

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Artificial Intelligence

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Artificial Intelligence

3D Platformer Update - NO CAPES

SolarLune

294 views•2026-05-30

Trending

Computer Science

The Meta AI Hack Is a DISASTER

LowLevelTV

141K views•2026-06-03

Paris is in SHAMBLES right now 😭

H1T1

4053K views•2026-05-31

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30