The Evidence Lower Bound (ELBO) is a computable lower bound on the log evidence (log p(x)) that splits into two components: the expected log-likelihood term and the KL divergence between the surrogate distribution q(z) and the true posterior. Since KL divergence is always non-negative, maximizing the ELBO simultaneously tightens the lower bound on log p(x) and drives q(z) toward the true posterior, solving both model fitting and inference approximation problems with a single optimization objective.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
ELBO - Why Maximizing One Bound Solves Two Problems at OnceAdded:
There is one quantity that nearly every probabilistic model wants, the log of P of X, the probability the model assigns to the data we actually observed. It is also called the log evidence. The trouble is, in any interesting model, the data X lives next to a latent variable Z that we do not get to see. To get the probability of X alone, we have to integrate Z out, and that integral runs over a space that can have hundreds or thousands of dimensions. There is no chance we can just sit down and compute it. So, instead of fighting the integral head on, we take a detour.
Buried inside that integral is the true posterior, P of Z given X. It tells us which latents could plausibly have produced our data point. And it is just as hopeless as the evidence itself, often shaped in a strange bumpy way that nothing in our toolbox can match. So, here is the trick.
We pick a much friendlier distribution from a family we do understand, like a Gaussian, and call it Q of Z.
We give it some parameters, phi, that we are free to tune. The plan is simple.
Let Q play the role of the true posterior and ask how good a standing we can make it.
Now, watch what happens when we write the evidence in terms of Q. After a small bit of algebra, the log evidence splits exactly into two pieces. The first piece is called the elbow, the evidence lower bound. The second piece is the KL divergence between our chosen Q and the true posterior. So, log P of X is the elbow plus the KL gap. The total height of the bar is fixed by the data.
The KL sits on top, the elbow sits underneath, and together they make up the whole.
And here is the part that makes everything click. KL divergence is never negative. It is zero only when Q is exactly the true posterior and positive in every other case.
That single fact has a powerful consequence. The elbow is always less than or equal to log P of X. We have a real computable lower bound on the thing we could not compute. Now, look at what happens when we push Q closer to the true posterior. The KL gap shrinks, the elbow rises, and the bound becomes tighter. Maximizing the elbow over our choice of Q is exactly the same as squeezing the KL gap to zero.
The elbow has a second face that is even more useful. After one more rearrangement, it becomes the expectation under Q of log P of X given Z minus the KL divergence between Q of Z and the prior P of Z.
You can read the two terms as a contract. The first term wants Z to be a good explanation of X.
Sample a latent from Q, run it through the decoder, and the data should come out likely.
The second term keeps Q honest by pulling it toward the prior, so it cannot just memorize the data. This is exactly the loss that trains a variational autoencoder. It is also the bound that the EM algorithm climbs every iteration. Same equation, very different machines.
So, when we maximize the elbow, we are not solving one problem, we are solving two at once. Tightening the bound pushes log P of X up, fitting the model to the data. Tightening the bound also drags Q closer to the true posterior, learning a usable approximation to inference.
Two birds, one stone, and the stone is just an integral we never had to compute. And that is basically it.
If you found this helpful, hit that like button, subscribe for more, and drop a comment if there is a topic you want to see next. See you in the next one.
Bye-bye.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











