Install our extension to search inside any video instantly.

ELBO - Why Maximizing One Bound Solves Two Problems at Once
Added: 2026-05-27

146 views244:14datamlisticOriginal Release: 2026-05-27

The Evidence Lower Bound (ELBO) is a computable lower bound on the log evidence (log p(x)) that splits into two components: the expected log-likelihood term and the KL divergence between the surrogate distribution q(z) and the true posterior. Since KL divergence is always non-negative, maximizing the ELBO simultaneously tightens the lower bound on log p(x) and drives q(z) toward the true posterior, solving both model fitting and inference approximation problems with a single optimization objective.

[00:00:00]There is one quantity that nearly every probabilistic model wants, the log of P of X, the probability the model assigns to the data we actually observed. It is also called the log evidence. The trouble is, in any interesting model, the data X lives next to a latent variable Z that we do not get to see. To get the probability of X alone, we have to integrate Z out, and that integral runs over a space that can have hundreds or thousands of dimensions. There is no chance we can just sit down and compute it. So, instead of fighting the integral head on, we take a detour.

[00:00:41]Buried inside that integral is the true posterior, P of Z given X. It tells us which latents could plausibly have produced our data point. And it is just as hopeless as the evidence itself, often shaped in a strange bumpy way that nothing in our toolbox can match. So, here is the trick.

[00:01:02]We pick a much friendlier distribution from a family we do understand, like a Gaussian, and call it Q of Z.

[00:01:10]We give it some parameters, phi, that we are free to tune. The plan is simple.

[00:01:15]Let Q play the role of the true posterior and ask how good a standing we can make it.

[00:01:23]Now, watch what happens when we write the evidence in terms of Q. After a small bit of algebra, the log evidence splits exactly into two pieces. The first piece is called the elbow, the evidence lower bound. The second piece is the KL divergence between our chosen Q and the true posterior. So, log P of X is the elbow plus the KL gap. The total height of the bar is fixed by the data.

[00:01:50]The KL sits on top, the elbow sits underneath, and together they make up the whole.

[00:01:58]And here is the part that makes everything click. KL divergence is never negative. It is zero only when Q is exactly the true posterior and positive in every other case.

[00:02:11]That single fact has a powerful consequence. The elbow is always less than or equal to log P of X. We have a real computable lower bound on the thing we could not compute. Now, look at what happens when we push Q closer to the true posterior. The KL gap shrinks, the elbow rises, and the bound becomes tighter. Maximizing the elbow over our choice of Q is exactly the same as squeezing the KL gap to zero.

[00:02:43]The elbow has a second face that is even more useful. After one more rearrangement, it becomes the expectation under Q of log P of X given Z minus the KL divergence between Q of Z and the prior P of Z.

[00:03:00]You can read the two terms as a contract. The first term wants Z to be a good explanation of X.

[00:03:06]Sample a latent from Q, run it through the decoder, and the data should come out likely.

[00:03:14]The second term keeps Q honest by pulling it toward the prior, so it cannot just memorize the data. This is exactly the loss that trains a variational autoencoder. It is also the bound that the EM algorithm climbs every iteration. Same equation, very different machines.

[00:03:35]So, when we maximize the elbow, we are not solving one problem, we are solving two at once. Tightening the bound pushes log P of X up, fitting the model to the data. Tightening the bound also drags Q closer to the true posterior, learning a usable approximation to inference.

[00:03:55]Two birds, one stone, and the stone is just an integral we never had to compute. And that is basically it.

[00:04:03]If you found this helpful, hit that like button, subscribe for more, and drop a comment if there is a topic you want to see next. See you in the next one.

[00:04:12]Bye-bye.

#bayesian inference #deep learning #elbo #em algorithm #evidence lower bound

Related Videos

Artificial Intelligence

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Artificial Intelligence

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Artificial Intelligence

5 Mind Blowing Omni Uses Cases

PaulJLipsky

1K views•2026-06-02

Artificial Intelligence

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Artificial Intelligence

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Artificial Intelligence

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Artificial Intelligence

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Artificial Intelligence

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Trending

Revisiting The Cat Cafe For The Final Time

BenGtalks

3195K views•2026-05-29

Lil bro is a menace 🤣

NotAirJordan

2037K views•2026-05-31

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

Political Science

My response to the Police

RecklessBen

1496K views•2026-06-01