The Law of Total Expectation states that the unconditional expectation of a random variable equals the sum of its conditional expectations weighted by the probabilities of the conditioning events, expressed as E[X] = Σ E[X|A_i] × P(A_i). This principle allows decomposition of complex expectations into simpler conditional components, which is particularly valuable in quantitative trading for analyzing strategy performance by partitioning trades into regimes (e.g., winning/losing trades, volatility regimes) to identify sources of edge and assess stability over time.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Modeling with the Law of Total ExpectationAdded:
My life is forever changed by the kindness, support, and strength that you guys Quant Guild has given me over the last 48-72 hours. And while I can't see Java this second, and he remains in a stable condition, I am going to continue to be there for you guys and help you master your quantitative skills, my mission.
And I'm going to do so today by talking about the law of total expectation.
I hope to have more information later today about Java and how he's doing and when he's going to be able to come back home. And I will keep you guys updated as soon as I know more information, you will.
So, let's start with this idea of the law of total expectation.
This is one of my favorite ideas. I actually first really understood it in a class on stochastic processes.
And it really changed the way that I think about statistics and how we take outcomes and scale them by probabilities in different regions of a a sample space.
So, let me kind of just draw for you the the diagram that we typically work with.
And that's this idea of just a sample space here.
And we usually call this omega.
Big omega.
And this is just pretty much everything that can happen. If we sum the probability of this entire space, we're going to get one or 100%.
It's pretty much that can happen.
And within this sample space, right, whether we're looking at a dice roll or coin flips or heights or whatever, if it's discrete or continuous, it doesn't really matter, we can always introduce the idea of a partition.
The idea of a partition is that we can perfectly separate this sample space into one of a set of categories.
So, for example, if I'm using gender, right, we can separate this into male and female.
All right, and please don't get mad at me right now for the for this.
Um but the the two genders here can perfectly partition the the sample space and we're able then to use this law of total expectation to our advantage if we know something perfectly partitions the sample space.
How might we do that? Well, we have this idea of an expectation and by definition the expectation is the expected value of, let's say, random variable X and we're going to call random variable X maybe the height.
Okay, of people in a particular population. This could be the world.
This could be a school. This could be a workplace. It doesn't really matter.
This is going to be equal to and we have this idea of just the discrete and continuous expected value by definition. So, in the discrete sense, it's going to be the sum of all outcomes in a particular set.
We'll call that A times the probability that A occurs. So, we're effectively normalizing all of the quantitative values. Well, not really normalizing, I guess you could think of it that way, but we're scaling all of the values by their corresponding probabilities. So, if we have, let's say, a height of 5'1" with a probability of 0.16, then we're going to multiply these two together, and we're going to do that for every single height. And that's what's going to give us the expected value.
Now, of course, it doesn't really work too great there cuz it's more It's really reminds us more of like a continuous random variable, something like height. Something discrete we typically think about like a coin flip or a dice roll.
But, if we're dealing with a continuous random variable, we're really dealing with a probability mass function. So, this is going to be equal to then, instead, the integral, which is just a continuous summation anyway, of X times the probability mass function of random variable X.
Right? Parameterized by X, DX.
Okay? And we're just going to integrate across the entire support chi. There's a lot of X's there. They're not all equivalent. This is going to be the X to parameterize the function and the integrand. This is random variable X in that case, that's the height. And this is going to be the support for the probability density function.
So, this is by definition the expectation. But, it turns out that a lot of the times it's easier to compute the expectation conditionally than it is to just compute it unconditionally. So, this is an unconditional expectation.
If I have just the expected value of height, so, X, remember, is going to be height, and I say that it's like call it 5'5".
>> [snorts] >> We don't know if that's the average height of males, females, we we have no idea, right? It's just a compression of everything in this sample space, right? You can think of each points as a different individual in this population, and they're going to live in their corresponding partition appropriately, and we're effectively just compressing all that information down into this one value here.
Okay.
So, how can we use the law of total expectation to our advantage here? What exactly is it anyway? Well, it turns out that this is going to be equal to the expected value by definition here.
If we use a couple of our axioms from probability and statistics, it turns out that this is going to be equal to the expected value of height given a particular partition.
So, in this case we have a partition of male and female.
So, I'm going to put male here.
And then what I'm going to do is I'm going to take this conditional expectation. This isn't all that difficult to interpret. It's just the average height given that you're looking at males, right? So, instead of looking at this entire population, we're just going to look at males.
Okay.
So, that's one component of this unconditional expectation. It's effectively a decomposition. So, what are we going to do then? Well, we're going to scale it by the probability of observing a male.
And I've always looked at it like, here's the thing that you want in an unconditional way.
And then here's the condition.
We got to find the expected value, then we have to find the corresponding probability for the condition, and this is how we're just gluing together pieces of the overall unconditional expectation.
So now when we add, right? What do we need to add to get the rest of the space? Well, male is male and female partition the entire space, right? That makes up 100% of the sample space.
So we're just going to need to glue the female piece together now. So it's going to be the expected value of X given female.
And then times the probability of being a female.
And this is actually going to equal if we find each of these components individually. Remember how we found the original 5.5? Well, if we glue all these pieces together, we're going to get that same 5.5.
All right?
That's this idea of the law of total expectation.
Now, in a frequentist sense, this is a very important idea here.
In a frequentist sense, right? If we continue to sample from the same population distribution, the same data generating distribution, these are all going to converge.
Right? Because this is a compression of probability and outcome in a discrete or continuous sense, right? It's the same idea. We're just conditioning on a subset of the sample space here, which is just effectively giving us a new omega. Right? Same thing with female.
It's just effectively giving us a new omega. We're just normalizing then by the quantity of males and quantity of females in those partitions, right?
They're subsets of the sample space. But in the frequentist sense, these are all going to converge to their true values.
So what do I mean by that? Well, if you have a small sample, right? We can think of it like this.
If I have a small sample, then I might observe height as I continue to take more samples, right? I'm going to observe Let's take a look at the expected value of height given male. I'm going to observe this over the quantity of samples that I have. So, this on the Y axis here is expected value of height given male, and on the X axis here, we have the number of males that I've sampled in this particular population omega.
You're going to find that as you continue to take samples, that this is going to eventually cumulatively converge to the true theoretical expectation, and that is by the law of large numbers. Asymptotically in a frequentist sense, the law of large numbers says that all probabilities, statistics, and distributions are going to converge to their data generating distributions, right?
Now, what's the problem here?
In practice, right? This is kind of true, and it's useful for the most part for the statistics and probabilities that we need in practice.
However, the problem is over time, this theoretical expectation changes.
And that's because the data generating distribution changes in kind of a broader sense, there really is no data generating distribution. We're imposing that structure on the space, and this is kind of the problem with using probability and statistics in finance just out the gate super aggressively is you're not going to have this nice frequentist convergence, right?
Asymptotically in finance, if you look at stock returns, if you look at anything like that, you're not going to have probabilities, distributions, and statistics converge because the data generating distribution is changing over time.
The way that you can think about this is imagine I have a school.
See if I can draw.
Yeah, crushed it. I have a school in call it the 1900s, maybe like 1902.
And I have a school today, right?
Completely different schools.
Maybe a school in 2026.
If I take a look at the expected value of height given male in 1902, is this going to be equal to the expected value of height given male in 2026?
I mean, absolutely not, right? There's so many different factors that go into this, right? Are these in the same location? Time plays a a significant role in the population's average height, right? Over time. There's so many things that are going to change the distribution of males over time and their corresponding heights. And that's why in real life we don't see these statistics, probabilities, so on converge.
There is no fixed data generating distribution. Yes, we have the central limit theorem, but the central limit theorem only holds under specific conditions, and in real life, when you have this time variation, this non-stationarity, that's when you get misled and you're not actually looking at correct probability statistics and distributions.
So, these are most certainly not equal, right? It's going to be a function of a a variety of different things.
But, nevertheless, we can see then, right, in the overall expectation, if we have all that information. So, let's say we have this information.
Let's say we have the probability of male in both time frames. We have then the same for females, the expected value of height given female times the probability of female.
The expected value of female and the probability of female in both time periods. Right? Hopefully, you see now in the stock return sense what we're building to. Clearly here, the two unconditional expected values, the expected value of height in 2026 is not going to be equal to the expected value of height in 1902, which is why your stock returns in the early 2000s are on average are not going to be equal to your stock returns now, right? They're companies. It's amorphous. We're not dealing with frequency-based probability statistics and distributions in real life. Instead, we're imposing that structure to try to make sound decisions under uncertainty.
What you're seeing here is that if I can decompose this expectation into these different legs, I can see what changed most dramatically over this time period. Was it the average height of males? Is that what changed dramatically?
Was it the probability of being a male?
Is that what changed dramatically?
Was it the average height of female? Is that what changed dramatically? Or the probability of being a female?
All of these go into the unconditional expectations. So, when there's significant deviation in the unconditional expectation, you can decompose into the individual parts, and over time, you can see what is drifting, what is changing. You can do distribution distance measures, the list goes on, but keep in mind, right? We don't have this frequency interpretation that we do in the classroom. It's more amorphous, right? So, what do I mean by that? Well, what's really nice about the law of total expectation is, so long as you can partition the sample space, you're going to be able to decompose anything, even if it is more amorphous, right? Even if it's extremely chaotic, even if the distributions change violently over time, it doesn't change the fact that this is true. It does change the fact that it may not be stable over time, which is exactly why you perform this sort of analysis here in a temporal sense, but let's apply this idea now to trading, right? So, if I have a Let's call it a win.
and a loss in terms of a trade, right? And this is a sample space of all of my trades, then I'm going to be able to partition the sample space perfectly by whether or not a trade was a win or a loss. This is one of the first things that I talk to any sort of anyone who comes to me for professional consulting, this is one of the first things I ask them to show me is the distribution stability over time of their their strategy, right? If if they are trading some sort of inefficiency, um whatever it is, but trades are going to live in one of these two one of these two partitions, right? A subset of the overall sample space.
You're going to have your winning trades, you're going to have your losing trades, right? Now, the particularly keen among you will notice that this is likely to be more granular, right? It's probably not going to just be your wins and your losses. You're also going to have like sub regimes. So, you're going to break this up into like a low a mid and a high volatility regime. And then you'll even segment again um by other conditional factors, macro factors, political factors, whatever. And then you'll look at your win and loss. You can apply the law of total expectation in the exact same way. As long as you're dealing with a partition of the sample space. So, more formally here, right? A partition is just going to be if I take the union of all of these different events, then I'm going to get the original sample space. And if I take the intersection of all of these events such that I'm dealing with two different events, then we are going to be completely disjoint. And there's not going to be any overlap. And that's what that's what you see in all of these partitions that I'm drawing, right? I'm drawing them effectively so they have no overlap.
They're just slicing and dicing this original sample space here.
So, let's take a look then at the idea of the Let's take a look now at the idea of the expectation of trading P&L.
Right? So, each one of these is going to be a trade. That means that we're going to have by the law of total expectation, the expected value of your P&L given a winning trade times the probability of a winning trade plus the expected value of your P&L given a losing trade times the probability of a losing trade.
And [snorts] the problem here is that these are typically extremely unstable, right?
>> [snorts] >> So, what you're going to have to do is you're going to have to continue to condition on different elements, volatility regimes, strategies to try to create reasonable stability. That is what your trading edge is effectively.
If you have an edge, then this is going to be reasonably stable over time. We often talk about discretionary trading being performance, it's swings at bat.
Even still, if you have an edge and you decompose your trades over time, you should see reasonable stability here, right? The mean should be on average positive with reasonably constrained variance such that you're not blowing out or such that you actually have like structurally orthogonal risk to just pure market exposure that you're taking advantage of.
But, that goes into >> [snorts] >> other types of analysis that you would do.
So, the thing here to keep in mind is we don't have that nice frequentist convergence, right? If I have n trades here and I'm looking at the expected P&L of my winning trades, right? As I make more trades, this doesn't have to converge to anything, right? Now, we say here with height that it did.
The only The only reason why I'm saying it does with height is because it's likely more stationary, at least ephemerally, [snorts] than it is for something like your trades.
Unless you have a really good strategy, then it's going to exhibit that kind of stability, right? But, if I take the average height of people in a workplace, people in a school, that tomorrow isn't really going to change all that violently, that distribution, right?
But, over 10, 20, 50, 100 years, then you're going to observe those significant differences, right? Now, in finance, we actually observe those significant differences quite quickly, and distributions are going to change very aggressively um over the short, mid, and even long term. Just depends on what it is you're trying to trade and what it is you're doing.
All right? So, just keep that in mind in the context of the law of total expectation. You can always decompose an unconditional expectation into a series of conditional expectations, and each conditional expectation is scaled by their corresponding probability. In a frequentist sense, if we're in the classroom, so to speak, uh in the lab, and we're just drawing from fixed distributions, then if I just keep increasing the amount of either people that I'm taking the average height of, or I keep increasing the total number of trades, >> [snorts] >> eventually I'm going to eventually I'm going to converge to the the true values of all of these, the probability statistics, and even distributions. But if we're not in the classroom, then convergence is not guaranteed, and that's the entire game.
The entire game is trying to figure out stability, and trying to make the best decisions that you possibly can in a forward-looking sense without that information, knowing that this information is somewhat flawed, right?
And you can use backtesting, walk forward testing to try to figure out how flawed it is. Obviously, that there is bias in doing that. It's historical data, right? All all empirical research in that sense is is data mining, I heard um I heard that from a from somebody recently, um just to to keep their privacy.
I and I I I I loved uh I loved that idea.
Um and if you've actually uncovered something structural, then you can continue to to take advantage of it.
So, [snorts] that's the idea of the law total expectation. Um, I hope you enjoyed this video. I hope you learned something. I would love to continue to do um, more discussions on probability and statistics if there's interest kind of continuing to bridge this gap between the classroom and and practice.
Um, thank you guys for all of the support.
It means so much to me. I I I really thank thank you.
Thank you.
As soon as I know more, I'll let you know more.
Um, and thank you guys for being there for me.
Um, tha- thank you for watching this video and uh, I'll I'll see you I'll see you in the next video.
Related Videos
Escaping the Fog
LogicLemurGaming
760 views•2026-06-03
Olympiad Mathematics | Indian | Can You Solve This One?
PhilCoolMath
650 views•2026-06-03
A Brutal Radical Expression Made Easy! The Shortcut Changes Everything.
tamoshop
112 views•2026-06-02
V : jee main /advance class 11 mathematics : Binomial Theorem class-1 ( 29 may 2026 )
dcamclassesiitjeemainsadva9953
125 views•2026-05-29
Is This Pentomino Tileable?
3cycle
241 views•2026-05-30
This Sudoku Has Many Lines!!
CrackingTheCryptic
2K views•2026-05-29
Olympiad Mathematics | Indian Can You Solve This One?
PhilCoolMath
268 views•2026-06-02
Olympiad Mathematics | Indian | Can You Solve This?
PhilCoolMath
669 views•2026-06-02











