In probability theory, a point estimator is unbiased if its expected value equals the true parameter value for all possible parameter values. The sample mean (X̄) is an unbiased estimator for the population mean because its expectation equals the true mean. However, the uncorrected sample variance (1/n)∑(Xj - X̄)² is biased, with its expectation equal to (n-1)/n times the true variance. To obtain an unbiased estimator for variance, the denominator must be n-1 instead of n, which corrects the bias factor.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Probability Theory 37 | Bias for Mean and VarianceAdded:
Hello and welcome back to probability theory, the video series where we talk a lot about the mathematical description of randomness.
And indeed in today's part 37, we continue our discussion about inferential statistics. Namely, we will talk about point estimators. And as you might remember, point estimators are one way to do statistics and we have defined the term bias for them. And today we will go into more details for these notions and we look at examples for the mean and the variance.
However, as always, before we do that, I first want to thank all the nice people who support this channel on Steady here on YouTube or via other means. And please don't forget with the link in the description, you can download the additional material for all the videos.
For example, it could be quite helpful to have the whole PDF version of the course about probability theory. And now without further ado, let's immediately go into the topic of statistics again.
And there we know we have a statistical model which is just a collection of probability spaces. So more precisely this Xn is a subset of Rn. And there we have our standard BL sigma algebra.
Moreover, we also have probability measures P theta on R and then we just form the standard product measure to get a probability measure on Rn. However, the important thing here is that we have a parameter theta that goes through a whole parameter space capital theta. And usually it's sufficient to consider a pupil of real parameters here, which means the parameter space is a subset of R K. And indeed this means that in the case that k is not equal to 1, we have to redefine what we mean by a point estimator. And this is quite clear if we look at a short sketch what we want to do in statistics.
So here we have our xn as a subset of rn and a point in it is what we call a sample. So not complicated at all. That just means that we find n coordinates for such a point in the space.
Therefore, we immediately get n projection maps which we can see as random variables.
So the projection onto the j coordinate is what we can call capital j. And by definition, it maps the sample space xn to the real number line. This means when you take a sample here, you forget about everything except the J coordinate.
Now this definition is quite helpful because we get a whole sequence of random variables which are actually IID which simply means they are independent and identically distributed.
And now as you know the goal of statistics is to find this hidden distribution by just looking at a sample. And this works because we already have a collection of possible distributions we call P theta.
Therefore, you could say we just want to find the best fitting parameter theta.
So having P theta as the hidden distribution fits best to the given sample. And now in order to describe that in a mathematical way, we can look at so-called point estimators.
And a common name we have for that is just a capital t with index n. However, these point estimators should actually always map into the real number line. So tn takes a whole sample consisting of n coordinates and maps it to a single real number.
And now the connection to the parameter space theta is given by a real valued function we can call s. So this is the actual quantity the point estimator is related to. For example, it could be the expectation or the variance of our probability distribution. And that's already it. This is how we extend the definition of a point estimator for the case that the parameter space theta is not just the real number line. In that case, we can just consider different maps pi that map the parameter space theta to the real number line R. Indeed, this part here is the only difference to the picture of the last videos. But it means that we also have to redefine the meaning of an unbiased point estimator.
This is not complicated at all because we just have to bring the function s in.
So we always have a statistical model as before, but now we also consider a point estimator tn. And in the simplest way you would just say it's a random variable from the sample space XN to the real number line R. But now we would add that it is a point estimator with respect to a given map S. And here I write quantity to make clear that usually we have something like the mean or the variance of the distribution.
And now we can define the term unbiased for the point estimator tn with respect to our quantity pi. Namely, what we want is that the expectation of tn calculated with our product measure is always the same as the value under pi. So it's pi of theta no matter which theta from the parameter space we choose. So you see if we want to estimate the value of S with our point estimator in average we don't have any error. Therefore being unbiased is a nice property of a point estimator but of course it does not tell us the whole story for the quality of our estimate. Indeed I can already tell you that sometimes other criteria are much more important. Therefore, we can look at a weaker definition where we have the unbiased property only asmtoyically.
And there you might already guess that we just look at the limit n to infinity and we just want to converge to the value p of theta which means that if n is large enough we are already very close to the equality. So there we have it with this new definition. We now know what unbiased means for point estimators for a quantity pi. However, in order to understand the whole thing, I really think we should look at examples.
And in the first example, let's take the mean the expectation for our function p.
And for that reason, I don't want to use the letter s, but just the letter m. So we write m of theta which is given by the standard expectation which means it's an abstract integral with respect to our probability measure p theta. So more precisely we integrate the function x over the real number line with respect to the probability measure p theta. And now as you might already know we have a good point estimator for this expectation which we call the sample mean. And in order to keep it consistent here, let's use MN instead of TN. Hence, MN of such a sample is just given by the standard average of the numbers.
So, we have 1 / N times the sum of the lowerase X's. And I don't tell you something new when I say that this is commonly shortened by Xbar.
However, for us here, it's even more helpful to write this point estimator with the projection operators from before. This just means that we can write MN as the sum of the capital XJS.
So, you see there's not much to it, but it definitely helps us in the calculation. Now, so first I want to show you that this point estimator is unbiased with respect to our mean M of theta. This simply means that we have to calculate the expectation with respect to our product measure. Therefore, now we have an n-dimensional integral. And exactly there we can use the linearity of the integral and the description of MN as a sum. Hence, we actually just have to integrate the random variable XJ.
So you could say what we have here is just the expectation of XJ. And since all the XJs are identically distributed, we know that the expectation is the same no matter which J we choose. Therefore, by summing them up and dividing by n, only one expectation remains. And now without restriction, we can just say that we take x one. And now since this is the projection onto the first coordinate, we know that the whole product measure will collapse to a single one-dimensional integral. So maybe here you would call the variable you integrate over x1 but it does not matter obviously and what we get out is our m of theta and that's it. This is what we wanted to show the point estimator is unbiased.
So this is very nice and then I would say in the next step we can go to the variance and the sample variance. So now our quantity function pi should be given by the variance which we call lowerase v and the definition is as always we take the variable x minus the mean and there the mean we have already introduced as m of theta and moreover we square the whole difference. So there we have it.
This is how we calculate the variance of the distribution p theta. And now as a point estimator we can take the sample variance which we can call capital VN.
And from the videos before we already know that different definitions exist.
And what we do here we take the sample variance which is uncorrected.
This means it's just the natural definition you would choose to calculate the variance of a sample. So you take 1 / n * the sum of xj minus the sample mean. And obviously in the variance everything here is squat. So now this is called the uncorrected sample variance because as we will see now it's a biased point estimator.
However this is not a big problem because on the one hand we can simply correct this bias and on the other hand it's still asically unbiased.
Okay. Okay. And now as before in the calculation will also be quite helpful to use our projection operators capital XJ. This means our VN here can be written as the sum of the XJS minus MN.
And as before, please never forget that we have a square in all the differences.
And now in order to calculate the bias of VN, we just have to look at the expectation again. And there we can use the linearity again and push the expectation into the sum. So you see here we have the expectation of this difference squared. And now it turns out that this is already the variance of the random variable because the other term for the variance is equal to zero. And in order to keep the reference to the probability measure, we write variance with index theta and n. And here to see that this is actually the variance we have to calculate the second term for the variance which is the expectation of the random variable and then this expectation squared. This means the only thing we need to do is to look at this expectation where we can use the linearity to write it as two expectations and then we are already done because we know by the calculation from before that these two expectations give the same result. So the second term in the variance just vanishes and the equality here on the left hand side is correct.
And then we can use the same thing as before namely that the XJS are identically distributed and that the variance of all of them is actually the same. Hence we can just say that we only look at X1.
So we don't have the sum anymore. We just have the variance of XN minus MN.
However, mn can also be written with the projections xj. So we have min -1 / n * x1 + x2 and so on. And now the idea is that we can use the properties of the variance to split this whole sum up.
Therefore, in the first step here, let's put the correct coefficients in front of each of the xjs. So x1 has 1 - 1 / n and all the other ones just have - 1 / n.
And this is quite helpful because we already know that these random variables here are independent which means we can pull out the sum. We can even pull out the coefficients. But there you should note that in the variance these coefficients get squared. This means here for the first variable x1 we have the coefficient 1 - 1 / n^ 2 and for the other ones the minus sign vanishes. So we have 1 / n^2 and at this point you might already expect what comes next namely we will use the iid property of the random variables again. So this means because of the same distribution all these variances coincide.
So again we can just say that we only consider the first variable x1 and then the coefficients here just sum up and you see 1 / n^ 2 we have n -1 * and you see I left a little bit of space here because we can simplify this square as well. It's just -2 / n + 1 / n 2. And now you should see that we can put both things together and then what remains is just 1 - 1 / n. So indeed the 1 / n^ 2 just cancels. And if you want to simplify it we can write it as n -1 / n.
And moreover we can also look at the variance of x1 which is given as an integral with respect to the product measure. But as before we already know this one easily reduces to a one-dimensional integral. Hence there we just get our v of theta and in fact we have this factor in front. So it's not an unbiased point estimator. However this factor is close to one if n is large which means it's asmtotically unbiased.
And in addition, this also explains why we get an unbiased point estimator for the variance when we correct this factor in front. This means we just have to multiply with the inverse of this factor to get an unbiased point estimator. So this immediately leads us to the second definition of the sample variance which is unbiased.
So indeed in fact of the factor n here in the denominator we need the factor n minus one and that's all this makes the sample variance an unbiased point estimator. So this finally explains what we have already discussed in former videos. So now you know the meaning of the bias of the point estimators for the mean and the variance and I can already tell you that we will discuss more properties of these two important point estimators in future videos. So I really hope I meet you there again and have a nice day. Bye-bye.
[Music]
Related Videos
A Brutal Radical Expression Made Easy! The Shortcut Changes Everything.
tamoshop
112 views•2026-06-02
V : jee main /advance class 11 mathematics : Binomial Theorem class-1 ( 29 may 2026 )
dcamclassesiitjeemainsadva9953
125 views•2026-05-29
Is This Pentomino Tileable?
3cycle
241 views•2026-05-30
This Sudoku Has Many Lines!!
CrackingTheCryptic
2K views•2026-05-29
Olympiad Mathematics | Indian Can You Solve This One?
PhilCoolMath
268 views•2026-06-02
Olympiad Mathematics | Indian | Can You Solve This?
PhilCoolMath
669 views•2026-06-02
Can you get the Correct answer for this Math Quiz?
Fendora01
24K views•2026-05-29
NUMBERBLOCKS COUNT THE TOTAL SUM OF TEN NUMBERS | ADD SMALL TO BIGGEST NUMBER | hello george
hellogeorge2294
5K views•2026-05-28











