Itô's Lemma is the chain rule of stochastic calculus, which states that for a function f(t,X) and a stochastic process dX = A dt + B dW, the differential is df = (∂f/∂t + A∂f/∂x + ½B²∂²f/∂x²)dt + B∂f/∂x dW. The term ½B²∂²f/∂x² is the Itô drift correction, which explains volatility drag in geometric Brownian motion (GBM). GBM has the solution S(t) = S₀ exp((μ - ½σ²)t + σW(t)), where the -½σ² term represents the systematic deterministic change caused by random fluctuations passing through the curvature of the exponential transformation. While the expected value of GBM equals deterministic growth (S₀e^(μt)), the typical trajectory shows lower growth due to this volatility drag, demonstrating that expected value does not equal typical experience in multiplicative stochastic systems.
深掘り
前提条件
- データがありません。
次のステップ
- データがありません。
深掘り
How Mean Returns Lie: Itô's Lemma (2nd form), GBM, & Volatility Drag | Stochastic Calculus ep.4追加:
Let's play a game. At each round, you either win or lose 50% of your current wealth with equal probability. So, after round one, you either end up with $150 or $50.
And round two, similarly, has four outcomes.
These are all the possible trajectories for the first two rounds.
The average of all the outcomes for round one is $100.
And the same is true for round two. The expected wealth is constant over time.
Yet, if we look at the payoff, that's the difference between the final and initial wealth, three out of four times we end up with a loss. And that's exactly the phenomenon we will study today, known as volatility drag, as well as Ito's lemma, the stochastic calculus underpinning of this phenomena. Volatility drag explains why expected value does not equal typical experience.
In fact, it can be shown that with probability one, wealth tends to zero as the game goes on. Hello, and welcome to the fourth episode of the series stochastic calculus for absolute beginners, in which we intend to cover the basics of stochastic calculus using only basic calculus and probability.
This episode is the first milestone of the series, as we will see some non-trivial applications, the fruits of mathematics. We will derive the celebrated Ito's lemma, the chain rule of stochastic calculus, in its most common form.
We will then use Ito's lemma to study an important model called the geometric Brownian motion, which in turn explains and quantifies the volatility drag we've seen at the start.
Let's go.
First things first, motivations. Let's consider a familiar model from calculus, continuous time compound interest as a deterministic model, aka an ODE. DST over DT equals RST, which we can rewrite into a differential form, DST equals RST DT for some positive constant r.
Note that in the differential form we can interpret the left-hand side d s t as the amount of interest accumulated in the interval t to t plus d t.
And on the right-hand side we have s t which is the principal times r d t.
Hence r d t may be interpreted as the interest rate over that time period.
A deterministic model is used for modeling zero risk investments.
For modeling assets with volatility we need to introduce random noise in the interest rate r d t.
r d t now has a deterministic component mu d t and a random component sigma d w t where d w t is the increment of Brownian motion which we covered in episode one. Here mu and sigma are both positive constants.
And we may expand right-hand side. And that's the geometric Brownian motion. d s t equals mu s t d t plus sigma s t d w t.
And that's an example of a stochastic differential equation s d e.
The stochastic differential equation is interpreted in the Ito integral sense as defined in episode one.
Integrating both sides we obtain s t minus s naught equals the integral from zero to t mu s tau d tau plus the integral from zero to t sigma s tau d w tau.
The first integral on the right-hand side is a Riemann integral whereas the second is an Ito integral.
In particular, we recall that the definition of Ito integrals implies non-anticipation.
The process s tau at any time has no information about the future of the Brownian motion w tau.
And just like in ordinary calculus we can simulate the trajectories of the s d e using the Euler's method.
For i equals 1 2 all the way to n, we have the following iteration scheme.
Delta WI equals the square root of delta T times ZI where ZIs are IID normal 0 1.
This is from the definition of Brownian motion. This way delta WI has a variance of delta T.
A requirement in the definition of Brownian motion.
Next, we calculate the finite increment in the process S by using a discretized version of the SD.
Delta SI equals mu SI minus 1 delta T plus sigma SI minus 1 delta WI.
Notice that in the last term the index I minus 1 for S and the index I for delta W satisfy non-anticipation, which is what we want.
And then we simply update S by adding its increment.
Or equivalently, we can summarize the last two equations into one step. SI equals SI minus 1 times mu delta T plus sigma the square root of delta T ZI quantity. The expression in the bracket is the random interest rate for the ice subtime interval.
For example, here are some sample paths of the geometric Brownian motion with parameters mu equals 5% sigma equals 20% and S0 equals to 100. So, this might be the time evolution of a price for a specific stock.
And these are possible possible trajectories over a 5-year horizon. And the dotted curve is the expected mean.
So, now we can define Ito stochastic differential equation also known as diffusion in its general form. It is a generalization of the geometric Brownian motion.
We say that Xt is a diffusion process if it satisfy the Ito stochastic differential equation dXt equals mu of tXt dt + σ tXt dWt where the drift coefficient μ of t and x and the diffusion coefficient σ of t and x are given functions.
The integral form of the Ito SDE or diffusion reads Xt - X0 = the integral from 0 to t μ of s and xs ds + the integral from 0 to t σ of s and xs dws.
If the coefficients μ and σ are time independent i.e. they do not depend on t explicitly the SDE is said to be time homogeneous.
For example, the geometric Brownian motion is a time homogeneous diffusion process because the drift coefficient is μ * x.
It does not depend on t explicitly. And the same is true for the diffusion coefficient which equals σ * x.
Which brings us to the main results of this video.
First, we have the Ito's Lemma. The version most commonly encountered in literature.
Let f of t and x be a deterministic function that is C1 in t and C2 in x.
Recall from calculus, C1 in t means the partial derivative of f with respect to t is continuous. And C2 in x means the second order partial derivative of f with respect to x exists and is continuous.
And the process Xt satisfies the SDE dXt = At dt + Bt dWt.
And we remark that whenever we write something like Bt * dWt in future it is implied that Bt is non-anticipating relative to the Brownian motion WT. Unless stated otherwise.
And then we have this rather complicated-looking formula.
But the idea is quite straightforward.
We have a process XT.
We define a new process by applying a deterministic transformation to the process XT. We wish to express its differential.
And it turns out to have this form.
The partial derivative DF over DT at T and XT plus the partial derivative DF over DX at T and XT times AT plus half of the second-order partial derivative D squared F over DX squared at T XT times BT squared quantity DT plus the partial derivative DF over DX at T and XT times BT DWT.
So, the expression in the square bracket is the drift coefficient for the new process F of T XT.
And the new process has a diffusion coefficient of the partial derivative DF over DX at TXT times BT.
Even though it might look complicated, the derivation is no more difficult than the Ito's formula in its basic form, which is episode three.
The next main result is the solution to the geometric Brownian motion obtained by applying the Ito's lemma.
It has the following closed-form solution.
A rarity among SDEs.
The formula reads ST equals S naught times the exponential of mu minus half sigma squared quantity times T plus sigma WT quantity.
And in particular, the expression minus half sigma squared is called the volatility track.
So, here's the derivation for the Ito's lemma.
If you haven't watched the third I strongly recommend doing so before coming back to this derivation because the ideas are the same and the algebra there is simpler.
I will not read out every letter in this derivation.
Instead, I will just walk through the main ideas. The link for the video slides is in the description.
So, you can go through them at your own pace.
We first do a Taylor expansion of the function f keeping second order terms.
The process x, the original process, has its own SDE.
Let's calculate two products which will be useful.
First, dt * dx. It is at dt squared + bt dt dwt.
We recall a crucial relation dwt squared equals dt.
This is what episode two is about.
So, dwt can be formally written as the square root of dt.
Just like in regular calculus, we ignore higher order terms in dt.
So, the first term is of order dt squared, which we can neglect.
And the second term is of order dt raised to the three half, which is also a higher order term of dt.
So, both of them can be neglected and we write dt dxt equals zero.
Next, we calculate dxt squared. First, expand the square as usual. Similar to the previous calculation, the final term bt squared dwt squared is bt squared times dt.
This is the only term we keep. It is not a higher order term of dt. It's the same order.
And we notice that dxt squared is not affected by the drift coefficient at.
And now we're just going to substitute these into the Taylor expansion and get the following.
The increment in df of t and xt is defined as F of T plus DT XT plus DX T minus F of T and XT.
Which is just the sum of the linear terms and the quadratic terms in our Taylor expansion.
For the quadratic terms, only one term survives.
And for the linear term with DX T, we simply substitute the S D for XT. And then collecting the terms gives the Ito's Lemma. Pause the video or go to the link for the slides to check the details in this derivation. Let's look at a special case where XT equals WT.
This is equivalent of setting AT to be zero and BT to be one. And it simplifies to the following.
Um we can compare this with the first form of the Ito's Lemma in episode three.
In episode three, F is a single variable function of the Brownian motion.
Compared with above, only the partial derivative with respect to T is missing.
Which is totally expected.
Note that the regular chain rule fails.
In calculus three or multivariable calculus, if we see a function F of T and WT, we can draw this dependency diagram.
F depend on T explicitly and it also depend on T through WT, which occupies the position of the second variable.
And by the chain rule from calc three, the total differential DF equals the partial derivative DF over DT times the differential DT plus the partial derivative DF over DX DW T.
And compare this with the correct formula, second order derivative term is missing. Do not use the ordinary chain rule in Ito's calculus. And as we shall see shortly, it is precisely this second order partial derivative term that explains the volatility drag. This extra term is called the Ito drift correction.
And it has the following interpretation.
It is the systematic deterministic change produced when random fluctuations in XT are filtered through the curvature of a non-linear transformation F.
And by non-linear, we mean non-linear in X. If F is linear in X, then d squared F over dx squared would vanish.
So, the Ito drift correction is purely an artifact of the non-linearity or curvature of the function F. Let's recall a simple correlation. This is derived all the way back in episode two using definitions.
And we also verified it in episode three.
dWT squared equals dt plus 2 WT dWT.
This dt term here is purely due to the curvature of the function f of x equals x squared.
With the Ito's lemma at our disposal, we can finally solve the SDE of geometric Brownian motion. To do so, we need to select a transformation.
For that, let's recall the deterministic case, the ODE and its solution.
The ODE can be solved using separation of variables. Namely, moving S and dS to one side and dt to another side. So, the ODE reads dST over ST is of course d of the natural log of ST equals mu dt.
Integrating both sides, we get the natural log of ST minus the natural log of S0 equals mu t.
And hence we consider the natural log transformation and apply the Ito's lemma to the natural log of ST. Let f of t and x be the natural log of x.
So, the time derivative is zero.
df over dx is 1 over x. d squared f over dx squared is minus 1 over x squared.
Substitute everything into the Ito's lemma, as well as AT equals mu ST, BT equals to sigma ST. After cancellation, we get D natural log of ST equals mu minus half sigma squared quantity DT plus sigma DWT. So, this is the stochastic version of separation of variables and this ST is known as a Brownian motion with drift. Integration is trivial.
So, we obtain the natural log of ST minus the natural log of S naught equals mu minus 1/2 sigma squared quantity T plus sigma WT or equivalently ST equals S naught E to the mu minus half sigma squared T plus sigma WT.
And here's an exercise.
Verify the solution using the special case of the Ito's lemma.
A step-by-step solution is given in appendix three of the slides.
And now let's look at a question of practical importance. In the GBM in the exponential, we have a coefficient of mu minus half sigma squared.
Whereas in the deterministic growth the coefficient for T is just mu. Can we say that GBM suffers a lower growth due to this difference?
And the answer is super interesting. It depends on what metric we're looking at.
If we're looking at typical trajectories GBM indeed suffers from lower growth.
But if we're just talking about the ensemble average the two are identical.
Our little game at the beginning is a discrete analog of this phenomenon and it's very counterintuitive. Basically, in a multiplicative system the average is almost meaningless.
Let's analyze the typical trajectories first.
From the solution of GBM, we know that the logarithmic growth has a linear trend in time plus sigma WT.
Divide by T on both sides, we get 1 over T times the natural log of ST over S naught equals to mu minus half sigma squared plus sigma WT over T.
What's the behavior of WT over T?
Well, it converges to zero in the mean square sense.
As T tends to infinity, the second moment of WT over T tends to zero. In fact, we also have almost sure convergence.
Namely, the event that WT over T tends to zero as T tends to infinity has a probability of one. And the proof is omitted as it measures theoretic probability. Therefore, 1 over T times the natural log of ST over S0 converges to the constant mu minus 1 half sigma squared almost surely. Now, let's analyze the expected mean or the ensemble average at each time instance, we average the values of all possible trajectories, and this is what we end up, the exact same as a deterministic growth.
The reason this can happen, despite the fact that the vast majority of the trajectories suffer from the volatility drag, is because a few exceptional trajectories have such high returns, they are the outliers, so that the average can be pulled back to the deterministic case.
Something that is already visible in our little game at the beginning. The only winning trajectory has such a huge payoff that it offsets all the losses in the other trajectories.
Showing that requires a knowledge about the log-normal distribution.
And here's the definition.
A random variable X is said to follow a log-normal distribution with parameters mu and sigma if the natural log of X follows the normal distribution with mean mu and variance sigma squared. It's important to know that mu and sigma are not the mean or standard deviation of the log-normal distribution.
And here are some of the numerical characteristics of the log-normal distribution in appendix two, in which we also derive these formulas. The details are in the slides. And from those formulas in appendix two, we can show that the expected value of ST is the same as the deterministic growth.
And in fact, the median is the typical path. So, the moral of the story is when analyze the behavior of a stock, look at the median performance, not average.
We can also derive the mode behavior and the probability of exceeding the mean.
Here phi is the standard normal CDF, and as we can see, when T gets larger, it becomes harder and harder to beat the mean return.
That's it for today's video. If you enjoyed the content, please consider like and subscribe. I've been having a lot of fun making this series. I hope you've been enjoying it as well. Thank you very much for watching, and I'll see you next time. Bye.
関連おすすめ
A Number Plus 5 Is 12
MathGirlTutor
101 views•2026-06-03
Olympiad Mathematics | Indian | Can You Solve This One?
PhilCoolMath
650 views•2026-06-03
Escaping the Fog
LogicLemurGaming
760 views•2026-06-03
H2 Math June Holiday 2026 Intensive Revision | H2 Math Tuition by Achevas #singaporemath #h2math
AchevasTV
304 views•2026-06-01
A Brutal Radical Expression Made Easy! The Shortcut Changes Everything.
tamoshop
112 views•2026-06-02
V : jee main /advance class 11 mathematics : Binomial Theorem class-1 ( 29 may 2026 )
dcamclassesiitjeemainsadva9953
125 views•2026-05-29
Is This Pentomino Tileable?
3cycle
241 views•2026-05-30
This Sudoku Has Many Lines!!
CrackingTheCryptic
2K views•2026-05-29











