Shannon entropy, defined as H = -Σ Pᵢ log₂(Pᵢ), quantifies the average information content of a probability distribution and represents the fundamental limit for lossless data compression. For a sequence of n independent events with probabilities Pᵢ, the minimum number of bits required to store the data reliably is n × H, where H is the Shannon entropy of the distribution. This principle, known as Shannon's data compression theorem, states that typical sequences (those occurring with high probability) can be compressed to approximately nH bits while still allowing reliable decoding, making entropy the ultimate measure of information content in classical systems.
深度探索
先修知识
- 暂无数据。
后续步骤
- 暂无数据。
深度探索
Physics foundation series 2026本站添加:
regarded for his foundation research in quantum information information theory open quantum systems quantum thermodynamics and quantum optics and in addition prof will also deliver a special colloquium uh titled quantum battery science fiction or reality which I will unfortunately miss but I will try to join through YouTube exploring an exciting and rapidly developing frontier of quantum thermodynamics. So I would like to also express my sincere gratitude to our director who has extended generous support to organizing this program and to all the speakers for their willingness to contribute to this program. I would also like to thank Vini and money who are sitting outside you have actually seen them and all other administrative and technical staff and everyone who has been who will be involved and has been involved in organization of this event. Most importantly I thank all of you for you all your enthusiasm and interest and in joining us for this weekl long program.
Theoretical physics has always advanced through a careful balance between mastering and establishing ideas and questioning their limitations. It is our hope that this series will not only strengthen your understanding of the fundamentals but also give you a glimpse of how these ideas continue to shape modern research. Perhaps some of you I hope it may even be a mark this might even be a mark for the beginning of a future career in theoretical physics. So I hope this coming week be intellectually stimulating for all of you and enjoyable and inspiring to all of you. With that let's begin the uh first lecture series. Uh it's now my pleasure to invite professor Shibbashi go to deliver his first lecture. Thank you.
Deep plug.
I don't need that. to see this.
>> Then we switch up the board.
Okay. Am I audible?
>> Yes sir.
>> Okay. Good morning. Um uh so what I'm going to talk about is uh quantum information theory.
uh but uh so of course uh it requires a bit of quantum mechanics knowledge of quantum mechanics also but quantum mechanics basics will be introduced later but now today I'm going to talk about classical information theory basics of classical information theory u and for that you don't need to know quantum mechanics Okay.
So, so information you know I think all of you know because uh you read books, you uh you know look at uh uh some documentary, you look at internet so everywhere there is some information.
Now the question is that uh how will you quantify information?
That is the first question and secondly which one you would like to call as information which one you would like to call as data.
Okay.
So for example you know uh there are plenty of organizations who collect data right personal data or something who some organization which does this market survey they take data from the market and all that but the issue is that this data of course it has some information but is it true that all of those data are useful.
So what we need to figure out is uh which information is useful, which information is not useful and we'll have to quantify that.
So let me give you two examples and try to tell you that uh which one is useful and which one is not. And this example will also help us to quantify the amount of information.
So as I said I'll talk about classical information today and from next class onwards I'll talk about quantum information.
So let me consider these following statement.
Please tell me from the back. Can you see this writing?
Sun will rise on the east.
Okay, this is statement number one.
Statement number two is it will rain tonight in Chennai.
Okay, I hope all of you can read these statements, right? Okay. So if I look at these two statements, so first one has some information, second one has also information. But let's see. Sun will rise on the east.
Okay. So this statement does it contain any useful information?
What is your idea?
No, >> no information.
>> No useful information.
>> No useful information. Why?
>> You already know.
It's a fact.
>> It's a fact. So this happens every day.
So it does not add anything to our knowledge. Okay.
So this is a certain phenomenon. Right.
The first one sun will always rise on the east. Okay. So this is a certain phenomena. So this has no useful information. Right?
Look at this.
Does it have the second one? Does it have any useful information?
>> Why?
>> Sorry. What?
>> I couldn't hear. Can you >> you can plan >> you can plan of course of course this is useful because you know in Chennai you don't get to see rain every day every night right so therefore this contains useful information whether today it will rain or not that I don't know but just so this contains some useful information okay so the question is that. So this is no information and this useful information.
Okay.
So now the question is how to quantify useful information.
So for example, if I look at here, if I look at the rainfall in the night for the city of China then I may have I mean generally I'll have some statistics that what is the probability that it will rain tonight in Chennai. Right? This is how the weather department works. So we'll have some possibility of having rain tonight in Chennai. Right? Okay. So if I look at that data, the statistics, I can figure out approximately that what is the probability of having rain tonight in Chennai. Okay.
Right. Let me take this.
So maybe if I look at the statistics maybe you can say yes >> sorry what?
>> Yes.
>> Yeah because it is it happens every day.
Okay.
>> Qu itself is the fact that uh where does the sun rise from? Then the first statement also >> yes. So because it happens every day so it does not provide any useful information. What is useful information?
Useful information means there should be some amount of uncertainty.
Okay.
So maybe if I look at the statistics the probability of having rain tonight maybe it's 1%. I don't know 1% or even maybe less less than that. Okay.
So let us take another statement like 23 of earth's surface is water.
Let's see. Okay.
So, is water let's say. Okay.
So, does it have any useful information?
Yes or no?
Sorry.
>> Again, it is a fact. Okay. Good. So, yeah. So, this fact generally does not change. You know, it may change little bit but does not change. Okay.
Generally, but I consider this for a different reason.
So if I look at the fraction of area of earth's surface 23 of it is land sorry is water and one/ird of it should be land right okay so here probability of land or water water surface Okay, this is what this is 23 right and probability of land surface this is one/ird right approximately it can vary a little bit okay so probability I hope you know right what is probability I'll come back to it uh uh in bit details.
So this sentence basically it provides me these two probabilities.
Okay.
Now let us consider another statement. So let me erase this.
Let's say let me consider this statement right 23 of dogs in Chennai are street dogs.
Okay. Yes.
>> Sir, the statement that 2/3 of Earth's surface is covered by water. Isn't that a certain statement? Why is it probability?
>> Yes. Well, this is a fraction because probability is basically a kind of fraction. Okay. So, this is you can say that you know if I choose arbitrarily some area of of earth surface okay that may be water area that may be land area.
What is the probability that it will be a land?
Okay.
>> So that will be 2/3 right? In that sense this is probability.
>> Okay.
Okay. So let us take this thing. 23 of dogs of of of the dogs in Chennai are street dogs. It may not be correct statement but let's say okay.
So then remaining one/ird are domestic dogs right? Household dogs you can call them right. So here again the same statement that if I select a dog in Chennai the probability that it is a street dog is 2/3 and probability that it is a domestic dog is 1/3 right so therefore probability that it is a a street dog this is 2/3 and probability that it is domestic dog is 1/3 right okay so in terms of probabilities if I look at this statement statement number four and statement number three is there any difference don't look at now at these statements just look at these probability structure of the probabilities ities. Is there a difference? No, they are same. Okay. So when we talk about information theory, quantification of information, it is this probabilities that will matter and nothing else will matter.
Is that correct? Is that clear? So from now onwards of course which probability corresponds to which event etc that will uh matter. But the structure that means the mathematical form of the probability or the values of the probabilities that will matter in quantification of information. Okay.
So it doesn't matter whether we talk about dogs in Chennai. It doesn't matter whether we talk about earth surface. It doesn't matter whether we talk about you know rainfall tonight in Chennai.
What matters when we consider quantification of information is this probability structure.
Okay.
So what is information? If I want to quantify information, what is it?
Basically I can quantify information from two directions.
One is that I have how much how much ignorance I have about some phenomena about some data about information. How much ignorance I have if I can quantify that ignorance that will be our quantification of information. On the other end, I can also say that how much knowledge I have about some phenomena, some data, some information that is also quantification of information.
Okay. So I can approach from both the sides.
What we will do here? We will take the first uh a a a you know method that quantification of information is through um your ignorance. Okay, both are same you know it doesn't matter. So I'll consider quantification of information as the quantification of my ignorance about that event. Okay, about that information.
Right?
Okay, good. And historically this is what people have taken. It doesn't matter. You can start from both these any one of these sides, right? Yes.
>> Exactly. Exactly. Exactly. So if you say from the perspective of knowledge, useful information means you don't have the full knowledge.
So the first statement that I have written that sun will rise in the east.
Do you have any ignorance about that event?
No. So therefore the information content should be zero because your ignorance is zero there. Okay? Right?
And let us make another statement that sun will rise on the west.
Okay. If I make such a statement, do we have any ignorance about this statement?
What ignorance I have? 100% ignorance.
No. No. This is a false statement. Okay.
This will never happen. So the probability of occurrence of rising of sun in the west is zero right.
So you can say that ignorance about the this event that sun will rise on the west is zero because I know that this will never happen. So what I want to say is that the information content of certain event is zero. information content of impossible event is also zero.
Right? Because for both the cases you have no uncertainty.
You have no ignorance.
Right? Is it clear?
Okay. Now let us look at for example this statement or this statement. Let's say I have this statement. 23 of earth's surface uh is water.
So of course you know if I choose a portion of earth surface anywhere you want it will not be land with certainty neither it will be water with certainty.
Right? So what we have only these probabilities that this portion of land portion of earth surface will be land is with probability 1/3 and the same portion of earth surface will be water is with probability 2/3. Okay. So do we have some ignorance about this statement?
So ignorance in this sense that I provide you a portion of earth surface can you tell me without doing any experiment anything without looking into it actually can you tell me whether it is land or water? Can you tell it with certainty?
No. So you have basically ignorance about this fact right? About this event you have ignorance. So question is that how to quantify that ignorance? I said for certain event amount of ignorance is zero. For impossible event amount of ignorance is zero. Okay. So therefore their information contents are also zero because you don't have any ignorance.
But for this kind of events you have some some ignorance which which is neither zero nor one.
So the point is that if I look at some event like you know it will rain tonight in Chennai whether it will rain or not.
If I look at some event then my ignorance will be higher if if the probability of occurrence of that event is lower. Do you agree with me?
If I look at some event, the information content of that event, right? Or you can say the amount of ignorance about that event will be higher if the probability of occurrence of that event is lower.
Do you agree with this?
Can you tell me why?
because if some event I know if it is certain then I have no ignorance.
Okay. So as soon as the event has you know probability which is less than one and if I if I decrease this probability my ignorance will increase. Okay. So in a sense that the um so if I quantify the amount of ignorance or amount of uh information whatever you want to call of some event uh whose probability is P the probability of occurrence of that event here for example this probability of uh you know having the surface as water is 2/3. So if I quantify I of P as the amount of ignorance or amount of information of some event whose probability is P then you see that this varies as one upon P.
Is that clear?
Okay. So this should vary as one upon P because if I increase P this probability this will decrease the amount of ignorance. Okay.
Sorry.
>> Yeah because it depends upon the probability. Yes it depends upon the probability. This is the probability.
Okay. Okay. So let me erase this part.
Right.
So now the question is this probability P you know that 0 less than equal to P less than equal to 1 because this is probability right.
So the question is what will be the actual functional form of this information content? Okay what will be the actual functional form? Okay. So you know there are plenty of functions which will behave like this.
Right?
Here comes certain properties okay that you'll have to impose on this function.
First of all the information content IP this is a continuous function.
of P.
I mean this is expected like if I continuously decrease my probability information content should increase continuously.
Okay. So this is I mean generic idea right? It should be a continuous function of P.
This is additive function of so if I have two independent phenomena independent events let's say okay that means two events which do not influence each other then the information contents of these two events if I club them together that means the total information content of these two events together okay that should be additive because you know ignorance of two events uh if they are independent you know ignorance will basically add up okay so this is again a property of this function I of P this should be additive function okay so in other words if I have The let's say probability of some event A is say it is P of A and probability of another event which are independent.
This is P of B where A and B are independent.
They don't influence each other. Then this I of A and B A B together or you can say A plus B. Okay. See A and B together. This should be basically is equal to I of A plus I of B.
Okay. This is additive function.
Right.
Uh so if I take these two assumptions on this I of P which is the amount of ignorance then one can prove and this is what Claude Shannon has done. By the way, the information classical information theory that I'm uh talking about was developed by uh somebody called Claude Shannon.
uh he came up first with a paper called a mathematical theory of information and then you know perhaps one year later this is actually in 1948 he came up with this work called the mathematical theory of information okay right and what I'm going to talk about what I'm talking about is about that word the mathematical theory of information. Okay.
Right. And Claude Shannon, you know, he was an electrical engineer.
Okay.
>> Yes.
Regarding.
>> Yes.
>> Suppose I have a latis and suppose I have a latis and I have probability of the particle being in any one of the positions on x and The probability of the particle being any one of the position one. So the probability of the combined thing won't it be multiplicative?
>> Yeah, that is multiplicative. But the question is that this is independent events. Okay. So the here when I you're talking about particle can move along X direction or it can move along Y direction simultaneously. It cannot move both along X and Y direction. Right? It will depend on the conraint of the particle particle.
>> No no classical particle how can it move at a time on both the directions. Okay.
>> Suppose I look at it after some time.
>> Yes.
>> Then also you know classically you can have only one trajectory.
You cannot have you know more than one trajectory at a time.
>> Yeah I agree. But then the probability of the particle being any one of the largest >> yes it will be product but these events are not independent in this case >> but it can move >> it can move independently that is true but if you look at the entire figure entire picture then it will not be independent so here what I'm talking about let's say I toss a coin okay right I'll have some probability right and then I take another event not about tossing a coin. So when I talk about this independent event means that the probability of occurrence of event A in no way it can influence the probability of occurrence of event B and vice versa.
In this case that you have given it can influence if the particle goes along X-axis for a particular time it cannot go along Yaxis.
>> Yes.
>> Yes. I know that probability product >> that is probability product but these two are not in see you are doing this thing in the same case okay in the so if I toss a coin and if I die throw a dice joint probability is product of course joint probability is product but here it is not joint probability it is about two independent events okay so if I so for example you know maybe this is confusing is that you know if I have to have some event a and and or okay so so I don't know how to use that like you know it is not that both of these will occur either this will occur or this will occur okay so you can say or b okay either this will occur or this will occur right so then what will happen these probabilities >> okay So you know maybe this notation is not correct. So it is you can say a or b. Okay. So these two are independent event. So their probabilities will add up.
>> Sorry.
>> No it cannot go to zero. How can it go to zero?
Question is see when I talk about events being independent is not linear independence. Whatever be the way you look at they are independent.
Okay. So, so by using these two properties, this continuous function of P and the additivity property, one can prove and in fact this is what Shannon has shown is that this I of P is basically a function. It has this functional form which is called logarithm with respect to base 2 of 1 upon p. Now this is not exact. This is up to some additive or multiplicative factor. Okay. You can add some factor some term here or you can multiply by some constant.
Okay. But here I'm not talking I am not considering this factor neither a factor let's say called a nor this constant b.
So I'm not taking that just I'm considering this. Okay.
So from now onwards this form of this IP is basically of this. So this is logarithm of one upon p with respect to base 2. And what is this base 2? Why this base 2? Because classical information you know I can always whatever information you can think about I can always write it as a string of 0 1 1 0 something okay whatever information you tell me I can you know digitize it it write it as a string of zeros and one and because it is only two values zeros and one a bit so this logarithm this base of this comes out as two Okay.
So we say that this is the bit of information bit of information bit okay bit of information corresponding to this probability P.
Okay.
Right. So then um so if I have events let's say this is A1 event which occurs with probability P1 A2 event which occurs with probability P2 and so on. a n event which occurs with probability PN. So like you know think of it like this coin tossing I toss a coin if head comes let's say probability of head is let's say p and probability of tail let's say it is 1 minus p because if I add these probabilities sum over all these pi this should be equal to one because sum of this probability like here this probability add up to one okay so if this is the case Then for this event you know IP1 is of course uh log of 1 by P1. Here IP2 is log of 1 by P2 and so on. Right? So the average information content if I look at it.
So the average information content content of the set of all events you know this thing this will be basically sum / i is equal to 1 to n p i log to the base 2 1 upon pi. Do you agree with this?
Right? This is the Can you see from the back this one?
So this is the average information content for each event. This is the information content and each event occurs with these probabilities pi. So therefore the average information content is this.
Okay?
Right? Is that clear?
And this is what is known as Shannon entropy.
Shannon entropy of this probability distribution.
So this is the probability distribution.
P1, P2 up to PN as I said from now onwards we will not care what particular event what is the name of that event we only look at their probabilities okay so if I if somebody tells me that this is your probability distribution then the information content of this probability distribution is this quantity Okay.
So this is the notation that we'll use h of p1 p2 pn. This is equal to that quantity which is basically I = 1 to n pi log 2 pi right do you agree with that because you know if I take this log of 1 by p1 it will be log 1 minus log pi log 1 is zero so therefore you will have this minus sign here okay so this is what is known as shannon entropy And this is the information content okay of this probability distribution right.
Okay good.
So now what about the information content of certain event like the sun will rise in the east? What is the information content? Zero. Because you know the probability that sun will rise in the east >> is one and the remaining probabilities that sun will not rise in the east will be zero. And therefore if I put that here it would be zero.
Okay. So for certain event the shannon entropy is zero. What about the shannon entropy of the of the statement that sun will rise in the west? What about the uh uh information? What about the shonen entropy of that statement? Again zero right because the probability that sun will rise in the west is zero.
Okay?
Right? So from that you can figure out that this is again the so for certain event or impossible event Shannon entropy is zero.
Okay. Right.
What is the maximum that this can have given a probability distribution?
Whatever be the probability distribution, what is the maximum value this can have? That means what is the maximum ignorance you can have?
When each event is equally probable, right? Okay. And that happens when this P1 is one upon n P2 is one upon N etc. PN is one upon N. And in that case, so therefore this quantity is less than or equal to what will be that minus of log 2 base n right how do you get that just put here pi is equal to 1 upon n you'll get this okay so this is the maximum that you can have okay so if I Take let's say just in the example of you know uh earth surface whether it is whether it is um uh uh water or land. Okay.
So then this probability so which is that that probability of land this is one/ird and probability of uh water this is 2/3.
So what is the information content of this? This will be can you tell me minus 1/3 log to the base 2/3 and then - 2/3 log to the base 2/3. This you can figure out. Okay. Will it be is it a negative quantity or positive quantity?
Positive quantity. How much will it be?
Will it be greater than one? Less than one >> less than one. So whenever you have only two possibilities with probability so like P and 1 minus P only two possibilities. This is you know it's a it's called binary. So this is the binary Shannon entropy. So this is basically minus p log to the base 2 p - 1 - p log to the base 1 - p.
So if I plot it this is b and this is h of p 1 minus p.
Okay. So this is your zero. Say this is one.
This is also one.
So this is your half.
So I know that when p is equal to half, this is the maximum. Okay. And which is basically log 2 and log 2 with respect to base 2 is one. So you see that what will happen is it is zero this is maximum and again zero. So this is the plot of this you know binary Shannon entropy. Okay it will increase it is zero here P is equal to0 at half this is maximum which is one and at one it is again zero.
Okay.
Okay. Good.
Yes.
>> There will be >> minus.
>> Yeah. There is no minus. Yeah.
Where is minus?
No. No. This I have drawn. This is no minus.
>> Ah sorry. Yeah. Yeah. This is log two.
Yes. Sorry. Yeah. This is two. Yeah.
Right. There's no minus because you know this is a positive quantity. Yes. Okay.
Good.
So now uh let us try to interpret this Shannon entropy. Okay. I said that this is your information content. If you have some event then let us try to uh you know interpret it.
Okay, let us say that uh I have a coin. Okay, bias coin. So probability of head is say P and probability of tail let's say 1 minus P right I mean P is equal to half means it is unbiased coin okay now let us toss this coin one after another okay or you can take n such coins each having this probability distribution and toss them simultaneously. It's up to you. Okay.
So if I toss this coin n times independently I do. Okay. So you can have for example initially the first one maybe h1. H1 means in the first uh run it is head. Okay. Next one maybe again h this is h2. Third one may be t1 t3.
fourth one may be H4 so on right so I am tossing the coin one after another independently so then the events maybe it will look like this okay you know it may also look like let's say T1 again this maybe T2 then this is H3 uh T4 so on okay so how many such possibilities will be there if I do it n times Sorry, how many possibilities?
Two power n right? So two power n possibilities will be there, right?
And if I look at so this is you can think of it as a string. Okay, this is a string of head and tail. This is another string of head and tail.
Okay. So what is the probability that this thing will happen? So probability that this string will happen. What will be that? I said I toss the coin independently. Same coin I toss it independently. What is the probability of such a string?
Sorry 1 by 1 by 2^ n.
Is that the case?
>> This is not a bi unbiased point. This is a bias point.
So it depends how many heads are there and how many tails are there right?
Okay. So if there are let's say in this string there are let's say m number of of heads and of course uh n minus m number of u of tail right then what will be the probability because I said I have done it independently so the probability of occurring head is p so this will be p to the power m into 1 - p to the power n minus m. Do you agree with me?
Okay. So this is the probability. So each such string they are not equally probable.
Okay. So they will happen with this kind of probability structure.
Right? Because it is not a not an unbiased co.
Okay. Good. So this is fine.
Now let us assume that this n that means the number of uh you know tossing I'm doing is very large. N is much much greater than one. N is very large. Okay.
Then you know how many times you would expect head there?
If n is very large, how many times you would expect that head will happen?
roughly >> sorry >> P * >> P fraction of the total number of tossing what is N * P right okay so N * P this is the you know number of heat in a typical sequence typical ical string and similarly n into 1 minus p number of uh of t in a typical string right is that clear okay and this is how actually we figured out that this is the probability p and 1 minus p I keep on tossing the coin many many times and then Look at how many times head has happened, how many times tail has happened and from this frequentist approach you can figure out this. So if n is large, this is what you will have. Now do you think that this happens to every such string? Out of this 2 to the power n string, do you think that this will happen in every string?
Yes or no?
Is this question clear? So question is that typically of course if you look at a string this is the structure but is it the structure for every string?
No. Why not?
Huh? Sorry.
>> Yes. If there are n heads then it will be p to the power n. If there are n tails it will be 1 minus p to the power n. So not every sequence will have this typical structure.
Is that clear?
Okay.
Just Okay.
But how many of these strings will have this structure?
I understand that not all of these two to the^ n string will have this structure.
But next question is how many of them will have these structures?
Sorry.
>> Yes, that is true. But my question is whether such a structure is will happen in most of the cases or not.
So there are two to the^ n strings.
Okay.
If I choose a particular string will it have such a structure or will it not have such a structure?
So if you know a majority of this thing will violate such a structure for large in do you expect to have this probability distribution.
So when I say this structure means you know because n is large there can be small deviation that is fine. So it may not be NP it may be NP plus minus epsilon that is okay. Okay. So but the question is if I choose randomly a sequence from this 2 to the^ n will it have such a structure or not or let me pose it in a different way.
If it happens that for a randomly chosen sequence from this 2 to the^ n sequence this structure gets violated violated you know in a big way not just np plus minus epsilon I'm not saying that that is not a violation if it happens if it violates this structure will you get this kind of probability distribution Okay, let me ask you another question.
How do you know that if I have a bias coin, the probability of having head is let's say 40% and probability of having tail let's say 60%. How do you know that?
Yes, you repeat the experiment under independent conditions, right? Many many times and take the frequency and then in the limit of repeating the experiment many times, you'll get these probabilities, right?
So then you know if such a structure is grossly violated then you will never expect this kind of probability distribution.
Okay. So this is why we say that if I choose a sequence typically it will have this structure.
Okay. When I say NP means you know NP plus minus epsilon is fine. So typically it will have this structure and those sequences are called typical sequences.
Okay. So among these 2 to the^ n sequences I mean or the strings those of the strings each of which has this property are called typical sequences.
Okay typical sequence typical string whatever you want to call right.
Okay good.
Now next question is what? So I choose a string from this 2 to the^ n string.
What is the probability that it will be a typical string? Not all these things are typical that we agreed, right? So what is the probability that it will be a typical string?
Any answer?
So it means that how many of these sequences of these strings will have this structure that means np of them will be head n into 1 minus p of them will be tail. So how many of these sequences? So the question is how many of them will be typical sequences? How do I figure out >> all the sequence be?
>> Yes, because >> there are very large number of and there's a possibility that all of them can be head that might viate this. But when all of those happen to be head you can just you can just change the value of P and substitute over there. Then P will be one and 1 minus P will be zero and NP will give me N heads and zero tails. So but that doesn't seem you know all the sequence seem to obey this kind of structure.
So when I you know consider this this is my given you know knowledge that head will occur with probability P and tail will occur with probability 1 minus P. I cannot change that. So this is my you know given fact. Based on that I toss this coin. During the tossing of the coin this structure should not change this probability structure should not change. So this is given given that if I toss it several times m is very large then of course there are sequences where you know all of them can be hit okay but maybe let's say 95% or whatever you know almost all of them are so those are not because it will be violation of but then you know at that case the instantly the PM 1 - P changes for every sequence then >> how can it change you know if I this is what I said you know this is a given fact >> okay that it is a coin probability of age is P and 1 this is given when I this fact will not change but uh I get into problems so thing is if I have 100 pounds let's assume that 100 to be big enough okay just for half we cannot take infinities so I I have 100 points and what I do is that okay uh let me say that I get 60% of the times head and 40% of the times 30 which is a biased point so now if I go and pass that 100 points again so then you say that there is a chance that all of them might be hit as well but at that time the initial probability changes initially I at 64 but after that I have a chance where all the 100 of them might be hit but at the time the changes you know this is the thing number of but very large number of you toward This is very loud.
So the point is how many of them have What is the number?
Heat.
Heat.
Can you see?
This is not enough.
Another one.
Okay.
So let us rotate it.
So this will be here.
So you'll have to use the stalling approximation for this and let me Yeah. So, so basically stalling approximation tells you that um this for large n so n is very very large n this is approximately equal to n log n minus minus n Right.
Okay. So this is what is known as stallings approximation.
Okay. So you use this here. So this will be therefore approximately n log 2 base n minus n then minus this is np log 2 np minus np and then minus is it can you see this thing here okay so this is n into 1 - p log n into 1 - b and then - n into 1 - p.
Okay.
So anyway, so if you do that, what you will get is something like um n * h of p and 1 minus p h of p and 1 - p I have written right this is the binary shannon entropy right so approximately for large n log two of this is n to the Shannon entropy 2 point Shannon entropy.
Okay.
So therefore what is this quantity then this number?
What is that? So therefore you see that n factorial by np factorial and n into 1 - p factorial. This is approximately 2 ^ n h of p and 1 minus p.
Is that clear?
Okay. So this is the number of typical sequences.
Okay.
So now the question is I have 2 ^ n such sequence. I choose arbitrarily one sequence. What is the probability that it will be a typical sequence?
Is this question clear? I have 2 to the^ n sequences out of which this is the number of typical sequences. I choose a sequence randomly from the set of 2 to the^ n sequences. What is the probability that it will be a typical sequence?
Sorry.
>> This divided by 2 to the^ n right that is the typical sign. Okay, good. So, so therefore is this part. Oh, this is not the case. So, let me erase this.
So the probability that a sequence is typical.
This is 2 ^ n h p 1 - p divided by 2 ^ n.
Right? This is the probability.
Okay.
Right. Okay. Good. So, so what is this good for? That you have this this many typical sequences. What is it good for?
Basically um if I now ask you the following question that uh what is the total probability of typical sequences? There are 2 to the^ n sequences. Okay. And out of which this many are typical sequences. So the total probability. So the total probability of typical sequences.
This means you take the collection of typical sequences. What is the probability of that collection? Okay, that is the question, right? So what will happen is that this total probability basically it is approximately one for large n. Okay. So the total probability of um a se of these typical sequences will be close to one. In fact it will be closer and closer to one as you increase n.
Okay. So what does it mean? It means that for very large number of such runs a sequence will be more or less a typical sequence. If I choose a sequence arbitrarily it will be more or less a typical sequence. Right?
Okay. Is that clear?
Ah so this is how do you reach that?
because you'll have to take the total number of this this is the total number of typical sequences and multiply it by the probability of occurrence of each sequence.
Okay, I'm not going through the mathematics but this is what you will see. I mean ideally the point is that if I choose a sequence for very large n it will be generally a typical sequence.
I mean this you know this thing that h etc or t etc these are called atypical sequence they will not happen. Okay. And why this is important? Let me tell you why this this fact is important. Because now let's say I have to store data. What is the data? I have to store the results of tossing this coin. Okay. On a computer on or whatever you want to call I want to store the data. So let's say I I store H by this bit zero and T by bit one. Okay. So I want to state store the data about the strings.
How many string data of how many strings I'll have to store? If I ask this question, do I have to store all the 2 to the^ n strings or can I do it with less number of storage space? Okay. So that is the question and because if you if you take n to be very large 2^ n is you know much much larger and you know in our computer we don't have those however supercomput you take we don't have that many data space so the question is that among this 2 to the^ n sequences how many of them or which one I should store and which one I should neglect.
So basically I should store only those typical sequences and the remaining ones I just neglect.
Okay. So what does it mean? It means that the sequences which are atypical that means which do not have uh this property that's np of them will be uh head and one n into 1 minus p will be tail. So this typical property is not there.
those sequences are atypical sequences and those sequences I'll not store why because the probability that a sequence is atypical is very very very less for large n so I don't have to consider them okay so if I store only the information of the typical sequences you know once I store it and then later part I can retrieve back. Okay. So basically I will store these many sequences and remaining ones I forget later I can retrieve them exactly. I mean provided where I store there is nothing happens there. Okay. So this is what I'll store this many sequences I'll store that means the the typical sequences and remaining one I'll forget.
Okay.
Now the question is that does it help? So the so with respect to 2 ^ n how big is this number? Because if you see that this number is very much close to 2 to the^ n I mean there is no advantage we get. Okay. So the question is how big is this number if n is very large because I'm talking about large asmtotic limit.
So as you can you know figure out from this plot that I said you know this h of p and 1 minus p at p=0 and p= 1 this is zero and p is equal to half this is one right so this will be 2 ^ n only when p is half for all other p values this h of p into 1 minus p will be less than one and when this is less than one let's say this is h of p and 1 minus p let's say this is 0.8 let's say right so for l 2 ^ n into 0.8 And to the party. What's that?
forgot.
You see that?
Uh When is the time by storing the data.
So now Yeah.
No, it is why you have to take the total probability of the typical sequencing is the number of typical sequences multiplied by probability of individual typical sequences. Right? Because they are equally probable. So that will not give you zero. That will give you one actually if you take n going to infinity.
Okay. So what is uh huh? So this I said that h of p and 1 minus p this is the uh shannon entropy twooint shannon entropy.
So we learned that this is the amount of information.
Okay.
Now here is another interpretation.
So interpretation is that if I want to store the data of a large sequence of zeros and ones, zeros or one or head or tail, whatever you call zeros and ones, okay, a sequence of zeros and ones and here here you have a large number of sequence also and the sequence length is also large. If I want to store that on computer or whatever.
So the question is that what will be the length of the bit string? This is bit string that will be required. So the length is basically n into h of p and 1 minus p. If you know if I know that the probability of occurrence of individual zero is P and probability of occurrence of one is 1 minus P. So then the length of this sequence that will be required to store this data about this sequence is n * h of p 1 minus p because this is what I said about the you'll have to store only the typical sequence. So this is the length of this bit string that you will require to store right and per single bit how many what will be the length basically h because you know your total length was n so n into hp 1 minus p is the length of the uh typical sequence I mean sorry this is the number of bits that you will require to store the typical sequence sequences.
So per bit what is the length that you will require? H into 1 minus P and this is basically a fraction. Okay, most of the times it is a fraction. So you see that in terms of data storage you have this advantage right rather than you know writing down all the sequences this is enough n into a HP 1 - p is enough. So in this sense you can interpret that what is the Shannon entropy? Shannon entropy is the amount of bits or length of the bits.
Okay that you will require to store the data of n bit string for large n. So n into h p 1 minus p number of this is the length of the string that you will require to store. Okay. in order to store data about n bit string. Okay, is this clear? So in order to store data of n bit string of zeros and ones what you need is n into hp 1 minus p length of bit string to store the data and that you can also later part reliably decode. Okay.
So, so in this sense Shannon entropy is also a kind of data compression. You are compressing the data right because this is less than one generally. So this Shannon entropy you can think of it as the data compression limit of storing information and thereby reliably also decoding it. Okay.
Okay. Good. So, so this is what is known as um Shannon's data compression theorem.
Okay. So, whatever we talked about is basically Shannon's data compression theorem.
So, Fannon's data compression theorem tells you that if you want to store uh information about NB string where n is large where the string occurs with this probabilities then n into hp 1 minus p will be the length of the string that you require to store the data reliably so that later part you can also decode that retrieve that data. Okay. So I told you about this twopoint function but this is generally true.
Um so let me see so if you have let's say these probabilities of this event let's say it is x1 probability this probability happens with p1 probability of x2 happens with p2 and similarly probability of so you can have n events so this is pn so that sum over i equ= 1 to n pi this is one. So instead of binary event you can have n you know outcome event. If this is the case and then you want to store information about the values like this you know x1 say um uh I call the value of this. So this is let me call it as okay. So maybe I should introduce are you familiar with this random variable?
Okay. So you know like this you know head and tail this is also it's a random variable X which will be head and this probability is P and then the random variable X will be tail with probability 1 minus P. Similarly this is a generalization of that. Okay. So you say that instead of x1 let's say x = x1 x = x2 and x right so this is the probability of this happen now I you know do the experiment where you know this is the first time I do that like you know tossing a coin I do this experiment first time, second time, third time. So I also put n times or n times whatever you want to call. So I do this n* n * uh running and this is we do it independent.
This is important.
So then you see that the sequence and the probability of this is independent.
Then you can again architect that what is the uh length of this tree that we require to store the data about this thing.
Uh okay. So then you see that same thing you can do. I'm not going to be see that uh number.
Total number of interpret Okay.
Right.
that if I want to store data about random variables which are independent 15.
Yeah.
paper * 8 notice but 104 Can I minimize?
Hey, come on out.
ignoring sequences which are not useful for your problem.
Atypical sequences are redundant.
Next thing what I'm going to talk about is uh channels. So if I have noisy channel information we will see but for this case you have to use extra bit string or something like that to encode data and decode data.
So the channel tells you how to control the data so that you minimize Okay.
So to actually approach process by CC and reconvene at 11:40.
Okay. This clock is uh 5 7 minutes slow.
So real time 11:40 we reconvene. Okay.
Thanks.
Oh by the way I did not say uh I hope actually you all know vun and money right. So in case if you have any questions for reimbursement etc you have to contact them for certificates reimbursement everything is coming from outside you know. Uh there was a lady outside and a man outside who was distributing these books and taking your signatures.
So you just contact them for uh reimbursement uh certificate or any other queries.
Okay. Thanks.
Oh, thank you.
What do I think?
Stop.
All right.
Okay. And look at it.
So it can be Of course.
I know that.
Okay. So, the idea is to the transformation.
I'm going to transformation.
So what are these transformations? What do they do? What do they transform?
So in quantum mechanics, the basic thing is state of a system.
there's a physical system. How do you denote the state of a system?
All of you have done quantum mechanics.
So wave function but wave function is actually a representation in a particular space. So we don't want to specify any particular representation.
We want to put it in the abstract notation. What is that notation or you >> huh?
>> Direct uh notation right you can denote the state of a system let's say by what is called as a k.
Okay, if you want a representation of this K in the position space, you can do that.
You would have done that in quantum mechanics. And then you will say S is then a function of X position. S of X that's your position space representation. You can get it from this abstract K space. You can also go to the momentum space project it on to the momentum space that will be s of p right so we don't want to confine ourselves to any one representation so we just denote that this is the state of a system it has all the information about the system it's actually a vector space and in fact here we call this as hilbert this.
Okay.
In particular, if you multiply this state a K by a complex number C, that also denotes the state of the same system.
So it is not you can multiply it by any complex number. It equally well denotes the state of the system. Hence sometimes it's called as the ray space. Okay. So it's not important what the magnitude is. And therefore often as you have seen in your quantum mechanics we just use normalized states.
The the dual to the K is denoted as a bra state.
this resides in the dual space to this and you can define what is called as an inner product between two vectors. Okay.
So that is a number. Let's say there is a k there's a brass s and a k five.
We call this as the inner product between two vector a vector and the dual of another vector. This is in general greater than or equal to zero. It's a complex number.
When it is equal to zero, what do you call it?
>> They are orthogonal. Oh, good.
So therefore, we define the inner product between two states.
The way to imagine it is like suppose you have two vectors in ordinary space. The scalar product between those two vectors is a number.
Right? And that is like the inner product between two vectors in this three-dimensional space. This is the inner product between five and the k and the bra and that is greater than or equal to zero.
Okay. So this this is what we need. Much of these things will be done in detail by professor raindran. So I will not spend too much time on this. This is purely to set the notation for what we are going to do next. Okay. And the length of this vector is simply the inner product of the vector with itself.
So this is the length of the vector.
Okay. And these are the states.
And in any experiment when you want to observe the state depending on your measurement that operator will act.
So the mathematical statement acting on the state and obviously when something acts on the state that mean something to the state Right. So the active measurement by there is an operator corresponding to an obser And in general it will give you what is the property of operator operator.
Okay.
What you are interested in specifically in quantum is not any operator.
What a particular type of what is that?
Why value here?
You want it to we are interested in operating I keep putting this.
So what does this mean? It mean let's say you take a and then transpose the rows and columns that is a herian conjugate for those who are not familiar because there are many undergraduates here if you don't understand something just put your hand up and ask me okay but all these things will be done again in the quantum mechanics course by professor Ravindra so we are interested in particular a permission operators.
There are special cases where this equation the right hand side actually gives you back the same state.
not exactly the same state may be multiplied by some number. So therefore suppose this some a * s where a is a number real number.
Okay.
So then what do you call this a >> it's called the igon value then this s is an igon state of the operator O and in particular if a is a real number not not some other number then this is a hermission operator because herian operators have real values.
Okay. Now you can also take any arbitrary state. Suppose you have an operator and you have a set of states like this.
Then you can express any arbitrary state of the system where the science or states of some herian operator corresponding to this system.
Okay. So this is the completeness relation and these a ends are the expansion coefficient. Any arbitrary state can be expanded in terms of a complete set of states corresponding to an operator.
>> Huh?
>> Huh?
>> Superposition is uh yeah that that will as I said I'm skipping many things. uh so superposition is given to you in the beginning itself. So superposition of a k with k b plus k b is also a a k a state of the system. Okay. So the that is uh I should have said it in the beginning but that's okay that's true that comes right here when you define the space in the Hbert space com you take any two states and then you combine them it's it also gives you a state of this system it's like superposition of vectors you know you add two vectors you will also you'll get a third vector okay that's also part of the same set yeah okay so This is as far as the state of the system is concerned and this operator is a transformation.
Okay, it transforms the state.
Okay, but there are certain types of transformations that we are interested in quantum mechanics.
If you want to describe the dynamics, how the system changes in time, that is what we are interested. See in classical mechanics, what is the fundamental thing? Equations of motion.
What do equations of motion do? You know the state of the system which is specified at some time. Take Newton's equation. Okay, you call that initial condition.
Given that equations of motion will tell you or uh equation of motion if it's one dimension or equations of motion if it's many dimensions many particles. So it can be as complicated but given the initial state which is completely specified then the equations of motion determine the state of the system at all times. Classically it's a deterministic system.
The thing in quantum mechanics is slightly different. They the observable themselves keep changing and they keep we know how to dynamically evolve them.
Here we look at how the state of the system evolves.
Okay. So the state of the system suppose you have s at time zero.
It can be t t equals some t kn. I just put it as zero. I start the clock at that time. I want to know how the system evolves.
What is the dynamical evolution of the system? This is a fundamental question that we try to answer whether it's quantum dynamics or classical dynamics.
That is the question.
So let us say there is a transformation u which takes it to s at time t.
It's an operator system.
What kind of an operator?
In order to find out, we have already defined Yeah, the information content in time is there in the department.
If there's nothing else happens there, the inner product cannot change.
So for example, if the inner product is between S at 0 and S at zero, that is actually the probability of that system that comes with the interpretation of the S in quantum mechanics. But that cannot change over time if the s two systems are isolated and sitting right. So therefore if this is a fundamental principle that information content does not change over time. There is nothing else that is acting on it. So therefore it doesn't change. It's just a time evolution. So but then this is what is this s of t this vector transform as ud dagger remember this is u I'm taking the conjugate conjugate and this is u five sorry this is equal to s0 and time Okay.
Right. I should have mentioned that O acting on S O dagger.
The bra vector transforms as O dagger.
Yeah, I'm using that formulation. Okay.
So what do you get from this?
What do you conclude from this?
>> U dagger U is equal to >> identity. Correct. So these U are of course time dependent. That is how they are transforming the system. So U dagger U is equal to identity.
So what do you call the U?
So such matrices such operators are called as unitary operators.
So U is a unitary operator.
But remember this time translation is a continuous transformation. It's not discrete. Time flows continuously.
So I can make this transformation. Any finite time translation can be obtained by incrementally doing this translation in time by very very small amounts, tiny amounts. I take it all the way.
Okay. So the I can generate the finite transformation continuously starting from identity and making many infinite decimal transformations. Okay.
So that is true of any continuous transformation and in particular that is also true of this unitary transformation in particular. Sorry.
Sorry.
>> You mean this argument?
>> This one. Huh? So what I'm saying is that this is a fundamental principle that with time evolution the information content will not change.
Right? because there's nothing uh the it's simply evolving in time. So whatever change is happening in phi is also happening in s. So the inner product it's like saying >> huh >> that is if you take the same system okay the inner product between two f and but yes it leads to the conservation of probability >> both isolated >> huh >> both are >> huh isolated system so therefore that information cannot change just because it's evolving in time.
>> Huh?
>> No, I start. No, here there is no uh we are not talking about uh properties of time at all. I'm starting at time uh t equal to0 and evolving it that I can start the clock at any time. So in some sense it's uh if you put instead of zero if you put t not that's also fine. So but the point is that this is invariant under time translation and that's all we are using. Okay. I'm not bringing in relativity properties of time that's separate question.
Okay. All right. So this is a this is a fundamental principle that we have to use here that there is no loss of information as far as the inner product is concerned. You'll see what all things follow from that. Okay. Now this is then a unitary operator and any unitary operator can be represented as e to the power from let me be careful here I epsilon * H since U dagger U is equal to identity that means E to the power I epsilon H minus H dagger is equal to identity just some representation I want to write it in this form this form will keep re reappearing uh in many uh places. So I just write the unitary operator in terms of this where this h is another operator provides a representation for this unitary operator but this h is hermission because u udager u is equal to 1 and using this representation I just take I epsilon h minus h dagger and therefore I can write this H equals H dagger because of this property.
Okay. So you can generate a unitary transformation using a hermissionian operator but in the in this exponential form.
Okay.
So this H h happens to be hermission.
I'll use this representation now to write this unitary operator suppose I make this epsilon very small of course this epsilon has to be I'll sorry I'll change the notation I'll write it as I t * h because it's time evolution for small times I can expand this exponential and write it as E1 identity plus I suppose I make a very small time evolution delta I delta t is very small then I can write this as I deltat t * h I can make delta t as small as possible and expand the exponential up to first order.
Okay. So, this is I would have liked it with a negative sign.
>> Huh?
>> Yeah. Yeah. Yeah. Yeah. here >> because u dagger u will become minus i t h - h dagger >> yeah correct I should be a negative s thank you right >> no sir the negative should be that must be positive sir only then the down can be negative >> no this will always be true here it should be positive >> oh yes sir Okay then this is oh you are saying this should be minus I h dagger minus >> nothing is correct but that's uh this is uh whether it's h minus h dagger or h dagger minus h >> you're saying about the negative sign so there must be okay so now what I want uh what I have what I want is the following thing I want it to be minus It h is that correct? Yes, it you'll see why I want it like this.
Okay, there nothing will change here.
And this I can write this as minus.
So this h is now uh a hermissionian operator which is the generator of this unitary transformation. Okay. So now yeah. So now I want to consider this infinite evolution operating on the state s. Okay. So if I take can I be seen here? Yeah. So let me consider the state at time t.
Okay.
And I want to evolve it for an infinite decimal amount of time taking it to S of T plus delta T.
So not necessary to write this. It's obvious. You'll see what it is.
I evolve the state for an infinite decimal time delta t.
So that is s of t + delta t is equal to 1 - i.
Okay.
This is the infinite decimal transformation on the state s of t.
So now what I do is I take this over here and I'll write this as s of t + delta t minus s of Okay.
Okay.
take this delta t to this denom denominator here. So this is I s of t delta t minus all I'm doing is an infinite decimal transformation. I want to see how the states transform under this infinite decimal transformations.
I delta t is in my hand. I can make it as small as possible so that my expansion becomes more and more accurate. I take the limit delta t going to zero.
So what do you get?
I what is this?
The partial derivative of time and the limit. Right? So this is So the generator of the unitary transformation determines the equation whose solution gives you the time evolution of the k.
What is that equation called?
That's the short equation. So remember all that we have used is this no information loss right then you have the unitary transformation express it the unitary transformation in terms of the generator of the unitary transformation which is h it's called generator in the sense that you can make this transformation as small as possible in terms of this h and use it to generate any finite unitary transformation. So the name generator just indicates that right.
Then from this we want to know the infinite decimal evolution of this state. So we use this form.
Okay. Because any finite transformation can be generated by repeated application of this. So therefore we can just use this for any small amount of time. And in the limit this delta t going to zero.
This is is this equation. Is there some problem with this?
Huh?
Note that h has dimension 1x t.
So I can't give it a name at right.
Somewhere I have missed out on dimension.
If I want to call it as Hamiltonian then classically we know that the Hamiltonian has the dimensions of energy.
Right? So that is not matched here.
Right? There's no matching of dimensions of this. If I want to call H as the Hamiltonian but I would like to relate this H to a physical quantity and I want to relate it to the Hamiltonian classical Hamiltonian which has the dimensions of energy. In order to do that all I have to do is introduce a constant in this transformation to make the dimensions agree.
See I can simply say that this operator is has inverts time dimension but I I I don't want to do that but I want to relate this H to something that is observable and that observable is energy right so in order to do that I will introduce a constant which matches the dimension what is it yeah what is available to me is the planks constant just one single constant it will make it uh dimensionally quotient. So I will just say this is I by H bar. I'll introduce that. And here I'll introduce I by H bar.
See if I once I interpret H as energy I can make the dimensions agree with a constant. Scaling with a constant is always possible. So therefore I do that and I'll put H bar here. Now H bar is action MLT minus1 / T it's MLT minus2 in dimensions and the energy dimension is MLT minus2. So now they are all matched. Now you can call this as Schroinger equation.
and call this H as the familiar Hamiltonian. We'll use the name Hamiltonian.
It has dimensions of energy.
Okay.
And the solutions of this equation determine how the s evolves in time for a given Hamiltonian of the system. So the Hamilton specifies the system and from that the time evolution comes as the solution of this equation.
But we are also interested in how physical quantities change over time.
Okay. So the physical quantities are defined. I will not go into the detail here. It will be done in your basic quantum mechanics course uh which Ravindran is going to do. But the important quantity there is the expectation.
So let's define what is an expectation value.
Consider any operator A.
Okay. Its expectation value is defined by the following quantity S between two states A and five.
No expectation. Sorry. Sorry.
Okay, we call this as the expectation value of the operator A. So in general in quantum mechanics, this A might be an operator for momentum, angular momentum.
So then the expectation value is the one that is measured.
It it is relatable to the measured quantities.
So we want to know how this expectation value changes. So usually this is denoted by a.
The meaning of that bracket is simply this.
How does this change?
Let's do that. What is the time evolution total time evolution of this operator A?
We are interested in the expectation value. So therefore put the brackets.
So this is equal to partial time operating on this because there are three quantities which are dependent on time.
So If the operator has an explicit dependence on time then you have to add this term. If not you can leave it out.
Okay.
Now this we know this from the shorter equation but now you have to conjugate the whole thing.
Therefore you will have minus ih bar on this side which you take it to that side I h bar h it becomes h s right so and h is hamiltonian is hermissionian so h dagger is h so therefore what you will get is - i by h bar do H >> check it anyway. I'll change it. But please all of you check the signs and other things. Okay.
And this of course will be remain the same. So therefore I can write the operator equation. Finally it should be of this form. D A by DT is equal to S commutator E comma H.
Please check this.
Plus the explicit s time dependence s.
So this is the equation.
Have I done it correctly?
I should also get a ih bar here, right?
Okay. Now if there is no explicit time dependence that is what the case that I am interested in I'll remove this term and this is the equation for the evolution of the expectation value.
So normally we write it in the operator form where we write it as d a by dt is equal to the commutator e comma h.
This is the operator equation which means that the expectation values have actually this form. Okay, please check whether it's a comma h or h comma a. I'll leave it as an exercise.
Let me Yeah. Okay.
You can work out all these steps.
They're straightforward. And I would like you to check every equation that I'm writing. Okay.
Now this equation is very important because suppose this operator commutes with the Hamiltonian.
Suppose this commutator is zero. Sorry, what is the definition of this? It's simply a H minus H A. That's the definition of the commutator.
A.
So this notation is a shortand notation for this and it's called a commutator.
Suppose this is equal to zero. That means this operator commutes with the Hamiltonian.
Okay. Then what happens?
Yes, it becomes time independent.
So if the expectation value if some measurable does not change with time as the system evolves what do you call that?
There is a physical quantity which does not change which remains constant as the system evolves.
>> It's a conserved quantity. So therefore this operator is conserved if this commutator is equal to zero.
Okay. So if you say a is conserved or you say it's a constant of motion.
So where have you seen this before?
>> H In classical mechanics, in classical mechanics, what is it?
>> It's a pa bracket. So in fact, this was the root DAK used for quantization. He noticed the structure of the puzzle bracket and then he uh the route to quantization was to replace the puzzle bracket by commutative brackets for operators.
Okay. So that's the procedure which DRA used and in fact if you see the these equations and go back to classical mechanics and check paza bracket formulation of equations of motion you will find that wherever you have pa brackets you can replace it by the commutator brackets in quantum mechanics.
It's a restatement of the correspondence principle.
Okay. So, A is then if the commutator is zero, A is a constant of motion.
So now that we are talking about constant of motion, how do they arise?
Is there anything that we can say which leads to the constant of motion?
Okay, so there now let's talk about symmetries.
So what we have done until now is I have defined what a system is and how the system changes and in particular how the system changes under time evolution and the fact that information content is the same should remain the same. We show that that principle leads automatically to the shortinger equation or you can also say it it leads to the operator equation. They are equivalent.
We have used that here.
So these are different uh descriptions of the same dynamics.
But that also leads us to define what is a constant of motion. How do how it arises when an operator commutes with the Hamiltonian.
So what is the general principle behind the existence of constants of motion?
So in fact they are related to certain symmetry principles.
Okay. So I will just digress a little bit to classical mechanics.
In classical mechanics we have and the same applies we can take it over to the quantum mechanics.
So we are now discussing symmetries.
So this is done. Now we are here.
As you know classical mechanics is formulated you can use different formulations.
Of course there is the Newtonian formulation the lrangian Hamiltonian anything else you can do classical mechanics using any one of these formalisms. Right? Anything else you know you can also use the Hamilton Jacobi formulation but that I I don't think they will do it in undergrad class. I'm not sure.
These are all different descriptions available for the same phenomenon. The fundamental thing is all these descriptions lead to equations of motion same equation of motion for a given system if it obeys Newton's law Newton's equation the grandian mechanics also gives you the same Newton's equation Hamiltonian mechanics also gives you the same equation of motion so the fundamental thing is equation of motion. You you arrive at the equation of motion from different points of view or different standpoints.
But why do you need these different stand points if you can do with equation equation of motion or equations of motion depending on the dimensions? Why do you need these lrangian formulation, Hamiltonian formulation?
If you can do with Newtonian formulation, why what is the advantage? You can guess I I don't mind.
>> Yeah.
>> Yeah. Okay. So more or less uh you are there. Uh let me put it in the following way. It depends on what you want what you are looking at the system. Okay. So for example oldest is of course the Newtonian equations of motion. Okay. But if you uh is everyone familiar with the lrangian?
What is the lrangian?
No. No. Have you done? Anybody who doesn't know anything about lrangian has not done you have not done. Okay. So the lrangian starts from is just a capsule one minute description for you for those who don't know. How do you arrive at the lrangian formulation?
It starts with the most basic principle that if you go from point A to point B, you can define what is called as an action.
Yes.
Which is given as an integral along this some arbitrary path from A to B.
of a function L called the lrangian and which is a function of uh you know the position variable along the path call it x velocity x dot and time.
Okay, you can define it for any path.
Okay, so when you integrate over this function from A to B, that gives you what is called as a action. You can choose any arbitrary curve between A and B and you can define this.
But then what is a physical path? What is the physical uh the system takes from A to B? Given a system, how does it go from A to B? There's there's one single path. But what is that physical path?
That physical path is the one which obeys what is called as the action principle.
Delta S is an extremum.
Okay. In fact, we generally say it should be minimum. But uh the statement is actually delta s is equal to0. This is called as the action principle and the L that satis that from this you can obtain this procedure this extremization procedure will lead to an equation of motion in terms of this function L along the physical path. Okay. And that is dy dt of delta l by delta x dot minus partial derivative equal to zero.
Okay.
Now here comes the advantage of the lrangian mechanics that once you for a given system you can write down what this lrangian function is you can determine what it is for all mechanical systems it's kinetic energy minus potential energy it takes that form but it need not be that form okay so what it says is that once you have the lrangian doesn't matter what system it is the equations of motion always have this form.
So somebody mentioned here that if you take Newtonian equations depending on which frame you are equations of motion take different forms you don't know which is the best which is not the best so you you have to go keep on transforming whatever you want but the advantage with the lrangian mechanics is no matter what the system is the equations of motion are always derived from the lrangian equation this this form does not change from system to system. It's the same.
Okay.
And another advantage with the lrangian formulation and when you take a particular lrangian here and then evaluate this what you get is Newton's equation. Okay. So that is what I meant by saying you will always get the same equation. These are fundamental independent of the formulation you will get this. And similarly in the Hamiltonian formulation order differential equation like this. This is a second order differential equation. You will get a set of two first order differential equations for each degree of freedom.
The advantage of the first order differential equations is that you can use it to define phase space and the evolution in the phase space. The language of phase space is very natural in Hamiltonian mechanics. And then you can trace the motion of the system along a phase curve which is unique for every time evolution. Okay. So there are there are some beautiful things you can do in the face space that requires Hamiltonian mechanics. There are some nice things which you can do with lrangium.
In fact it's a very beautiful thing because this action principle even though it originated in dynamics it's the most general principle.
You can go to electronamics.
You can define an action principle, get Maxwell's equation.
You can go to standard model of particle physics, write the lranchian and then use the action principle and get all the equations of motion. So the action principle though it was first used in mechanics, it is the most general principle of law of physics that any physical system follows the least action or extremum of action. This is the least action principle in mechanics. But even though it originated in mechanics, it is true in all of physics in any area quantum field theory. You can start with the L grand uh action principle, write the action and get the equations of motion.
Okay. So that is the advantage of uh this lrangian mechanics. But there's something else that's very beautiful about this lranchian thing.
This is stated in terms of a theorem and that theorem is called related to symmetries and the theorem is amino theorem.
How many of you have heard of amino theorem? Good. Very good. I'll state that.
How many of you have heard about amino?
Yeah. She was one of the greatest mathematicians of early 20th century.
Sorry, >> you can write the action principle in terms of Hamiltonian also. Yes, you can do that.
What we do in the lrange formulation what we do is that we get to Hamiltonian from the lranchian >> yeah yeah that's right so >> correct because Hamiltonian is useful in describing the phase space dynamics for that first you have to define momentum without defining momentum you cannot get to Hamiltonian so that takes you to the phase space but you independently start with Hamiltonian mechanics as first order set of first order equations. You define a phase space which is momentum and position as your two n dimensional phase space for n degrees of freedom and then you can start with first order equations in that. Okay. So that that has a beauty of its own and uh uh you can I mean I discuss it usually in classical mechanics that approach I start by that the face base approach but in leangian mechanics you can go from leanian to the Hamiltonian it's actually simple legendary transformation so you can write it in terms of either of them but the starting point is the lerangian because that's how we get the equations of motion okay So yeah, so the beauty of this lrangian is that the notor's theorem it states that if the lrangian is invariant under a continuous transformation.
Then that invariance leads to a conserved quantity.
Okay, this is the nother's theorem in classical mechanics. But this holds good in uh even when you go to quantum mechanics.
So when you say lrangian is invariant it means it's symmetric it's symmetric under a certain transformation.
So what does symmetry mean here? That if the laws of physics don't change under a certain transformation then you call it as a symmetry transformation. Okay. So for example uh if you take this legangian which is in general I'm just using one dimension but you can extend the dimensions as much as you want is a function of let's say x x dot and t in general but suppose let us take t out it's there's no explicit time dependence it's just a function of x and x dot Okay. So from the equation you see that if it is independent of X L is only a function of X dot independent of X. So for example L is simply a free particle with kinetic energy.
then it's half MX dot square L right so it's independent of X what does it lead to it leads to if you use the equation of motion it leads to the following that if lx l is only a function of x dot then d by dt equation of motion tells you that along the physical path is equal to Z.
What does it mean?
It means that this delta L by delta X dot is a constant of motion.
And we call it the momentum. It's the definition of momentum.
This is nothing but So what is the symmetry here? It's a space translation that I can take this system which is only a function of XAR put it at any space point and start the dynamics.
Okay. So it's independent of X.
So then that such a system momentum is conserved.
Okay.
Right. A free particle has only kinetic energy and then if there is nothing else no force acting it will continue to move with the same momentum for all times.
So it's a constant of motion. So the time I mean space translation invariance leads to momentum conservation. That's a symmetry.
Okay. And this is a fundamental symmetry. It's not going to change when you go to quantum mechanics.
Similarly, what about time translation H?
Yes, but we have to define energy through that. Here since I called it as X, you can take X as position.
But in some other lrangian, X may not be position. Like if you go to lrangian in electronamics, it will be function of some fields. Okay? So it's I'm just giving you this as an example of an invariance. So if you look at the time translation then you get the following thing.
Let us take the total time derivative of L. I don't this L is a function of X and X dot but no explicit time dependence I can put it let me put it initially what is this this delta L by delta X X dot plus delta L by delta X dot X dot plus partial derivative with respect to time.
So I'll use the equation of motion.
and write this as Okay. Combining various terms you get this. So now I get the following thing which is d / dt of l minus what is delta l by delta x dot it's momentum p so l x * p is equal to z.
Huh?
I have some problem with my ear. So >> won't it be del* x dot?
This is okay.
This is okay. This is x dot delta l by delta t. No explicit dependence on time.
Okay. So then I I take this and I use the equation of motion.
Okay, if I use the equation of motion, I get I can replace this partial derivative with respect to x by the time derivative of this quantity.
Correct?
So then that is the definition of momentum. It's a conserved quantity line after this.
>> Huh?
>> Oh, extra correct. Thank you.
>> Yeah, correct. Thank you very much.
So this is equal to zero.
So therefore this the time translation when it's not explicitly dependent on time gives you this conserved quantity which is I call this as E which is L minus XP dot actually it's written as XP dot minus L Okay, both are true. But the definition of energy is X P dot I mean X dop minus L. This is the definition of energy for any mechanical system. It turns out to be energy, right?
X dop minus L. So for example uh I'll give you a lrangian in one dimension xx dot is half mv² kinetic energy right m x dot² it's t minus v it turns out to be t minus v any arbitrary potential in one dimension okay that's the form of a lrangian of a particle moving in a potential in one dimension. Okay, suppose you take that then you put it here. This momentum P is MX dot. So this is m x dot² -/ mx dot² + v of x and this is half m x dot² + v of x which is nothing but the energy of the system. That's why we call it energy. But in fact for some systems this may not have the interpretation of energy at all.
But in general we define this as energy because in normal mechanical systems it corresponds to energy.
So the continuous transformations invariance of the lrangian under this continuous transformation leads to a conserved quantity a constant of motion. Yeah.
Oh, that you take any leanian which is not quadratic.
You try to do that then it won't be energy right from general mechanical systems with the langen where langen is of the form t minus v where t is kinetic energy v is potential energy for most mechanical system that form works and then you will have energy equal to t + v. In fact, the name the we called it energy simply because we started with the mechanical systems. But when you generalize it, it's not necessarily the interpretation of energy.
Huh?
>> Nonlinear.
>> Uh say it again please.
>> So if the if it is a nonlinear dynamics like double pendulum, it works.
>> Yeah. In classical mechanics this always here it's we are not assuming whether the system is linear or nonlinear nothing has been assumed.
Yeah. So for example you you can take uh uh things like Laurens attractor okay the mechanical system and then try to work it out. If you can write a lrangian for the system then you can follow this.
Yeah.
So the last step sir >> this step.
>> Oh yeah.
>> Oh because this is d by dt >> is equal to d by dt of this x dot.
Right.
>> So I can take over this to this side.
So it's d by dt of l minus x dot >> that come. Yeah. Taking >> I've taken it to the in fact I should take it to that side.
uh so that it's x dotp minus n so then the energy will be positive otherwise it becomes negative and we it won't have the interpretation of energy so it's a convention that dictates whether you take lus xp or xp minus okay so this this is the property of continuous transformation but in quantum mechanics uh we not only have continuous transformation But we also have discrete transformations which play a role. So for example if you look at Newton's equations if you take t to minus t it's invariant but there is no consequence all that it happens is that the you can work out the state of the system either in the future given the state of system now or you want to know what it was in the past you can work it out in the past okay so that time reversal symmetry is already there but there's no further consequence.
Okay. But in quantum mechanics, discrete symmetries have consequences in the sense that they lead to certain properties of the system. Yeah. When there is an invariance and discrete symmetry. So I'll just mention because I think my time is coming up.
My left ear is completely gone because I did something and then uh I'm not able to hear. It's usually okay. I I have to get it set right. So I'm sorry if I can't uh hear you very well sometimes.
So uh the symmetries also can be continuous or discrete. Okay, discrete symmetries are relevant in quantum mechanics and in particle physics we use it all the time. Okay, one such discrete symmetry is parity.
usually now I'm moving away from what I said in classical mechanics because neither theorem originated in classical mechanics and then it's valid uh across board in quantum mechanics also so uh that's why I chose that example but now back to quantum mechanics parity is a discrete symmetry which takes let's say a vector r to minus R and momentum P to minus.
It's like inverting the coordinate system.
Okay, it's a discrete symmetry. There's no continuous transformation which takes you from R to minus R except in two dimensions.
I'm talking about three dimens three dimensions here in two dimensions. This is very special. You can take a vector in a plane and then rotate it by pi.
Then from one quadrant you go to the opposite quadrant where both x and y go to minus x minus y. That's not a parity transformation. It's a rotation. So in two dimension a parity transformation is where x goes to minus x y goes to y or y goes to minus y and x goes to x. If you write it in the matrix form it has a determinant minus one. So we will come back to this question later when we talk about rotation groups. Okay for now in quantum mechanics I'll tell you what happens when there is this a discrete symmetry. This is a discrete transformation. I haven't said symmetry.
I I haven't said what the symmetry is yet. I'll tell you.
So if parity operator P dagger R P if I define this as the operator then it takes this to minus R.
I can define this as an operator relation.
Now if you operate it twice, if you operate it twice, it should come back to R.
If you do it again, minus R goes to plus R. So therefore the property of this parity is that P² is equal to 1 herian. Therefore P dagger P is equal to 1 but P ^² is equal to 1. It means P dagger then is the same as P inverse.
Okay.
So this is the property of the parity transformation. How does it affect our uh results? Let us look at the shinger equation.
For simplicity I will use the one-dimensional shinger equation.
which is the stationary stringer equation.
Okay.
So now if I make a parity transformation on this P operating on this then this will go to X going to minus X is the parity transformation in one dimension. So this is minus H cos² by 2 m.
This is second derivative. Therefore, it is unaffected.
Plus v of min - x under the parity transformation.
Okay. Of course the two equations are different because this is v of x and this is v of min - x.
But suppose v of x is equal to v of minus x.
Can you think of any example where the potential is invariant under parity?
Harmonic oscillator. You see it's a parabola x going to minus x nothing changes right so it is invariant under parity the potential therefore the hamiltonian of a one-dimensional harmonic oscillator is also invariant under this parity transformation what it means is that if these two are equal then the Hamiltonian is invariant under parity and therefore s of x and s of minus X are both solutions with energy.
They're solutions of the same equation.
Okay. And since P of X is P on C of X takes it to C of min - X and repeated operation.
This is the same as P sf - X is equal to C FX.
But this is nothing but P² = 1.
So therefore the igen values of the operator P since P ^2 is equal to 1 P of S of X plus or minus one. I values are plus or minus one. Which means the following that both C of X and C of min - X are solutions.
C of X under parity operation goes to C of X is one possibility because the values are plus or minus one and C of X under parity goes to minus S of X.
So the ones which change sign under the parity operation are called odd parity states.
And these are called the even parity states.
So if the Hamiltonian is invariant under parity the states of the Hamiltonian have definite parity which is either odd or even they are states of the parity operator. Since they are invariant under parity then P commutes the operator period P commutes with H.
This we saw earlier and the correspondingly the states of the Hamiltonian carry definite parity either odd or even. So if you remember your harmonic oscillator what is the ground state e ^ minus constant times x² apart from normalization it's a gausian so x going to minus x what happens e ^ - x² remember gausian it remains the same so the ground state of a one-dimensional harmonic oscillator carries even parity it's an even parity state. It's an energy states but simultaneously it is a parity state with even parity. Okay. So computing observables have simultaneous states.
Yeah.
>> Yeah.
Oh no. of x = minus uh minus and this is minus s of min - x. Thank you. Thank you very much. Yeah. Okay. So then you take the first exitation of the harmonic oscillator. What is it?
Very simple.
Come on. This is the first problem you do.
Ground state is a Gaussian e ^ minus all some constant times x² that constant depends on the oscillator constant spring con what about the first excited state suppose you use this result what it what should it be you can make a guess you don't need to solve it what what should be its form Sorry.
>> X * Gaussian because it should be odd.
It's a parity state. So it's X * Gaussian. If you go to second exited state, it is X² minus a constant some constants floating around times the Gaussian. And then the next one the leading term will be XQ. And what are these polinomials? They are the hermite polinomials. Hermite polinomials have even and odd parity.
That's all. Hermite equation therefore is parity invariant.
And that's what you have in one-dimensional harmonic oscillator.
Okay. Uh I think time is over. Okay. So one more thing I wanted to discuss was time reversal invariance like parity which is also a discrete symmetry that also you you can see what happens to the shortinger equation. It is different from parity in the sense that you it will work only if all you also simultaneously charge I mean uh complex conjugate the equation then it becomes time reversal invariant and then there is another discrete symmetry which is a relational symmetry uh that you would have heard about which is permutation symmetry okay it's not a property of a single particle. But suppose you have two particles.
Okay. Depending on the nature of the particle, it is either the wave function is either symmetric in this two particle wave function or anti-ymmetric.
What are they?
Huh?
Huh? Bzons and firmians. This is what you have heard. The wave function of a many firm a many firmian wave function is anti-ymmetric under the exchange of any two of them. Any two of them you interchange it gets a negative sign.
Okay, this is a disc an example of a discrete symmetry. Of course, we cannot show this discrete symmetry as the invariance as coming from the invariance of some grandian or something. It's an intrinsic property of all elementary particles and even composite either they are bzons or firmians. Of course you also come to hear of other types like anons and all those appear as quasi particles exitations of systems of many particles which are essentially made up of bzons and fian. So when you take an asymtotically free particle it's either a boson or a firmian.
These other types occur only in interactions.
Okay that's another example of a discrete symmetry. So therefore the symmetries can be continuous and such symmetries of lrangian continuous symmetries are governed by nether's theorem. They lead to conserved quantities. But in addition in quantum mechanics and later you'll see in quantum field theory that you all discrete symmetries also play a role and in fact there's one more which I could talk about which is charge conjugation but that takes me out of uh you know this essential framework that I'm talking about and there the combination of charge conjugation parity and time reversal acting simultaneously on a state leads to what a very fundamental theorem called as the CPT theorem that those who want to do particle who go into particle physics study this CPT theorem it's a very fundamental it's a foundational princip theorem in for elementary particles okay so I leave it at that you can read about these things how we know made the lrangian invariant under position and time we got the subsequent conser oberve momentum and some continuous.
>> So can we make it invariant under x dot as well? We made it t and x we made it invariant under t and x from the uh oiler lrange equation. Can we make it invariant under x dot and can we get some conserved quantity?
>> Well in mechanics if you make it independent of x dot it is just a function of x.
>> Okay.
>> Right. So in terms of dynamics that's that's not very interesting because what is it that you want to translate your fundamental space variables are x and t this time there's one more which is rotation I didn't do that if you do that then you will get angular the rotational invariance we'll do that actually that I I have postponed it we'll do that Okay, thanks. Uh, we can defer the questions to a lunch session because lunch is waiting for us and they also need to leave after lunch the people who are serving us. So, let's go for lunch.
Let's thank Professor Morty and go for lunch.
相关推荐
A Number Plus 5 Is 12
MathGirlTutor
101 views•2026-06-03
Olympiad Mathematics | Indian | Can You Solve This One?
PhilCoolMath
650 views•2026-06-03
Escaping the Fog
LogicLemurGaming
760 views•2026-06-03
H2 Math June Holiday 2026 Intensive Revision | H2 Math Tuition by Achevas #singaporemath #h2math
AchevasTV
304 views•2026-06-01
A Brutal Radical Expression Made Easy! The Shortcut Changes Everything.
tamoshop
112 views•2026-06-02
V : jee main /advance class 11 mathematics : Binomial Theorem class-1 ( 29 may 2026 )
dcamclassesiitjeemainsadva9953
125 views•2026-05-29
Is This Pentomino Tileable?
3cycle
241 views•2026-05-30
This Sudoku Has Many Lines!!
CrackingTheCryptic
2K views•2026-05-29











