Large Language Models (LLMs) are text prediction engines that function as 'autocomplete on steroids' and do not possess human-like memory; instead, they rely on context windows—the amount of text tokens they can process at once—to maintain conversation state. The context window acts as the AI's short-term working memory, with different models having varying limits (e.g., GPT-5.5 has 1 million tokens, Claude 4.6 has 200,000 tokens). When context exceeds these limits, the AI becomes bloated and forgets information. Effective context management involves starting new sessions when tasks are complete, using agent MD files to store instructions without repeatedly sending them, and utilizing compacting features to summarize long conversations and free up space.
深掘り
前提条件
- データがありません。
次のステップ
- データがありません。
深掘り
If You Don’t Understand AI Memory, You’re Losing追加:
So, a lot of people use tools like Clot, Jet GPT, or Gemini as if there's a little brain in the background keeping track of everything they say, but that's not really what is happening. These tools can feel really smart, almost like they remember you, but they don't. It just looks that way. And in this video, we are going to break that down in plain English. We will look at what a large language model is, why it feels so smart, and why it can seem brilliant one moment, confused in the next, and weirdly forgetful a few moments later.
Once you see what is really happening under the hood, you will become better at using these tools to your advantage.
So, first we need to talk about the LLM itself. What is an LLM? And let's keep it very simple. An LLM stands for large language model. and it is designed to predict what text should come after an input. So you have an input, the LLM predicts the next text and you get an output or an answer. So it's basically autocomplete on steroids. And I don't say that to be rude because of course it's more complex than this, but you know this is the core idea. So a little demo. Let's start chat GPT. Now when I ask JG GPT a question like what color is the sky and I press enter. Now let's zoom in a little bit and let's close this. Now then this chat GPT user interface that you see here on the screen sends that prompt or this question to the LLM aka the large language model and that model predicts uh the text that you see here on the screen. So this text, so this text that you see here is just the most likely text to come after this question. And that is what I mean with autocomplete on steroids. Now, one of the reasons that LLMs can predict so good or that the answers feel so smart is because they are trained on a huge amount of text and that makes them very good at pattern matching. But this does not mean that the model is thinking like a human with memories and opinions. It is just predicting text from learned patterns.
That's it. So now comes the fun part.
And let's go back to chat GPT. Now let me type something like my name is Bern and I have a YouTube channel. And I press enter. Now the LLM replies with nice to meet you. What's your channel about? And now I ask, "Yo, what is my name, bro?" And I press enter. And then the model reacts with Buren. Bam. That looks like memory. It looks like the AI remembered something about me. But if I now open a new chat and I ask him, "What is my name?" And I press enter. Then it says, "I don't know your name unless you've explicitly told me in this chat. So what changed?" The answer is what you see here on the screen. Context which is a very important concept when we talk about AI and LLMs. I mean the model itself is stateless. It doesn't remember anything unless the entire conversation is passed in again. So in this example here where I said my name is Bern and I have a YouTube channel. When I asked the follow-up question like what is my name bro? Then this chat GPT app that you see here is not sending only this question this yo what is my name bro question it is sending the whole conversation so also all these earlier messages. So the model is not remembering it in the human sense it is just seeing this whole conversation again and it's predicting the best next response based on the whole conversation. So if an AI tool feels smart, a huge part of that is not just the model, it is also about feeding the context into the model. Now there are also some tricks for feeding this context but I will show that in a bit because there's something more important that I want to explain first and that is about context window or that is the term context window and this term can be defined as the amount of text in tokens that the model can consider or remember at any one time. So here we see a new term called tokens. Now, without going into too much details cuz I want to do that in a different video, but an AI does not read text exactly like we as humans do. So, what it does is that it breaks down text into smaller chunks, which they call tokens. But for this video, you can simply see tokens as text that the AI can read. So, that being said, you can basically see the context window as the AI short-term working memory. And this working memory or context window has a limit just like our working memory has a limit. I mean every model has a max amount of tokens it can hold. So let me show that in an example.
So let's open this website and let's scroll down all the way to here. And then here we can see that the GPT 5.5 has a context size of 1 million tokens.
But if I for instance compare this with let's say cloth set 4.6 that one has 200,000 tokens. So every model has its own context size or context window. So if the amount of text that we pass into the model if that becomes too big then the AI becomes bloated like it will forget stuff it did minutes ago or it will give very bad responses. So let me show this in more detail in a demo here.
So let's start uh copilot in here. So here I I use the GitHub copilot CLI and this behind the scenes uses an LLM just like I did with chat GPT but in this case you know I do it via the terminal um cuz I can then do a bit more stuff.
So let's run this command SL context.
Then we see that I use here the model GPT 5.4 four. And that model has a context window of uh 34K tokens and as you see it already used uh 20K tokens and that 20K is coming from uh yeah this part here and that are basically the capabilities built into the model on how to read files, write files, run commands, uh search your folder structure, etc. Um and then here we see the messages. These are the context we are in. So the actual conversation in this session as I will show you in a bit. And this is the free space that you have left. And this is some reserved uh buffer space which is not really important for now. Now let's write some uh random stuff in here. Um I don't know uh something like do you know the YouTube channel def01 question and it's thinking all right so it gave an answer which you see here um it knows it and it wrote my name correctly I didn't write it correctly but uh yeah I'm not an AI now before we saw that the number of messages was zero or 0% % and when I now type in SL context then we see you know that this is already uh 611.
So you see that the context gets filled with every prompt. Now you can imagine that with simple questions like I did here, the context window does not fill up very quickly. But when we upload a large document or even an entire coding project, then it can uh fill up uh extremely fast and of course that will also cost a lot of money. So now we are also at the most important part the problem. So now that we understand how LLM memory works, we can also see the core problem. You might think that the more context you give an AI or an LLM, the better the answer will be. But as we now know, that is not necessarily true.
So important for you to remember is that you should avoid stuffing context with junk like random notes, old tasks, extra rules that are irrelevant. You should really pass in relevant, focused and clean code because yeah, if it exceeds the context window or the closer it gets to the context window, the worse the answers will be or it forgot things or you know you really see that the LLM is doing a bad job then. So this means that we must manage what goes into the context window and that means a few very practical things. First number one and one of the most simple ones is to start a new session or clear the context when you are done with a specific task. So um an example when I'm in here and I say um my name is Bernen and then it responds with got it buen. If I say what is my name? Enter again and it gives my name as I showed before. But I can also do in these CLI tools you can also do something like clear and then it clears the context. So to show this um when I type in here SL context we see this amount here the amount of tokens. Um but if I now do dash clear and I do this again uh then you know you see zero. So zero tokens are in there.
So with the SL clear command you can clear the context. Um and if you use chat GPT then you know you simply start a new chat and if you start a new chat you are also in a new context. So rule of thumb work task by task if possible and uh yeah just start a new session when you are doing something completely different. Second put repeating rules in an agent's MD file. And you have more files like this. you have a skills.mmd file, an instructions.mmd file. Um, but I won't uh dive into all of them. I just show you this one. But that being said, let me demo this quickly. So, let's open this one here. Let's clear the context and let's say that I want to give the LLM this prompt. You know, I want to give it this question like, can you explain to me what an LLM is? Uh, but I want you to do it in a friendly tone with short sentences, simple uh simple words and it must always use a simple analogy and end with one punchy takeaway. And then I press enter. We wait and it gives us a a nice response, you know, with the friendly tone and the takeaway, etc. Everything we specified.
Now, let's say I want to ask it another question like, can you explain what tokens are? So, instead of LLMs, we want to know what tokens are. And then again I paste this whole stuff here. So the disadvantage of this is that it's not really uh user friendly because every time I have to specify this but it's also unnecessarily uh filling my context window with extra tokens you know with extra text every time that I uh do a new question um or or make a new prompt. But the thing is I've prepared this a bit for you. Of course, if I go to uh this uh example folder, then in here you see an agents.md file. Now, let's open it in the terminal. And in this agents MD file, I have said uh use this guide to explain things to me. Explain. And the explain instructions are uh use a friendly tone, use simple words, use short sentences, always use a simple analogy, and end with one punchy takeaway. Now when I start uh copilot in this folder then uh it automatically sees this agents.mmd and let's now start this in private mode which I normally do.
So if I now ask the LLM, can you explain to me what an LLM is? And I ask it a question and I ask it that question. I mean, and then here you already see I should follow the custom style guide.
Um, it's also giving the answer immediately which says to use friendly tone, simple words, blah blah blah. So everything we mentioned in the MD file and it gives us uh an answer in the style that uh that we specified in the agents MD file. So this session will grow but much less because we now stopped resending the same instructions every time. Plus it makes our life a bit easier because we don't have to uh rewrite uh yeah the style guides so to say every time when we ask it a question. Now this is of course a simple example but instead of this uh simple agents MD file that I showed you here.
So instead of this simple file you can have style guides for whatever you want to use like um uh you can use this to define style guides for programming an app or creating a marketing campaign etc. Yeah. So this is really beneficial for your context and it makes your life a bit easier. So this was a short example. Then let's go to the last one which is compacting. Um so most AI CLI tools have this feature called compacting which basically means that they can take the whole conversation or a long chat how I called it here and then you can execute this compact command and what that does is that it uh summarizes it into a shorter context. So it frees up space in the context. So let me quickly show this. So let's uh start the co-pilot again. And in here you know you have this compact command and you can simply uh execute this and uh yeah that will summarize the conversation history to reduce the context. Anyways you now know way better how LLMs work.
They are just a uh next token prediction engine and what feels like memory is usually context being passed in. And as explained, the context window is the amount of text that the model can see at one time. And as you know now, context engineering is the real superpower because it decides whether the model sees the right stuff or one big chaotic mess. I hope you like the video and I see you in the next one. Cheers.
関連おすすめ
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











