Installieren Sie unsere Erweiterung an, um sofort in jedem Video zu suchen

NEW Self-Improving Memory For AI (Forget Memory.md)
Hinzugefügt: 2026-05-17

294 Aufrufe4024:05code4AIOriginalveröffentlichung: 2026-05-16

SAGE is a self-evolving graph memory engine that couples reinforcement learning-driven graph construction with a structurally gated graph neural network, enabling AI systems to dynamically optimize their memory topology through continuous feedback loops where retrieval failures serve as reward signals to improve graph structure, achieving 0.032-second retrieval times and superior multi-hop reasoning performance compared to static memory systems.

[00:00:00]Hello community. So glad that you are back. You would say, "My goodness, what is this?" Yeah, today we talk about memory and how we design memory for artificial intelligence. And if you think that you just have a memory MD file where you just concatenate new memory elements, well, you will be in for a surprise because we do science here. Now, you remember in my last video we talked about here slow fast training the eye where we have here an interweaving compound of multiple AI system that are training here together.

[00:00:32]We will use the same idea and apply this now and I will show you here a new system that will improve significantly our traditional AI memory system. And my goodness, we're going to use physics here to the maximum so to show you what is the latest in memory. So, let's have a look.

[00:00:54]Sage, what is it? It is a self-evolving graph memory engine that couples now our reinforcement based graph construction with a structurally gated graph neural network. So, this means we have multiple AI that write now together in an optimized way our memory file.

[00:01:15]And you might say, "But wait a minute, what is the problem? Why should we do this?"

[00:01:19]What is the problem that Sage or Saget will solve? By the way, this is Sage 4.

[00:01:25]I have so many other Sage that are complete different topics here, but somehow people love here this acronym Sage. So, careful. Sage can mean different completely different things in completely different domains.

[00:01:39]So, let's go let's start at the beginning rack. No, standard retrieval augmented generation retrieves an independent text chunks via dense vector similarity or cosine similarity which struggles to recover the long reasoning chain. No no way to do this, no? And then you say, "Okay, graph rack." No, improves upon this by extracting entities and relation from text to ability and knowledge graph prior to the retrieval and we say, "Hey, great." But unfortunately, both have a problem, you know? Because both treat an external memory as a static index.

[00:02:08]And this is something that we're going to eliminate today.

[00:02:11]So, if the initial graph construction was, let's say, the moment you build it, you know, it was noisy, or we had some highly connected hubs that were dominating here, or you missed some crucial bridge entities, some edges were not really there, or were not in the strength that are required, you know?

[00:02:29]All the downstream retrievals here of memories, of rag, of graph rag, all of this will fail.

[00:02:37]So, some beautiful scientific minds said, "Hey, okay, you know what? Let's do a memory file, an external memory for an LLM, or be specific for an LLM agent, of course. It should be now a dynamic and an optimizable and differentiable substrate." And you see in this definition, the memory will become something you have never seen before, but it will outperform everything else that you know about.

[00:03:04]So, let's have a look. Yeah, by the way, I was asked, "What is the secret of AI?

[00:03:11]What is this mythical element here? What is the alchemistry of AI?" And now, you know, I have just wanted to make this sure and I make it here in a simple sentence here. You know, I have a feeling that AI is just pure science.

[00:03:26]But hey, this is just my idea. So, let's continue. So, treating now a graph memory, and you see we have here, of course, graph structures, as a joint optimization problem. So, this is exactly where we left off in the old video from yesterday, where we looked here at here a complex reasoning structure, a slow and a fast or different temporal dynamics here in the LLM long-form reasoning. Now, we go here especially only we are focused on the memory and we will have an AI that is a memory writer and we have another AI that is a memory reader.

[00:04:01]And of course, the writer will be a large language model that generates here our topological graph our knowledge graph and the memory reader is of course not an LLM. Come on, this would be too easy, huh? Here we go now with a graph neural network that will retrieve the information.

[00:04:18]So, we have here two different types of AI architecture that we can finally combine and may I just indicate here this little sentence AI is science.

[00:04:30]Okay.

[00:04:32]Now, let's think about physics. It's also something with science about coupled system. You know, system that are coupled here like a pendulum, huh?

[00:04:39]So, this means when the reader here of our memory fails to retrieve the correct evidence path, the failure is then used here since it's a coupled system a self-learning system.

[00:04:50]The failure is used as a reward signal to update now the writer generation policy our pi data.

[00:04:57]So, you see this is if you want a self-generating self-optimizing memory.

[00:05:03]Okay. Who would thought that a memory would be static, huh? So, the system continuously evolves here the graph's topology, the real structure of the graph to maximize now the retrieval accuracy by the reader. So, remember this is a coupled system. Imagine this with some spring loads. Imagine this with some joint optimization as a mathematical optimization problem, whatever you have in physics, huh?

[00:05:30]So, Oh, just noticed I was a little bit Okay.

[00:05:35]In case you don't notice, AI is science.

[00:05:38]So, the simple question is now, "Hey, you might ask, how do we rewrite the topology of a dynamic graph given here our complexities that we going to encounter?" And I know that you knew already the solution, so let's go together.

[00:05:51]Now, the memory writer DCI, what is it?

[00:05:54]What is the task?

[00:05:57]Now, the task is simple. This is to construct, to build the graph, and optimize the graph representation of a certain textual complexity, or a medical or financial complexity, and optimize this via the reinforcement learning algorithm that we already know, huh? So, what is it? As I told you, it's a policy-based dilemma. Pi data is an optimization that takes some raw text and dialogue history, whatever, as input, and outputs now a structured JSON array of entity relation triples, subject relation object, exactly the elements that we need to build up incrementally our beautiful knowledge graph of your in your particular domain, your domain knowledge, and your complexity, and on.

[00:06:43]Now, the memory writer do all this trained this, to build it, and trained it, of course, on your data, huh? And here do you see a forward RL optimization, the group relative policy optimization, our good old friend the GRPO.

[00:06:57]So, the environment reward is determined by the reader's downstream utility.

[00:07:02]And the reward function, there's a lot of mathematics, but I want to make it clear, is just evaluating the recall, the precision, and of course, whether the retrieved evidence contains here the exact entities that are required to the use the answer.

[00:07:16]So, beautifully, you say, "Hey, that's simple." Of course, it's simple.

[00:07:20]The other element is the memory reader, and now it gets interesting, huh? This is now a graph neural network with a structurally gated message passing.

[00:07:30]And you say, "Hold on. I I'm new to AI.

[00:07:33]I know nothing about this." Never mind, you go to my channel, you put in GNN, and you have your complete course about graph neural networks.

[00:07:43]Four years ago, my goodness, we coded here graph neural networks here or here in my playlist. Look at this. We were working with DGL on PyTorch and PyG code examples in graph ML.

[00:07:55]Long are the times that this was here the easy time, you know, the good old times that this was just a DGL, a deep graph library that we could optimize everything. Today is everything so much more complicated, but also mathematically more challenging, and this is a nice thing. And you see just a year ago we talked about LLM in a knowledge graph and a GNN. Is this really the truth? So you see, we already had the idea a year ago.

[00:08:20]Just the the concrete practical implementation.

[00:08:24]This is now happening. But this implementation here, this was the GNN.

[00:08:28]And you might say, but what about this message passing? Well, you were lucky because also have 3 years ago where we're talking about also, you see, graph sage here, also the sage here, the GCN, the graph isomorphism explained here visually. In message passing already 4 years ago we coded this together. Then the message passing for graph in simpler examples 4 years ago and 3 years ago. We went to topological message passing on GNN on simplicial complexes here on CW networks. So a little bit advanced mathematics, but no, we stay simple today. We stay with a simple message passing a GNN, but structurally gated. Now we come here to the interesting part of the paper.

[00:09:13]So what is in general a graph foundational model, a graph neural network designed to process the query and output now a probability distribution over the graph's node to select the relevant evidence, no? Of course, what else?

[00:09:26]Now they have two modifications. And now here, if you have a look at the paper, those modifications are the core insight. Those modifications are mathematically really heavy.

[00:09:38]But I want to make it easy that everybody understands the eye. So I tell you what it does and the what is the effect and why we need it. And if you want to have a jump in the deep dive of the mathematics, you just open the paper and you enjoy it. So, first modification is structured query planning.

[00:09:55]So instead of using our walk very vector, single vector, to initialize here the node activations, the model decomposes here the query, you know, we know this, no?

[00:10:05]Make it simpler, reduce it, reduce the complexity level. Multiple single single queries. So it decomposes the query into structured retrieval intents, explicit entities, aliases, hard constraint that we have on the system, to create here simply a broader initial activation state, no? We widen it out, we smooth it out. We don't have a single peak here as a vector, but we have here some soft addressing across multiple anchor node. We have a broad basis. This is beautifully structured query planning.

[00:10:38]And the authors now modify the message passing layer by computing a vector gate for each edge.

[00:10:46]Now you may say, what is it and why we do it? Now, what is it is simple. What is a gate? They implement a gate here as a parameterized by a multi-layer perceptron or MLP that takes a local topological feature as an input, let's say a node degree, or any other similarity you have about neighborhoods in graph, you know, remember graph theory in mathematics at the university. My goodness, we had a complete semester just about a classical graph theoretical functions and the elements and whatever. So, classical stuff, beautiful.

[00:11:16]But it allows now graph neural network, and this is the reason why we do it, to learn to mathematically dampen the message passing through some noisy hubs, you know, these monster hubs, these are these are disturbing here the harmony in the network, no?

[00:11:31]But on the other side, we wanted to the little sparse bridges that we have, the little edges, that absolutely shine in the background, that we take care about them, too, that we integrate them in our long hop, multi-hop reasoning structure, no?

[00:11:45]So, this is exactly what will help us to do this. It will mathematically dampen the message passes, so these noisy monster hubs, while also preserving here the signal transmission across these little tiny sparse bridge edge structures that we have in our graph structure.

[00:12:04]This is all there is.

[00:12:06]So, you see, beautiful.

[00:12:10]Now, the self-evolving loop, this is now how to combine here, or if you want the training regimen that connects now the writer and the reader AI here.

[00:12:21]So, how we do this? How we train it? How we let it optimize? Now, simple, no?

[00:12:26]First, the reader AI is frozen, and the writer's policy is updated via reinforcement learning using here the reader's retrieval matrix, they are multiple, as the reward structure, no? So, the reward of the reader become now here important for the writer policy update via reinforcement learning.

[00:12:45]Next step is simple. Now, the writer is frozen and generates now a new set of improved knowledge graph. So, the writer AI understood, "Hey, I understand that the reasoning topology, the interwoven net of why we reason in a particular way is now different. So, now I rebuild my graph structure. I make the graph more precisely, absolutely aligned to the new reasoning structure, to the textual reasoning structure, no?

[00:13:12]A new knowledge graph.

[00:13:13]And then the reader is then updated on this new graph distribution to adapt to the writer's changing topological structure. So, you see, again, exactly like in my last video, where we talked about the temporal dimension, about fast and slow and we had the reinforcement learning and GePa. Here now we have, if you want, two EI, a LLM and a graph neural network that right now in an optimized way and knowledge graph and continuously update the knowledge graph.

[00:13:43]Yes, this is computational heavy, but what you don't know yet is it is extremely mathematical heavy, but I want to make this video that everybody understands here why we do it, what is, if you want to trick, the beauty, the mathematical operation, and how we achieve uh something that is outperforming everything else that we have.

[00:14:05]Now, I know a lot of you say, "Get up.

[00:14:07]Here you have to get up all the files.

[00:14:09]Everything is available for you."

[00:14:10]There's even uh the author of the GitHub said, "Hey, support me with a cup of coffee." So, if you want, I think this is a great project.

[00:14:18]But let's come now to the results.

[00:14:21]Okay.

[00:14:22]I told you it should outperform everything that we have on how to build memory for AI, no? Everything from a simple EI, I don't know, memory MD, concatenation, or maybe you have here a gated LLM that just lets you some particular knowledge adds to the memory structure, but what you don't want is you don't want to start with a blank sheet when you open up your machine and your AI comes online and you start from blank, no? No. You want to have the complete memory of everything that happened to you and your AI. So, let's have a look.

[00:14:57]Now, by explicitly optimizing now the graph topology via the reinforcement learning feedback, Sage in this Sage form here drastically improves here the multi-hop reasoning.

[00:15:09]And you see, just look at the terms, no?

[00:15:11]EI science, graph topology, reinforcement learning feedback. Yes.

[00:15:16]Nice.

[00:15:17]Let's look at the data. Now, what's really nice, they give us here yeah we have to have the hot pot Q&A, the music, and the wiki multi hop Q&A, so three different benchmarks. And they give us a lot of that you see here rag and memory.

[00:15:30]And I think this is beautiful. They give show us a hey, look, this was here published or presented here in 24, 25.

[00:15:39]And at the end stage, of course, 26, May 26.

[00:15:42]So, and you see here really here uh the exact match EM and the F1 score here. This is not a result for multi hop question answering Q&A performance here on these three benchmark and the average rank here, it is the last column.

[00:15:56]And you see it almost outperforms you all the other systems.

[00:16:00]But what's really nice if you can now compare if you use for like for example Hyper Rag now, and then a chain of sword, or you use here GFM Rag, or whatever.

[00:16:10]You can see now the improvements that you are able to achieve if you switch to the new system. I think really really nice here.

[00:16:20]Now, I already mentioned it. Maybe you noticed it.

[00:16:24]This is a complete story, but just want to tell you, I wanted to give you a presentation as a simple introduction that you immediately understand the complete complexity of everything.

[00:16:37]You have now a complete understanding of what this methodology is and why it is.

[00:16:42]But now to code this, you have to build the mathematics for this, no? Because what you want to code if you don't have mathematics. So, I want to just tell you this paper here, they have 62 pages of some beautiful mathematics. And just to give you a feeling, on page 27, they already have here 86 formulas. And this is not of the easy kind.

[00:17:05]But absolutely beautiful. Right in use graph distribution shift and the target graph calibration. And they really go and really try to be as precise as possible. Here, they have a lot of proposition and ideas and mathematical proofs and not only for the right and use dynamics, they also go for the reader stability and the dynamic graph evolution and they have a look at say, they ask themselves, hey, can we do this? What we know, what we know about Lipschitz conditions, what we know about here the padding alignment and the on all the other complexity that if you're not into a AI mathematics here, you would never thought about this. But I think this paper is so nice that they have here not a medical annex just of some pages, but they have 62 pages of pure mathematics explaining here how to write memory better than ever before.

[00:17:56]And if you look here at their data, you have to say, yeah. But you know what?

[00:18:00]The best thing I haven't shown you yet.

[00:18:03]Because this has an advantage that is just amazing.

[00:18:08]Where have you been? Wait a minute.

[00:18:10]Here.

[00:18:11]Yeah, you have tables over tables exact data. If you can have a deep dive, please do it. The training results for the memory writer here and and and they just want to also they had a lot of effort to show you also in ablation studies, hey, it is really important that we have to do what I just presented to you. You have no shortcut. You have no simplification.

[00:18:36]You really have to go all those hoops.

[00:18:38]You have to do it because here for example, I have you all here the recall the reinforcement learning. If the writer is trained solely on an oil recall, the precision is high. Yeah, but the did usable drops here significantly and the orders tell us this proves that merely accumulating a signal mass here is insufficient for the graph topology.

[00:18:57]The topology must be structurally continuous.

[00:19:01]You remember I told to you about the differentiable structure of this. Must be structurally continuous to support multi-hop reasoning.

[00:19:09]So, I have a feeling that they really invested quite some amount of time and thinking and deep thoughts and mathematics and code. Absolutely beautiful. So, if you want to get a feeling here of really the the leading and bleeding edge of AI here for memory, I highly recommend this particular paper.

[00:19:31]Yeah, and I told you there is something beautiful, yeah?

[00:19:34]Have a look at this. And now we look here the retrieval efficiency and the comparison here and the retrieval time now in seconds. And here our three benchmarks, so the time is in seconds.

[00:19:45]And now look here our iterative retrieval methods here, the classical one, one of the best here is about 3 seconds, yeah?

[00:19:52]And now look at Sage.

[00:19:54]It is 0.03 seconds.

[00:19:58]You might say, "How is this possible?"

[00:20:00]Think about it. Yeah, of course, this is why we built a continuous differentiable graph optimization.

[00:20:07]The optimization is continuously graph No, Sage shifts here the immense computational burden of the multi-hop reasoning offline. You do this continuously.

[00:20:20]And then when it really happens, when you really the query comes in, the real the force learning driven writer our writer AI expands now the compute to continuously pre-optimize the geometry in the graph, rendering here the online reading phase now, when we really have to execute here the query, a trivial single pass matrix multiplication.

[00:20:41]And therefore, it takes only a tiny tiny fraction of a second compared to all the other models.

[00:20:48]So, this is nice if you are having time critical jobs.

[00:20:52]For memory now, this is a solution that you continuously improve the graph representation of your memory complexity in your domain knowledge.

[00:21:05]If you do this, if you continuously let the system learn, what is the best representation for all the task this human user asks me all the time, then you have here the perfect representation.

[00:21:19]And we do here the all the computation burden here offline. So, we do have it.

[00:21:23]We cannot skip it.

[00:21:25]But we put it offline and then when it's the real job online, it's extreme fast.

[00:21:33]And I think that's beautiful. The authors established the the graph a memory.

[00:21:37]Finally, it's not static indexing anymore, like you learn it in the books and you see it in all the videos.

[00:21:43]But now, now we can say, "Hey, this is a fully differentiable underline two, three, four times self-improving substrate."

[00:21:54]However, the complexity is not simple.

[00:21:57]And I made this video particular after reading the paper to enable you to enjoy the idea to have to complete idea. And then have a deep dive into the mathematics because I hoped that this will help you to understand the paper.

[00:22:11]Because as already mentioned, I don't know if you notice, but EI is science.

[00:22:17]Now, what I really like, you can kind of reduce to the max. No, and you can say, "Okay, so what have we chosen now to solve?" No? The question was, "How do we dynamically evolve the topology and the structural gating?" You remember the two modification to the to the message passing in our graph of neural network after heterogeneous knowledge graph. So, in a way that the current answer is extreme fast generated. And the current answer naturally becomes now the global attractor.

[00:22:50]Or if you want to be here in the in the graph neural network notation, maximize the target neural probability mass, no?

[00:22:56]Have a look at my videos about GNN.

[00:22:59]This is it what we just solved. And now just look at this formulation and tell me is AI what is AI?

[00:23:08]Now, if you want to be if you are a subscriber of my channel and you appreciate this channel, you know it is not that easy, now? Because the question or at least the the entry to the question would be this.

[00:23:19]The entire objective of SAGE, this means evolving the right as topology and the read as gating function is to optimize the graph transition matrix in a way that after L layers of message passing, the probability must successfully concentrates on the correct evidence nodes. So, this means by minimizing the multi-positive list entropy loss during the training, the system now ensures that the ground truth nodes naturally, underline, achieve the highest activation logic across the entire candidate set. And this is the way we achieve a solution with this SAGE.

[00:23:53]I hope you enjoyed it. I hope you had a little bit of fun. I hope you understood that there was an underlying message that you hardly noticed. AI is science.

[00:24:02]And I hope to see you in my next video.

#artificial intelligence #AI models #LLM #VLM #VLA

Ähnliche Videos

Künstliche Intelligenz

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Künstliche Intelligenz

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Künstliche Intelligenz

5 Mind Blowing Omni Uses Cases

PaulJLipsky

1K views•2026-06-02

Künstliche Intelligenz

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Künstliche Intelligenz

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Künstliche Intelligenz

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Künstliche Intelligenz

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Künstliche Intelligenz

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Trends

Revisiting The Cat Cafe For The Final Time

BenGtalks

3195K views•2026-05-29

Lil bro is a menace 🤣

NotAirJordan

2037K views•2026-05-31

Politikwissenschaft

My response to the Police

RecklessBen

1496K views•2026-06-01

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30