Simon provides a pragmatic and well-structured blueprint that transforms the abstract idea of AI memory into a production-ready architecture. It is a grounded guide for developers who want to build agents that are both context-aware and securely managed.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Claude Code Agentic OS… It Remembers Everything
Added:For a tool this good, Claude Code's memory is embarrassingly bad out of the box. Decisions you've already made, context you've explained before, and work you've already done. It forgets all of it. And after a while, that becomes attacks on every single session you run.
And it's attacks on your time. So over the last few months, I've been tearing apart the best memory systems out there.
We've looked at Hermes, Gbrain, Memarch, and a bunch of others to work out what they're each doing differently. Then I rebuilt the best parts of each inside my own agent operating system. memory that can find a six-month old decision from a question that you ask in completely different words. Memory that points to the exact conversation behind every answer it gives you so you can actually trust it. And then memory that a whole team can share where everyone gets what they need and no one sees what they shouldn't. And by the end of this video, you'll have the blueprint to build this yourself. So let's start off by talking about perfect memory.
So for a business, I've defined this as four specific things. And like I said, each one is lifted from a different open source project that solved that piece of the puzzle much better than I've seen anywhere else. So number one, it cites its sources. So when it tells you something, it tells you where that information came from. Not just which file, but the actual conversation context is returned with it, exactly the words that were used, and on which date that was agreed. And if it doesn't actually know, it's going to tell you so instead of making it up, which is super important. We all need to know if we're receiving trustworthy information. in particular when you're working with others across a team and it might be conveying their decisions to you, not necessarily your own which you can more easily remember. Now, this specific part of recalling information was inspired by GBrain, the project that Gary Tan from Y Combinator built out, and I'd highly recommend checking out some of the principles there. Number two is quick recall of recent contacts. So all the latest things that you've been discussing dayto-day with claw code, your pricing, who are your active clients, your current decisions, all of those things should sit close and accessible to every session that you run. But the trick here is a small curated snapshot of context that loads silently at the start of every session using a hook. But it should also be capped so that it doesn't bloat context too much and contribute to context rot.
And you might recognize this as the exact frozen snapshot pattern from the Hermes agent. It injects a capped amount of key contexts every time you talk to it. And this is going to save you loads of time versus digging back through a larger database of mostly irrelevant memories or not having any memories stored at all. Which brings us to three.
Long-term search by meaning. Something you or another team member agreed on, let's say 6 months ago, buried somewhere in a meeting transcript should be returned back when it's relevant. Not because you searched for the exact words you used in that meeting, but because you asked it about something similar and the agent was able to go in and recognize that that context was important and relevant. And what I've just described is semantic search and the approach comes from vector databases more generally, but for agent memory frameworks, it's used heavily by a few different open source frameworks including memarch. So the idea is split up all conversations you have with it, summarize them or keep them word for word, chunk them, turn them into vectors that are then able to be searched by meaning. You can pair that with a hybrid approach that also allows for keyword search too. So if you search something like payment processing, it's going to find all sorts of results, but it will even find things that don't keyword match like Stripe because it understands the context behind Stripe and payment processing. So even when they share no words, you're able to recall that information. And then last but not least, number four is scoped access across clients and your team. So your work with one client and another client should be kept totally isolated. Like all of those conversations need isolation. And you might need that across projects, too. So when you bring a teammate in, they can see their clients and only their clients. The memory knows who's asking because all of the access to that shared brain is actually scoped. And this one's the GBrain company brain idea. Again, one shared brain with every memory tagged by exactly who owns it and every query then that comes into the shared memory is filtered by the person asking. So they can only access what they're supposed to be accessing. So pulling all together the perfect memory, we've got citations from Gbrain, we've got memory injection from Hermes, we've got search by meaning from memsarch and then team scoping back again from Gbrain. Now before we show you exactly how we implemented this, let me quickly show you why claw code out of the box is not good enough. So every memory system anywhere basically answers the same three questions. Where and how does it store what you tell it? What does it inject into the short-term context when you start a new session? So the stuff that's kept local. And how does it recall the right thing when you ask for it? Now I'll be blunt. Claw code is actually just weak at all three out the box. So let's take storage. Claude code has an automemory function which is effectively notes that Claude writes itself based on the corrections you give it and your preferences. But to be honest, it just saves the odd thing that it decides matters. So you don't end up capturing much at all. So as an example, I can go into cloud code and run the /memory function. It will basically run me through the different sets of memory.
Automemory you can see is on. Let's go and open the automemory folder. What we can see, by the way, in a repo that I've been running for months, is quite literally two files. An index file, memory.md, which has one reference point, and that's for a specific project that I did, and then we've effectively got one single project. So we've been running this for months, and the memory index has one record of that. So unless you're telling it, please remember this fact. It's barely going to actually capture anything and it's going to be really hard to recall memory if you haven't actually saved any. Right? So the next step is injection. So this is when you open a session, it loads your claw.md and it basically loads very little else out the box. So it actually loads the auto memory or the memory.mmd file. But as you saw, not much is being captured. So we might as well not have it in context at all. There's no curated snapshot of here's what actually matters about this business or here's the brand context or here are your recent memories which are really important. So it's dependent on you then not itself. So you need to inject that context into that.
Still very very manual. And then we've got recall which is actually probably the biggest gap. There's no search basically at all. If we ask it to recall some information we've told it. The only option it has is to go through the automemory file, which we saw doesn't capture very much, or to search specific sessions from conversations that you previously had, which as you can imagine is basically trolling through that session context. It's super tokenheavy and often actually can't even search by things like meaning. So, it's just searching for keywords that kind of match the intent of what you've asked it. Now, you can actually, you know, use resume to resume previous conversations, but you need to understand the conversation ID or again add a search term into that. So things like what did we agree in that client meeting 6 months ago will not come back reliably and it often just won't come back with any matches at all. Now with that capability or lack of capability it's pretty much a dealbreaker the moment you start running real clients and projects across a series of months.
Now because we know how poor clog code is out of the box for these different functions it led us down the route of assessing what was available on the open source and paid market. So first you need to understand what decisions to make around storage injection and recall to decide which frameworks are actually best for your use case. So I'm going to walk you through the logic and the questions that we asked ourselves to decide which framework we wanted at each part in this decision tree. So first of all, do you need to store all of your conversations and recall them exactly word for word verbatim or are you happy actually summarizing most of that conversation and then recalling key snippets of that conversation? If you want complete word for word recall then best in the market are something like me palace and claude mem but actually a curated set of context from every conversation you can use something like mem search or even openclaw follows these patterns too and one of the things that make these frameworks really powerful is that they save those conversations and then you can rebuild them later from the markdown so they're completely portable then the second fork is at the start of every session do you want to load in none some or all of the context and I would not recommend loading in all of the context which is basically like over stuffing your claw the MD and firing ing in thousands of lines of context that's going to make it really hard to recall information. Or do you want to take the approach that Hermes took and actually inject something like a frozen snapshot? So they inject a soul.md, a user.md, a memory.md, which is curated set of recent context memories, not like Claude's automemory, and they also inject the daily memory of the work that's been done today. So it's basically like a capped amount of frozen memory that gets injected into every conversation. Then how thorough does your recall need to be? Do you need to search by meaning or is keyword fine? So Hermes recall out the box actually searches by keyword versus something like me search which has a hybrid vector so we can search by meaning and keyword search. So like we mentioned earlier if we search for something like payment processing then it's going to identify that stripe is a relevant thing to recall and actually come back with more relevant context. And then the second part of recalling is how it's actually delivered back to you. Do you want things delivered as pure fact and pure chunks from the vector database back to you or do you want them actually delivered in a synthesized way with a cited answer telling you this was the source, this is the line we read the source from and this was the date. If you do, then you're going to need to add on a layer like the layer from Gary Tan's GBrain where it reranks all of the results and then actually writes an answer that cites exactly where it's come from. But almost more importantly, it admits when you can't actually find an answer or where it can't find an answer. So you can see that we've based the features that we've actually implemented or the things we've cherrypicked from different frameworks on our specific use cases and that's exactly what you should do too. So you can by all means install one of these frameworks, the Hermes framework, the mem search, etc. off the shelf. But I highly encourage you to know about how it's tackling storage injection and recall. Now, for those who watch my earlier memory videos, the long-term store that we had planned to run, we originally picked memarch to run this, but we actually ended up replacing that for a locally hosted PG Lite and Vector store. We kept the memarch logic, meaning markdown first, daily logs, the semantic and keyword hybrid search, but we upgraded it to PG Lite and PG Vector for three reasons. Firstly, there's no external dependencies, so it can just run locally on your PC if you don't need to share it with others. Two, it can install on Windows. So, mem search was a real pain on Windows as you had to have a cloud account as we found out as we started implementing it. And three, the big one, it lets us scope memory per user. So, instead of using PG light and PG vector for our community members who want to run this across a team, it helps them set up a railway Postgress database. So, it's cloud hosted with rowle security that scopes access for specific users across projects, across clients, and any other scope that they want to give them. So this was all super important for us specifically as part of our foundations for our team access.
So in practice then what does this actually look like for our flow end to end? Well, let's say that I had a conversation with Claude the other day and I told it we've decided to introduce a third pricing tier because it lowers the barrier to entry for our community.
Now the first thing that happens is we actually activate that stop hook. So after every turn it's trying to work out do I need to save this into short-term memory as a durable fact? And then it's also whether or not it's saved into short-term memory, embedding that into our long-term searchable memory. So, it's going to promote things into a short-term memory based on its judgment.
So, is it a decision? Is it a price change? Is it a preference or an environment factor that I might want to come up with every single session? If so, it's going to put that into the short-term capped memory in a memory. MD file, and it's going to run through the existing file and work out is this a duplicate of information or overwrites existing information that we've got there. Then down here, we've got our clawed working context. So when you're in that context window, what's being loaded in? All of that short-term memory is actually being injected into the next session. So it's a frozen snapshot that when you restart Claude and go into the next session, it's going to be injected into that conversation. And we also inject things similar to Hermes like soul, user, memory, and today. And we're basically putting in recent facts and injecting that into the working context.
Now, irrelevant of that, everything is filed long-term regardless. So it goes through our pipeline to be chunked, embedded as vectors, and then later is therefore searchable. So if later we ask what did we agree on plans and we've not mentioned pricing tiers, we've not mentioned communities, it has all the context to put those together and work out if something is relevant or not. Now there's a three tier retrieval system.
First, it's just going to read in that short-term context. So it already has in the context relevant information or potentially relevant information. But if it can't find it in that short-term memory, it's going to dig deeper. It's now going to turn the query into vectors. And that means we can search for similar vectors in the database. So, it's going to do vector and keyword research if it's needed. It's going to pull back the top five to 10 relevant bits of information based on our query.
And then it's going to rerank them to work out which one is most likely to be the best answer for our original question. And then because we're citing and synthesizing those results, the reranked results, it might come back into the conversation and say, "You agreed a third pricing tier would lower the barrier to entry for your community.
It was decided by Simon. The price was $37 starting. Okay, I've pulled it from this file, this line, and this is the date or this is the time since that was made. And it might also come back with some confirmation that actually I've checked through other conversations and no new discussions about pricing have been had since that day. So, it's synthesized, it's cited, it's honest about its gaps, and you basically have a multi-level storage injection and recall system that helps you work day-to-day inside that clawed working context. Now, that's the pipeline running on all new conversations going forward. But here's the question we actually asked ourselves. We already have months of sessions sitting there not registered on our memory. So everything we've made before we install it is basically going to be lost. So all the decisions and all the work we've done before setting this system up is going to be lost. All of it's just sitting there in that session history doing nothing. So we actually wanted one more capability. We wanted to take everything that you've already said and turn it into memory. So it's going to take all of your previous session history and effectively create memories, long-term memories and short-term memories from that full session history automatically for you. So what this actually means is the day that you set this up, you're not starting from zero.
You start with everything you've ever told Claude in that project already remembered, searchable by meaning and cited. So your back catalog becomes your brain from day one. Now I've not found this fourth and final part about team OS or team operations working properly anywhere else. So we built the logic ourselves and this is inspired by Gary Tan's Gbrain setup. So this is something still in development. We've got the foundations in in place with our PG light vector database and soon to be hosted railway database, but we'll be releasing the team OS in the next few weeks and that will include scoped access for anyone on your team to whatever you want in the database. So there's two principles we wanted here.
Firstly, allow nontechnical team members to contribute. So we're enabling some key files like brand context and claude.md to have their source of truth in notion or Google drive. claw code is going to handle things like skills and memory functions everything you don't need your team to actually touch or see and then it's all going to be backed up and version controlled in GitHub on your behalf then secondly we want to scope access for individual users so we're basically using a Postgress database with rowle security so every row is going to be tagged by system team client or private and then every query is going to be filtered by who who's asking enforced in the database so the owner of the database could potentially see everything a junior team member on a specific client client will see their systems, their team, that client, and their own work, then they don't get access to any client that they're not supposed to access or any scope that they're not given by the owner. So, there you have it. That's every decision we made and every consideration that led us down the path to create our own AgentOs with memory good enough to search by meaning and then point to the exact conversation behind every answer, citing it so you can actually trust it.
Now, if you'd rather just plug and play with a oneline install, then you can actually grab it in our paid community, the Aentic Academy, in the description below.
Related Videos
AI Agent Mastery Certification Course: Lab 4 – Tools & MCP
arizeai
350 views•2026-06-16
Real-time Voice cloning, Kimi K2.7 CODE, GLM 5.2 and 3D reconstruction | AI News
kaiexplainsYT
111 views•2026-06-16
He Believes AI Could Replace Humanity Faster Than Anyone Expects
LondonRealTV
815 views•2026-06-15
General Session by Rami Rahim-The next generation of networking: From vision to self-driving reality
HPE
108 views•2026-06-17
[PLDI 2026] Flatirons 3 - LCTES (Jun 16th)
acmsigplan
191 views•2026-06-16
Google DeepMind’s AI Halves UK Housing Planning Time
60secondsignals
467 views•2026-06-17
The Creators of Claude Code and OpenClaw don't Prompt Their Agents Anymore?!
ColeMedin
569 views•2026-06-18
Why prompt injection is AI's biggest fail
usemultiplier
1K views•2026-06-17











