Install our extension to search inside any video instantly.

This Agent Self-Evolves (Fully explained)
Added: 2026-05-06

1,953 views9417:48AIJasonZOriginal Release: 2026-04-21

This breakdown offers a practical blueprint for building agents that actually learn from experience rather than just following static instructions. It is a rare, clear-eyed look at the tiered memory architecture required for true AI autonomy.

[00:00:01]Thanks HubSpot for sponsoring this video. For the past few weeks, massive amount of progress has been made for making your agents self- evvolving. from his auto agent purge which is concept evolved from Andrew Cupsy's auto research that utilizing cloud codecs to self- evvolve and agent harness for a specific set of tasks and it achieved number one on the spreadsheet branch and number one on terminal branch as well as from cloud code's leaked source code where people found this hidden auto dream feature that is getting cloud code to autonomously extract learnings and best practice from the conversation to super popular her agent that's almost took it growth away from open cloud because agent they remember what it learns and gets more capable over time.

[00:00:43]The question is what is the actual mechanism behind all sorts of self-arning purchase and what is state-of-the-art implementation that you can take away for your own agent building and this is what I want to take you through today. What is state-of-the-art way of building a self-evolving agent that gets smarter the more you use it. So first of all you should actually break down those different projects into two groups. Auto agent and auto research is actually very different creature compared with the rest of those self-evolving agent setup where auto agent or auto research is a mechanism to improve the agent harness or software itself which means the goal of auto agent is produce an agent harness that can complete a specific type of task better while Herms agent autodream and many other self-learning skills are really focusing on the in context learning or memory output so they are serving very different purpose like with auto agent or auto research fundamental entally is this for loop that is running where user would define a vision or prd in a program.mmd file that clearly explain what this agent model should do and you will get the latest agent harness like cloud code or codeex to read this program.mmd and make improvements to the system itself which can be the agent harness runtime itself or especially model and script then the agent will run the evaluation to compare the performance of this new version against the baseline and decide whether they should keep or discard the improvements and repeat this loop infinitely And the system once it's produced rather than a mechanism that make your existing agent to continuously learning and to run this loop you actually have to have this database of the task and programmatic way to evaluate and verify the performance which in many cases you probably don't have that large database of deterministic way to verify the work and in so appropriate comparison. This auto agent approach is almost like training or fine-tuning the model because output is model or agent harness itself but once produced this harness or model is kind of frozen versus what her agents or autodream or other self-arning skills introduce is this mechanism for in context learning memory mechanism to making sure agent actually remember it action and feedback so that it can make a better judgment call the next time which means you get this agent that grows smarter the longer you use it. And this second branch is a part that is much more practically useful today. So cloud code open claw harness agent they all have their own different setup for the self- evvolving part and we're going to take you through implementation for each one of them. So at the end of it you have a good understanding about difference between each implementation and also form a good understanding of what a state-of-the-art implementation look like achieve this type of income self-learning mechanism. But before we dive into this one thing I think a lot of people get wrong with agents right now is that they assume more agentic is always better. But in reality there's a spectrum that there are different ways you can deliver large language model based system from just one single large lang model call to workflow based system like chaining steps together just like how you do in zap year and n and on the other hand you have 40 eogenic system that can make decision generate skills and evolve over time and sophisticated builder don't always go for 40 eenic system because it cost more token and can be slower in fact you choose the right architecture and setup based on the use case sometimes Sometimes you want something deterministic and predictable and other times you want something more flexible and adaptive and that's why this AI agent cheat sheet from HubSpot is actually really useful.

[00:03:59]It cover the fundamental of different agentic system. It breaks down and compare different production large learning model systems. How they are architected what they are good at and what type of use case to use the best.

[00:04:09]So you can decide what kind of system actually fits your use case as well as list of tips and pitfall that will really help you make your agent system much more effective. So if you're building agents is a solid reference for you to learn and think through the architecturizations. I put a link in the description below so you can download for free and thanks again to HubSpot for sponsoring this video. Now let's get back to the right harness setup for in contact selfarning. So at high level to making sure agent actually continuously learn from his own action or feedback.

[00:04:38]There are three main pillars where empower that which normally contain important facts like user.md or cloud MD file and normally there will be a separation between the hot memory which is something that always loaded into the system promp of the agent versus one memory things that will be loaded on demand and second one is skew a skill quite often contain the domain knowledge for agent to execute very specific type of task and third is a history which log the ro conversation thread so the agent can refer back and each agent harness like cloud code Open clot or hermise agent all touch different parts of those three pillars and let's firstly take a look at a cloud code how they implemented this three layer memory system that many people didn't know about. So when cloud code was just introduced initially we only have this cloud. MD file and whatever in this MD file will be feed into agent system prompt and this is where most people started to put a lot of preference additional guard rail to address agents behavior. But the problem is that then this file very quickly became too bloated and too large. Then a common practice is that people just put index or table content about also different other files into cloud. MD with a description to agent when to read and update which file and from people already view this type of hot and warm memory setup where hot memory is something that always part of system prompt and warm memory is something that will be loaded on demand and this setup is kind of like 99% of how people using cloud code today but many people didn't know cloud code actually evolved a lot and has a three layer memory system in place already and there's one article from Arton where he gave a very detailed breakdown of how the memory system work which is very useful so I highly recommend it. Go check out. But at high level, cloud code already introduces automemory feature that you can turn on.

[00:06:18]This automemory feature is basically instruction to the agent to ask to achieve something similar of what some of you already doing. Once it turn on, it has this special prompt as part of cloud co in terms of when to save memory and what type of things should be considered as worth saving and the agent will start saving those different memory file into the dog claw folder for each individual project. And it has a very specific structure that has this memory.

[00:06:41]MD file that consider it as a index or table of content of all the memory file and cloud code has this organization convention for different type of memories. It could be something related to user or could be a piece of feedback they give or related to certain projects as well as reference doc. So if you open your docloud code folder pest your specific project you might see a memory folder that has this memory md file that just lock the table of content and that specific memory file contains the main details. So the person is basically you talk to the cloud code and because the special prompt it has if cloud code notice there's something that worth remembering about user per feedback it will try to create a file and index in memory MD file and this memory MD file will be automatically load as part of system prompt to the agent so that it would know what are all different memory exist and read those file on demand and it kind of work in many situation but the problem here is that this system is purely prompt based which means to make it work you have to making sure agent remember to create and update those memories which we know large model can easily forget and skip some steps and that cause a problem because that means those memory get outdated very fast and those outdated information can actually pollute the context and impact performance negatively that's why they introduce this autodream feature and this autodream feature is something exposed during the cloud code source code exposure so people realize this hidden feature called autodream which is memory consolidation It's basically background process that will be triggered after certain session finish and it will start this new cloud code session with this special prompt to ask cloud code firstly look at what's already in the memory then check the conversation history to see if there's any memory that is outdated then consolidate all the different memory as well as update the index and this process will be triggered while your cloud code is not running to gather all the sessions read memory store results and consolidate. So this is three layer memory system cloud code has it evolved from just a single cloud.md file to auto extract memory system and now a background a sync process that will autonomously keeps memory update and even though it is actually pretty simple but it's kind of represent state-of-the-art setup for the memory itself which means you should have hot and warm memory where hot memory is always loaded into the system prompt which normally include index or table content of other warm memory than can be loaded on demand and then you give agent instructions about when and where to write those memories as well as some as sync process to automatically update it.

[00:09:09]But the limitation with cloud code setup is also they mainly have mechanism to handle those kind of facts memory. But there also very important pieces need to be filled in like the skill which is domain knowledge as well as auditable history and even though cloud code does have skill feature and also does have a row conversation log but the conversation log for example is not searchable. It is there but it is not designed for agent to search across because it doesn't really make sense in the coding agent context and skill even though it's supported but is more or less relying on human to find some skill and equip cloud code with it and it's those gaps that made people feel open CO is so much smarter than other agent when people first try it because they put those memory as first class citizen so they have a list of more defined memory file each represent a different aspect and they also have a bootstrap.md file which will instruct agent to chat and cl information proactively from user and they also have the daily log provide high level snapshot of interactions between human and agent and most importantly they have this memory search tool out of box and this memory search tool will search across all those memory file as well as a ro conversation history and that's what made open claw feels like it just remembers things across all different sessions and also another aspect is skills open claw agent has very specific instruction to tell the agent that you can use cloud hub to search more relevant skills and can add and move update skills on the go. And when you look at open claw setup, it actually very simple still, but they just designed a whole system to making sure this type of self-improving is a core of their agent harness. However, it also still have problems. So when you use open cloud, you will notice all those memory creation, skew creation and memory search still requires human to prompt it properly. And this note a scene called proactive process that is autonomous updating those memories and this is a gap that her agent comes in and try to solve it and they basically introduce two concepts that really made the agent feels much better. One is autonomous skill generation another is memory reviewer and autonomous skill creation is a crux of the system. So Herb's agent has this mechanism that is counting the number of steps agents doing and every time when agent run more than 10 steps without creating any skills. It will spin up it new sub agent that will not block the main agent process but add background but review what has been done and decide if is there any useful skill that can be created to makes its complex process more stable. In the promo skill reviewer agent basically looks something like this. Review the conversation above and consider saving or updating a skill if appropriate and focus on was a non-trivial approach used to complete a task that require trial and error or changing course due to experimental findings along the way and from there agent will create skill in format like this. So the agent is equipped with this skew manager tool that allow them to create a new skew patch or add existing skew, delete one or write and remove files from a skill and also add a proactive prompt in the main agent saying when using a skill and finding it outdated, incomplete or wrong, patch it immediately. Don't wait to be asked.

[00:12:07]Skills that unmaintain became liabilities. It is this flow and system that made Hermes vision just feel so much smarter in ter extract it learnings and do it better next time. And because it is giving agents ability to create skills itself, they also add this concept of safety scan which means when agent try to create new skill it will go through the skill guard Python file where they define a whole bunch of reject pattern and once those are detected it will automatically fail and delete the skill and also send message back to the agent so that it can know how to adjust the skill. If it all good then it will be saved. So this first one of autonomous skill generation it basically have this autonomous process to making sure domain of procedure knowledge is autonomous saved and maintained and on the other side they also doing the same thing for the general memory and facts. So out ofbox her agent have this four main tiers of different memories. They have user.md file which mainly contain who user is preference style workflow habits as well as memory md file which contain the environment facts about the project conventions operation systems and those two are part of the system prompt every time. Then they use skew for the domain knowledge that will be loaded on demand as well as row history. So every single conversation history will be saved to this local SQLite DB that can be searched and retrieved using session search. And if you need they also have a way for you to plug into a semantic memory layer like meme zero or homecho.

[00:13:29]The main part agent is managing apart from the skew and the row conversation history are just this two file of memory MD and user.md and each one of them have very strict character caps that in total is less than 4,000 characters. So you can see that they really try to push agent to just use skew as a way to maintain most of task knowledge and they have similar type of async background process to extract memory. It is counting the number of agent turns and only after 10 turns. If there's no memory extraction happened before, it will respond a new memory reviewer agent with a special prompt. Has a user review things about themselves, their persona, desire, preference. Has a user expressed expectation about how you should behave.

[00:14:10]If so, save them to those two files. So, this is how her agent works. You basically have the hot memory that is autonomous extracted every 10 turns as well as warm memory of all sort of different skills that again autonomous extracted every time when there's more than 10 steps as well as large code memory for conversation history and semantic long-term DB the agent can search and after going through this you can basically map out how the different agents works and understand why Herms agent feels just smarter because it has those a sync autonomous process across skew and memory creation updates as well as a way for agent to search ro conversation log and this kind like a state-of-the-art implementation for you to build any kind of income context self-learning aspect for your agent too.

[00:14:52]You basically use skew for capture domain knowledge, memory for facts as searchable and audible row history. And ideally, you have a sync process. So we don't rely on agent or human to extract and maintain a snapshot of knowledge.

[00:15:06]And if you're already using open claw, you actually don't have to change to her agent to get this type of really good self-learning experience. There are different skills on the market. They already available to plug in and enhance your open cloud or cloud cause memory and self-learning setup. And here are three skills that I tested and found a pretty novel approach. I put the table here that you can take a look in the detail, but their setup is very similar to what we just discussed before. Just implementation wise, each one has its own pros and cons. And the most popular one is this self-improving agent skill.

[00:15:35]They introduce a simple memory structure. Apart from open claw's own memory, they have this learnings folder with learnings arrows and feature request defile. And they have pretty smart use of hooks to making sure this memory creation and updates are more following. For example, they use this user prompt submit hook. So every time after user send a message, they will capture that and feed a small piece of prompt to just make sure agent follow this memory generation pattern. Then they also have this post to use hook.

[00:16:04]After every bash command, they will check the result from bash command to see if they match with any arrow pattern. If it does introduce arrows, they will again append a error detected reminder prompt as part of tool result.

[00:16:17]And for open claw when it is bootstrapped, they also have the self-improvement reminder MD file that is injected as part of the system prompt. So if you already have agent that you use for a while, you don't have to suddenly change your agent to the another one. Though the migration from open clouds agent is actually pretty simple. They have just one command to migrate everything over. So this is basically a state of art of how teams are achieving in context self-learning agent behavior. And as you can see, it's actually surprisingly simple. So if you're building your own agent harness, I hope this is useful. Meanwhile, if you want to learn more, I also have a more detailed breakdown of different agent memory and harness setup with step-by-step module in AI builder club where we have group of top AI builders who are launching agent products. And we have weekly workshop where myself or other industry experts will come and share the latest tips and practical learnings. And we recently launched this new platform called Koolit. There's also a self-improving agent but monitor all the critical data across all your business. Prioritize growth actions autonomously and every day every week review the results so you can drive the growth autonomously. You just give your company website connect all your business data source and integrations.

[00:17:22]They will analyze across different data source and build the organization memory and start taking actions autonomously across content, leads, ads or any other growth operations. And most importantly, it remembers all actions ever took so you can review and improve the next time. We're opening early access to member in AI builder club. So if you're interested, I put the link of both AI builder club and ket in the description below so you can check out. Thank you and I see you next

Related Videos

Artificial Intelligence

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Artificial Intelligence

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Artificial Intelligence

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Artificial Intelligence

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Artificial Intelligence

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Artificial Intelligence

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Artificial Intelligence

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Artificial Intelligence

3D Platformer Update - NO CAPES

SolarLune

294 views•2026-05-30

Trending

Computer Science

The Meta AI Hack Is a DISASTER

LowLevelTV

141K views•2026-06-03

Paris is in SHAMBLES right now 😭

H1T1

4053K views•2026-05-31

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30