Long-running agents are AI systems designed to handle multi-week processes by maintaining durable state machines, persistent sessions, event-driven wake-ups, and multi-agent delegation, enabling them to pause cleanly, survive restarts, wake on signals, and delegate work rather than relying on continuous chat history like normal agents.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Long-Running Agents — Build an Agent That Never Forgets with Google ADKAdded:
Let's talk about long-running agents.
You may have heard the term and some of you may have already built one. But for those of us are still learning, myself included, there are a few interesting questions. First one is like why that name? And how is it actually different from a multi-agent system that is already out there, right? Honestly, these are the questions I also had. So, instead of jumping into a technical explanation, what I thought is let's [music] look into a real-world scenario.
And this is something which you and I may be able to relate, right? So, let's say that we build an HR onboarding agent, okay? And its whole job is to help a brand new hire [music] through everything that the employee would need in the first couple of weeks.
Weeks. [music] That word, weeks, turns out to be the whole point, right? Let me show you why.
So, let's say that day one this particular agent sends the welcome [music] packet, right? The documents that the employee needs to sign, and that's where it will check off the first task, [music] right? And then nothing. Because now it's waiting on the human. Now, the new hire will have to go through and read the documentation. Of course, they can sign it pretty quickly, or they might just take a couple of days, right? [music] So, a few days later the signed documents come back, and then the agent needs to wake up. It needs to remember what has been done before.
And then the agent will set up their email, their Slack, and all of those things. And then it will hand the laptop request to a different team, maybe [music] to the IT team, right? Different team, different system. Again, the agent will need to now wait, this time on laptop, which will be shipping to someone's house. So, a few more days can go by. Laptop finally shows up. The agent wakes up one more time, right? And then it will send the employee a day-to-day schedule. And that's when the fully onboarded process completes, right? So, if you take a step back for a second and look at what just happened, right? This wasn't like a chat conversation or a continuous conversation, right? This was a process that ran for almost two whole weeks. And honestly, it spent almost all that time doing nothing. It was mostly waiting on humans. And this is a real-world scenario.
So, here is the obvious question, and it is the one that I kept [music] asking myself. Couldn't a normal chatbot agent or a multi-agent system do that? And the answer is actually no. And the why is really the whole point of this video.
See, a normal agent keeps everything in its chat [music] history. Every message, every result, and then it resends the whole pile back to the model on every call.
>> [music] >> For a 5-minute or 10-minute chat, that's totally fine with the kind of context windows that we have.
But stretch it to 2 weeks and it'll start falling apart, right? And the [music] history will really fill up and the bills will climb from a token perspective.
So, this is where a long-running agent comes [music] in. And here's the part that surprised me. The fix isn't a bigger context window. It is simply [music] a different architecture.
It's not just about giving it more tools or an MCP server or a bunch of skills, right? Because all of that is about what the agent can do. This is about whether it can survive the wait, which is stressed for 2 weeks. Totally different problem. So, you can hand it every tool in the world, every kind of context, every skill, but it'll still fall apart on day three.
And when you strip it all the way down, a long-running agent is really just an agent that can do four things a normal one cannot. It can pause cleanly.
>> [music] >> It can survive a full restart. You can kill the server and it doesn't lose a thing.
And then it can wake up a moment a signal comes in instead of just sitting there. It'll keep checking like a doorbell, right? So, you don't stand at the peephole for 3 days. Just wait for the bell.
>> [music] >> And the fourth one is it can hand work off instead of trying to do everything itself.
So, honestly, this is the answer to the question I opened up with, right? A plain multi-agent system is really just agents talking to agents.
A long-running agent uses that same delegation for something a lot harder.
Staying alive and staying coherent for weeks. And that's the real difference.
So, the best part of all of this is you don't need some brand new framework for this, right? You can take an agent you already have. Let's talk about the Google Agent Development Kit or the ADK agent in our case, and you can give it four properties. A durable state machine, persistent sessions, event-driven wake-up, and delegation, and you will have a long-running agent available for you.
So, what we will do is we will actually build a long-running agent or look into it the code of the long-running agent so that it's crystal clear.
But, before that, a quick disclaimer.
All opinions are my own and do not belong to [music] my employer. All right? So, look with that, let's get into it. All right. So, in order to showcase this, I wanted to actually build an app. So, I am actually in Antigravity and I gave it a prompt to build me a long-running HR onboarding agent, and this is the UI that I've created. And I'll take you through each one of these components so that you can understand the overall structure, right?
So, if you look at this, there are four different components. Component one is onboarding state machine. Component two is persistent checkpoint, then webhooks and events, which is for waking up the agent and making it sleep. And then the last piece is the multi-agent handoff.
But, all of these things we discussed in the initial part of the video, right?
So, when you think of component one, think of this as if you look at the onboarding process, right? This is not going to be a sequential process like immediate step-by-step, right? It is a bit of broken where you have some explicit steps and then there needs to be some sort of a handoff, right? So, that is what you would see. The second one is persistence. This is very an important one because it says that the active database is onboarding.db, which is a SQLite database. So, every time an agent finishes a step, right? It writes a progress to the disk in this case, SQLite. So, this isn't like living in memory. If a server died right now and gets restarted, the agent will then come back to this exact step and nothing is lost, right? So, this is very important component of any kind of long-running agent, which is the component two and the persistence.
The The third one is around agent wake-up status, right? So, this is where we are determining whether the agent is dormant or it is asleep or doing nothing. And when as soon as there's an event that gets triggered, it hits the webhook and the agent wakes up, right?
Once it has completed the steps, it goes back to sleep. It's sitting in a loop and trying to understand whether the job is done or not, right? And the last piece is around delegation, which is as I had mentioned before, right? Instead of one agent doing everything, you're delegating it to to multiple agents, okay? So, those are the components that you need to look at. Now, if I start the simulation guide, you will see that even the code is also going to be reflect to reflecting the changes. So, let me zoom a little bit here so that I can see it in an action. I'm going to trigger trigger the webhook event and you can see that it the first thing which it has done is I was as mentioned, right? It has sent the welcome email. And this would go ahead and also change the state of the agent, which is now in the Abacus agent. And also the coordinator will wake up. It will inspect the active state and then update the step to welcome sent in the database, right? So, once that is done, now it goes back to sleep again and you can see here, right? So, it has done one step and now it is waiting for the human to provide it back, right? So, that means once the human sends back, there needs to be triggered event and that's what we are doing, right? So, we are creating that trigger event and now that the document signed event has happened and then the agent is awake again and it is now working with another department.
So, you will see that it it will actually move to the component four as well, which will it will hand it off to the IT provisioning department. And that's what you were seeing, right? So, once that is done again, the agent is awake and it is waiting for the hardware to be delivered and once the hardware is delivered, then the agent will basically report back and update the status as completed. So, you can see like all of these different steps happening and how each one of these components play and you can see just now the agent went back to sleep after updating uh the database in terms of the checkpoint, right? So, these are the four newer components that you will actually update your agent code with and all of this actually makes your existing agent long-running agent, right? So, I just wanted to show you literally step-by-step exactly how this works, what are the different components and I'm really hoping that this particular demo helped, right? I'll also share the official blog that Google Cloud folks have written, which was also an inspiration for me to build this particular demo. As always, if you have any questions, do let me know.
And if you like the video, please do let me know as well and if you're new to the channel, please hit that subscribe button. It helps me continue the journey. Thank you very much for your time. Thank you for watching and I will see you in the next one.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











