Giving AI agents access to a persistent file system (sandbox) fundamentally changes their behavior by enabling them to follow through on long tasks, stay on track, and build on their own prior work, rather than just adding storage capability.
深掘り
前提条件
- データがありません。
次のステップ
- データがありません。
深掘り
Give Your Agent a Computer — Nico Albanese, Vercel追加:
I now have my uh live deployment here and I can continue to dashboard and I've got my project one out of five things for the production checklist.
Is that all? Were you able to get that?
>> Cool.
>> So, we're going to jump over to the repository on GitHub that clicking that big repo button. And then I'm going I'm going to quit out of Slack and all these things. And I'm going to jump into a terminal. Head over to a directory that I'm happy with that I'm happy to pop this in. And I'm going to get clone that. Oops, that's not it. I need to copy it, obviously. Click the big green code button. Copy the clone and get clone AIE London demo.
Awesome. I'm going to jump into that uh that repo there. And then if we jump back to the doc site, this is at by the way this site is if you go nicapp.comie and then that will redirect you here.
Um, if you don't have the Verscell CLI installed, you can install it with so you can install that with mpmi-g Verscell.
Uh, that will probably take you to a Verscell login first time you use it.
Uh, which will pop up this um this sign up process that you can allow and then you should be signed in. And you can check that with versel. Who am I?
Yes.
Good thing is that in a small room like this, the Wi-Fi shouldn't be an issue.
>> Were you able to log in?
>> Cool.
>> Yep.
>> Perfect.
the our IM team is doing awesome work with the Versel CLI. Um I'm a big fan. I am biased, but I am a big fan. Uh so once we are all set up, logged in, we go into our repository that we just cloned and we run Verscell link. Uh we say yes, should be able to find it if you have the same name. Uh, and then we want to pull down our environment variables. And that should give us the, which I probably shouldn't show, um, but it gets invalidated. That should give us an OIDC token which we'll be using to authenticate with both the AI gateway for our inference and for self sandbox which is yeah we we get that a lot as well where internally you can imagine we have a ton of teams I think I'm on like seven or eight different like versel teams and so authenticating with the right one can be um or finding the projects can sometimes be challenged, but making it easier every single time.
>> Yeah, you may have already had a token in there from >> a work one.
Nice.
Oh, of the vers CLI.
>> Oh, yeah. Yeah, that is if you have if it's installed with uh with npm, you can do npm install-g uh versel at latest and that should get you on to the latest one. I had that recently as well because we just invalidated the old authentication method in favor of a new one.
You can also look which versel and that should tell you what what you installed it with.
See mine is with pmpm instead of npm which that classic frustrating problem of which package manager owns this which for Were you able to set things up? Yeah.
I love your shirt, by the way.
>> Homemade.
>> It looks It looks very professional.
>> For those watching, it's Typescript shirt. It's very cool. Typescript till I die. Ding that hill.
Is it getting set up or it's installing new?
>> Great.
>> Yeah.
>> Which is the versel link.
By the way, if you haven't seen, we did ship a new landing page. Finally, after two years, we replaced my terrible landing page. Um, which this was definitely not done by me. This is design engineers on our team. Very, very cool. Um, live download figures.
I love showing this work because it's looks so good.
Oh, and if you didn't know, if you've used Open Code, Open Code is all built on AI SDK as well. And this is the most DAX quote ever.
Open code uses AI SDK. Uh, all right.
We all set up.
>> Cool.
>> So, we're good to go. So, first thing we're going to run is uh pmppm install or npm install. Uh if you're using npm in 2026, maybe maybe use pmpp or bun. I love bun. Uh and then we can just check that it is working. This is a fresh xjs app, so we shouldn't have any issues.
But if you run the dev server and then head to localhost 3000, you should see the wonderful um pretty new actually starting screen.
>> Is that all set up for everyone?
>> Yes.
>> Amazing.
>> Uh so we're going to start with our agent. We are going to build an agent.
First thing we're going to have to do is install a few dependencies because we are working with JavaScript, right? We love installing dependencies. So, we're going to jump into the terminal and we're going to run pmppm add AI at AISDK react and uh ZOD. Don't worry, we're right at the beginning so you're not missing anything.
Um, so we are, if you want to follow along with your computer, um, I should write it. Is there like a whiteboard or something? No, there isn't. Um the this site that we're on is niko albbanaz.comie.
Um and that will get you to the project setup which you can follow right here.
And then we're just on we're literally just starting off with building out the agent. Um so uh first thing we're doing I was saying we are building with JavaScript.
So we got to install some dependencies.
Uh we are going to be installing the AI SDK which is this very nice uh twoline npm package right here. Um our react adapter which we'll see in a little bit and zod which we'll be using for defining a few um a few schemas that we'll be using later on in the project.
So last time I did an AIE workshop was in AISDK 4 and back then everything was built along pretty much four primitives which were generate text stream text generate object stream object those still exist although we've kind of pushed as much into the the text generation functions as possible. So you can now do structured outputs with just those generate text and stream text. Um, but we've also been working on providing a more objectoriented approach to building agents with the SDK. And part of that was like we saw in our own applications how much they were ballooning by having all of the LLM logic in the call site. So you had your API chat.ts and X.js route handler and in there you had 2,000 lines of code because you had the tools defined in line and you had the system prompt defined in line. And so um part of the the beauty of the AISDK is it is lightweight JavaScript. And so you can define this once in code in a monor repo and then use it anywhere whether it's a next.js app, it's a it's a simple bun server, whatever it is, it's plain JavaScript. Um and so this was our like first foray I would say into that which is what we call a toloop agent. Now um we have the mastermind behind the AI SDK is German. Uh and so naming is something we uh we hold quite dearly. Uh this is one of our shorter APIs but we find that this is quite obvious as to what it does. It is a tool loop using agent. Um kind of does what it says on the tin. Uh so to build our first agent, all we need to do is define this import this tool loop agent um and then specify a model.
So we're going to do this. We're going to create a new file called um agent.ts in a lib folder. And I'm actually going to open this up in an editor so it's a little easier for me. And I'm going to bump up the font so everyone can see.
Let's do Is that better? Can we read? Yeah. Cool.
Um, so we're going to head into the app.
Actually, we can do it in the top level and we're going to create a new file and it's going to be in a lib folder and it's going to be called agent.ts.
Now in here we'll copy this snippet and like I said before we're importing to loop agent uh specifying our model. Uh the cool thing here uh if you've used the AISDK before you may have seen this syntax where you uh import a provider.
So you'd import open aai from AIS if I can type at AISDKAI and then you could specify this would create a model provider instance that you could then specify a um model ID in here. One of the cool things that we ship with uh in AI SDK 6 is the concept of a global provider which allows you to effectively attach a provider to every single um AISDK function that you have in your application and by default that is the AI gateway and so by specifying just plain strings you can access any model in the AI gateway. If you want you can override this and use any provider as that global provider here. Um, but this just makes it really easy to get started. So, uh, we are going to be using, what did we say? GPT54 mini.
Always hope to keep these kind of things up to date. I did this last night, so this is pretty up to date. Uh, but probably when this video goes out, this will not be. Um, this model will give us I haven't used 54 mini a lot. I've used 54 a ton. Um, but this gives us a nice way of kind of cheap and fast way to experiment with our agent as we go. So this is our agent definition. Kind of surprising how how little it is, but it literally is just this reusable agent using GPT5 for mini. Um we're then going to have to create a way to call our model and we're going to do that by building a route handler. So that's going to be in a folder called API which has a folder called chat. And then in there with NexJS's conventions, we can define a post handler in a route.ts ts file.
Uh so again, that's a app API chat route.ts.
And so one of the cool things uh going back to why you might want to use this instead of just stream text out of the box is that this allowed us to keep all of our agent definition in one place while having just the concerns for streaming from that agent in another place. And we've abstracted a lot of the complexities for that streaming into really that one line of code, this create agent UI stream response. Again, if you've used um the AI SDK before, what's happening here is literally the same as uh con result equals stream text. You have your stream text call and then you return result.2 UI message stream um response.
Uh so it's effectively the same thing.
Um, and on your agent, you also have my agent.stream, which is your agent's streaming function.
Um, so that gives us our way of, and I'm going to get rid of this. Uh, that gives us our way of actually calling our agent. You can see that our post handler is taking in some messages and just passing that in alongside our agent. And now we need to actually create a a page um and a client to call that uh that agent or call that endpoint. So we can head to our page.tsx our homepage and replace it with the text that's here.
And what we've got here is uh the infamous now use chat hook which was actually the first component of the SDK when we launched whatever three three years ago. Um, and this is managing all of our message state and then helping us send those messages off to our route handler. So we have we dstructure messages which will be our message store. It will all be on the client today. Um, error and the send message function. uh we check if there's an error and then otherwise in our markup in the UI we will map through the messages in our message state and then for each message we will map through the parts uh and we can have you used the AI SDK before is this first okay so and and you have as well so um or you haven't this is the first time seeing okay great so most of you guys have seen this so I can fly through this um but yeah this is how we're rendering on the client so we can now run pmpp rundev and we should now think I we should now be able to say hi and get back our response. So we've got our our chatbot or our agent as we should say it's 2026 uh ready ready to go. Now the first and most basic thing that we'll be wanting to update our agent is to change its behavior. One like very core way of doing that is altering its system prompt or its instructions and you can do that with the AI SDK in your agent definition by passing in an instructions uh parameter. So you can do that here. I can say respond like a cowboy. Um and now when I go back to our agent and I say hi, we should see howdy partner. How can I help you today? with a nice cowboy emoji as well. Um, so this goes back to I mentioned to Casper right at the beginning. Uh, what I wanted to do today is try and communicate uh what I believe to be the core building blocks for building agents in 2026 which are an agent runtime uh being a way to build your your harness effectively. So how do you manage the loop? How do you manage context between the loop? That kind of thing. uh the tools itself that you're passing into that runtime. And then finally, a computer uh or some kind of sandbox file system for the agent to be able to persist state or execute code uh within within its run. Uh we'll call the runs being the work the sessions that it's working on. And so uh this is like one of building blocks, the instructions. I think a lot of folks kind of scoffed their head at system prompt being something of 2023 when we were first really starting, but we'll see today how we're going to use the instructions alongside those two other comp really like get the most out of an agent and really uh influence its behavior. Um so that's the starting point. That's we've built our basic agent here. uh we're going to do the first thing that I think a lot of folks will want to do is give the agent the ability to pull in more context in some kind of way. Uh basic thing that we can do today because we don't have a great use case is just bring in web search. Uh so the way that we can do that is we're going to jump back to the terminal. Uh I'm going to stop my dev server here and I'm going to run this pmppm add AI SDK OpenAI.
And the reason we're going to add the OpenAI provider uh is to be able to get the tool definition for something that's called a provider tool. Now there are roughly three types of tools. There are custom tools and I can actually find this uh right here because this is a good help for me. Um so there are three types of tools. You've got custom tools which are tools that you define yourself. Uh you provide a description, you provide an input schema and you provide an execute function. This is effectively giving the agent the ability to run any kind of arbitrary function that you want to give it based on the context of the conversation. Uh we then have provider defined tools. These are quite interesting. These are tools that an LLM provider will usually postrain their models to use effectively. So, Anthropic has like a bash tool which alongside all that cloud code usage that is subsidized uh they're training to use a lot better. Um they also have uh a tool for computer use as well. And so the idea here that you define what the agent should do when it calls it, but they've worked really hard on the description and the input schema to make sure that the agent is very effective at calling it. The final type is uh provider executed and these are tools that the that exist in the LLM provider's infrastructure. And so uh the classic example of this is web search web search tool. Anthropic has a web search tool as well. You don't actually provide the tool or the what should happen when the tool is called but you kind of opt yourself into the LM provider being able to use it. And if the agent decides to use it, they will literally execute that on their server and add the tool result to the message state and return all of that back to you. So the really nice thing about it is that we don't have to write any more code like you get it out of the box. The bad thing about these provider executed tools is you're obviously tied to a a single provider. But for today, for this demo, uh not so much of an issue and allows us to move quite quickly. Uh, and so what's happening when we when we import the OpenAI provider here is that we're literally just using this to augment our request that's going off to OpenAI ultimately to include that opt-in flag like I want the web search tool included. Um, so we will add that into our definition, our agent definition. And you'll see we remove our instructions as well. And actually that is literally it for us to well I need to run the dev server as well ppm rundev but I can now jump back to my agent here and say like when is AIE London AI engineer summit London I think that's what this is called and we'll see a long pause terrible UI but eventually we should see our response come back in here our augmented contacts are rag bot, if you will, using the old archaic language. Um, but our agent now has the ability to fetch in and pull in relevant context uh via web search. And there's some cool things that we can do here. Uh, if you pass in an object into this um into this tool uh what is this like factory function here, you can actually specify things like the user location here. And so you can say type I just use the uh language server to kind of help me here. But I can say that we're in London.
Uh and that allows you to maybe get like uh when is AI engineer summit? So we don't include London and in theory we'll get more London oriented. Yeah, there you go. So like that extra context or those parameters are now being passed into OpenAI in into the request that's going off to open AAI ultimately.
Uh so cool we've got a way now of augmenting our context and we've provided our agent with its first tool.
Um but the problem that we have if we show this again uh when is AI engineer summit you guys already saw this. Uh I think what did I do? When is AI is that there is nothing showing up in the UI right now. Uh so user has no idea what's going on and uh well the agent is actually going through multiple steps there and calling a tool. So what we really want to do is actually render in our markup in this uh page.tsx.
We want to render for different uh we want to render what should we want to describe what should be rendered in the UI when different tools are being used specifically when our tool web search.
Uh so we could go in here and we could say case and then the AI SDK follows this convention of uh you a prefix of tool and then you would specify your tool name. So we'd go and we'd go to our agent and we see it's called web search.
So we'd say tool web search and then we define some kind of markup here like uh let's see say like div web search.
My god I'm typing I haven't typed in months. Uh and you can see now that our agent has called web search. Um but this process is terrible. Like what if we wanted to see like what is the p like input? There's probably a query here that's going in but it's typed as unknown. That's ugly. We don't want to do that. Um so this is where I'm going to delete all of that. That's why typed it and it's not on the page. This is where we are going to leverage another really awesome component of the AI SDK which is the endtoend type system that we've built. Um, so the big assumption goes through every single AI SDK API decision is like we want to have the agent definition being the source of truth that everything kind of inherits from and deres from. So we spend a lot of time across tool calls across the UI library making sure that it can flow nicely. Uh, and so where we start where we have to start with that is the atomic unit of state in our application which is the message. Um, and so we can get a typed message by using this type helper infer agent UI message. Uh, so we're going to jump back into our agent definition and pass in the type of our agent. Now this agent UI message is fully type- safe based on the tools that we pass. Um, and so we can now if we head to the route handler, we can update the route handler to show that the to define that the messages coming in are of our custom UI message type. But most importantly now we can go back into the page and we can type use chat as having my agent UI message.
And so what you'll notice uh once you've done that and that's imported from our lib agent file. So when we scroll back down and we check for the different part types, we'll see that we should have our web search tool here now. And if you were to go in here and check for the um the input and output types, you'll see that they are typed, which is very cool.
So, I'm not going to bore you with trying to type out and build a component on the spot because I can't design at all. Um, and I don't think any of us design anything really by hand anymore.
But you'll see if we jump back, we now have a nice component in line for our web search. And we can try that out again and say who is Nico Alanz. Sorry, kind of egoistic. But, um, now you can see our component has that pending state, which is really cool in line. Uh so what do we have next? That's augmented our context. Now on to like what I think is the most interesting component also in 2026 which is providing our agent with a computer with a sandbox to interact with. Um, and to kind of take a second here, we saw this really take off where internally we have an agent called DZero, which has access to all of our like the chat with your data that we're talking about has access to pretty much all of the versel back backend, our entire admin panel, all of Salesforce, all of these kind of things. Uh, and the idea here was I think it was our head of data who wanted was was destroyed from getting tagged in everything and he wanted naturally a replacement Slackbot that could do all of that um, and help scale his team as well. And there was a really interesting point where it went from let's have all of this this agent that has all of these tools and it would use maybe five or 10 of them at a time and return an answer that was kind of somewhat hallucinated to when he added a file system to it and some instructions there to say okay every single piece of work every session that you do you have like a scratch pad in the file system which is where You're storing like an initial plan that becomes your reference for exactly what you should be doing in each step and then you have a directory that's for your research and you collect everything in there. And there were two things that were happening there. For one, like the agent started f like actually following through and going through entire tasks because things weren't just being layered in this insanely long context window where the initial thing was getting was getting thrown away. like the the instructions there were create this plan file and in that plan file was the objective right at the top and then right below the instructions were follow this plan file to a te check things off as you go and all of a sudden now you have this fascinating thing where the agent is reading and pulling in and reminding itself step okay this is my objective and so it's staying on track more and at the end you get this really great artifact that shows exactly what the agent did what work went into it. Uh and so it was kind of this emergent behavior that we saw where they were actually very good pairing this file system whether it was a full computer or just a virtual file system worked very very well. Um so we've now seen it across pretty much every agent that we build internally.
We've got an agent for the GTM team. Uh we've got an agent like I said the data one. Um, we have a customer support one that pushed our customer support tickets down by like 90% with I think a 95% like they're people that were actually saying thank you, thank you, which we we haven't really seen before. All of these are backed and underpinned by this kind of file systembacked agent approach. And so that's what I wanted to show off today. Um, we are going to be using a Verscell product for this Versel sandbox and I have built a lot with different sandboxes over the years. And there's something really cool that is just shipping now that's in beta which is called um named sandboxes, persistent sandboxes, and I think I have them up right here. Uh, and that's what we're going to be taking advantage of today. So one of the uh issues that you have with sandboxes in general is that unless you're using it on your computer, they are fundamentally ephemeral, right?
Like whether a provider has a a sandbox that can live for 30 days or five hours, at the end it's done. Um and so we were thinking about how we could work around those challenges. And what the team ended up coming up from is this really cool concept of like every sandbox has a name and then every sandbox can have sessions and sessions are just instances of that underlying like sandbox item we can call it. Um, and every time you in your product or in your code want to use a sandbox, you reference it by its name and under the scenes, Verscell will either see if there is an active instance for that session and route you to that or it will spin up a new one optimistically and route you to that. So in your code, you simplify all of this life cycle management of is my computer running, which one should I point to?
I've written we'll show if I have time at the end something I've been working on that I built out an entire like life cycle management system where I was literally like uh taring the entire file system after every single request and then storing in blob and then like spinning up new it was terrible and this is just so much easier. So where we'll be getting to today is almost this kind of like I hate to drop the open claw thing but like we will have be able to have this specific computer that we'll be able to reference throughout every single request. Um and behind the scenes what will be happening is that Verscell will actually spin it down after a timeout after inactivity but it will snapshot the file system. So the state remains the same and when you make subsequent requests it will spin up a new one with that snapshotted state. So it's effectively feels like the exact same machine which is really cool. Um so this is in beta right now. Uh but I'm already using it a ton and it's working really really well and we've got a lot more that's coming here. So the first thing that we're going to do is we're going to jump. By the way, any questions so far?
>> Yeah, of course.
>> Yeah.
>> Yeah.
Yeah, great. Um, so the question because I don't know if the mic picks it up is like do you you're saying for every agent request do you send the entire history?
>> I think that's the default >> with use chat. Yes. So that is the default um is right here. When we make every single request with send message, this is going to send the entire this is going to take the entire message history in the route and send that over to your agent. Um, now you'll always want to send or that's a bold claim. Uh, you will your your message history is your context, right? Going into to into this into your agent. And so that will always want to be something bigger, right? Like as much of your conversation history as you would like, but sending over the wire all of those messages every single time is not really something that you'll do. And so the the more classic pattern is sending just the most recent message here. And you can do that in your client with uh prepare. I think it's in the I don't write code anymore so I can't remember what the name is. Uh it's probably on the transport which is new.
And then we have uh in here where is it?
Oh, we're not importing anything from AI from AI. Uh I'll just show you very quickly. Actually, you know what I should be doing is just show you in the docs. That's even better. Don't know why I'm trying to type this all out. Um, basically what we will do here is you can specify to just send the most recent message and then you fetch the all of the messages on the server in your endpoint and you put those together and send those to the model. But where I think you're where you're leaning is more around like the context thing, right? You don't want to send old irrelevant things to the model, >> right? like the context engineering side of things of like pruning stuff and and making things go away. Is that where you're going or >> Yeah. Um so that's where things get harder to do uh more from a they're they're simple to do with the SDK, but it it requires tradeoffs um and it requires decisions from the developer that we don't necessarily impose from the framework. So we have a few approaches that you can do here. They're all based on using or most of them are based on using uh two things. For one, in um behind the scenes, what's happening in our endpoint here is that we're actually taking our UI messages, these messages, and we use uh a function under the hood called convert to model messages, which strip away a lot of the UI um specific like timestamps, ids, and all that kind of stuff. Um but under the hood as well you can define this I believe to like ignore certain things convert data parts to different structures to alter your context that's coming in. Um so in general like you own the messages that are coming in. So if you wanted to messages mapap and like go through and say uh if m do if message roll is I don't know uh assistant and uh m.artsinccludes what is it like if there's a tool call this with the typing starts getting fun.
What is the type here? Basically, you can go through and and and define it like I want to strip out this type of tool call. You can totally do that. Um the the way that we see bigger code bases doing uh is actually using a function that we have called prepare step. And this runs before if we think about everything that was happening in our previous web search. is like agent, we send a message that spins up a new assistant message. In the first step of that assistant message, it called web search and then it sent back it basically the agent runtime took all of that context and then decided what to do next. And it could have used web search again, could have used the computer, whatever it might be, but each of those are like individual steps. And with the AI SDK, you can listen in. you have this callback for that runs before every single step and then this allows you to modify any of the parameters going into that specific step. So what do you get in here? You get like the messages, the context, the model that you're using, the step number and any of the like the steps data, what has happened. Um and then in here you can literally return any of those top level parameters. So you could return a completely different message state for like if uh let's say if step number step step number or is greater than 20 you could basically take in just where is it? These are the messages that are actively and running. You could say messages slice and take just the last five. And so in this way you now have running at the beginning of every single step you've got this sliding filter that is only taking the most recent five messages.
Now this is a very interesting like long rabbit hole that we can go down because I've been thinking about this a lot. Uh, I was telling Casper before like I I've been working on a coding agent for a few months now that I have it like yesterday it ran for uh I wonder if I can find the exact photo but like it ran for 104 minutes in in one single turn. Uh let's see if we have the exact so I can prove to you these are all my I think it was right here. Yeah. 104 minutes it ran for used 316 tool calls changed 29 files and it used only 32% of GPT54's um context window. I have zero compaction running on this. And I think this is like kind of a hot take today because everyone's got like their own systems where they're I I previously had I was stripping away tool calls that were earlier on in the message history when I hit a certain threshold so that I would stay between 40% and 60% of the token window or the context window. Um, but you have a really big problem that comes up when you do that, which is you invalidate the the input cache every single time you do that. Um, which I think is like it's is it's like a oh Yeah. Like of course. Um, but it was important when the token windows I think were like 400k. So when I when I was first building this main models I was using was Opus with I think the 400k token window and GPT53 codeex both 400k token windows and so you would hit that 40% pretty quickly but with these million token windows I'm not seeing as much of an issue. So what I'm building instead for it is more of the are two things. one uh more comprehensive like I have sub agents implemented here and we can talk about that a lot as well like taking dedicated pieces of work that can be somewhat independent and putting them off of the main context thread and then like returning just a thousand tokens which is the summary of whatever the objective was and bringing that back in. So that's just like efficient use of the context window in a session. Um but also having like a dedicated tool. Uh the AMP the folks at AMP did a really cool implementation of this is giving the agent a handoff tool which at a certain point agent can call generate some kind of context that it wants to pass into a fully new thread and that is that becomes like the main thread to start off with. And so I I feel as as much as you can push into sub agent territory is the because these uh summarization this compaction by agent summariz or by LLM summarization is lossy. Like you saw the there was the the infamous Twitter thread of I think the head of one of the heads of AI at at Meta AI who let her who asked her open claw like can you archive like emails from yesterday and it just deleted her entire inbox and she was like asking telling it to stop and it was like no you asked me to do this I'm going to keep going and the reason why was because there were so many emails that were brought in by an inefficient tool.
Put that in. That triggered autocompaction and the emails overwhelmed her initial instructions. And so her initial instruction that was don't do this was gone. And so like that's what scares me a lot about like I don't have a great system. Whether you just keep the user messages and strip out everything else.
I don't know. But that's my long tangent here. I can go into that a little bit more. My whole point here trying to say is like I don't think having having used this my coding agent for 14 hours a day for every single piece of work that I've done in the last four months compaction has not been an issue to me. Um and I like I I was showing you earlier I have a 95% uh cache token read ratio on on that as well. And that to me is like more valuable both in terms of like speed, performance and and cost. But that long- winded very long- winded uh rabbit hole for that and we can go into that more as well. But this is this is like this is how you would manipulate context between steps. And the last thing I'll mention here is like a lot of we were very um intentional with this API being functional, right? like this is a function that runs on every single step rather than like returning a uh a new message array here and persisting that to the steps afterwards. Like this is very easy to reason about because this is a blank like an empty function that starts at every single invocation.
Your messages coming in here are the aggregated set of messages throughout.
And it also gives you like when you do persist at the very end, you know that you're getting like the full message history at the end. Sorry. Was that helpful?
>> Cool. All right. Um, and this is good as well because I know I'm kind of doing kind of basic stuff and you guys have already used it. So, please do stop me with interesting questions like that.
I'm happy to to blabber. Um, so we're going to do we're getting to the fun part now. We have I think we installed the beta, right? Uh no, we didn't. So, we're going to go through and we're going to run this command pmppm add versel sandbox at beta.
And then um we are going to create a new lib file called sandbox.ts.
And like I was explaining before with these um persistent sandboxes, we're just going to create a function leverages this functionality. So we pass in a name and then it's going to check if that sandbox exists. If it does, it will return that. Otherwise, it will create it and return that sandbox instance.
Uh so we've got that set up.
Now this is some new stuff that I don't think any of you would seen if you haven't used uh AISK6 before. So we've been using the tool loop agent and we've been calling it just like this pass in our agent pass in our messages but very rarely is your is the only thing dynamic in your agent your context like a lot of the time there is different there are different structured inputs that should be changing the way that your agent behaves a lot of the time like the classic I give here if you have a customer support agent one of the classic things that you'll want coming into that is like the customer ID and like maybe their their um what do you call the customer type whether they're enterprise or hobby for example with Verscell and then the behavior of that model might change based on that if you're running a system for hundreds of thousands of people maybe you're an airline I think you're you're in travel right or no >> construction >> construction so tra travel works we we'll use that for now Imagine you're British Airways and you've got a chat bot and you are serving hundreds of thousands of customers every day with your chatbot and maybe a blue member who's like the bottom of the thing, you want to give them GPT54 mini. But for gold member, you want to give them GPT54 Pro or something like that. These are like structured inputs that fundamentally alter and augment the behavior. And the way that we used to do that before is like functional style.
you'd say con create agent and you'd have some kind of input that would change you'd have your input here that would ultimately like change the behavior and you're doing if statements and all that kind of stuff and so we saw that and thought like these are uh it's frustrated some people online which I didn't realize it would these are call options they change the way that they're options that you pass at call time that change the way that the model um behaves And uh so the way that you can define those is with a call option schema that you pass in to your agent definition. So I'm going to copy over all of this code right here. And I may strip away some of it so it's just easier to to see. So we'll maybe we'll close this off and focus just on the call option schema. So what we want for our agent that is going to be like this agent with a computer is it should take in a sandbox in each invocation right and the agent is going to interact with that sandbox. So we define a zod object for our call options expected um and it's going to be an instance of our sandbox class from the versel sandbox.
We can then pass in that call option schema into our tool loop agent definition.
And if we were to go to our route, you can see in our create agent UI stream response, there is an options key. And in there, we now have this type safe uh options object that we need to pass in.
You can see we've got an error here. Our options expect a sandbox.
>> Howdy.
>> No worries. I didn't know you have.
>> Okay. Yeah. Yeah. Um, so we have our call option schema. Now we like we actually need to do something with our call options. And what we're going to do here, we're going to, this isn't going to make a ton of sense right now, but it will as we go into it. We want to have our sandbox available within our agent runtime for any tool to use. And the way that we're going to do it is with a kind of loaded term here is something called context with the uh within the AI SDK agent runtime. And what the context is is this is more similar to React context than it is agent context. And so if you're familiar with React context, you have a provider and any component within that nested uh component tree can access random arbitrary values within that. And so this is the same idea is that you can pass in any kind of arbitrary data, variables, functions, whatever they might be into this object and then you can access it in your tool runtime in your tool execute functions. So what we're doing here is we're saying our agent now expects a sandbox to come in every single time it's used. And then the first time it's called this prepare call function will run. It runs only once the first time that our agent is called. And then it's going to take that sandbox off of the call options and inject it into our runtime context. This runtime state that you have across agent invocations.
Sorry, not across agent implications across the entire agent run within across steps. So that's that's the big thing. Um and I know this this is this is a lot right now. We we'll see it in a second. It will make a lot more sense.
Um because we are going to define our first tool.
So we're going to define a tool uh called in a file called tools.tcs and this is going to be a bash tool. Now we feel very strongly that bash is all you need. In a lot of cases these agents are really really good at writing bash commands. And so uh we actually for this whole session are going to have just this one tool bash that is going to do everything for us. So how do you define a tool with a SDK? You guys have all used ASK before but there are three components. Description super important like can't overstate it enough. This is what the model uses to decide whether to use your tool and can also influence how it uses your tool. Then input schema obviously this is what that tool needs in order to run. And then we have the execute function and this is the code that will be executed every single time the agent uses it. And so now you'll see where the context comes in. This is an argument on the second u the second argument of the execute function that you can pull off in your tool runtime access any of that um any of that runtime state and then run commands. So this is that cool way of your tool is effectively like uh wholly independent from the agent it's interacting with but expects this input to come in u and then can can use it. And we have a cool update coming to AISDK7 which actually adds typing for the context and it types the main uh context on the top level agent. So if you were to pull in a a tool that expects some kind of context, that will then throw an error in your agent definition for saying like you need to provide this, which is really cool. Um, and directly as a result of me pestering Lars for three months as I've been using this pattern a ton. Um, but so what we have under the scenes is like we're literally just pulling off the sandbox from the context, running our bash command that the agent generated, and then returning the standard out any errors, um, and then the exit code. So we could jump back and head back to our agent, which I don't think is running.
So we'll run PND dev and we could say like run ls run ls-la.
Nope.
Why aren't we? Oh, see it's not working.
Uh because we haven't provided the sandbox, of course. Um if I followed instructions, I would have known that.
Uh the next step, which is well, there are two steps that are quite important.
One, we need to actually give our I was going way ahead of myself. We need to give um our tool to our agent. So, we're going to head back to our agent definition. I'm just going to replace it all. And you'll see that now on line 21, we are passing in our bash tool that we just created. And then most importantly, we actually need to update our route handler to get said sandbox and pass it in to the agent run. And so we're going to do that with our uh helper function that we created before. Create or get sandbox specifying a sandbox name. So this is the this is just a random ID in an application. You'd have these names tied probably to a user's name or to a session. Um and that becomes this persistent sandbox that you can use across invocations. Uh so we get our sandbox here you can see and then we pass it in to our call options. You see again if you didn't pass this in we would get an error because it's uh end to end type safe and who doesn't love endtoend type safety?
So the final thing that we're going to want to do uh you can it will work right now but you obviously need to wait and you don't see the terminal actually like the the commands running in line. So we can actually define a component for our bash tool which we'll do uh here by providing a case for tool-bash.
So I'm going to jump to back to the page. I'm going to copy this in and we can look at this again. So before we had we have what should be rendered if the model generates a text um or model or the user has some text. What should be rendered in the UI if the model uses web search? And then finally what should be rendered in the UI if the model uses bash. Uh and this obviously I didn't create because it both looks good and um and it works. So we can say run ls-la.
And now we should see our terminal that's spun up. It's running the command and we'll see returned that we are in a versail sandbox.
So this is like uh I know this is kind of a small this is a small leap for us today but a huge jump for our agent to be able to now have this. Uh but you'll notice and I go back to instructions and behavior. If I said, "What do you see?"
Agent is like oblivious. It now knows that it's got a bash tool, but it doesn't really know when to use it, how to use it, what to use. Um, and so this is when we could literally say, we could jump into our agents instructions here and say, you are an agent with a computer. You can access with bash. If the user asks what you can see, use ls something like that. This is ter terrible system prompt, but you can be like, what do you see? Naturally, like it's going to start using the computer a little bit more. Um, this is not hugely helpful right now. I think a natural progression to having access to a file system is like let's store in it, right?
Um, and the pretty natural next step for storing things with an agent is memory.
I think a lot of people are thinking about what's the ideal memory. And my hot take here is that memory is a file that you store in your sandbox and you have some kind of actual deterministic code for pulling that in, injecting it into the system prompt and then potentially having some kind of structure in your file system for different memory types. So you'll have like a core memories.mmd which is the stuff that gets sent in every single term and in there are probably some information like for uh to search conversation history. It's stored in conversations.jsonl or something like that. But the point being the file system becomes this playground this environment for you to store in a structured way a lot of this information. And the beauty here is that these agents are so good at generating bash commands that they can use things like find, like ls, like gp, like glob, all of these kinds of things. Um, so we're gonna, our final thing that we're going to play around with here is adding this concept of persistent memory. So, uh, I'm going to copy this in quickly in here.
I I explained the basic idea of what I was thinking about which is like let's have a file in our file system called memories.mmd.
Uh the agent will know to put new memories inside there and then we'll actually in that prepare call which runs once before every single agent run.
Let's fetch that file and just inject it into the system prompt into the instructions with some kind of context around it. So that's what I'm doing here. I'm fetching the memories MD. I'm getting the the string itself and then I have the instructions from before.
You're a coding agent with access to a computer. You have a memories.mmd file that you can read and write to. You should always add any facts the user shares to memories.mmd. And then if we have any memories here are your memories. Otherwise, no memories yet.
And we could even ask something like no, actually that's fine. We can go like this. And so now I could say, hey, my name is Nico. Again, really kind of dumb dumb examples here, but we'll see the the beginning of it. User's name is Nico. And now, if I refresh and I say hi, we'll see that it's doing some behavior that we don't want, which is it's adding some kind of arbitrary like user greeted with hi that. But you can see that it did regardless like it had Nico in the file system. And I could say something here like uh don't report every only record important memories.
>> You're you're absolutely right.
>> Important facts the user shares only record important memories. Let's try.
You see this this explains what's broken with most agents is that like me even as a native English speaker kind of uh writes conflicting instructions that an agent will obviously take very literally. So I say hi and now hopefully it's done it again. So so you see like this is so much of this this is also like a kind of a dumber model uh with or no we moved up to GPT54. So, it's not even uh I would say don't don't share greetings. And this is bad because you obviously this is like the per the pink elephant. You don't really want to mention behavior you don't want it to do because it's now activating parameters and we don't really know what it's doing. But let's uh let's see.
man really. Uh so that doesn't Yeah, exactly. The fact that nuke my memories. Uh, so let let's get rid of delete them.
Okay, great. Let's remove the So now, I mean, this is the this is kind of the fun part of building this is like we're we're literally trying to get this kind of machine to use this persistent state.
And you'll see that I I find this so having built so much with sandboxes now the fact that we just have that ID that name and it's just persisting whether it's alive or whether it's actively running or not but we have that persistent state is really cool. I did work on some prompts before that do make the memory system better. So you could see do not save trivial interactions like greeting small talk or information that can be derived from the codebase itself. So we're going to copy this over and see how this does. Um, and now hopefully when we go hi, I've also made this added system prompts here to make it a little bit more inquisitive. So, a bit like open clause like onboarding was like what's your name? Do you do so like hi I'm Nico. Um, and so now it's adding users name. What do you do? I work on why am I doing this? I work on the AI SDK at Verscell. I don't know why I was typing so much today. Uh, so then we get that added and now when I pop in to a new chat, hi. Like we've got that memory that's persisted over these states. Now another really cool thing that you can start doing just by leveraging the bash that we have here is using something we already have in the environment and something the agent is already very good at which is generating code and executing code. And so a classic thing that you'll see which is I think is why open cloud was so exciting and why agents have really taken off this year is this idea of agents modifying or extending themselves and a big part of that is less about modifying itself but it's about like creating these giving it the feedback loop in the environment where it can build run code evaluate the output and uh and iterate on that and I think that's also why coding has become naturally like the first place this has really gone crazy is like we've got compilers, we've got type checkers. It's like it is the perfect environment for that uh basically ripple uh loop. Um, but so I've added another kind of thing here that telling the agent if there are any repeatable tasks like make a Python script out of it and then inject those Python scripts into or the description of it into the end of our memories MD and use that as like a core or or use that to decide what to do next. So if the user asks you to get the weather, make a weather script. And if the user asks to get the weather again, use that script that you already have access to.
And this is like these are the building blocks of this agent that learns that builds what builds on itself as we go.
And so we can give this a go. We can see how it will do. Um we say get the weather in London. Uh use Python. I mean, this is a I'm going to force it a little bit here given we're doing a demo. Uh, and we'll see what it does.
So, it's going to write some Python here and we'll see it's it's gotten the weather and now it's added that to our our memories MD and it's pretty cool. Like, this was all one assistant turn just using bash and now it's got this tool for for getting the weather. So now we could see get weather NSF.
Let's try. Will it work? And look, it used it used it is not minus 11 uh because it got Quebec Canada for some reason. We'll have to modify it uh slightly.
Yes. Do this.
Do this. Oops. I probably broke everything here. Um, amazing. It actually worked. Okay, so it's 12 degrees Celsius. But you can see now like we've got this um I prefer. See, it's cool. Like the more that it's asking, the more that it's building itself, the more that it's learning about what I like. And now in this computer, this is like my agent's dedicated playground and workspace for it to help me out with anything. So that was the basis for what I wanted to go through today. Okay. What I did want to show you is like what you can I've I've built a very complex agent system on top of these exact patterns that I use now every single day to get my work done. Um so I wanted to I mean shill this is coming out later today. I'm holding myself to that. And this is effectively like uh cursor background agents uh but using those concepts that we had today.
uses AI SDK, uses that exact pattern, um uses um AI gateway for inference, uses workflow, so it can run infinitely and each of the steps are durable. So each LLM step is matched to a durable workflow step and if any step fails, it literally retries until it gets um it gets a result. Um, and yeah, this also builds on that uh what we were talking about before with sub agents. So like I can create any new session um and be like, "Yeah, we have to finish. I'm done anyway. This is more me showing off." But um spin up a uh sub agent to explore this project and um after it creates because this is on my hotspot so it's a little bit slow.
kind of see effectively what I'm trying to show is like this this uh pattern scales to a pretty large system uh and to show you like this is being used by uh so yeah we've got sub agent spinning up um and this runs in the background like this is being used by 23 people at versell I've put three and a half 3.8 8 billion tokens through this uh in the last like month or two. That cash read ratio I was telling you I'm very proud of. 90 91% uh responsible for close to 350 PRs although this doesn't include all of them that closed. Um, and yeah, and you'll see like this will, if we go back here, has resumable streams, and our, you can see our sub agent was going off finding, searching all of this stuff off of the main thread, use 30,000 tokens, but just returned like 500 at the very end. So that keeps us within literally 7,000 tokens for the main like agent thread.
So yeah, and this hopefully will be out later today.
I'm obviously here if you guys have questions after if we're getting kicked out. Um, yeah.
関連おすすめ
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











