This guide effectively shifts the focus from simple prompt engineering to the robust system architecture required for reliable, production-grade AI agents. It provides a necessary reality check on the infrastructure—like state management and error recovery—that truly defines a successful agentic deployment.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Agentic AI System Design (from the basics)Added:
A lot of engineers I know right now want to start designing, building, and deploying their agents right now, but agentic feels like a new category of software and they're not sure how to do it. So, in this video, I'm going to show you exactly how I go about building new agentic products from scratch, starting with the simplest possible agent, then giving it some tools, and then showing you how we can handle things like some failures. Today, we're going to be building with a combination of the OpenAI Agents SDK and today's sponsor, Agent Span, which is a free, open-source library you can use to build and manage durable agents. Let's get into it. So, to start, we're looking at the simplest possible version of an agent that you can construct using the OpenAI Agents SDK. This is a Python example, and quite literally with this, you've now built your first agent. But, it's not a lot more useful than just a plain large language model. If you want to accomplish that, we have to give it some tools. Here's two sample tool definitions we could give to our agent.
Now, we have a simple greeter agent which can get the weather in a certain city, so use real-time data outside of its own internal memory, and do math. As you might know, large language models aren't the best at doing math without a tool. With just this, you can build a simple agentic application. I've built a simple travel agent demo. Let's say I want to plan a trip to Japan. Well, the way this works is by invoking the OpenAI Agents SDK under the hood. I've given it some tools like find places and review itinerary. With that, it's able to go through and build out an itinerary for my trip using agents. I've built a simple sidebar here to show you exactly what's happening with the tool calls under the hood, but this isn't something that you'd normally show to customers, at least not in this way. Now, once it's done, you can see we've built a simple itinerary. This all looks great, but now let's go back to the whiteboard. Once we start introducing more components to our agent, for example, sub-agents or more tools, we obviously increase research chance of failure. A web search could fail. We could be depending on external API which has an issue, and then we have to build custom scaffolding for all of those things within our server itself.
This might feel familiar to you if you've worked on distributed systems before. You need things like state management, failure recovery, logging, replays, retry. All of this stuff now has to be built into your server, which makes building your own agents a lot more complex. I'm going to go ahead and simulate a tool failure here to show you what this looks like when we don't have that kind of stuff in place. So, as you can see here, imagine our provider was unreachable. This is super common. Look at the availability for some of the API providers right now. Now, with this, our agent is dead in the water. We don't have retries, we don't have a great way to expose exactly what went wrong, and we have to have custom logging in place to track all of this stuff. Huge pain.
So, you could go ahead and try to build out everything to solve this for you.
This is actually something I tried on a personal project I'm working on before I found out about Agent Span. It was a complete nightmare. So, here's how Agent Span works and how you can add it to your project right now. Agent Span is a runtime for your agents. It doesn't need to replace your existing stuff in place like the OpenAI Agents SDK software we just wrote to declare our agents in the first place. So, now instead of your server actually managing all the agents, we start an Agent Span server which manages them all. And of course, you can host this server yourself. Then, that server actually manages the agents, handles the tool calls, and does everything for you. So, now let's try running this through Agent Span, and we'll leave that same failure in place so we can see what this looks like. So, the first thing I'm going to show you is the actual Agent Span environment. So, when you have this running locally alongside your other servers or deployed somewhere, you can actually monitor all of your agents' tool calls to see exactly what they're doing and what's happening under the hood. So, as you can see, our research agent is now running, and we can click into this execution to see what's going on. As you can see, it's using that GPT-4o Mini OpenAI Agents SDK under the hood, and you can see that on this find places, there was actually some failure as indicated by the two attempts. So, let's click into this and see what's happening.
Attempt one failed, so that's the same failure that we saw earlier with the OpenAI Agents SDK that halted the entire run. But, it also was able to finish on the retry. And let's go back to Trip Span to see if this ended up out for us.
It's still wrapping up the entire run, but as you can see, now our users don't see any failure, and we didn't have to write any of that custom scaffolding code to actually make this work. So, just to round it out, here's exactly what this ends up looking like in a real production setting. Just like any other standard application, you have users who can come to your website or program via some UI. Let's just say they're coming in through Google Chrome. Those requests go to a load balancer which can send them to any number of your back-end servers. These servers handle everything that they've always handled: billings, payments, database integrations. All that fun normal business logic is still inside your servers. Your servers can interact with Agent Span servers, which you can host with Kubernetes or whatever you want, and those servers manage the runtime for your agents. If an agent breaks down and needs to be restarted, a tool call fails, or you need to wait a couple days for human approval before resuming an agent, all of that stuff is handled by Agent Span servers. One other design pattern that I think is worth mentioning here is just building with Agent Span directly. Instead of wrapping something like the Anthropic SDK or the OpenAI Agents SDK with the Agent Span runtime, you can actually declare your agents directly with Agent Span, so you get that durability from day one, and you don't get locked into a vendor-specific SDK. Here's what that would look like in practice. As you can see, we're able to declare an agent directly using Agent Span syntax right here. We can tell it what model it has and what tools it has access to. The rest of the architecture is pretty much the same. If you haven't tried to build your first AI agent or project with agency yet, I highly suggest you do it.
lot easier than you think, and it's a great way to peel back what all these LLM providers and GPT wrapper products are actually building right now. And of course, I suggest you check out Agent Span as your runtime when you do it. I'm using this across two different projects right now, and it's actually been extremely helpful for debugging and making sure things work in a production setting. If you want to give it a try yourself, I've linked their GitHub in the description, and feel free to comment with any questions around projects you might want to build with agents or any other videos you want to see from me. And if you want to practice system design every day, check out my app and Discord server, which are also linked in the description down below.
Thanks for watching, and we'll see you guys next week.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











