Install our extension to search inside any video instantly.

Agentic RAG in OpenAI Agents SDK with MCP. 3 of my favorite things in 1 video!
Added: 2026-05-14

898 views7912:07Edward.DonnerOriginal Release: 2026-05-13

Donner masterfully demonstrates how MCP bridges the gap between OpenAI’s SDK and real-world data, turning agentic RAG into a practical, modular architecture. This is a concise blueprint for building AI that doesn't just process information but actively manages its own knowledge and tools.

[00:00:00]Well, I have a cheeky little video for you today that combines three of my favorite things, A Gentic rag, Open AI agents SDK and MCP. You're not going to believe this. Check this out. I'm in a repo called expert. There's going to be a link in the description. You can clone it. You can follow the read me instructions if you wish or you can just watch what I do right now. And I'm going to go to a Jupiter notebook called ingest. ipy notebook and this is where I'm going to show you all three things together in one. So, I'm going to start by running some imports and then I'm going to set up which model I'm going to use. Now, if you wish you can just use GPT-4 mini or any of Open AI's models just by using this code here, but I'm going to use instead Cerebras, the platform for super high-speed inference so that the chat can be blazingly fast.

[00:00:47]So, I set my Cerebras API key and then I'm going through this light LLM model.

[00:00:52]That's the way I set up my model to use Cerebras with the Open AI agents SDK.

[00:00:58]And obviously I cover the Open AI agents SDK in more detail in my course, AI engineer A Gentic track. And if you're new to it, then just get the gist here.

[00:01:07]But otherwise, you know this well, you can create an agent just by saying agent, give it a name, pass in the model. I'm using Cerebras. You could just use a string and be using an Open AI model and then you say response is runner.run. You pass in the agent, you pass in the user prompt, the message and you get back the final output and we get 2 + 2 is 4 in pretty quick time. Let's do that again. There we go. It's fast.

[00:01:32]Cerebras is fast. Okay, in the next cell I define a few constants, but a couple of them are really important. These first ones are not. I've just got a list of some websites that I want to look at for gathering some memories and then I've got this which just sets up a path and it's going to create a new directory called knowledge and then it's going to create potentially a vector database inside it. All right. And now onto the the event, these two constants right here, two sets of parameters. I want to equip my agent with tools. I want it to be able to retrieve information from the internet, and I want it to be able to find memories from that information, and store it locally in a vector database.

[00:02:11]And that sounds like a lot of coding, a lot of work. We've got to build all of that, and then we've got to describe it in natural language, and provide it as tools to our agent. Well, it's not a lot of work. It's very little work, because MCP does all of that for us. Someone else has built the tools, and we can reuse them thanks to MCP. Anthropic has built a tool called fetch, and these are the parameters that describes the fetch MCP server, Anthropic's MCP server for using the tool that they've built. And the Qdrant, the open-source vector data store, that company has made an official MCP server called MCP server Qdrant, and with these parameters, we can just take advantage of the work that they've done, and we can just take these tools and give them to our agent. So, after I run this cell, you can see that it's created a knowledge directory, but so far it's empty. Let's change that. Here are our instructions for our agent. Now, I'm making an agent that's going to be an expert about me and my courses to answer questions about my courses. You should make something that's an expert about something that you want to have a question answerer on, particularly if it involves searching for information online. So, look, the context I'm giving my agent, you've got expert knowledge about me with particular focus on my online courses, and then I'm saying, "Your job is to populate your memories with information retrieved from a website that you've been told about. Use your MCP tools to do this. Also, make sure that you check your memories so you don't add duplicates." That seems like sensible instructions. And now we need to tell it about the MCP servers.

[00:03:49]An OpenAI Agent SDK makes it incredibly easy to use MCP servers. You just say with MCP server STDIO and then you pass in the parameters and optionally a timeout and then you can work with that MCP server. Let me ask it what tools it provides. This is the fetch MCP server and it provides only one tool and it's called fetch and it fetches a URL from the internet. And here is the other MCP server from Qdrant, the vector data store, and it provides two tools called Qdrant find and Qdrant store, which is to look up memories and to keep memories just as it sounds. So, there we have it, two MCP servers, a total of three tools between them. We now want to equip our agent with those tools. Okay, in this next cell is where everything comes together. We want to use both MCP servers and that's what these first two lines do. Then we want to create a new agent. We give it a name. We give it a model, Cerebras. We give it the instructions we defined above about storing memories that it finds online.

[00:04:53]We tell it both of the MCP servers that it's using, the fetch one and the Qdrant one, and by doing that we are equipping it with those tools.

[00:05:03]And in the next line I define the task, add unique memories with information from this website URL, and that URL has a yellow squiggly cuz I haven't defined it yet, but I will do it just a second, you'll see. And now we say runna.run, we tell it an agent, we give it the task, and we give it a max turns of 50 because we want to give it plenty of flexibility to use its MCP tools. It needs to look things up on the internet, it needs to use its tools to see what's already in its memory, it needs to add things to the memory and needs to a fine job of it. So, plenty of turns there and then we will display the final output.

[00:05:40]So, it only remains for me to tell it what URL to do. Well, you may remember above I have like a list of three URLs.

[00:05:46]So, I'm going to say for URL in URLs.

[00:05:50]And I actually want it to do each URL three times. So, I just do times three, and that's going to make that be three sets of URLs. And then, I'm just going to press tab to move all of this along, and there, that is all I need. When I run this, it is going to set off my agent to retrieve things from the internet, and then to look for memories, and to add any new memories into our vector data store. We hope to gather tons of memories and add them thanks to our MCP servers.

[00:06:20]And so, I'm kicking this off right now, and it's off. And even though Cerebrus is quite fast, the searching on the internet and in the Qdrant lookups is going to take a while. So, I'm going to be off to get a coffee, and I will see you when it completes in a second.

[00:06:34]Well, it's 3 minutes later for me. There was no time to get a coffee. It was pretty quick. And if I look in the knowledge folder, there is indeed a vector database there. And now, let's see if it's been populated. Here is some Qdrant code to look into that and see what's there. And wow, yes, there's tons of information. There is in fact 90 There are 90 different records in here, and that makes sense because we iterated three times through three websites and asked for 10 memories for each. So, that does indeed come to 90. So, there's 90 bits of information stored in our vector data store with a vector key.

[00:07:11]And I'm sure the rag people are thinking, "What embedding model did this use?" And the answer is by default, the Qdrant MCP server uses the famous all-MiniLM-L6-v2 from sentence transformers that is extremely fast and capable. That's how it created vectors against each of these that are now put in the vector data store. Or they will be used as the index to retrieve any of these chunks of data.

[00:07:37]Okay, now it's time for real agentic rag. We've got a vector data store populated with 90 memories. It's time for our agent to be able to be an expert about me and my courses. And if you've been following along, hopefully you've picked some other area of expertise to quiz your agent on. So, now expert instructions. You're an expert about me and my online courses. You're answering questions about the courses. Okay?

[00:08:05]That's the instructions. And we're now going to use sessions with OpenAI agent SDK. A way to be able to store the conversation history. It's so easy. You just do SQLite session and give it a unique name like this one, test conversation. And now we are going to have our chat function. The function which Gradio, we're going to have a Gradio UI. It's going to call the chat function every time we want to call the AI, our agent, to get a response. And now this is the punchline. This is agentic rag with MCP written in a chat callback function in four lines of code.

[00:08:40]We do a with MCP server STDIO. We pass in the parameters for our MCP server, the Qdrant vector store, and then we call agent. We give it a name, expert.

[00:08:53]We give it the model, the Cerebras model, the expert instructions, and the MCP servers that we're supplying there.

[00:09:00]The MCP server is our memory, our Qdrant vector data store. And then we call runner.run. We pass in the agent. We use the message from the user as our user prompt, and we give it a session. This session, this SQLite session, to keep the conversation history, and then we return whatever comes back from the agent.

[00:09:21]And it's worth mentioning that when people use a session like this to store conversation history, it's sometimes called short-term memory. And when you use tools with a vector store to keep track of memories, that's sometimes called long-term memory. So, we actually have short-term and long-term memory associated with our agent here. And it only remains for me to bring up the Gradio user interface. We use geo.chat interface. We give it the callback function chat. And also, I wanted to add some CSS and JavaScript to spice it up and make it look pretty, but I can't do that cuz I'm hopeless at that stuff.

[00:09:55]Luckily, I know someone who's great.

[00:09:57]Claude code whipped it up in a flash and it looks sharp. Here is our user interface. Here is Agentic rag. It took us about 10 minutes to get here. And I've been rather selfish and made this whole thing about me and my website and an expert about me, but hopefully you've made this about you or about something else that interests you or something to do with your work, some other field that it has researched and added to its memories. All right, let's see if it can answer some questions.

[00:10:23]And we'll ask it which course covers MCP? Tell us, Agentic. And there we go, it gets it right. AI engineer Agentic track. And remember, this is the open-source model GPT-OS-120B running on Cerebra. So it's an open-source cheap model and it's nice that it's got this expertise.

[00:10:42]And let's ask it and which course covers rag? Let's see what it can do with this.

[00:10:50]Uh there we go. AI engineer core track.

[00:10:52]It's absolutely correct. It does appear to know about the courses. And we'll just ask it what are Ed's interests?

[00:11:03]To see if it has general knowledge from those websites. Off it goes. Bam! Lots of stuff about me and I hope you spotted how fast it was to reply and plenty of information there retrieved from its memory.

[00:11:16]Now, I hope you've educated this on yourself or maybe you've taken some other source of information. And what you may find is that it's not perfect.

[00:11:23]It's a bit janky. It answers many questions, but it does get stuck from time to time. And that's because we're only just getting started. I have plans to turn this into something that's a real expert and adding the short-term and long-term memory is just the first starting point, but I did want to show you how, thanks to MCP servers, we could get this kind of capability and we could just do it all in like 10 minutes. And that's my quick video that brings together my three favorite things and just to spice it up, I also threw in Cerebras and short-term memory in the mix. And if you like this and if you want more agentic rag content, then you know what to do, like and subscribe and I will get on the case. I will see you soon for another video.

#AI #Agent #Agentic #Agentic RAG #RAG

Related Videos

Computer Science

Agentforce NOW AMA: Build with React and Salesforce Multi-Framework

SalesforceDevs

490 views•2026-05-28

Computer Science

How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust

aiDotEngineer

450 views•2026-05-28

Computer Science

WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅

LearnwithSahera

1K views•2026-05-29

Computer Science

More tests are always better? How to use AI to identify tests that bring little value

Alliance4Qualification

335 views•2026-05-29

Computer Science

Search Algorithms Explained in 60 Seconds! 🤖💨

samarthtuliofficial

218 views•2026-06-01

Computer Science

People of Game of Thrones using JavaScript DOM

AltCampus

296 views•2026-05-30

Computer Science

Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA

ascensionix

107 views•2026-05-29

Computer Science

So What's Odin Lang Even Good For

TechOverTea

131 views•2026-06-01

Trending

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30

The Fastest Way To Board A Plane 😮

zackdfilms

6504K views•2026-05-29

Artificial Intelligence

DOOM Runs On Everything...except Neo Geo

ModernVintageGamer

143K views•2026-06-01