This session provides a crucial architectural blueprint for moving AI agents from experimental hacks to enterprise-grade systems through standardized tool integration. It is a must-watch for developers looking to leverage the Model Context Protocol within the robust Spring ecosystem.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Beyond local tools: Deep dive into MCP with Spring AI by James Ward / Maximilian SchellhornAdded:
[music] >> All right, welcome everyone. Thanks for being here.
I'm James Ward, work at AWS and I work on agentic things including being the Amazon representative to the agentic AI Foundation Technical Committee where MCP is so get to work with the standard side of things so happy to be here to learn more about MCP.
Yeah, and I'm Max, >> [snorts] >> Maximilian Schalhorn. I'm a principal solutions architect working with AWS.
And yeah, James and I, we want to take you a bit on a journey today through the MCP landscape. So we will start off with looking at the protocol, see how we can write MCP servers, we look at tools, resources, prompts, elicitation, sampling, and also new things such as MCP apps. And we will then see how do we make them enterprise ready? So how do we scale MCP servers? How do we secure them? And last but not least, we will talk a little bit about context efficient MCP. There's a lot of discussion also on Twitter around MCP is dead and why not just use CLIs and all that kind of stuff. So we want to shed some light on it on how we write context efficient MCP and where they make a lot of sense.
So with that, let's dive right into it.
So why are we here, right? I think most of you have interacted with some form of LLM in the last couple of years and they have also evolved quite a bit, right? So they have reasoning capabilities. But as you know, they are still stateless in a sense, right? So [snorts] they do not really retain information, they cannot do tool calling, they have a certain cutoff date. So the LLM itself, you cannot really do a lot of things with that. So that's why you equip it with additional things such as memory, like keeping track of your conversation or adding additional context and steering files and markdown files that they have additional knowledge on top of what the LLM already knows.
And we can further then add tools to it, right? To retrieve additional information on demand and act on certain things. And I want to clear a little bit the fuss around that and that's basically the three components that you need to have an actual agent, not more, not less. So those I would say are the three core ingredients to to build an actual agent. And with that now we are able to act and observe our environment.
Right? So this is basically a a lot of new a lot of customers from AWS as well are building agents, deploying agents and this is why we are here today.
We wanted to add a quick recap to all have the same baseline to understand where does MCP come from and why it's relevant. So we have a simple introduction to tool calling. Bear with us for a second if you already know that, we go through it rather quickly.
So I said the LLM itself cannot do things. So if I ask an LLM about the current weather, by design it cannot get it, right? So what do we do is if I ask what is the current weather in Berlin for example, what I do is in addition to my prompt, I send a list of tools that the LLM could use. So in that case, I have a get weather tool and a description when to use that.
So the LLM will figure out I obviously cannot get the current weather. That's why let me use a tool for this out of the list that the user or the app initially provided. So it will say please use that tool. And now the very important thing is that now the application actually is making the call.
That can be a local Java function call that you have in your code that is called get weather for example. And then we will return the result of that function back to the LLM and the funny thing is for the LLM now it's really just a summarization task essentially saying, okay, I got the result of that tool, I have the initial prompt. So therefore let me return the weather in Berlin is currently sunny. And this is like the entire thing of how tool calling works and how we are able to get real time and additional data into our context.
So that being said, if we would write this get weather function, we can just write it as a Java function, right? So we would write a get weather implementation. But if we want to get maybe customers from our CRM, we write a get customers, we would write get tasks, get friends and so on and so forth. So the problem is that you as a developer would be responsible for writing all those different functions in your code base in every [snorts] application that you want to basically build that has this capability and the vendors who provide those APIs would rely on you, the app builder, to actually integrate with their APIs in an agentic way.
So MCP now is essentially standardizing this, right? So if instead of you manually writing those functions, we have a standardized MCP client and a standardized server interface and therefore whenever the application has an MCP client, it can connect to the MCP server out of the box, right? So this is the high level explanation. Under the hood, it's rather simple. It's actually just a JSON RPC call. So this is the standardization that both the client and the server agree upon and they have for example with a method, we define what do we want to call like tools call with the name get weather and certain arguments, right? So this is the whole standardization and that allows us now to connect any client, any app to any server.
There's a bit more than tool calling.
For that we will talk about resources and prompts and there's elicitation and sampling. But I think it would be better to look into code how we actually write it with Spring AI and how can you get started with your first MCP server?
James. Yeah, so let's dive into looking at the code with Spring AI. So first let me go over to IntelliJ here. So I have a dependency on the Spring AI Starter MCP Server WebFlux library. That underneath covers depends on another library which is the MCP Java SDK. So the Spring AI team, they created the first in in the kind of the Java SDK for MCP that many people now use and then they wrapped that with some additional Spring stuff. So we're at the Spring IO conference so we're going to be using Spring. You could of course use that MCP SDK outside of Spring but why would you?
You know, just use it with Spring. So great, so we've we've got our dependency there. Now let's go dive into some code.
So I'm going to pull up my Spring Boot application and then let's just go start looking at some examples. So the first one that we're going to look at is the most simple thing that we could possibly do with MCP. We're going to define a MCP tool and we give it a description.
That description is important because that description is what the LLM is then going to use to decide when it should actually use this tool call. And so then this particular one is just going to add two numbers together. So great, so my server's up and running. We're going to experience this in a few different places. Let's start with one which is a AI coding assistant called Kuro. This is one from AWS and it's a good one. I use it a lot and it's it does well for me.
But I've set up my MCP server to be connected to that Spring AI MCP demo server. This is a remote server and we'll talk about different transport options in a little bit. We can also look at the tools here and see that there sure enough we see that add tool that was defined there. So let's actually give this a try and and see if the LLM does it right. I'm going to say add six and seven. And what I'm hoping that will happen is the LLM will say, all right, I've got a tool that can do addition.
I can now take the the thing that the user said and I can decide to call that MCP tool with those parameters that have been parsed out. So sure enough, looks like the LLM did figure out that it that we have a tool to help it do this and now let's hit yes. You need to approve initially. You can approve kind of forever or you can approve just once.
And great, now we see that my tool was called, my MCP tool was called. It returned the number. That went back to the LLM that did that like compression, summarization of the result, the initial prompt and then the result that came back. So that last line that we see there where it's 6 + 7 = 13, that's coming from the LLM but we know that that 13 actually came from my tool call.
It wasn't just hallucinating. Okay, so that's our quick example of that. We can also do this in another context which is IntelliJ and the Kuro IDE and many other coding IDEs now support connecting to MCP servers as well. And so in this case, I've configured Kuro again to point to my MCP server and then we should be able to come in here and ask it a similar thing. Let's add six and seven and then we should see a kind of similar experience but this time in the IDE. So And this nice thing about MCP is that it's standardized. We can use this with any of our coding assistants. So if you're using something else, great, you can use the same MCP servers. And let's see, yes, I do want to approve that. Oh no, what is this asking me?
I don't know.
Ask oh, I typed it. That add six and seven. See the LLM is smarter than I am, that's for sure.
Okay, so now it should find the right tool.
Thanks for also verifying we're using Opus 4.64 adding two numbers, right?
Right. Opus may know how to do it. What is the input required? What is it saying?
I don't know. Well, let's see, it should work.
I I really confused it there with ask six and seven. It's like what what does that even mean? I think you get the idea. We can use these MCP servers, they're standardized across many different ways and I'm sure many of you have used your MCP servers with your AI coding tools. That's great. Uh so, we can also use these MCP servers for our own agents. And if you saw Josh and I's talk this morning, we went into like that side of things where if you want to build your own AI agent, whether that's for coding or for your business use cases, uh you know, building your back end uh Java systems that have agents integrated into them, then you can use MCP as an integration technology to those as well. And so, um looks like what really confused that thing. That's okay. Uh let's go back to the uh web browser. There's this tool that we can use when we're building MCP servers to inspect the MCP server and see what's actually going on with it. So, I'm going to connect to my Spring AI MCP server, and then we see a bunch of things here.
But, the first thing I want to show you is that there's a life cycle to the MCP protocol. So, the first message that actually gets sent from the client, in this case the inspector, to the MCP server is initialized. And when we do the initialization, then we can see some different uh kind of metadata about the capabilities of my particular MCP server. And so, that's the first step in the protocol. Then, the client usually will ask for the list of tools that are available. So, by default Spring AI, if you're building an agent with Spring AI, it'll do that list tools for you automatically. But, here in the inspector, we can go do it manually. So, we can go over to the tools tab, hit list tools, and then I can see the add here, and we should be able to now invoke that thing with our parameters and hit run tool, and then we see the response. So, in the inspector, there's no LLM. This is just like a web UI for my MCP server. So, in this case, no LLM here. Uh and then we can see the actual protocol for like listing the tools and the tool calls, and that's what the insides of the JSON RPC messages between the client and server are. Uh so, I'm going to walk through a few other examples covering some of the other places uh that MCP covers, but anything to add so far? You're doing great, James.
>> Okay, great. All right, so let's go on to the next example here, which is we want to get a little more advanced in what we how we define our MCP tools. So, in this case I've got a record that is multiply results. So, I want to like actually get a structured thing out of this tool call instead of just the the primitives, untyped. I want actual schemas, I want things to be typed. And so, now we can set some additional parameters on our MCP tool annotation.
So, we've got our name, we've got our description, then we've got this other one, which is generate output schema equals true, and that's going to look at the type that is coming out of this function and then include in the tool definition the actual schema of not just the input type parameters, those are always included, but also include the schema for the output type. And then you could use that Let's say that you're talking You know, you've got your agent that you're building, and you want to actually get typed objects out of those MCP call calls, then this would be the way to have that metadata so you could do that. And then there's also some annotations that we can provide.
These can be used or indicated to the user, uh or your agent can use these annotations to handle things differently. So, we can add annotations, but then here's my multiply. I've got the parameters, I've got descriptions on my parameters, I've got whether they're required, and then I'm going to return this multiply result. So, let's go give that one a try here in the MCP inspector. Uh so, that's multiply, and let's multiply 6 * 3, and oh, we can see here the actual So, we know uh when we did that list tools, it included the output schema.
So, we see that now rendered here in the web browser. That's just indicating that sure enough we get that schema as we expected. But now, let's go run this tool and see that now we should get a typed object, and it renders it both in a structured content and unstructured content, which looks the exact same for this particular case, but it does validate that what we actually got back conformed to the schema.
So, output schema's another great part of tool calling. Uh then we can start to do some additional things with MCP. Uh one of those is the ability to do logging, and to do this we need to pass in this MCP sync request context, and this allows us now to call an info log.
And so, this will now do a log uh from the server to the client. And so, this is the opportunity for your agent to start to collect login information, uh potentially log those, you know, in your your logging system, whatever. And so, let's just go see what that looks like in MCP inspector. That was subtract is the the one here, um which should be the UI I didn't build this UI, so don't blame me, but it's a little weird. Got to scroll down there, and then we should get uh subtract, and we see that notification pop up down here in the bottom right.
But obviously, if you're building your own AI agent or you're in an AI code assistant, it's going to decide what it wants to do with those log messages.
Okay, so that's logging. Uh a variation on logging is something called um the ability to do progress updates. So, in the divide one, what I'm doing is I'm again injecting that context in, and then I can if I have a long-running task, can provide progress updates. And the idea of this one is that when you're in your AI code assistant or using an agent, you can actually see a progress bar for a a given uh call task. So, pretty easy to do that. I won't run that one, we'll keep moving, but I think you get the idea.
There is a way in Spring AI to use um the reactive uh and reactor-based uh ways to have your tool calls return mono or flux. Uh that is a mode that you have to enable, and so by default I'm in synchronous mode, not asynchronous mode, and so that particular call won't work unless I flip the server over to be in asynchronous mode. Um but you certainly can do monos and fluxes.
Uh okay, on to the next one, which is a more advanced uh feature uh and pretty new feature in MCP, which is called elicitations.
So, the point of elicitations is if something is a required parameter to call the tool, then you're going to encode that in the input parameters. But if something is a derived thing that I sometimes need to get some additional data from the user, uh then we're going to use an elicitation to get that data.
So, the the example that I like to use with this is Let's say that you've got a flight search MCP tool, and the user, you know, you've got your agent that's going to be able to search flights, and the user has a user profile. You've used Spring Security to, you know, know who the user is, but let's say that it was optional in your system to select a preferred airline. And so, sometimes the user is not going to have a preferred airline. Well, when you come into this function call to search for flights, maybe you want to look have they set a preferred airline, and if they have not, then you want to elicit from them what their preferred airline is, and then pass it in and continue your tool call.
And so, elicitations just part of the standard MCP uh specification, and of course implemented in uh Spring AI. And so, the way that we define this, and I've kind of created a weird way to be able to trigger the elicitation is The whole point of an elicitation is it's something that sometimes needs to be provided by the user. If they need it all the time, put it into the parameters.
And so, the way that I've simulated sometimes is just to do a random Boolean, and if then I can check to make sure that elicitations are enabled, and then we can elicit a user random number, and then when we the user has accepted and and given us that number, then we set it, and then we can uh continue, which says that, "Hey, if we got the the random number, just return it, uh otherwise return an actual random number." So, kind of a little bit of a convoluted example, but let me run this one so you can see how this actually looks. So, this is our tool called random, which is and let me get to it there. Run. Okay, so sometimes this is going to do an elicitation like it just did. And so, now I need to provide it.
So, this would be like providing my preferred airline, and then it then continues with that tool call with the number that I provided, and obviously you could do whatever you want with that value. You could then store that in their user profile, uh whatever you want to do there, um but then continues with the the tool call.
Uh okay. Is that clear?
Absolutely fantastic, James.
>> a bit of an advanced one. Okay, >> [clears throat] >> now on to an even uh kind of more complicated advanced one is called sampling. Sampling is this weird thing that you can do in MCP, where you've got your MCP server, and your MCP server may or may not have access to an LLM. If if it does, great, you can, you know, have it call the LLM and do things. But, what if you want your MCP server that's not connected to an LLM to actually use the host, the thing that's connected to the MCP server, use that LLM there? And so, sampling allows kind of this reverse thing, where the MCP server calls back to the server, the agent side, and then can use the LLM on that side. And so, I'll show you what this looks like in MCP inspector so you can kind of get an idea for the flow here, but this is our loud joke. And when I run this, it's going to uh do a sampling request.
Normally, this would be handled on your agent side, and then you would decide, "Okay, great, I got a sampling request from the user. Now, let me give that to the LLM." And then it's going to return back to the MCP server the result of that sampling. So, this is the the flow.
Let's run it. We can see here's the sampling request, which I don't have access to an LLM in the MCP inspector, but I can provide the text to you know, I just simulated what an LLM would say, and then in this particular code, it responded in and upper-cased what we got back essentially from the LLM. So, so that's pretty simple API. We do should check, you know, in that context to make sure that sampling is enabled and, you know, have some backup plan if it if it doesn't work. But, so that's sampling. Is that clear?
Got it. Clear. Great. Okay, moving on to some other parts of the MCP specification and the way that we can use it. So, tool calls are kind of our general-purpose way to do all these integrations over MCP. Then we also have a few other things. One is resources. Resources allow us to represent files, potentially database records, that sort of thing. And these can be either kind of static or dynamic.
So, on the static side, let's list our resources. We see that I have a server info resource, and then I can get, you know, mime type and get the contents of that static resource. If you've used Claude.ai, these actually get used by MCP servers where you can actually pop resources from an MCP server in as files into Claude by just navigating through the UI. Pretty nice there.
And then I can also do resource templates. So, these are where the path is dynamic, and I can dynamically figure out based on the the path that's provided what I want to return. And so, in this case, I'm using a key, which actually, let's go look at the code real quick so we can make some sense of this.
So, this is my static one. No no parameterization on that URI. And then we've got the dynamic one which takes a key parameter, and then we're going to do something, you know, different based on on what's provided in that key.
Again, descriptions so that the LLM knows when it may want to pull these resources in. But, let's go see our list here. We've got a database URL, for example. We can read that resource and then get it back. So, you can use this for text like we're seeing here. You can use it for images, whatever kind of resources that you want.
Anything to add? No, that's great. Maybe yeah, you see the completions here. So, when when James typed, right, you actually send completions request that you can send back a list of all the available resources. You can think about also putting resources on S3, right, and dynamically when you hit then the proper path, for example, you would then pull that resource from S3. I think the important differentiation is that tools are model controlled. So, the model decides when to call a tool, while resources are bit different depending on the application. So, Claude, for example, Claude desktop would show you the resources like you would add a local file. So, you see all remote resources there as well. But, you could also take the resources and build up a local rag index, for example. So, it's really up to the application to decide what to do with the resources and not to the model itself. Yeah.
Yeah, and so as you're building your own AI agents, you're going to have to make those decisions on how you actually use all these different parts of MCP. If you're if you're not building the agent, you're using, you know, a code assistant, whatever, then it's really up to that agent to decide how it wants to use these different parts of MCP. And there's a gigantic kind of matrix of compatibility for which agents support which parts of MCP. But, I'll show you the next one, prompts. We we kind of saw that a little bit earlier, but with prompts, we can get a prompt and the prompt is a place where many of the code assistants already do support prompts because what I can do is provide a kind of shortcut to a larger prompt through this prompt thing. And let me actually show you this one over in the Kiro CLI. So, when I use prompts, I can first list them, and we'll see that I see an MCP server and it has this greeting prompt, and then I can use this at syntax and say at greeting, and then I can provide a name.
And what that is essentially doing is expanding this particular call at greeting with my name into a larger prompt and then sending that over to the LLM. In this case, it's not very exciting. It just says hello James. So, I'm actually telling the LLM hello James, which is kind of weird. And then the LLM is going to respond with that.
So. I mean, a good use case, for example, for prompts is think about if you provide the MCP servers to your users, maybe you want to give them already specific prompt templates on how to use the MCP server or certain workflows, how they should interact with your application, right? So, it can be a very good starting point, right? Imagine your user connects to your MCP. They don't know what to do with it. You can have a prompt that maybe guides you through a structured process or something like that. Yeah.
Yeah, kind of like a dash dash help.
Yeah. Yeah. Should we take a look on how we can combine those things? Yeah, so that was a recent addition, right? Yeah, so this is a new part of MCP called MCP apps. It's a extension to MCP. And what it allows us to do is render rich content into our agents that have rich UIs. If you're in a text UI, great, like just return text, that's fine. But, if you're in like a web UI or a mobile UI or something like that, you can actually return not just text, but richer content to this. And so, I'll show you what the the view of this looks like within MCP Inspector. We've got the shopping list.
Shopping list MCP apps use MCP resources to deliver the HTML that then's going to get rendered in the browser. And then we do a tool call to trigger opening or or launching an MCP app. And so, a lot of that code you've already seen before with MCP resource and MCP tool. There's a few additional details in here that kind of set up the the the context and the metadata for an MCP app. But, let's look at what this actually looks like.
Here I can flip over to apps, go to my shopping list app, and see here's my web UI. So, this actually would get rendered in the agent interactions. If you're in like Claude.ai, you actually would see this almost like a portlet. We're kind of back to like the old days of like portlets. We're like rendering this rich content from this MCP server, but rendering it within our agent experience. Yeah. And this is also very powerful, right? Think about I mean, I work a lot with software companies, for example, that they provide like software as a service, right, where you log in and the UI. So, and they think about maybe in the future, right, user interaction might not happen in my normal UI, but maybe my users will just have a Claude code or chat GPT or a Claude desktop, right? Then they will basically connect to your MCP server, and you render the entire interaction with your system as a UI component.
There was also a great example at the MCP developer summit where Shopify showed how you would basically render product search already in there. So, the customers would not even have to leave their chat environment to interact with your system. So, I think that's a very powerful component in the future. Yeah.
Definitely. Okay, that's our kind of quick run-through. We're going to see some other stuff later, but our quick run-through of the MCP kind of specification, many of the the features that you can do in MCP. And again, it depends on your client, your agent to determine how these different features get used. If you're building your own agent, use all of them. You know, use resources, use prompts, use apps, all those kinds of things. If you're using a pre-built agent, a frontier agent like Claude.ai or an AI coding assistant, then it's going to be up to that to decide how it wants to use these different features within MCP.
Okay, back to slides.
Great. What we have not covered so far is there are quite some new additions to the spec that are quite interesting. And one is tasks. So, we just saw the synchronous tool call, but think about a process that might take 10 minutes, half an hour, right? So, tasks will allow you to spin off some sort of asynchronous work, and the client will poll to see if the if it's ready at a later stage, right? So, this is task. There's currently an open pull request in the Java SDK. So, it's currently being worked on to also make that available.
And another important one, I think we can research it later together, is a URL-based elicitation. So, you saw the elicitation where we would add like text in a form style, but there might be times when you get from an MCP server a URL that the user should click to authorize and basically negotiate some form of security credentials that should not pass the MCP client, right? And this is what URL-based elicitation is for.
Good. I want to recap real quick on the thing we mentioned before, right? We have the standardization on the JSON RPC layer, right? So, this is basically what happens between the client and the server. But, it doesn't matter on which channel this actually happens, right?
So, you can obviously send that also locally. So, you might have heard about local MCP servers. Probably most of you have installed one, right? So, you provide a command locally that you configure like spinning up a jar via Java or NPX or something like that. And then actually what happens if you connect to that, we basically a new process will be spun up locally, and then the communication happens over standard in standard out. But, the most important thing is it's always the same messages here that get exchanged.
Now, what we can do is we can just switch the transport. So, instead of doing local standard IO, we can just use streamable HTTP, which is the other official transport, right? So, instead of communicating via standard in standard out, we use HTTP. This is what you have seen James doing with localhost, right? So, now those very same messages are exchanged over HTTP. I think there are some fancy projects where someone built even SQS queues in between to exchange those things, but you get the idea, right? It's always the same messages.
On local MCP servers, they can be useful for certain things if you want to control local programs and things like that. They're quite easy to build, easy to set up. But, I would say it's also rather a developer-focused setup, right?
Your PMs or whatever, right, might not have installed Java or something locally to be able to spin up those local processes and things like that. Also, there's a bit of a supply chain risk, like why would you still install random MCP servers from the internet on your local machine, right? You do not know what they actually do with that, right? And there's a great blog article for MCP and at enterprise scale, I think that Cloudflare recently published where they also say that they see that as a supply chain risk if just everyone in your company installs local like random MCP servers. So, also with our customers, we see a lot that the the preferred option would have a remotely hosted MCP server that runs on some form of compute. And then also for your users, it's much simpler. They just put a URL and they can connect to it, right?
And it will also be updated and maintained because how would you update a local binary in a streamlined way, right?
Now, you might have seen the streamable in the streamable HTTP part, which might be a bit new, like what does that mean? You have seen those stateful interactions where the server actually requests something from the client. And that usually is not possible really with like plain HTTP. Like how do you does the server address the client back, right?
So, that's why for those stateful operations, there will be server-sent events channel that will be set up so that the server can reach back out to the client. There are a couple of scaling challenges addressed to that, but we will talk about that in a second.
So, in a nutshell, right? The remote MCC servers, they allow you to be managed and maintained, and they also have an authorization framework. We will talk about that in a second. They're easy to set up. They're scalable. We will talk about that in a second. And yeah, obviously there is now for you as the developer some form of operational overhead, right? You need to deploy them. You need to scale them. So, for you, you need to think about how you do that in a scalable way. But maybe James, do you want to show a quick remote MCP Yeah.
>> Yeah. And do you want to share the streamable switch to stateless here?
>> And it comes a bit later. We can do it later as well. Okay. Great. Okay. So, I'm going to show you a quick demo of a remote MCP server that I've set up. It's called javadocs.dev, and javadocs.dev provides javadocs for libraries on Maven Central.
And so, I provide an MCP server for javadocs.dev. And so, I've added that into my cloud.ai in this case. There's the URL up top. There's all the tools.
And I want to just show you how this looks in this particular example, where I've got my connector. I've got javadocs enabled. And now I can ask it a question like So, all I did was add the URL.
That's all that it took to set this up was add that one URL for the MCP server into my agent. And now I should be able to say like, what is the latest version of the Spring AI agent core library? And that's brand new. We just announced it yesterday. And so, hopefully this all works. But what it should do is figure out, great, I have a tool, an MCP tool that can help me figure this this question out. And so, now it should go off and search for that particular artifact. That was all natural language. I didn't give it Maven coordinates. So, first what we have to do is resolve to the Maven coordinates.
And then once it gets the Maven coordinates, then it should be able to get the latest version, which is another MCP call on the server. And sure enough, 1.0, which we just announced yesterday.
So, cool. Works end to end. So, that's our example of just like how easy it is to add these remote MCP servers to our agents and then have all these additional tools available to us. Like obviously the model doesn't know anything about this particular library, but by augmenting it with MCP tools, great, we can get this additional information to our AI.
Good. Do you want to show a more advanced one with the security? Yeah, sure. Yeah. So, let's go on to the next one here, which let's go create a new chat. And so, for this one, Josh Long and I are building a beautiful Spring AI book. And so, I have I've added the MCP server for the book, which that's the URL. All I had to do is plug in that URL. But this this one requires authentication because, you know, we don't want to just give away our book to everyone. We, you know, it's it's going to be some really good stuff. And so, so we'll have the book, and you can get the book, but I don't think anybody actually reads anymore. Instead, we want to do is provide the book to the AI so that the AI can read the book for you. And so, we have an MCP server for that. And so, great. So, now we need to authenticate to this MCP server. And so, when I hit connect, it's actually going to redirect me to our Spring Auth server. And it's asking me to put in my email address cuz I've of course purchased the book. And now I can email me a login pin, and I need my actual phone to get into my email here. But I should get a pin emailed to me if everything works. And then, you know, I can't take my pin. It's a one-time pin, but if one of you types it in faster than I do, then I'm screwed.
Okay. So, verify that pin. Then what's that's going to do is authenticate me.
And that's just all Spring Auth stuff.
So, just a identity server from Spring Security. But great. Now I am connected to the beautiful Spring AI MCP server if my network works. And then I'll be able to make some calls to this MCP server that are going to be authenticated. So, let's go give this a try. Let's make sure that that's enabled. Great. Now that's enabled. So, now I can say, tell me about the beautiful Spring AI book.
And then what that should see is, great, I've got a tool that can do that. And we used MCP apps to render this rich, you know, MCP app within the response there.
So, end to end auth with MCP apps.
Pretty cool.
Awesome. Okay. Great. Back to slides.
Yeah, back to slides.
Awesome, because we want to dive a little bit deeper what actually happened under the hood when we do the entire auth flow to get you a bit an idea what is also the standard behind that, right?
So, in general, they use OAuth, and they tried to cover the latest back from November. The other one is currently still in draft mode. So, there's a lot of changes and authorization, a lot of back and forth. I'm sure you know the pain, Daniel. So, however, what happens is if you do the very first call to the MCP server, you're not authenticated, right? So, the first thing that happens is that the server will respond with a 401 on authorize, but it gives you a www-authenticate header back with the URL for additional metadata on how to get information on how we actually authorize against that MCP server, right? So, we have a protected resource metadata that is hosted on some form of well-known. And this then has information around the authorization server, for example. And then there is another resource that needs to be fetched, which is the authorization server metadata that has additional information now on the token endpoints and all those things that are needed to get a proper token now from the authorization server.
And the next step is the fourth step is then there must be some form of client identification, right? So, to get a valid token, you usually need to have a client ID, right? So, we need to know that, okay, this is for example, cloud.ai. Those are the callback URLs where we exchange the authorization code later. So, it must be known to some extent to the authorization server. So, there must be some form of registration here or pre-registration. I will talk about that in a second. But let's assume this is done. We can then talk to the authorization server, get the actual JWT token, and then we are able to use that JWT token and talk to the MCP server.
So, this is pretty much what happened what James showed. This is what happens under the covers. And if all of the MCP server follow that authorization framework, we have this plug-and-play mechanism that we can just log in and use those flows in between.
Now, the client registration is a very tough one because, I mean, it's something that is debated a lot. I will start with the one that is the most simple one and the most common one probably in enterprise environment is that you already have a pre-registered client. So, you already have some form of client ID and client secret, for example. So, you would pre-register in your authorization server a cloud desktop. You put the callback URLs, right? And this also gives you control over which clients can actually talk to my MCP server. And sometimes in internal or in company environments, for example, you might want to limit it. You might not want to any allow any client to talk to the specific server. So, this is one.
Now, if you want to allow any client like a cursor or something else, an inspector, and all that, you would basically have to have additional clients, application clients registered at the authorization server. So, there was one method that is called dynamic client registration. So, the client would realize, oh, we are not registered. Now, let me call a registration endpoint and send all the information, my callback URLs and additional information. And now the authorization server would dynamically add used to be in the first version of the spec be a public endpoint, right?
And that's a bit tough because you could essentially DDoS this endpoint, right?
And just register as many clients as you want. And also, not many enterprise-ready authorization server supported dynamic client registration.
So, it was a bit of a problem. I was like, how would we actually do that?
Some built a wrapper around that to call an API to register something of the authorization server. So, we kind of I think the ecosystem agreed that's not the best way forward, right? So, there is something now in development that is called client ID metadata document. So, the idea is instead of doing dynamic client registration, the client ID itself is a URL that must be hosted under HTTPS that has validated client information. So, think about Claude desktop would have an HTTPS URL with all those information that the authorization server would need. So, when we send that, the authorization server gets this document and then knows, "Okay, this is that client and this is what I follow with them, right?"
And then I have an additional trust boundary and a sense.
As far as I know, this is also very new.
Not a lot of authorization server support this check. I know that the Spring authorization server is working on that. So, there will be support for this at a later point.
There are a couple of other things.
There is off extensions. You can use client credentials. You could technically also for internal deployments use API keys. Then MCP security, there was a module to allow that, right? So, for because if it's not publicly reachable, you might want to use a different security mechanism, for example. Or you could just yolo and make it public. And for things like documentation servers and things like that, that would make sense. If the only question that you have now after this off section is help, then probably you want to look at two things. You can solve most of those things at the application layer with MCP security that is Daniel is driving. So, there's a lot of development to simplify those aspects for you that you do not have to care about a lot of those things. You can also solve them depending on which provider you use on some form of infrastructure layer. So, for example, if you deploy an MCP server at Agent Core runtime, for example, you get those out of the box redirect and header handling and so on and so forth.
That was off. Anything to add, James?
No, that was a lot. Yeah. That wasn't a lot. Okay. But luckily, Spring makes it easy. Thank you, Daniel.
>> Yes.
>> [laughter] >> All right. So, now let's say we have our secured remote MCP server. Now, how do we actually scale it, right? Locally, it's rather simple. We just spin up a process per client that we have. But if we now put an MCP server somewhere remotely, right?
And suddenly, I have a lot of clients hitting that, it might be a scalability issue, right? It might be overwhelmed.
So, how do we usually scale systems? We scale them horizontally, right? So, we just add a lot more remote MCP servers.
And then when a request comes, we would essentially load balance between those different servers, right? In a round-robin fashion.
However, there is a bit of a challenge because by default, by design, if you use the MCP protocol as is, you remember the initialization request that James showed before, right? Technically, this same initialization request and follow-up request should go to the very same server instance, right? And you might ask yourself, "Okay, easy, right?
We just do sticky sessions." You know the the point around sticky sessions, right? You might overwhelm a certain server. And sometimes, you do not even have control over the stickiness if you use a serverless environment. You might get just round-robin over a thousand sandboxes and you do not have control of that. So, I want to give you three ways on how you can address this challenge.
The first one is you do not care about it because it will be improved at the protocol level, right? So, there are various SCPs make MCP servers stateless where they want to get rid of the initialization call and really make that part of every request that you do not have this stateful or sequential requests happening after each other.
The second thing is you can avoid state at you can avoid state at all. So, in that case, we would basically Spring has a configuration that is called stateless. And then you do not need to care about that anymore. Now, the important caveat is you cannot use things that rely inherently on state such as sampling and elicitation. So, those tools will not be available for you anymore, right? Because you just do stateless tool calling, okay?
And if you still want to use that, the last thing is you can solve it on the infrastructure level, right? So, for example, with Agent Core runtime, what you do if you deploy the MCP server on Agent Core runtime, you get a dedicated sandbox per MCP session. So, the user will always hit its own MCP server instance and then you can have the stateful processing. We have an example in the GitHub repo that we will share later on how you see how you would deploy that. You would basically solve the problem on the infrastructure layer.
Good. Anything to add? Nope. That was great. Yeah, I mean it's super easy to switch to the stateless mode. My Java doc server, the Spring AI book one, those all run in stateless mode.
So, it just makes it super easy. I don't need any of the state features in those cases. So, yeah, super easy to switch it over to stateless. And if you want to just do basic load balancing, you know, multiple instances, that's the easy way to go.
>> Yeah. I think most of the MCP servers that are currently out there are in fact stateless. So, they do not make use of sampling and elicitation, right? But I think at a later point when we talk about stateful interactions between components that are long-running, I think state is something that we have to accept in the future, right? So, I would rather use those feature and evolve the protocol and the infrastructure to be able to have this long-running exchanges.
Good. That gives us 5 minutes to talk about context-efficient MCP. You think we can make it? Yeah, I mean I heard that MCP was dead. Skills are replacing it. And so, this the thing that you hear, you know, if you're watching Twitter and all that, is that people are like like, "Hey, you know, MCP, it uses too much context. And so, let's just get rid of the whole thing."
It turns out there's some techniques that we can use to improve that. So, that's what we're going to go through.
So, for example here, so the problem is now imagine you've learned about the MCP servers. Now, we add 10 MCP servers.
They all bring 50 tools. Now, the description that we said, right? So, that the model can decide which tool to call also goes into the context. So, you saw here, I think we have almost 31%. I mean, it's a bit of a bloated example, right? But it can eat up a lot of your context if you just smash in all the tool descriptions to let the model figure it out, right? So, the problem is accuracy. If you have a lot more tools, the model gets confused. It doesn't know what to call anymore and also cost because you pay for the tokens that the tool list actually consumes, right? So, a lot of things are yeah, let's just replace MCP servers with CLIs, right?
So, the idea is instead of using a GitHub MCP server, for example, I just locally use my GitHub CLI. And if I have a bash tool, why not, right? The model will be able to figure it out what CLI to call. And that can be a good solution for local developed for local developer setups, coding agents, something that has access to your local machine, that can be a good solution. However, there are very important trade-offs to it, right? Like how do you discover and maintain the CLIs that you want to install locally? How do you distribute them? And as soon as you start running those MCP servers or your clients or whatever on remote compute, you usually do not have where your application runs a GitHub CLI installed to just randomly do local bash calls, right? So, in a remote scalable remote enterprise deployments, you usually don't have that option at all. And how do you keep them updated and maintained, right? So, they can be a solution, but I think rather on the local side of things.
Yep. Yeah, one thing I see that when people do these comparisons, MCP versus CLIs, one of the things they usually ignore is that often times the agent, if it's trying to figure out how to use a CLI, has to do a bunch of actual turns with the LLM to figure it out. And so, it's like, you know, running the CLI with {dash} {dash} help, and then it's looking at the output and being like, "Okay, let me try these parameters." And then it's like, "Oh, that didn't work.
Let me like try these parameters in a different way." And so, it's there is actually quite a bit of context in real world use of CLIs in this case as well.
It's just like, you know, the imagination is that, "Oh, you know, I the LLM's never going to have to figure anything out." Turns out, nope. We either provide that through like schema and metadata around our MCP tools, or we let the LLM figure that out with CLIs, or put it into a skill that then also has to be read into context to then figure out to tell it how to use a CLI.
Yeah. And I mean, also the authorization aspects, right? There's usually no agreed-upon authorization standard on how you would use a CLI, right? You might export a token as a Right, you get the point.
Now, what we often also see about if you use MCP a lot to retrieve like a lot of context, for example, I'm not talking about actional context, people say, "Okay, we can also just use a skill."
I'm not sure if you're familiar. Who has used a skill before? Okay, a couple of people. So, the idea is you have markdown files. It has a description at the top. And the LLM or the actual agent would only read that initial description. And when it figures out to do more, then it will read the rest afterwards. So, the only thing that goes into context is the actual YAML front matter that you see at the top. And then you dynamically discover basically additional knowledge and you can combine that with local script files and Python script that you can execute, right? But again, there's also some trade-offs that I that we outlined here. So, we want to in the last 2 minutes give you an idea now how What would you do to make MCP itself more context-efficient? And the first one is only give your agent tools that it actually needs, right? For example, in Spring, you can do tool filtering, only give the agent the actual tool that it needs.
Um you can also manage that with a centralized gateway. So, you have a gateway, and on the gateway, you would control actually the actual tools that your client would need instead of making all of them available.
And the second part is progressive disclosure. So, first, only give the relevant tools, and then can you dynamically load those tools? And there is in Spring a tool search tool. There's a great blog post about that that you only give the client a tool search tool, right? So, in that case, you only expose a search tool and give it the task, and then it would return the list of tools that are relevant for the current task.
So, in the initial context, you only have the search tool and only when you want to do certain things, you get the list of relevant tools, right? And this is with the tool search tool a good way for doing that. And the last thing, which is quite an advanced and currently explored concept is the concept of code mode. And the idea is, let's say I want to add, subtract, and multiply. If I would do that and let the model do it, it would do three round trips. It would do a add tool call, subtract tool call, multiply tool call, three round trips, right? Now, if I already know that I only want to have the final result of that, you could let the agent write code in form of Java or Python code to make the tool calls in the code itself, and you only let the agent execute the code, so you only get the final result. So, you're basically saving a lot of context if you do not need those intermediately results, for example. There's a great blog post on that, and this is to be further explored.
With that, I think we're 1 minute over time, James.
>> Yes. I hope that's okay.
Oh, you could have the URL to the code This is the URL to the code repository on everything that we shared throughout today, also the deployments, and also code mode, a bit of an example on how that would work.
And with that, if I do not see any more phones, I would say, "Thank you so much, and I hope there was something useful for you in this session." Thank you.
>> [applause]
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
Re: 🗣️📍theprophedu📍2026 GST 103 CLASS (E-EXAM REVISION)
theprophedu
636 views•2026-06-04
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Instagram accounts got PWNed
EricParker
13K views•2026-06-03











