AI agents are systems that use language models to interact with tools and solve complex problems through iterative tool calls, with Kotlin's type safety and JVM compatibility making it ideal for building reliable agent frameworks like Koog that orchestrate LLM interactions through graph-based strategies and tool registries.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Building AI Agents in Kotlin • Anton Arhipov • YOW! 2025Added:
Hello.
Uh yeah. So let's uh do a quick start with the slides. I have a lot of demos to show to you, so I'm going to rush through them very quickly. Uh a little bit of extra introduction of myself. I come from uh this little country in the north. Very very under north. Uh population 1.3 million. So it's basically a CBD area of Brisbane and uh yeah I'm a developer advocate at Jet Brains. I worked for with Cotlin team for about five years and but my uh background is in Java. I've been building tools for Java developers for 15 years and did some enterprise development with Java in the past as well. So I've seen EGB2.1.
Um so what what is this talk about? Uh well the there there is this self-explaining title uh of course but uh it it's going to be an introduction into cook framework that we built at Jet Brains for ourselves but we are making it general purpose so you would be able to build your your agents with that as well not only with in cotlin but also in Java there will be a Java API as well. uh obviously it's a shameless uh placement of cotlin because I like the language and I think it's well positioned for the things we are going to use it for um and we're going to learn a few things about LLMs as well especially about the local ones because uh when I started build working on this content and uh with the framework uh as such uh half a year ago uh there were a lot of questions can we make agents with local models, not with the frontier models. And I did a lot of experiments uh how do uh local models respond to the things I want to do with them. And uh well, this is the fun part I want to share with you today. So one of the questions the philosophical questions that I have when we talk about building agents uh is why like what is the problem you're solving with the you know with the agent and why do you actually want to build an agent not just a deterministic software and uh not just because it's a hot topic you know hot topic to to do uh what makes the problem such a unique one that you want to apply uh a AI agents for solving it. And I I couldn't really find a lot of uh good information about what makes your uh problem so unique and how to prove that currently you have to attack it with AI agent. But when you have a problem like this and you don't have anyone to ask, what would you do today? Go to chat GPT, right?
and you ask from chat GPT and actually he proposes some interesting uh arguments why uh you would use AI agent as opposed to something else and uh for myself I uh derived that okay if you have if your problem space is such a vague one as for instance generating programs as with the coding agents then probably I agent is a good thing to do because it's very hard to generate a program deterministically ally for all the corner cases and uh limitations and and constraints um with with just a generator a deterministic one. Uh there obviously is more like uh attributes why you would like to use the agent for instance uh human like interaction with the system like natural language input but that's not the only attribute that should be there right the solution space should be huge enough uh and and uh there's like a lot of other attributes that contribute to this decision okay now I'm going to build an agent but assume we are now convinced that we are building an agent instead of writing a program. What makes Scotlin a a good choice? So some time ago like couple of years ago if you want to build an AI agent it would have been Python and uh our developers at Jet Brains also started with Python and we kind of quickly realized that it doesn't really work out too well because the products that where the agents are going to run the products that are going to use this agent framework so to say they are written in Java. they have the domain of like encoded in Java classes and so on and the distance is important like the uh the the software needs to be self-sufficient. So if we want to uh integrate the AI agents into our products then they need to run on Java Berto machine.
And uh the team lead of uh cook framework wanted to publish this public blog post why trains actually wants to build something in on the JVM with cook and not in Python. and he asked me for a review of his blog post and I I thought like you it doesn't work out like whatever you're writing here doesn't make sense for me what I read here is is a skill issue you don't want to use Python you want to use cotlin why but he was very furious and he wanted to publish it and of course he published it on on Reddit and the very first response was it's a skill issue of course but the thing is yeah It's not it's not a glorified client or like a window for communicating with LLM that we are building. We need to orchestrate stuff. There's an internal loop inside the uh workflow. There's there are the main objects. So there's more uh to it to than just making the LLM calls of course and that is the main reason somehow he failed to encode it in that blog post. So, Cotlin, why Cotlin?
Because it runs on the JVM. It actually runs not only on JVM for our internal projects. We also built uh we use this capability of cotlin compiler that we can translate it into various platforms to web to iOS uh for for building other kind kinds of agents that you know will be targeting the uh desired platform not only the JVM but also the capabilities of the language itself the type safety and DSL uh capabilities are actually working uh out pretty great. I will I hope I will be able to demonstrate that to you today.
Uh let's quickly talk about the highle view of this system kind of when we use cook for for the agent implementation.
So we have this uh user who communicates to the agent or to the software and uh there is an environment with the tools. The tools can be any kind of tools, the system tools, the the services, whatever. And then we have a bunch of LLMs that we can integrate with. So the user makes an input to the system. There is a request to the LLM that the agent can make.
There will be a response. The response can be of two kinds in this model is assistant agent. The assistant message is technically just a response, text response or the model can respond that it's a tool call that we want to make a tool call to continue with the the task at hand and then the agent delegates the tool calls into the uh the invocations of the real tools and we ingest the result of the invocation back to the LLM and the loop continues until we uh receive the assistant message at the And this kind of sign u signals to the agent that the task is done.
Uh and the the whole strategy of how uh we operate with that is encoded in the brain in the agent. This is where the orchestration happens and this is where we can tailor things for for the agents and uh make them smarter in like this is where basically we define what we want from the agent to do. Uh let's quickly look at the mapping uh of this diagram into the code. So there's AI agent API inside the cook framework. Basically an instance of the agent is going to do the work for us. So we are creating an instance here. We integrate with tools.
There is a tooling registry that we define in our program. Then there is a strategy. We going to look at into what the strategy is. This is the probably most important part of the agent. uh by default there is this default strategy with tool calls that if we receive a a tool call response from the LLM we going to make a call to the tool receive the response from the tool delegate the uh result into the model again until the uh LLM responds with the assistant message and that's the default strategy we don't have to override that but we can uh Then there is a part for like additional additional uh settings for the agent like what kind of model we want to run, how many iterations we allow the agent to do, uh the system prompt and and so on. There's more um and the prompt executor is basically client uh that will talk to the LLM in uh because this this is partially an integration issue. different LLMs require some like different schema maybe and uh we we need to adjust for that. Uh those are all my slides. I will dive into the ID now and we will see how the things work and uh they will fail of course they are unpredictable. I'm going to run like for just for uh for not relying on the external network. I'm going to run on the local instance of a llama. Those are the models I have uh currently on my machine whatever fits in the memory. There's a very nice model ministrol released probably last week. I downloaded it a week ago or so that is very very fast.
Uh looks very nice. I mean the size of it, the execution speed sometimes you have to adjust um for for some behavior but uh it seems to be very pro promising for small applications and I'm going to use that just for the speed. Uh but we we're going to see some other uh models in action as well. So this is the ID.
Let me close this thing. And uh what what what what is it we we need to do at first like basically hello world how does it look like? I have a bunch of helper meth methods uh ministral ministral to get an instance of the model. I will show you how it looks like actually. So that's the definition I have or llama client that resolves uh the model by the name and all we need is to get the executor which is part of the API. It's like simple llm prompt executor with the lama client and then the model itself. So let me jump back and I'm going to extract it into a variable uh executor executor and the model. So then I'm going to use this executor and the model to define a a basic agent uh agent basic with no definition of the strategy just you know those parameters and that's it and the only thing I need to do is to uh specify a system prompt that could be a very generic one uh you are you are a helpful uh programming no program uh AI assistant uh respond to the user queries.
That's it. And for the last bit, we need to tell the agent to run, right? Agent agent run uh what is the capital of France for instance and if we run this we're going to see some output but we're not going to see the result uh we we can see some statements right we it's loaded loaded some uh lama model it started the execution for the graph no enforced execution point means that there's no persistence involved. So it doesn't start uh doesn't uh try to start from some uh persistent behavior.
But where's the where's the output? And uh the output is that is basically the result of this run method because well also we want to print it. Now this example here doesn't make any sense. If we talk about agents, we just created a a simple client that makes a query to the LLM and make and and gets a response. What we want to get is this a genetic behavior with the loop where the uh tools are involved and see at least one iteration of that loop that uh some some tool is being involved. So uh tools sorry uh tool to tool registry it should be tool registry. Yep.
Uh so there are tools uh in cook framework provided out of the box. For instance to interact with the user I could add say to user tool and uh for purity of the experiment I should actually uh remove this uh system outprint line statement so that during the execution I would see the loop and see the tool invocation right there. Uh let me run the example.
And we still don't see the result. Uh why is that? Because well the model it depends on the model. Now will the model decide to execute some tool or it will will not decide to execute some tool. We can uh affect that or like influence that using system prompt. For instance, you should should use say to user tool to uh print the response result.
Um let's try that.
Still failing, right?
So there is a lot of trial and error like adjusting the system prompt in order to convince the model to finally start using the uh the tool or you can switch to some more sophisticated model.
Uh for instance GPOSS is far more sophisticated one but works slower.
So I was trying to do that with a smaller model that is faster just for the speed. And here we see after switching to the model that we can see some agent output.
So tool calling like the first outcome of of this experiment is that tool calling very much depends on the system prompt and the the uh the model that we use.
uh when we are successful with this experiment here like why won't we uh try something more interesting like uh more complicated after the model tells us the the result why don't we ask it to uh kind of ask ask a follow-up question from from the user um make uh after after uh providing the result.
Ask a followup question using uh what was it like? Ask user tool and I'm going to add that as well to the registry ask user tool. Um let's see how it works out. Maybe it won't, you know, like it's it's all probabilistic.
Uh, how was it? Yeah.
Uhhuh. Would you like to know more about Paris? Great stuff. What is the temperature in Paris?
So it makes a request to the LM again and well I'm sorry but I don't have a real time weather data so I need to supply some other tool for that. LM are smart enough uh sometimes when when you ask them to do something but then they also can hallucinate and tell you that they actually do the work without actually having tools for that and they do that so uh let's say convincingly uh that you might not notice that they are actually lying. Uh so here we are building a workflow already and if if I want to make it more um predictable deterministic I might start you know adding some uh conditions uh like what you should ask or what kind of uh answer you should provide me uh after after I I submit some um some task to you. And this makes it very unpredictable how LLM will interact with me. I actually want a workflow especially if I start building business applications or or coding agents maybe.
So this is not a great idea. Today in the morning I I tried this example uh multiple times and managed to run this LLM into a loop. So they it basically started responding that the capital of France is Paris over and over again without stopping and without asking me.
And when you see the LLM going into the loop that basically means that it will not get out of the loop because it follows the pattern.
Let's stop that. Let's not try to experiment with uh uh breaking the flow.
But you just can uh trust me that it can break. So uh what we what else we can do like for the agent what kind of APIs do we have? We can handle events for instance and this is very useful when you are developing something and you want to see how the agent behaves for instance on on LLM call starting we can do something or on uh tool call uh starting or tool call finishing. This kind of hooks allow me to debug stuff uh even integrate with some other logic in my application and and uh well handle errors maybe as well. So that's that's a handy API that I'm going to use uh further in my examples as well. So uh let's talk about tools a bit more. Uh tools are might be just you know any code in your application. There is a tool API of course but there is also just uh an annotation that can make any function behave as a as a tool right one single annotation what the LLM needs to improve the resolution of those tools is a better description so the LLM description annotation helps us with that so there is a temperature the tool provides a real uh time temperature in Celsius is it integrates with weather.com.
Uh so this way I'm just faking uh the tool and I want to make the LLM think that I'm really providing a service uh or like real data from from the service and it's going to return 30. That's it.
Uh and we're going to ask it just to you know provide the current temperature maybe let's be more specific in Brisbane.
And then in this case we're going to need uh a city city string. We can give it a description as well but this city will be encoded like into the tool description uh using reflection. So I'm going to uh skip adding the description for now.
And uh what else do we want to see? We we're going to see the tool call. So when we execute uh the example we are looking for a invocation of this function and we we will check what the result and what the uh conclusion will the LLM make if I you know get 30° C in Brisbane and uh what what should I wear?
What kind of clothes should I wear?
Let's see.
Okay. So we we can see that the tool uh has been resolved and uh the other tool uh call also happened. The the model is interacting with me through the tools.
That's great. Uh that's too positive actually. I wanted this example to fail first. Uh but you never know if it will fail or not. Uh so the the uh temperature is 30° in in Celsius right now and uh we are great. Uh everything's nice. But what what else could happen?
For instance, we forgot to add the description.
Maybe the uh function name could be something like I don't know uh some tool not the temperature tool. doesn't have any reference in the name that it can provide a temperature. Uh it provides some kind of might provide some kind of unrealistic uh number as well. Uh let's see what happens then.
Some tool it's still decided that the city kind of matches what we want and like that that that is the only reference we have in the code right now.
that somehow refers that you can do something with the city. Um, but let's see. Yeah, I'm sorry, but I don't have real time weather data, so I can't give you the exact current temperature in Brisbane right now. Um, the tool resolved uh returned unrealistic value of 300. So, and and depending on the model you you have uh like connected to the um to your agent, you might get wildly different results.
Sometimes they are confidently lying that the temperature is plus 24° and therefore you should be wearing a t-shirt or uh some other uh model might say okay we have a tool I recognize the tools it's good and they have some registered uh examples here so here's the tool uh we need to call it but the model does not generate the tool call response so that the agent doesn't know what to do with that. It's just a assistant message not like the model does not instruct our agent to actually run the tool even though it recognized it. So it's very unreliable in this sense.
Um and this makes made me think like why don't we have anything in the framework that will would at least uh help us to extract express the contract that for this kind of functionality for this request to uh implement this task there needs to be at least one tool call reliably generated by the model and if it doesn't then you know we have to retry and retry and try and maybe switch the model in in in this execution as well. So this is a part of an orchestration that you actually need to do in the agent to to handle the situations like this.
um and uh it's not implement my request is not implemented in the framework but there is another uh uh way to enforce to execution and for this I need to explain what the strategy what what kind of strategies um we have in the in the framework and how you can implement one so I will delete this example to start from scratch there is a an agent the same agent that we have seen here. The only thing that changes is the strategy that we are going to implement right now using the DSL. So imagine the agent is going just to uh get a random number as a string. So the string is like a valid number. We can convert it just for simplicity for this demo. And that's going to be our first task to convert this string that represents a number back to a number back to an actual number to be able to do something like some math maybe. So there's going to be wall string to int node. So strategies are graph based in cook uh and uh it takes a heavy uses the DSLs and like DSL capabilities in cotlin heavily. Um node is this uh custom function that generates a custom node basically and the node may have an input and may have an output as well. Um so and the implementation is uh defined in a lambda. So here we have it to int and that should kind of work.
Uh yeah it doesn't match the type. As you can see the compiler already telling me you're returning an integer from the lambda but your initial definition was that the note should return an a string.
then you know your operation doesn't match the type you have declared. So that's my mistake of course. And another one for instance we want to increment the integer by a node that takes uh integer and just increments it. So we have two nodes right now that one that you know converts the string to an integer then we have another node that increments the integer how we connect it all together.
So there is a a DSL or like the the API I would say that um allows us to do that allows us to draw the edges between the nodes and the initial node is always node start that is always present and obviously there's note finish as well.
So from node start we could go to forward to string to int.
And as you can see it just uh you know connects and compiles. But if we had integer right here as an input then this would not compile right. So it's all type safe. Uh why wouldn't it compile?
because well we expect string uh or we expect integer but we get string as an input and this uh input type is declared on the strategy level right so let me change it back yeah now it compiles now we we need other ages so from string to int we need to forward to increment node and from the uh increment node we need to forward to um finish node right not finish why doesn't it work obviously some type safety issue again well our strategy declares that the output is going to be string that means that node finish expects string as an input ink node currently is declared as uh string as an integer. Now we have two options either fix the types maybe on the strategy level like like this or we can transform uh the data on the fly as well transformed and say that it should be a string that works as well. So this DSL kind of allows us to uh connect the nodes based on the types, do transformations, do conditions. For instance, what kind of conditions we could do here? For instance, we could uh based on the uh the nature of the number that has been you know created using the ink node to make some conditional u transition transitional edge into the uh some other note for instance um Spanish like we can ask the LLM to do uh int sorry int to do a translation or spelling for us like spell spell it in Spanish and do the same I don't know maybe for for Italian and then spell it in spell in.
And what do we want to do then? We want uh for instance for odd numbers to do like to spell in Spanish and for uh even numbers uh spell in Italian edge uh ink for two Italian for instance on on condition and then the condition could be something like this right like uh two equals zero and the other one for vanish and doesn't equal to zero something like this. So we basically have a graph right now that is you know statically checked for the types. uh we have in the first nodes we are not doing any work with the LLMs but then on some in some places conditionally we may work with uh additional you know help with the LLMs and supply uh what what do we expect the LM to do what kind of response we want the LM LM to provide so let me run that let's see uh okay so serialization I guess.
Yeah. So this this is a little bit unpredictable but uh let me try again. Somehow Olama mod client is not too stable like it doesn't match the edges of course like uh currently the compiler cannot check the completeness of the graph but I have forgotten that uh I need to connect the Spanish node to the finish node of course not finish without transformation and I need to do Italian to not finish as well.
Great. So that works. A few things that might be interesting regarding Cotlin, right? So you probably have noticed that we are using this by syntax instead of equals mark. I hope like other people who know what it is completely in like unfamiliar to you, right? So the by uh syntax means we are using delegates. In cotlin we have delegates and uh this technically means that the type that is returned by this function contains uh a common logic for accessing the properties. Uh I have an example of that as well. So I it's just not not a cook frameworks uh code. It's a custom example here. So I have a node node delegate that implements get value and set value methods. And this way we can reuse the logic or like you know for um for something for properties and and delegate this logic into some uh external classes uh and and this way it's actually very convenient to uh compose uh those graphs in in Google framework.
Um okay so we we talked about strategies quickly and uh those strategies are graphs consisting of nodes. There also is an API that allows you to uh create subgraphs. So your strategies may become uh more granular. So you have sub subgraphs of nodes and you may compose them as you wish. uh but subgraphs have uh one interesting capability that affects the tool calling uh in a way it's uh it's now enforcing the tool calling for me. So here's the same example with the temperature right if I make the temperature call or the temperature tool to fail so that the note uh the the LLM is not recognizing the tool is not calling it and if I put this logic the invocation logic into a subgraph of the strategy the subgraph has this interesting uh additional capability or additional behavior where they append send an extra a small extra system prompt where they tell to the model that the model should only talk to the user via tools and uh to get out of the subgraph it needs to invoke the final tool that the work on the subgraph is finished. So if I execute that we probably will see uh additional yeah so it actually was since it's all asynchronous sometimes you can see the output written in in uh out of order but uh when I executed it uh there is an enforcement happening in the subgraph for the tool calling. So it's not really designed for enforcing uh the uh invocation of the tools but it has this nice side effect so that if I wrap some operation into a subgraph in most cases it enforces the uh execution of tools.
There is no like way to uh fall back and say oh you did not execute this tool therefore I want you to try it again. So currently it's it's it's not really uh a remediation for this problem but it's a nice side effect that it improves tool calling. So since I only have 5 minutes uh I promised more in this uh session I have a few examples where I iteratively built uh a coding agent and that that actually can uh generate the code for me. I probably will run it once uh maybe not this example but uh there is a repository that you can download and experiment with that. Technically creating a coding agent with a very simple strategy even like the default strategy uh can get you to the uh 50% on the SV bench. So a very simple agent with 50 lines of code is capable of generating pretty good code for singleshot tasks uh with no additional effort. It's the other 50% that like even not 50 but the other 10% where like if you check the uh results on the benchmarks for different coding agents they are barely getting to 60 or 70. So you can imagine to get uh to 90% on the benchmarks they will need to do a lot more effort than they are capable of today even with frontier models. Um but doing something simple is is very easy. So here I have like a a system prompt. Hey the coding assistant you you you should be doing uh like creating the code in some simple directory. Here are the tools. I will give it the tool to execute the commands. Like if I'm brave enough, I can let it to execute the commands in my shell as as as it needs or I I I can uh switch to non-brave mode where I have to confirm every execution and it will get me pretty far. But it's so like even though the strategy that is encoded in the agent is you know statically checked because it's the language when you start actually building something more sophisticated you want uh to start working with the real domain model let's say a project a directory uh a plan development plan whatever else um and one example that I made was that I want some domain objects to be present and I want the LLM to map the or the agent to map the responses from the LLM back into the main objects like a plan and the plan item. For instance, if I want to ask the agent to first uh generate a plan or how how you are going to build the app for me and then stick to that plan iteratively. And in my DSL when I'm you know first I I execute the uh this request that hey create me a plan that's the that's the prompt right create a minimal list of tasks as a plan how to implement the request that's going to be for instance a fsbuzz application in cotlin right and I will get some response and uh it will be mapped on the plan domain object that really exists in my application and then in a type safe way I can start doing some work with that like items rights for each and so on that's nice but how predictable is it that the uh LLM is going to map the response or like will generate the response that is going to map on my domain model and for this uh the LLMs or the LLM providers provide this ability to generate a scheme schema like there's a structured output if you go and check the structured output formats for instance there's like the JSON type schema that I can uh conf like generate during the uh request to the LLM and uh even though this example works it works because it's so simple it's just two fields in the plan uh object object, right? So the the chances are very high that I can just tune it with a system prompt that it should be a JSON format and there should be an ID that is type integer. There's a task that is string.
So with very little effort I was able to actually tell the LLM that I want the response to be mapped into uh some domain object. But I want to ensure that if if the structure of the response is much more sophisticated, I would have to do a bit more work. So uh the same same example but with structured output where I have to annotate those uh domain objects with uh more um information what those attributes of those domain objects are uh what do they represent and then create some examples of that to be added to the request where I start generating the the plan and force the model to respond in a structural way.
In this way I I can actually get the uh the uh the main objects that I'm going to use in my agent further. So that is one important topic when you start working with agents and building something with the agents uh is the main modeling and also that is uh uh a reason why you want to keep the agent code within the same runtime with your application, right? because if your application is running on JVM, you have your own domain objects already there in your application and you want to reuse them in the agent as well. So, uh I'm out of time and uh I thank you very much for coming and I hope it wasn't the time wasted. Uh I have this links for you. So if you like my talk, uh if you like the topic, if you're interested in any of these, I have links for you. Uh please rate the talk in the conference uh app and well thank you for coming again.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











