In agentic AI systems, the deterministic harness (approximately 98.4% of the system) is more critical than the underlying model for successful task completion, as it provides the structured framework that guides the model's behavior, minimizes token costs, and enables reliable tool interactions; this is exemplified by the recent release of goal commands in AI coding tools like Codeex and Claude, which allow agents to autonomously execute complex tasks through iterative loops until predefined conditions are met.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Goals in Agents - 010 Agentic ThinkingAdded:
thinking.
Welcome back, Agentic Thinkers. I I don't know. We don't have an audience name yet. We got to figure out what our audience name is for this one. But welcome back to Agentic Thinking with Mike and Matias. Hello, Matias.
>> Yes. Hello. How are you?
>> I'm doing well. Good afternoon. Good morning to some of those of us in the US still. Uh >> we are an international podcast. We have people from all over the world, you know, uh joining us today.
>> Let's unpack some of the news items. We have a lot of things that have been happening. As always, we are not disappointed with anything AI related.
>> New ideas, new concepts, things are coming out all the time. And we have some really interesting topics today that have happened since last week around this time. So, let's kick off some items here. Let's go um down the route here, Matias, with your concept here. I think I think another feature has been released from Codeex >> and then immediately copied by um Claude or Anthropic as well. Let's uh let's step into this one. What's this new feature called?
>> Yeah, lots to unpack here. They both released the new goal command. Uh if I remember correctly, and I'm pretty sure I'm I'm right on that. Codeex released it a week ago. Um slashgoal. Um, if you look at u Claude's change lock, yesterday's Claude code release mentions goal.
>> Um, look at that.
>> So, they've been playing catchup um, arguably uh, but also very, very quickly. So um in a way slashgoal is an evolution of Ralph loop I would say where basically um what people have been doing since January or so on Ralph loops where they're basically running the coding agent in a loop uh you know um from the outside um those vendors have now brought that into the agent to basically say uh the slashgo command allows you to provide a um let's say a a a longunning um complex goal and the the harness itself will ensure that uh it keeps going in a loop um until a you know until a certain condition um until a certain goal is reached. Um and um uh we've got two links here, you know, to to the respective um documentation for both codecs and um >> today >> Claude and um uh it's all brand new. I haven't really experimented with it very much but um it was obvious that something like that would be coming because without that kind of vehicle >> you know you are limited you know you know without that without some kind of um let's call it orchestrating concept around your coding harness you are somehow limited um and uh you end up being a bottleneck yourself because you you may you know start a longunning turn but uh you still need to react to it and in most cases you know you you don't get a one shot uh deliverable in most cases you do have to come back uh and challenge to harness and this is I think you know what what this new goal command is meant to do for you but it's all based on you explicitly providing you know what's what's the validation step?
You know, you don't get that for free.
So, you still have to do some upfront work and you have to give the model a way to validate, you know, um how can it um determine whether or not your prompt has been successfully delivered on.
>> This I think is the trick of this whole system. You could tell agents to build a website, but if there's not a definition of success at the end of that prompt, again, this is I think we probably have been inadequately using agents or harnesses for a while now. And what this is doing is it's taking more of our rigor around engineering processes and app development processes and incorporating them closer to what agents because I think what if you think for me as I've learned to use agents my first interaction was a chat window. I just give it a random prompt. It comes back with some results. Now this is evolving into like these looping experiences and now you know I want it to go build something but then you need to go check the build of what it created. Um, test driven development is a good example of one of these things.
Build a test for this thing and then make an API that meets the requirements of that test and then you can add more tests to it. And so test-driven development is is becoming more front and center for our team and how we're using it. Start with building the tests and then figure how to make APIs meet or achieve those tests. So I really like this the goal.
I would maybe even analogy this to like milestone or you know requirement or it it it's really causing me to think a lot more about what is the expectation what do I require it to deliver and by more clearly defining the outcome I can let it chew a lot longer on internal states to figure something out. Are you have you used any of these features yet Matias? Are you are you um focusing on have you explored these things yet?
Well, I mean, not those specific features because, you know, they've only just been released, but conceptually, I've been in that space for a while now, you know, and um my my personal um sort of um state-of-the-art at the moment is to actually use different coding harnesses in conjunction, right? So, that would be my big criticism or or the the reason why I wouldn't actually use those features. You know, they they keep you in the same ecosystem. you by, you know, by definition if you're using slashgall inside codeex or slashgo inside um clot code everything happens in codeex or clot. Yes.
>> You know, I want to be able to actually combine uh different harnesses and particularly those two, you know, as arguably the strongest in the market today um to criticize each other, right?
And uh so I explicitly architecturally I would want to run that loop outside of either of them. And so that that's what I've been doing with with some success actually. Um but um of course you know it means you know you need uh uh subscriptions um or or sort of billing setup with different um providers. You you need to have much more uh complicated setup. So um goal you know within the same harness is definitely something that will um solve problems for lots of people. I would say >> this reminds what you describe to me. So again maybe maybe I'm let me me rematerialize what you're describing into another way of saying it maybe for so how how I comprehend what you're saying.
>> Yeah.
>> Just because I built code with Claude doesn't mean Claude should be the one reviewing said code. I want I want a different trained model >> something with an outside perspective doesn't have the same weights on the model doesn't have the same information in it like um you know and even maybe you know the the code that is the the agent that is building the sorry let me say this again the large language model that's building the code maybe a simpler model GP 54 or GPT something else 03 mini right we've given it instructions it built something, but the testing agent should be a different model on a different stack to give you a better perspective. And now that we're able to do this stuff at scale and automation is a lot more in place, why not have two reviewers?
>> I'm going to build in GPT3 Mini and review in um you know, Claude Opus, maybe Sonnet, maybe even GPT55 or something like that >> worked or the other way around. uh which I think makes even more sense in a way. Why don't you have two implementers and one reviewer and the reviewer picks which which solution to go with? Right.
>> Yes. Sure. Exactly. And now that this is all program. So this reminds me of a project from another one another gentleman on the internet that I don't know but he I I follow very closely. Um Matt PCO I think you know him well as well. He's a really good educator, really diving into the AI space and he has this project called Sand Castle.
Have you heard of this Sand Castle?
>> Absolutely.
>> Yeah.
>> So, it's a bit more above than just a harness and and some things. He's doing a lot more other things here. He's, you know, sandboxing an entire agent and putting in its own little Docker container and then running some code on it and then letting it, you know, interact. To your point here, having multiple harnesses or sandboxes together where you just orchestrate them together and have them work as a unit is I think really important and very powerful to use.
>> All right, I this is a really interesting concept. I also want to talk about the idea that is interesting that Codeex did it first.
I want to talk about the period of time it took Anthropic to add a similar feature. What are your thoughts on this one?
>> Yeah, totally. Totally. That's the other big one to unpack here, right? So, it's not coincidence obviously that these things were released in such short succession, right?
>> Yes.
>> Clearly, these guys don't coordinate, right? So um obviously if if you want to sort of guess what could have happened here, right? Um it could have been the case that somehow concurrently both companies were working on the same feature and codeex just happened to release it first. Sure.
>> Um what I think is more likely um I I think in terms of where the industry is at, it's pretty clear that this kind of feature was needed and was something that engineering teams sort of needed to evolve towards you know with their agents.
>> Yes. Um so it was conceptually there.
Um Codeex productized it by making it a feature in Codeex. Enthropic effectively had it already but then they managed to productize it within days.
>> Yes. one because you know they actually have an engineering stack now that allows them to ship that fast and two sort of the the the underlying uh architecture needed for the feature was already there. They just didn't think of releasing a feature called goal uh you know before codeex did. So that's what I'm guessing happened here. And it's it's it's enormous to think about, you know, that this is the kind of stuff that can happen in days nowadays.
>> If you miss the first news item about goals, all of a sudden the program that you're using that's not the one that creating the feature gets the feature automatically. like the the I think I think the concept here is the proliferation and speed of feature copying >> between different tools is now really high.
>> Like you can feature copy something in a a couple days >> into your product now. And I think this is going to be very interesting to see how the software market handles this just in general like just any program doesn't matter what it is teams zoom you know uh you know Salesforce like any piece of software that's coming out with things if you're able to take the feature development you see a new feature appear and hm that's a pretty interesting feature we want to make sure we have that and and maybe to your point here Matias and we don't see behind the curtain on what's happening on the development side for anthropic But they may have already had, to your point, I think they're dog fooding. From what I've seen in the past, is they have everyone independently building with cloud code features and things they're finding useful. They immediately dog food those items to the entire company.
Everyone has access and can start using them. And then items that get used more frequently then turn into real features.
They productize it and they push it out the door. So the idea is like their whole company is the software development firm. Anyone at any level can make a feature and build something inside the product and everyone gets that immediately. You're probably right.
I think what was happening here is they probably already had a feature that people were using. We're like, "Oh, wow.
>> Codeex released this thing a week before us. We already had something on the shelf that's similar. Let's just spend let's just focus our attention on that feature for a little bit.
>> Put a couple days of cycling on it.
Okay. Now that our agent has refined it and they made it better, now let's push it out." Mhm.
>> So that is really interesting to me. Um >> and the other thing why uh it's so easy for them um to turn things around so quickly. They don't have to worry about UI, right?
>> Those are those are 2e tools. I mean obviously there's a bit of UI now, but uh those ones are, you know, fairly simple and standardized. uh that from a software development point of view, you know, that that is a huge advantage when you don't have to worry about uh web UI and and and testing >> just CLI everything CLI >> these days.
>> So let's let's talk about speed of things. I want to maybe shift the conversation here slightly to a little bit something else. Um have you heard of another company called I think it's Cerebrris I think is the name of the company. Cerebras is the company. is Cerebrous AI and I'll put the link here in the chat window.
>> Matias, I stumbled across this in one of my feeds.
>> And this is a chip company. It's making computer chips for GPUs. Okay, so for scale let me I'm going to ask a question and we don't you don't have if you don't know the answer directly that's fine. I I totally understand but like I'm just going to throw a question to you. When you look at models that you run, whether it's on anthropic or whether it's from codecs or even if you're running things locally in your machine, like when you run models, >> do you have a number? Do you know how many tokens per second are generated? So words that are coming, it's not really quite exactly the same thing for tokens, but like you know, I get my mind proxies the, you know, number of words in versus the number of words out as tokens.
>> That's probably not a good analogy, but that's how my mind works about it right now. But tokens per second is a thing.
>> And so my question to you is um when you're using the models you use today, what is the rough number of tokens per second you think you're getting?
>> Well, I don't it's not a metric I generally look at. I'm only I'm only familiar with it when I run models locally because LM Studio for instance, you know, which I generally use for local models, exposes tokens per second um very nicely. Um so there depending on the model I get somewhere between 60 80 or 100 tokens. I I don't know about any cloud hosted models.
>> Yeah. Must be a very different number there I guess.
>> Yes. And also if one thing I've been now that I'm actually more interested in the tokens per second number a little bit because of the Cerebrus company and we'll promise I'll get you to I'll get you to the meat of the conversation here in a second.
>> I didn't really understand what things were doing. So, I'm actually going back to the debug mode in Copilot, >> and it tells you how many tokens were used and how long.
>> True.
>> Yes.
>> So, so there's a little bit of a but it doesn't really give you it gives you time and then tokens. It doesn't actually tell you like tokens per second. So, I'm doing some of the like some of the back back of the napkin math >> to kind of work that out, right? Okay.
I' I've got a uh I'm going to literally pull up VS Code right now and see if I have a session that has some debug in it, but like you know it's it's taking it 3 seconds and I ran 2,000 tokens, right? Something like that, right? So or or whatever the number is, you know, 150 tokens or it took it 15 seconds to do something. What is I ran a thousand tokens and it took it 3 seconds, right?
That's a number where I think they're getting the tokens per second item from.
Okay, so why do I bring this up? The Cerebrous AI company. And if you go look at their website on their homepage, they are talking about blazing fast inference powered by the world's fastest and largest processor ever. Okay, for comparison, a Nvidia GPU I think I if I've read the numbers right, an Nvidia GPU that today has around 22 billion transistors on the GPU for for Nvidia for Nvidia. This is like a B2000 um or sorry 200 a B200 Nvidia GPU. By comparison, the Sebrris chip is not just a single chip, a wafer.
So when you build computer chips, you build these things called wafers. That's how they build the chip up. The entire wafer is the chip.
>> So this is a massive volume difference.
And it's pushing two 1.2 2 trillion transistors is what I think I saw.
>> So, we're going 22 billion to 1.2 trillion transistors to do the inferencing.
>> This sucker is fast and I'm floored by the performance. Okay, so what I was trying to just get my head around that their whole mo is and think of this.
This is e this is token economy at scale.
When I look at the AI world in general, everything boils down to how fast can I run tokens through a system. It doesn't matter if you're Microsoft, if you're local on your own machine, everything boils down to speed and access to cheap tokens. This whole system runs on this.
This is blowing my mind. You're just talking about 150 to 100 tokens per second. This sucker is pushing 1,800 tokens per second in certain scenarios.
So, I was so like, "No way. This is not There's no way this could be true." I'm writing. I go into the program. I'm even I'm like, "I'm so bought into this. I bought the API. I threw some money down.
I'm like, I need to test this. What the heck is this thing doing?" So, I get in.
I'm really excited about this. It is I told I just threw some simple tasks.
Build me an HTML HTML website with this, this, and this on it.
It built the whole site, everything, the full HTML, over a thousand plus lines in less than 3 seconds. It was >> I couldn't prompt it quick enough. By the time I hit enter, the answer was appearing and I couldn't it. You know how sometimes when you like work with an agent and you send a message to it and it starts writing text and putting tokens out for you, right? You start scrolling the window and you can kind of see it thinking and it it does it provides text back to you. watch it think through the text on the screen.
Zero of that. There's there's no you can't scroll faster than the thing is producing the code.
>> Mhm.
>> I was I was floored and I thought so one it was very impressive to see the speed and how fast this thing could produce tokens and I was using some pretty large models their demo website. So if you go to Cerebrous AI you go play with the the website when you look at their models that they're providing they're open source models. So it's like Quinn and it has this X AI I think model that is it's not XAI like the company you think of today but it's like some other open source model a GLM model I think is where it's coming from something something that's still open source >> but these models are really big they're >> hundreds of billion parameters in size so they're substantially large and they're just ripping tokens like there's no tomorrow >> and so I thought to myself >> I'm we are seeing the excel acceleration of the entire AI. This this to me is an inflection point at this point. If the models we were using before could produce say 300 tokens per second, whatever that may be, or whatever they're producing on the on the main the main frames. If you can double or triple that or 20x that to models that can produce tokens that much faster, this is going to greatly change the entire spans of the system. And I'm I'm looking at this going, what does this mean for me?
>> How what does this mean for like what I can?
>> If you can render an entire website in 3 seconds, >> does your do you even need a website? Do you just give the AI agent what you're trying to sell and then when the user shows up it just based on information it has about a user the it just present it literally materializes the website out of thin air and none of it's real. It's always generated in real time exactly what users are clicking on.
>> This makes >> this makes UI generation of stuff >> so fast you're going to change what you build.
Anyways, I was just just want to pause there. What are your thoughts?
>> Okay, I'm gonna I'm gonna challenge that a little bit if I may. Right. Yeah. Um >> I I mean I would argue that sort of real world agents don't generally have token generation and token flow as a bottleneck at all.
Real world agents have value because of tool interactions. you know there's a lot of IO you know on your file system but most importantly IO with external services uh in fact I would argue that a good agentic system is one that very rarely relies on inference a good agentic system you particularly from a cost perspective is one that has a huge degree of determinism in it and and uh only certain aspects of it rely on inference as in you know a call to a all um you know if and when absolutely necessary otherwise things can get very wasteful and you know we've talked about this on the podcast many times um consuming tokens gets more and more expensive right now it's definitely not going in any other direction than that right so uh anyone is well advised to design a gentic system in such a way that um the amount of tokens sent back and forth is is minimized massive Right. And so um this what you described may well be really really um impressive but the scenario you described was you give a prompt and then it creates something for you in return out of thin air right so basically a green field project right you get something produced for you out of nothing. Um in reality you have brownfield projects you know in the sense that you want you want to do an iteration you know you you want it to um I don't know um perform some actions on on your emails or on on some you know some some some other inputs um you know in a in a in um messaging service or so.
Um so this is where the agents have to interact with the real world and uh that's um where you cannot speed up things at all um using hardware that so um as I'm thinking about that there's actually one very interesting research paper that comes to mind uh which um I looked at recently. remember a while ago when Enthropic had this massive um presumably unintended leak where they accidentally published cloud code uh in conjunction with source maps which then meant that loads of people out there um were able to completely reverse engineer the cloud code harness. Um there's a research project um which came out of that um which was published in early April actually almost exactly a month from today um which the the the research paper is actually published as a GitHub repository called dive into cloud code very interesting can absolutely recommend that um and um one of the headlines um is fascinating they're saying 1.6% of the claude code code base is actually AI decision logic and 98.4% is deterministic. 98.4% is the nonAI harness that makes cloud code what it is and that makes it as good as it is. Right? And this actually is what you know real world agentic systems should look like. Um and so that um if if anything that's proof you know that um um uh that we need to invest much more in harnesses. we need to invest much more in, you know, agent design um uh outside of the of of the inference model and outside of prompting, right? Um and yeah, it just happens to be something which, you know, I'm sort of very deeply invested in right now. You know, I I've I've been doing a lot of work around different harnesses. Um uh hint hint. Um and uh so um it's definitely something which um I can um uh which I definitely very much believe in because I've experienced it.
So while I agree with your point I agree with your point and I do it I think look let me be clear harnesses are what make the large language models sing bar none.
and and I love the paper that you're you're presenting here. Um I have heard other statistics that like 60% of any good large language model is always the harness and it's probably even hard it's even further now more like these looping systems and deterministic systems that are built on top of so a lot of I think what you're describing here is the deterministic system that you put on top of the the large language model helps it guard rail the large language model to give you better more accurate results coming out of it. Right? So it is it is kind of a wild animal and you've got to like you know uh wrangle it to some degree and that's where skills and harnesses and everything comes from. But what you know when where my mind goes to this is anytime you have a leap in production of speed of anything >> brand new businesses solutions technology builds on top of that right and and I'll maybe pull out another analogy here.
Google has also decided that that's an advantage for them as well. So Google decided, look, if we want more people using Google, what we need to do is we need to make higher speed and cheaper internet for everyone across the US and a lot more areas. So Google's made a decision to say we're going to invest more in fiber distribution for many other cities across the US. And what that's doing is it's dropping the price of the ability to have access to the internet. So when that speed turns up an order of magnitude higher than everything else, what that does is it spawns more video usage, it spawns more YouTube usage, it spawns more applications, uh applications get bigger and larger. So anytime we see a shift and substantial increase and improvement of some kind of technology stack, I think what we get from this is things we don't know about yet, but it's it it generally drives new innovation in that space. We're seeing a general shift here. So >> to your point, um >> I still think the story of dollars per token or dollars per million tokens is the right story here. here and when you look at their documentation they're saying that they're able to reduce >> costs on token generation by 30% or something like that right so now you know that's just on the chip you know how many million tokens can I produce for a certain amount of dollars but there's also a lot of other costs and um other data center providers will talk to you about this right it's not just the cost of the chip and to run the tokens and electricity it's the power grid the cooling systems all the building you got to put it in like anytime you can consolidate that down to a more concise system or package, it will become better. Uh I don't know if you saw the the announcement here recently, but I believe Nvidia just announced a satellite inference chip module. Had you did you see this one?
>> No.
>> So I believe on Nvidia they just announced a satellite inference chip that you can send up into satellites.
Well, this sounds just like SpaceX trying to put data centers in the cloud to 100%. Like that sounds like exactly what they're going to do is they're going to move all this inference into space into satellites and now you have unlimited power because of the sun and you've got unlimited cooling because you're in space. Like two of these really big problems are looking like they're getting solved now by Nvidia building a new chip. And I'm like, "Yeah, I can see the writing on the wall on this one. This is clear as day to me.
You're trying to build data centers in space." Um, but you know the this is what we're walking into. like we're walking. So I see this as the first couple steps of the next couple months of where we're going to start seeing companies really doubling down on inference speed, dropping token cost per million tokens and making that more efficient because to your point Matias, right, the harness runs on my local machine >> typically >> or in a in a VM or something like that.
>> That's a very fixed known cost. I bought the computer that's on my desktop. It costs me zero new dollars other than electricity to run the harness. The only variable in this system that I think is expensive right now is the large language model cost. That's that's the expensive piece. And I could run a harness all day long. No large language model cost me like 30 cents in electricity or a dollar and then $5 like nothing. It's nothing. So the deterministic side of of software is cheap.
the undeterministic side of software is expensive. And this is I keep getting my head around this whenever I talk to Microsoft or PMs or like trying to like communicate things.
I'm like we need the non-deterministic nature of large language models to build more deterministic software for us >> because that's what runs cheaply. That's what run effect. So the hard part is getting the software written >> but then the cheap part is running it on systems that are wellknown and understood.
Does that make sense what I'm kind of describing this?
>> So I I want to agree with your point, but I also want to kind of expand the idea slightly into this new realm of >> I we're seeing a new we're seeing a a downshift. We're we're accelerating to another gear at this point >> to this new world here.
>> Well, it it reminds me a bit of um decades ago when multi-core CPUs appeared, right? U exactly the same kind of shift. uh you're suddenly um integrating uh compute uh capabilities um uh sort of into the same physical space but uh providing um multiples of those. Uh sounds like exactly the same pattern here. Um uh that always comes uh with cost efficiencies in terms of energy consumption, production, all of that. So yeah. Yeah, totally. I I I didn't mean to argue against that at all. I I was just um agree trying to say um uh there is a lot we can do as we build systems to uh from a cost control point of view and uh there are probably orders of magnitude between um welldesigned and poorly designed um agentic systems in terms of cost and uh you know that's also something you know you brought up a a CFO story a few weeks ago on the podcast um and we said this is definitely something which is going to become more and more important moving forward so so there is still huge um variety in that space I >> I believe that was the CTO of Uber saying hey we've got so many agents or so many agentic experiences being run for developers they burn through their entire R&D budget in the first quarter of the year because they had spent I I think it was like somewhere the order of like three billion dollars of like research and development and all the spend was done in the first quarter of the year. Um so I I sent you a couple links Matias uh and also in the chat here as well the cloud data center is literally in the cloud now we're talking clouds and data centers.
>> Amazing. So Nvidia is now launching they announced uh launches space computing and rocketing AI into orbit.
Clearly this is SpaceX and Nvidia working together uh to build stuff in the in the cloud here. And one of the um um compute units they're trying to push into space is this one called Vera Rubin platform. It's the next generation of AI inferencing now launching up into space and into satellites at this point. So this is >> this is really exciting I think from my perspective, right? This is the whole purpose of this is to nobody wants a data center in their backyard. I've heard so many had bad stories. People don't want the higher electricity cost.
They don't want the loud noise. It uses a lot of clean water. There's a lot of challenges with getting data centers showing up in your area. And it feels like right now they're building them all over the US everywhere. M m.
>> So I think this is the next uh mission and possible goal here is to start shoving these AI inference machines up into space. And I think at the end of the day, all of this feels really good to me because this is just going to drive down the price of the token parts that is going to make it more accessible for me to run inferencing all day long to help me build more of these deterministic systems.
By the way, I just shared the GitHub link to that research paper I talked about earlier with you. So hopefully you can make it available in the show notes as well.
>> Let me put that here in the show notes and I'll also add um who's the it's uh Va Lab I guess is the name of the company.
>> Uh yes, >> I pronounce it.
>> And so VA lab it it's talking about again to go back to your article you said earlier, right? The majority of the success of an agent is written around how well the harness can wield it.
Right? That's kind of really the thesis of the paper.
>> Correct. The the the tools you give to the agent, the uh the way the agentic loop is um orchestrated, um what it does with your prompts, how it um uh compresses or not your chat history. Um there are there's an infinite number of possible optimizations there. Uh which is why we also have s such a proliferation of harnesses out there. You know uh if you as if you as I have done um recently um go out and sort of look at them all in a comparative way it it your job is almost never done. Mhm.
>> Um but um it just shows that this is an area that requires sort of a lot more research and and innovation and experimentation nowadays. And um not one harness is the perfect harness for every single use case. you know, uh there's an argument to be made that uh in certain domains or for certain uh uh applications, uh a a dedicated harness, um may well be the right choice.
>> That's interesting to me. Um is this is this a Matis, if I had if I had to take your comment and rematerialize that into another thought here, is this the idea of hyperpersonalized harnesses or harnesses that are like so In the same way we have a compute unit, a computer a CPU uh general processing computing and then we we hyperpersonalize that to graphics processing units which we're now we're kind of building like inference processing units like so so the more we specialize on a specific thing you're able to really tune the performance of the harness the system the GPU to really do something very specific and the more hone in on that, the higher the performance we can get out of it. I mean, I think this is the same principle that Apple uses with, you know, why do they build their own silicon? Why do they build their own chips? Because they can build chips that pair very well with their software and they run very efficiently together.
There's no extra waste that they have to do. When we go back down to the CPUs of your computer, it has to generally compute a lot of different things. It's it's serving so many different purposes.
It's not really good at any one thing, but it does everything kind of okay.
This seems like the same principle here with harnesses.
>> You can build a general harness that does everything really well or okay, >> but nothing really really well.
>> And so >> tuning the harness for the workload.
Here's a loop. Here's a Ralph loop or here's a a goals loop. Let's build a harness that just handles that experience. Um, I need a different harness for interacting with my work data. This is maybe where co-work comes in, right? That's a slightly different harness. Uh there's a CLI harness. That's a different way to work on something. So I think I've also experienced this a little bit too with GitHub Copilot because I'm juggling between the CLI, >> VS Code, and the web browser. Like those are three separate harnesses. Maybe similar in some ways, but they all are slightly different in how they implement the harness. M >> um >> and obviously you know all harnesses which are generally available nowadays they offer you >> uh extreme ways of customizing them you know you give them you give them context files you know things like um agents MD or CLMD right or or other agent instruction files that's a way for you to basically customize the agentic experience uh um and then you give them tools skills and MCP servers again, right? All of that ultimately impact what the agent has access to and how it's grounded. But what you cannot influence at all >> as a general user is what is the fixed system prompt that comes with the harness. Mhm.
>> What are the built-in tools uh you know that are very very fundamental things like read, grab, uh write, uh make to-do, you know, um what kind of memory capabilities are there if any, right? That's definitely something which lots of modern harnesses um have and and use extensively. So this is where harnesses differ sometimes significantly and this is where your abilities to tweak them are limited right and so that's that's where you have to either make choices or where ultimately you may decide I'm I'm building my own because uh I you know I really want to um tweak my agenic experience much further than I can just by providing um agent instruction.
structionction or skills files, right?
So, >> wow. Okay. I never thought of it that way, but all of these skills MCP servers are just parameters >> you're using to customize the harness. I didn't think of it that way, but that's a really good mental model of how you unpack what a harness does, right? The harness is simple. It does some things.
It's u this new language of the word opinionated seems to be really popular now. Yeah, >> an an opinionated approach of how you use the model, opinionated approach of how the the harness works. Uh, and then these other features are all parameters.
Interesting. Didn't think it out that way. All right. Really great talking. A lot of cool stuff coming out this week.
Um, we're going to continue on this agentic thinking experience. Um, Matias has some more demos coming up. Right now, we're currently working through um, a github.com interaction with agents. Right now, currently, we are making issues. We're changing our semantic model and we will be continuing to modify our semantic model using our agents. Um we were going through different experiences of how you can customize your harness >> indeed >> in order to build better output for your semantic models. So I believe on Friday we're going to continue down this path a bit further. U we got through I think almost two different ways you can customize your harness.
>> Yeah, we've got two more to go. two more we're going to go and we're going to keep trying to get refining the output so we get less of an intern building semantic models but more of an expert uh or trying to move towards that expert level of modifying changing and and working with semantic models.
>> Anything else Matias you want to close off on?
>> Um no um um um nothing nothing at this point. Sorry.
>> It's an exciting world we live in right now. This is really neat. Stay tuned for more uh information and content around this. Also, if you like this kind of content, if you like going deeper, the news on Tuesdays and the demos on Fridays, or if you want more parts of this content, what are things that you're working through or unpacking or struggling through? Let us know down in the comments on the various social media platforms we're looking to cater this show andor program towards what you're interested in learning about. Thank you very much, Matias. Appreciate your time today. Thank you all for listening and watching today, and we'll see you next time.
>> Yep. Thanks everyone.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 viewsβ’2026-05-29
Long-Running Agents β Build an Agent That Never Forgets with Google ADK
suryakunju
142 viewsβ’2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K viewsβ’2026-05-28
BREAKING: Microsoftβs New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 viewsβ’2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 viewsβ’2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K viewsβ’2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 viewsβ’2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 viewsβ’2026-05-30











