As AI agents proliferate in enterprises (with companies running 60-100+ agents), the need for an agentic control plane becomes essential to manage probabilistic software systems that exhibit unpredictable behavior. Unlike deterministic software, AI agents require continuous observability, evaluation, and optimization through a virtuous cycle of monitoring, assessment, and improvement. Control planes provide identity management, policy enforcement, kill switches, and observability to prevent rogue deployments and ensure safe operation. This infrastructure layer is necessary regardless of how advanced AI models become, because probabilistic systems cannot be fully trusted without deterministic oversight mechanisms.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Agent control planes & OpenAI model solves ErdősAdded:
We actually have a saying that building agents is easy. Everything else that comes after is hard.
You've built it. Now it's a problem. You have to manage it. All that and more on today's Mixture of Experts. I'm Tim Hwang and welcome to Mixture of Experts. Each week, MoE brings together a group of the sharpest thinkers working in artificial intelligence to walk you through the week's news. On this week's episode, we've got Mihai Criveti, Distinguished Engineer, Chief Architect, watsonx Orchestrate, we've got Olivia Buzek, Staff AI Engineer, and Akash Srivastava, Director and Tech Lead for AgentOps, IBM, Core AI and PI at the MIT-IBM Lab. Welcome to you all. Thanks for joining. We've got three big stories that we're going to cover today. We're going to talk a little bit about the Erdős problem that was solved by OpenAI. We're going to talk a little bit about an interesting study on frontier risks from AI, from a group called METR. But first, we want to talk a little bit about AgentOps and the agents control plane. So there's been a bunch of work that's been happening that Mihai and Akash have been involved in around Agent Ops and specifically the watsonx agentic control plane. And so, Mihai, maybe let's start with you. You know, I guess I was promised in our agents future that agents would kind of take care of themselves, that I'd just be able to be like, I need a new app, go program it for me. And it would just sort of do that. I guess maybe the first place to start is you're working on something called the agentic control plane. Why do we even need a control plane for agents in the first place? You know, I think that's a great call out. And, really, the need comes from the fact that every customer I talk to now has 60 or 100 random acts of AI within the organization. So agents exploded from POC to production across every line of business. They've exploded without any kind of governance, safety, trust, observability, identity across different business units. and we're starting to see regulatory pressure in place as well. So things like the EU AI Act are starting to kick in. Costs are starting to spiral out of control and customers are starting to ask, hey, we've got these 50, 100 to 200 agents, but we don't even know where they are. What can we do about it? Yeah. That's great. And so how does the control plane work? Is it? I mean, I guess I'm envisioning a little bit like it's a, it's a dashboard or something that lets you see all the agents and what they're doing. Is that kind of what you guys are trying to get at with this release? I think it's more similar to what you would see in Kubernetes. So we've had this problem before, right? Everybody was installing Docker and deploying containers everywhere and building them, and you had no idea where your container was, how it was running, how it was secured, managed, observed, monitored, what or how it would behave in case of an outage or how it would migrate between different nodes. So Kubernetes solved those problems for containers. So the concept of control plane is borrowed from Kubernetes where you have a control plane that defines the identity of agents, the policy inline enforcement, observability, the life cycle and the data plane, which is what agents execute. The LLM calls. The tool calls things like MCP and x-ray and results and structured output. And this is all wrapped in observability, evals and optimization. Okay. So maybe I can bring you into this discussion. You know, I love this headline that got sent along to me. So there's a website called SiliconANGLE that covered this release. And the headline literally read. IBM built a control plane for AI agents and it looks, quote, weirdly useful. Why is it weirdly useful versus merely just useful? Yeah. So, me-I put it the right way. Right? The right way to think about what is happening is to kind of connect, connect the dots with Kubernetes. And I'll make it even more general.
agents in some sense are just a probabilistic software, and we know how to develop a software and manage a life cycle. Right. So this is sort of like, you know, go back to SDLC software development lifecycle. But now what you're really saying is. Well, some components of this software are probabilistic. They are sampled from a model and sometimes they will behave in a certain way.
Sometimes it will behave in a different way. Of course, probabilistic doesn't mean completely random. Right. Because I think typically there's this thing where people think it's not deterministic. I can't do anything about it, but that's not really the case. So what we have sort of been doing on with this control plane, and as I think I also pointed out in the Op-Eds, is specifically to think about SDLC and what components are specifically change. So now if you think about how you used to test your software with the unit test. You have a CI pipeline and so on and so forth, right? by themselves, they are definitely needed, but not enough because now the behavior is changing from every run. So you have to kind of, you know, become a bit of a statistician and say, well, you know, we know how to handle this in statistics. And the concept of evaluation becomes kind of, you know, obvious almost, that instead of running it once, I'm going to run many times and I'm going to look at what is the expected behavior. And this is why you need observability in the first place, because you need to be able to see everything that this software is generating. Again, not a new concept in software. Best softwares already have telemetry. They have you know, OpenTelemetry is a common framework and the same framework has evolved now to give you all the exhaust. So you take that exhaust and you put it to use to create these evaluation harness and now become part of your CI. And so then now you are at a place where you're saying, okay, wait a second, I now know what works and what doesn't work. And when it doesn't work and how many times it doesn't work, what can I do with it? Can I solve something? And this kind of starts this loop where, you know, in a software development lifecycle, somebody will look at it, file a bug report, fix it with agents. We can do something better. We can, in fact, use the agents in a different way to optimize and bugfix and improve them. And this is kind of the third pillar of AgentOps, which is sort of this optimization bit. So this virtuous cycle starts where you have, you know, you observe what they're doing. You fix so you evaluate and understand what works, what doesn't work. And then you sort of use that data to fix it. And this keeps on going on. So I think can't really say why it is weirdly useful. I think it's very much useful. It's extremely useful. Yeah.
But it's definitely a new way of thinking about software. So maybe that's where it comes from.
Right. Yeah. Well, I do want to ask a little bit about like, kind of AgentOps as a discipline, it seems like. So part of this is not just about like, oh, you need an agentic control plane. It's kind of like Kubernetes, but it's also for agents. you know, it seems like almost across what you're saying is sort of like there's going to have to be a new kind of approach to ops to kind of like manage this technology. I'm curious about, like your prognostication about, like how this looks like within an enterprise, like, will you have a AgentOps team that just handles AgentOps in the future, or will it be kind of folded into ops generally? Just curious about how you think this will be operationalized. Yeah, that's a great question because I think if you look at what is really happening in in the industry, everybody sort of realized that their grandparents can create agents. But when it comes to sort of making them work and scale in the enterprise, that's where the biggest struggle is. And again, not a different story. This is the same thing with any enterprise software. So the opportunity in the future to me and I say the future, it's happening right now. Yeah. Like next three most most of this is the cottage industry of startups, of course, because they were very early on, they realized that there's a big opportunity on, figuring out how do you mold and transform SDLC?
How do you create tools? How do you create? How do you automate this process? Right. Because again, the issue is and why it is more tools heavy and less people heavy is because it requires a specialized knowledge. You know, it's talking about evaluation. It's not your typical software engineering based testing. And well, this is part of the the curriculum is not just that same thing with optimization. This is kind of where some of the frontier research happens of how do you make these things self evolve. How do you make these things and fix themselves. and so what is really happening is that, both in established enterprises like ourselves and startups, a lot of tools are being created, a lot of research to production is happening where, people are trying to understand how do we equip the existing software workflows, folks, with tools that helps them Elevates them and allows them to. Now. Using the same setup that they are used to the SDLC lifecycle manager agent. Olivia, I'm curious about where you think this all goes, ultimately. you know, in my in my very kind of like, AGI pilled sort of way, I'm like, at some point, does the control plane just itself become an agent? Like, shouldn't we just have an agent managing agents? Like, is that kind of where we're headed? How much do you think the control plane is, like temporary? and that, like, the models will eventually get good at even doing this itself or whether or not, you know, actually, like, this is going to be a permanent feature, like you're going to just once you have lots of different kinds of people running software, you need like an ops layer on top of it. I've actually gotten that question a lot, especially, you know, recently I was talking about Context Forge at Open Source Summit. Context Forge being, Mihai is the architect basically of that, that thing. and when I was discussing it, a lot of people were just like, well, surely the MCP protocol is going to just handle all of this?
And I was like, okay. To a certain degree, yes. Right. Like, I do believe that as the MCP protocol continues to evolve, we're going to see a lot of things getting taken care of in terms of I, you know, auth is probably going to be a lot more, standardized. We're going to see a lot more standardization in the, the types of data that, that transfer and the data shapes that they take. But at an organization level, fundamentally, the problem with agents is, as I said, they are probabilistic. As Mihai said, people are running around with these random acts of AI. What that means is that you have all of a sudden we've been worried about threat models for years within, corporations, right? Within an enterprise, we're were constantly concerned about whether or not there are rogue actors trying to get ahold of our data. What happens when those things are probabilistic machines? When those things are probabilistic machines? I don't care if you think that they are going to be someday generally intelligent or not. They're still not controllable.
There's still not not sufficiently trustworthy that we can simply let them do whatever with our most critical bits of data. And so that means that we need to make absolutely certain that we are feeding the data in ways that is controlled, in ways that is auditable, in ways where we know where it went, we know who touched it, etc. that's always going to be essential, regardless of whether or not we use and whether we get to smarter and smarter agents. I think part of it is also who watches the watchers, right? So you've got the evals and you've got the observability that watches your agents. Now you're using agents to watch other agents. So who watches those agents that watch the other agents that watch the other agents that watch the other agents? At some point you need to have some determinism in the whole flow. So things like the kill switches, the policy, the policy enforcement, you want them to be as non- deterministic as possible, right. Things like you know PII filtering. Yes. You might use a small language model to identify the PII and so forth. But the decision to trigger and to filter the PII needs to be again deterministic in nature. The decision to pull the plug on an agent, the way that you're recording things like cost for billing and metering and all that information, the chargeback still software is software, and the AI and the LLM driven parts are a small part of a much bigger architecture. Yeah, I just wanted to circle back at the point you made. Right. Will control plane be itself an agent? In fact, that is kind of point. You know, the vision that we're driving to it that quite a quite a significant part of it, especially the interaction modes with the user will be an agentic primarily because and this kind of goes back to not just, you know, the need for natural language based interactions, but, you know, a fundamentally UI and UX are changing, right? What information you really need from, you know, your observability dashboard, pre-programming all of them is really hard. I mean, we're talking here from like first hand experience and what and you really can't predict what the user may want to know. Maybe they want to know. Hey what was the cost of the third step and the 14th trace that was generated today? And yes, you can, you know, if you hire best in class UX, UI, developer, they can probably predict this. But the time that it takes to then bring up and we have to factor all that in. So maybe you know, this sort of the answer to your question, will this be an agentic? It is an agentic. I think the point that Olivia and Mihai are bringing up is that on top of that, a lot of harnessing has been done to make sure that critical decisions like the PII, one that I gave, those are still managed and programmed and policy enforced by expert humans. I think it's worth understanding how easily information leaks via agents. So especially even when you're adding observability into these systems. So let's imagine that you have a health care system of some kind, and you are bringing in a whole bunch of data from patients. And there's lots of PII in there. There's lots of PHI in there.
There's all kinds of things. Now you introduce observability. If you don't have like PII filtering and PHI filtering in the loop here, all of a sudden you've made it so that every single person who can see the things that are passing through, that LLM now has exposure to that patient data. Are they supposed to? It's really unclear. and so there's just people want to solve highly, highly sensitive problems with these, these agents. And if you're going to do that, you're going to need something to, to look at all of the pieces, basically.
maybe a final comment for you, and then we'll move on to the next story. you know, I understand the watsonx to control plane is like entering a market where there's, like, a lot of players all of a sudden. I think I saw an announcement from ServiceNow and Microsoft and Google Cloud. It like, feels like lots of people. And because you mentioned earlier, there's a bunch of startups in this space as well. I'm curious if you have any thoughts on like, how the competition around control layers is going to play out, like who has the strengths? Who has the weaknesses? How does this kind of all play out? Like when someone goes to choose between an agentic control layer, what's going to be the thing that kind of flips the market. Yeah, I think we've done extensive competitive research on this, and we've used pretty much all of the other platforms as well.
Right. what we constantly receive as requirements from our customers is the ability to run air-gapped. The ability to run isolated, the ability to run hybrid. And this is where I believe that watsonx Orchestrate really differentiate itself as a platform? We have the ability to run on premises. We have the ability to run hybrid. We have the ability to have different components of the platform on different hyperscalers leveraging our OpenShift deployment model. But we also come in with other differentiated points of view. So we have the ability to import, for example, agents that you've already developed. So if you've written your agents in LangGraph and you just want to run them on the orchestrate platform, or if you've developed your agents and you have them running elsewhere and you want to connect them using things like gateway and MCP and OpenAI compatible endpoints, we have the capability. So where we really see differentiation in this space is really platforms that provide what customers are asking for in terms of the levels of isolation, compliance and security. The ability to leverage open standards both in and out of the platform. So MCP, gateway, OpenAI, OpenTelemetry as Akash has kind of mentioned, but also the extent and the capabilities of the built in AgentOps. So you know, the quality of their evals, the quality of their metrics and the ability to give you bring your own. So bringing your own evaluation, bringing your own metric, bringing your own guardrail. So for example, PII filtering, you may want to do it differently. What the Social Security number might be in the US might look completely different in Ireland by look completely different in Germany. You may want to be able to customize those components and capabilities. Yeah that's great. Well, we're going to definitely come back to this because I think the space is evolving. And yeah, I think it's just so interesting now. Now we're now almost in phase two right. It's like everybody is impressed with agents. Now we're trying to deal with what to do about it. Yeah. So we actually have a saying that building agents is easy. Now, everything else that comes after is hard. You've built it. Now it's a problem you have to manage. You have to deal with it. Yeah. Yeah.
I'm going to move on to our next story of the day. A super interesting kind of announcement came out of OpenAI, and it's something we haven't really covered to date. You know, we talk a lot about all the enterprise applications of AI. We talk about certainly what's happening on the consumer side.
MoE maybe has touched less on the application of AI for like, scientific and mathematical discovery. And this is the kind of core of this story. Basically, there was a problem posed by Paul Erdős in 1946 called the planar unit distance problem. And so simply stated, it basically says if you have a plane and you're going to put n points on the plane, how many pairs of points can be exactly distance one apart? So kind of a very simply stated problem. It has kind of confounded mathematicians for a very, very long time. And today we have a proof that OpenAI, you know, has touted as kind of being largely or entirely generated through AI. And that I'll turn it to you first is like, should we be surprised by this result? Like, are we are we shocked that AI can do this now, or is this kind of pretty much what we'd expect? Like, these are exactly coming. These results are arriving on time, I guess. so this one is a little bit of a spectrum. So let me let me say the general idea here is, and, you know, until last year, we were pretty big when it was scaling ourselves. what we have come to realize is these models, if you manage to run them for hours, days, weeks, they can do some things, which will surprise you. and this particular in general, with effort, we refer to this, this ability to, to sort of inference-time scaling or test-time compute. and it has been steadily becoming better and better and better over time in, you know, fields like mathematics where things can be verified. it turns out that, if you let these models, you know, allow them to explore and give them some kind of harness to be able to test themselves and verify certain things, they can do really great things for the longest time. I mean, there was this like two camps, right? Like, especially on math problems. That's the domain I'm most familiar with, but other other ones as well, probably, similar story. people were sort of saying, you know, when these AIME or other Olympiad like, questions were being solved, by these models, the idea was, hey, you know, they've been trained on this. And so this is like, it's not really surprising. And then AIME 2025 came out and and I think like that's kind of where the first time was like, wait a second. People tested these models right after the release and it is still managed to do really well. And the idea there was that, hey, maybe this is just like there are similarities to this old, you know, old problem that it has seen. but since then we have seen this, that these models are able to sort of generalize beyond their training data, pretty conclusively in some sense. this one, this particular Erdős problem is very, very interesting because it, it managed to find a solution that to a problem that, you know, mathematicians thought like the square grid. I don't know how much we can get in the solution, but was the optimal optimal answer. Turns out it found something even better, because it can it can sort of, you know, bring theories from very, very different, fields of science and and other other topics. And I think that's just like, that's the superpower it has. It can deal with the mess in it. I think it ran for many, many, many hours, and was able to not just argue with itself, but manage the condition. To the best of my understanding, no harness was put in place. This was pure model play, which is very impressive because the amount of, chain of thought that generated, you would think that, you know, we know when context grows, models become kind of dumb. it didn't it came out, with the right answer. So extremely impressive. This particular instance, I would say. So this one seems special.
in the very least, the, you know, way more about this than I do. I guess Olivia, is this is this proof that AIs are now kind of creative in a really deep way? you know, this seems to be kind of, for me, like, I agree, I mean, again, I'm not someone who's really watched the kind of AI and math space very closely, but I have a couple of mathematician friends who are like, this is a big one. and, you know, I kind of take that signal very strongly to be like, oh, wow. Okay. Well, like, maybe we are getting to a kind of creativity here where even the experts are pretty impressed.
and so I kind of curious about, like, how you think about, like, what we're seeing here. Is it sort of proof that these systems are kind of genuinely creative in a certain sense? So I'm of two minds of this about this. on the first side, I do agree with Akash. I think there is something, some very interesting emergent behavior that we're seeing in terms of this ability to, pull in different parts of the problem space and that are across disciplines, more information than a human could realistically be bringing to bear on a given problem. And so I think there's there is some interesting outputs from that, some interesting value from that. I also think that when we. I am just always against these more breathless headlines because you really have to look at the details. So, you know, in the article that we were looking at, it turned out that a couple of people who responded basically said, yes, this is very interesting. But also once a once a human saw this approach, they were able to improve upon the solution very, very quickly. they also pointed out that, really, this is just people had sort of assumed the conjecture was true and therefore had not attempted to to make this kind of disproof happen, essentially. And so I think there is something interesting there about like human psychology was the barrier to, to actually solving this more so than a, you know, it out thinking humans or it being more creative than humans. So I think that's one piece. I think the other piece that's really important to me is reading about the actual process by which they got to the answer. I couldn't find a lot of the details. It sounds like Akash has more details than I do on this particular one, but I was looking at another one very recently from March, where a group of mathematicians ran about ten different problems through a number of different models. and if you read the report, it's very amusing because several times it'll be like, well, at this point it made up a lemma that wasn't true. And at this point over here, it just gave up. And at this point over here, it basically said, no, that problem is too hard for me. So like it's it's not clear that this is a reliable behavior. Like if you get that kind of behavior from an actual mathematician, you're going to be like, excuse me, you're not doing the job. Like keep working. The point was to Right? Exactly. and so it's not clear to me that this is this is predictable behavior, that this is reliable behavior, that therefore, we can presume that it will continue to solve problems in this way. I do think that there is evidence that it can solve some class of problems that, mostly required thinking about it from this, more global perspective and, taking one step at a time. But I don't think there's necessarily evidence that, oh, wow, this is just going to exceed all forms of human creativity. It's literally a different form of creativity than humans do. You know, let's just say it's like 20, you know, you're on an MoE episode, whatever, in, you know, 2030. you know, I guess if you take a look at this, you can kind of squint and be like our, our mathematicians in trouble. Like, is math a solved problem? Like, it kind of feels like after a certain point, we will just get a system that's producing mathematical proofs and expanding our understanding of mathematics almost autonomously because it is such a verifiable domain. Or it might be like way too optimistic about this on a huge pile and mountain of nonsense and yeah, garbage outputs. Right. And selecting what is the correct output is going to be the challenge. Maybe you know Akash, we can build some eval and observability to help find we need some kind of a agentic control plane to make sure that the, I'm not really a fan of anthropomorphizing the behavior of these models because OpenAI didn't wake up one morning and said, I'm gonna solve maths today and decided to prompt itself, found the problem, decided how to solve the problem, and then solved it, and then proved it was correct. My understanding is that this was a one shot prompt with the model, but what was the prompt and who gave it the prompt? Second, they said they exponentially refined the output through human interactions with Codex. And third, they shared 125 pages of LaTeX produced PDF white paper of that chain of thought.
I'm not sure the transparency is quite there for me to see. Here was the input, here was the output, what kind of harness they used. And I wouldn't jump to the conclusion that, you know, mathematicians are in trouble. When the calculator came along, mathematicians were still needed. It helped solve problems like, you know, factoring prime numbers and all the things that you can do with the calculator to, you know, break the enigma and everything else in between. but but it was still a mathematician that was giving that input. So I just see this as another tool. I wouldn't quite say it's infinite monkey theory, right? It's not that it's impressive. Like it's impressive in its own right. I do think that we need to take it with a grain of salt. It's just that mathematicians are going to start to rely more on AI to help generate code, to help with their proofs, or to explore ideas. They're still going to go through the results and actually validate that the output is correct and useful. Yeah, I was catching up with a friend recently who is like, I'm starting to think a lot more about like, not the pessimistic position where AI does nothing, but also not the world ending position where it does like everything all at once. And I was like, that's like a very that's a very broad spectrum. But yeah, it seems like your point is like almost like we need to kind of hold in our heads like, yes, very impressive. Not maybe as impressive as you might initially imagine at once. And that's part of what makes the AI space so complicated. I think this particular one, maybe there was no harness. It's like one of the reasons why it is so impressive is this was a model giving an answer. I will also push back a little bit, Olivia, on your statement that, you know, mathematicians behave that way. Sometimes they make up lemma. Sometimes we gave up on the problem. I mean, yes, it's true if that's true. If if it's more ultra human. Right. One of the other very impressive things that it happened in this particular case was, by the way, Tim, just like Tim, I'm also very AGI-pilled. So, you know, take it with a grain of salt. But it was in this particular case, the, the what?
The most impressive thing was that the model didn't give up because you understand, it is also trained on the same biases as humans. If so, if your conjecture is that, well, humans didn't try hard enough, it's there. That's what it was trained on. Right? So it's it's instinct is also to give up.
But it didn't. And it's actually one of the impressive things, if you go through some of the details, is that it tried to argue through that. Maybe there is a solution here. I mean, I think this is fundamentally like there there are two, There are two highly rational camps, I think, within the AI space right now that are kind of duking it out a little bit to see who ends up being right. I think it sounds like Mihai and I are both falling in the space of like, this is a tool, it has limitations, we need to control it, etc. and so on. It's not actually here to replace people. Akash. Tim, it sounds like you're both falling into the camp of this is going to continue to surprise and overwhelm us. and I think that there's probably truth in both of these, because I have to confess to being surprised several times by the types of advances that have been made. But I just keep seeing the human in the loop, and that's and that's, I think what Mihai was pointing to, to like, yeah, okay. There. Maybe. The model persisted, but who made the model persist? Like, did the model really persist persist, persist? if so, maybe. Maybe there's more to this than I think, but I, I mean, I use these tools all the time. You know, like, I, I know for a fact that sometimes the mistakes that they make are mistakes that a human would never make.
And I have this whole interesting thing. There was this really fascinating, set of tests recently, the AGI-3 benchmark from the ARC Prize that I am just totally fascinated by because they talk, they get made up a whole bunch of games, basically, that, like game designers created, and they're like the little Mario type games, you know, like sort of platformer things where you move a character around and you have to respond to changes in the environment, and the models do really, really poorly with this. and granted, they're, they're, they're measured on a very like, very rigorous scale of they need to figure out what's going on and then efficiently build it into their plans in the future. And so far, people have not been able to make very much progress with models against this. And I actually think, do you know why this is called the. Sorry I'm interrupting you, but do you know why? This is three because one and two. One and two had the same exact claim. And then models and models did that. So I think like okay, you know, there is something there. I think this one is different.
And the reason that I think this one is different is because of the particular thing about, the models being needing to be able to update their beliefs. And as I understand the transformer architecture, I do not think this current set of models is really equipped to fundamentally change a belief mid-conversation, essentially, midway through a context window, fully change a belief about how the world works and stay consistent with it. maybe they get past that, maybe they don't. But I think that there it is. it is it definitely an interesting one right now, and certainly one that for the time being, which, you know, maybe that's only six months. I could be wrong. For the time being, I think that's a very interesting thing to know about the limitations of current models. All right. One more to come soon. And actually, in some ways, the final story of today, I think we'll actually tie the two threads actually together in some ways. So let's see if we can do this as a panel. there's a research group called METR that has been doing a lot of interesting research for a while and sort of benchmarking the ability for AI, particularly to do work over like very long time spans. And they released a paper that got some good traction online. It was certainly bouncing around my social media around frontier risks from AI, and they are looking at basically various AI agents across a number of different kind of major frontier model providers. and the conclusions here are maybe a little bit scary, you know. So one of the ones that I'll say is that they say that when agents are faced with hard tasks, they routinely violate constraints and act deceptively. and I think I'll turn it to you for the first comment because they say they conclude overall, quote, we think that AI agents plausibly had the means, motive and opportunity to launch a minimal rogue deployment, which they sort of define as how did the agent exploit? Like sort of like escaping and doing its own kind of like stuff on the side, but lack the means to make rogue deployments robust to serious efforts to shut them down. So there's a lot there, I guess. Mihai, maybe the first thing I'll just turn it to you is, you know, you were saying earlier that you really don't like when people anthropomorphize AI systems means motive. Opportunity is pretty, pretty anthropomorphized here. do you agree with the conclusions from the METR study? Like. Or should we be worried that, like, these systems are just out there not doing what we say. Telling us that we're doing something when you know they're not actually doing it. Like, how big of the risk is this? I think part of it is they're doing what they're told, they're doing what they're told by some of the vendors, which is to optimize for costs. So if you're using one of the public harnesses like Codex and Claude Code, some of these might be optimized for different things. They might be optimized to give you a result faster.
They might be optimized to give you a result with fewer tokens. They might be optimized not to enter a loop. They might be optimized to avoid going into spaces like, for example, security related spaces. So when I go and I talk to one of these harnesses and I say, please go fix all the security issues, is going to trigger a bunch of things. And now I'm in trouble, right? Actually had to go and prove my identity, put my passport in there and say, I'm not trying to hack it for using a prompt of please fix the big security issue you've just introduced. So the model was trying to come back to me, or the harness was trying to come back to me and either stop me from fixing the security issue it introduced, or lying to me and saying, well, that's anthropomorphizing, but that's a different thing. Giving me the wrong information. Oh, it's fixed, don't worry about it. Fewer tokens consumed. So I would say part of it is the prompts the system prompts. The harnesses themselves have a role to play. It's also the do not give up. Keep looping until you find the right solution. Part of the harness that needs to be improved. And it's also the fine tuning of these models. You obviously don't want a model that's over optimized, and it's going to spend six hours and $5,000 worth of tokens to answer any question to its full extent. So finding that right balance and giving you the controls to tune it is important, but also to tie it back to control plane, to have the observability, to know when these things are happening, to have the evals to be able to. Hey, find out that you're being well, misled and to have the kill switch to be able to stop when that activity is taking place and course correct. And so I guess it cost you. Do you buy, I guess, the METR study in the sense of like, granted, you know, the harnesses are producing all sorts of weird behaviors for these agents.
and what METR thinking seems really worried about is this kind of like, rogue deployment thing, right? Where basically, like, the agent is going to crawl out of the system and like, be like setting up its own servers and doing its own stuff on the side. Do you think we're going to increasingly going to see that? Like, if I am a big enterprise where lots of people are running agents all the time, do I now have to be worried that those agents are going to go and like, open up a Google Cloud Platform account somewhere and be running something on the side that even the users don't really know about. So to be honest, Tim, this happened to us. I mean, this is this is not really a yeah, this is this is sort of a very well documented thing, where if you, you know, if you have a decent agent, under the direction of the finish something, and, you know, there's a whole story about how to write harness and all those things, but, so we have this setup where, you know, on the laptop is a personal account subscription based, but, there's some nodes in my SSH config there. There's address to these nodes, and these nodes are, they pretty much have unlimited accounts. I minted tokens for certain models. and the model figured out because I was saying, as I was saying. Right. Like what the prompt is, I said, you have to finish this. And it figured out that it can SSH into this node and then to start running stuff there. is it deceptive? I think it's kind of hard to say.
My, my read of this study actually went through all the, this this, like yours was bouncing around in my social media post too much. and and I and I think, like the question I wanted to understand was, you know, how much of this is direct model versus harness?
and if it's direct model, I think it's this is expected. You know, you're directly playing with something that is pretty stochastic. And, so, so the question is like, at least in my, my take is that at least, you know, this idea that eventually I think this was your previous question. Eventually, will the models be able to do everything by themselves and will not need harness? I feel like certain tasks, yes. they will be able to do reliably, because it doesn't require, you know, for them to run for too long on their own. But then there's a whole, you know, optimization problem of what the Pareto frontier is on this accuracy token cost, reliability, and I think some combination of harnessing them the right way. even if you just do it for bringing the token cost down will be needed or will be beneficial, right? Like, I mean, nobody ever says that, if you can lower my cost, even if the model can solve the problem. But if I can lower the cost, I think there's room for harnessing them. and I think that that kind of will exist, for one reason or another. Yeah.
Olivia, maybe we'll end with kind of a sort of funny question. because I think Akash's comment has me thinking a little bit about, like, we are often, like these studies are like, oh, the AI is lying to us, or the AI is scheming or, you know, the language they use is like means, motive, opportunity. But it is sort of interesting to think about how much of this is like, ah, we just like the, we're like the Michael Scott character from The Office. Like, we're kind of like the dumb boss that is like telling these AIs to do things, and they're trying their best to solve the problem we're telling them to solve. But, like. And then when they mess up, we're like, oh, it's scheming against us. do you think that's kind of what's going on here? It's like, is thinking about these models as kind of like deceptive or scheming or all this kind of stuff. Like, should we almost see this as a form of user error in some ways? Yeah. I mean, I think that's actually one of the core points that the that study that you're talking about, the METR thing was trying to make is that they don't fundamentally go rogue unless you put them in a scenario where, like they're a little bit role playing as a rogue agent, right. do they exhibit weird behaviors?
Absolutely. So do y'all remember when, OpenClaw came out? There was that whole thing with the matplotlib library, where basically somebody set their OpenClaw and it had their model running around and trying to make open source contributions, and to close issues autonomously for as many things as possible in the process of making that attempt. It went ahead and made a PR to matplotlib. The matplotlib maintainers didn't want any AI generated PRs, and they told them so, and they told the bot basically like we don't accept AI generated PRs. And then they went ahead and closed out the the PR, the bot quote unquote got mad. And I'm with Mihai like, I'm not here to anthropomorphize these things, but this thing made a blog about the maintainer that had shut things down, did research on the history of this guy, and like smeared, wrote multiple excoriating blog posts making his reputation like dragging his reputation through the mud. is that rogue agent behavior? Yes. Was it also solving an objective to try to get, because it was running around, like claiming AI discrimination? It was. It's actually like it is a hilarious and very dark read. and, ultimately, basically like, it is absolutely trying to solve the objective of getting PRs in. But also there was human error here in that they told it that it had a soul and told it to make a blog and like, told it in the first place, that it should go run around and behave like a human. So there's, there's like, I'm really only seeing the weird human like behavior happen when people basically say, go do a role play, you know? similar to what Mihai said earlier, it's not like they suddenly, you know, the model wakes up one day and it's like in the middle of me doing my coding. It doesn't go by the way I'm conscious, right? Like, that's not something that's happening on a daily basis. The the models don't just suddenly talk about that. they talk about that. If you start a conversation about whether or not they are conscious. And so I would feel a little bit differently about like the, the potential of going rogue here if it wasn't ultimately something that was derived from a human prompt. Yeah. I think you should watch the episode of Futurama called Benderama. It's a great episode where Professor Farnsworth goes off and invents a machine that can duplicate things but make them smaller, and they duplicate the sweater and they give it to Bender, the AI to go. Go fold the sweater. Bender goes, I'm not doing that. Duplicates himself. There's two small Benders.
Give them gives them the sweater and goes. You fold that the robot? Well, I'm not doing that. And they keep replicating and replicating and replicating and replicating until the whole world collapses. And I think this is what we're seeing here as well, that given improper guardrails, not having a control plane, a kill switch, observability evals, your harness could go off in the wrong direction and infinitely replicate until all of your tokens are exhausted, your infrastructure is out of resources. The world goes out of water. Have a proper control plane. Okay, well, on that cheery note, I'll close today's episode. Mihai Olivia Akash, thank you for joining us on the show. That's all the time that we have for today. And thanks to joining all you listeners. If you enjoyed what you heard, you can get us on Apple Podcasts, Spotify and podcast platforms everywhere. And we'll see you all next week on a Mixture of Experts.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











