Install our extension to search inside any video instantly.

I Tested Claude's New Managed Agents... and I'm not impressed
Added: 2026-05-06

104 views1021:41jonathan.mclemoreOriginal Release: 2026-04-29

McLemore provides a sobering reality check on managed agents, correctly identifying how sequential tasking creates a "telephone effect" that compromises production reliability. His advocacy for deterministic orchestration over vendor-locked convenience is a vital lesson for building robust, professional-grade AI systems.

[00:00:00]Claude managed agents, right? So, you probably have seen this announcement a few weeks now. Anthropic drops it and everyone loses their mind. Agents are here, the future is here. My job, what are we going to do? And then you realize after your feed fills up with it, it's really just 30 demos of the same thing.

[00:00:16]So, there's no real utility behind it other than get to flash some, you know, special React Flow dashboard of all these agents working together. But here's the thing. I actually sat down and, you know, played with these things for a while. like I've used it enough to hit all the ceilings and all the walls I can find and I realized there is a real heap and bound to it. There's a reason I still rock with open claw and you know I run this on my own machine without rate limits and my models I use tons of different things and lo and behold the thing that Anthropic built the type around funnily enough this thing is uh quite similar to Kimmy's computer from Moonshot AI where you get to deploy multiple agents and they all get to work on your behalf. However, if you're like me and after you dig into this, you're like, "Wait a second, that is not cheap." I'll break some news to you.

[00:01:00]Neither is Anthropics. So, OpenClaw here would actually be the only instance this is affordable. And I'll explain why. So, for those who are actually technical and looking at this being like, "Yeah, I don't know if I should uh trust this guy." That's cool. I don't blame you. If anything, the hands are mine to catch. I run Plinko Solutions. My name is Jonathan Mlemore. I build agent workflows for real businesses. I've got daily projects in production running in tax firms and on the sales floor for the mid-market and its price sales teams. Uh so this isn't theory for me. I actually, you know, build the stuff and I ship it on the regular. So what does that mean in plain English? Alex Mosy talks around this dirt to cloud concept.

[00:01:33]>> I'll consider cloud to dirt knowledge.

[00:01:35]So like vertical integration of knowledge. How do we connect this API to this other API? basically understanding everything down to where it all connects at an API layer and how this all actually gets wired together and works together all the way up to the level of like where is this actually being used daytoday in a business that's actually useful for them and what are options to do that spoiler alert it's not always going to be an open claw or paperclipip or any of these it might just be something in arms reach I'm saying that because I felt the problems and when I'm telling you there are paths out there you shouldn't just succumb to the ecosystem being pushed towards you I really mean that so let's get into it.

[00:02:10]What is cloud managed agents? By definition, it is a pre-built configurable agent harness. So, let's stop there. What does that mean? A pre-built meaning it's done for you.

[00:02:19]Configurable meaning you can go in and tweak it. Agent harness. So, the best way to talk about an agent harness is what does it mean to harness agents, right? Think about the model as a horse and the harness being the saddle. In this day and age, we are bountiful with horses. But lo and behold, not a lot of people know to actually ride them. Of course, that's a theory here in terms of the digital side of things, but if you give them the proper saddles and the proper harness to do so, that it becomes that much more possible. So, what does that look like in the real world? Well, for starters, if you can afford it, Anthropic essentially just rebuilt a hosted version of Kimmy's agent swarm.

[00:02:56]So, the idea is you write some big instructions. It goes through an orchestration layer which essentially is a bigger agent that points instructions to smaller agents or specific agents have to be smaller and it gets to work.

[00:03:09]So if you can imagine on the front end it's like a mail room. You walk in, you drop off a letter, someone in the back sorts it, stamps it, sends it off. You don't see anything that happens in the back office. You just see the front desk. And that's essentially what cloud managed agents is. It's a front desk where you walk in, you drop off a letter of instructions and it goes through this orchestration layer that lives inside of these environments. And I'll get into detail about what all that means. And generally what you kind of need to know here is after it stamps it and sends it off, the envelopes when compared to the rest of the world are a little bit more pricier than they need to be, right? And I know flashing a bunch of numbers on screen isn't going to prove that. So let me explain. The moment you look under the hood, you realize you've given up three core things. one, you're stuck to Anthropic's models. Meaning, if Claude bumps the price next week, you don't really have much of a choice, right? If Anthropic decides that Opus is actually, you know, needs to be a bit more priced to balance off things, if they raise the prices on you next quarter, you kind of just have to deal with it, especially if you've built everything already on this.

[00:04:09]Two, the rate limits are brutal. And I'll show you exactly what I mean by that and what the internet's been going on about. Three, your data, your logs, your context, all of it lives on anthropic servers when you put everything in its environment, right?

[00:04:22]So, it's not yours. And which for some of you, if you're in a regulated space, if your work actually cares about that stuff, um, and if it's distrusting, just like me, that's a full stop. No. So, let's talk about that rate. This one actually frustrates me because again, these are real quotes from real people.

[00:04:35]People are hitting the weekly cap in 2 days. I've offloaded it to build websites and within its fivehour context, I've used 50% of it just to map out the WordPress. When I say WordPress, I'm talking about just HTML and CSS, the basics that go on a website anchored in the brief that I gave it around the offers, the concepts, all those things, and it drinks through the context. Right now, imagine I did that with multiple agents. It would just be five times that trouble. And I'll explain why. So within Anthropic, you get charged not just for input and output like any other model, but you get charged for the cash rights.

[00:05:10]So notice that every time you're talking to it and the session's been going on for a while, it will compact and pull in the conversation so that the agent can then use it later on, right? They don't expect you to start a new thread every time the agent gets bloated. They charge you for the act of compacting it. Other models don't do this. So when you compound that, you realize if you're not just asking it to help you draft emails, which by the way, single agent can do in its managed product, you're building an engine that you need to run reliably again and again. If you need your agent to work after hours and delegate efficiently, rate limits like this are a dealbreaker. So that made me go looking for an alternative. So quick pause. If you like the SOPs and the standard operating stuff that we're talking about here in terms of living markdown files stuff I'm showing you today drop them all for free in the school community. So if you're interested link in the description um we're growing week by week. So super exciting. Come say hi.

[00:06:04]All right. So what is it? This scares a lot of people. It's a self-hosted living on a server or a hardware like a Mac Mini. And it's your own personal AI agent. Runs on your own hardware. No flat fees. You just pay for the usage.

[00:06:18]Your limits aren't dependent on a single provider. It's actually significantly harder to hit your limits here. I'm rotating between GPT 5.4, access to Quen, Kimmy, Grock, Miniax, and LS amount of different LLMs if you want to tie them in. Bonus points. If you know your horse well, you get to get the most out of those models based on the output you're looking for. So, if you can imagine, it's the difference between, and I'll change analogies from the horse stuff from now. It's the difference between renting an office downtown and building a shed in your backyard. At first, you might think the shed is a worse idea. The office has a doorman, fancy lobby, coffee. But if someone else owns those walls, you are limited to those walls and what they give you. If it's in your own shed, you own everything. You get to decide what goes in, what gets to be built, and what gets to live there. And the best way to kind of approach this is through understanding the concept of skills. So, what do we mean by that? got this guide for free in the school community talking around multi- aent orchestration and we build this heavily on the principles of consulting firms that for many years had been working in this space and uh making this stuff work right so here's the part I really want you to pay attention to we know that agents are configured with living markdown files given that these are pre-built and they're configurable agent harnesses in managed infrastructure meaning it's their servers we understand that if we can replicate that in our own setups we can have the advantage of real time multiple agents. So before I go any further I need to mention paperclip. So this has been the talk of the town for good reason. Essentially it's another opensource project that everyone's been talking about. You know 53,000 stars on GitHub in 2 months. It's essentially a supervisor and worker multiple agent orchestration. It's actually well built.

[00:08:05]But spoiler alert why am I not running my business and my engines on paperclipip? Because paperclip plays telephone. So if you can imagine a lot of different people have kind of tossed paperclipip out the window for this exact reason and I wouldn't recommend doing that. I think there's real value to it. So let's assume you have a supervisor agent. This supervisor agent sends a task to worker one. You have worker one and worker one finishes. They hand it off to worker two. Worker two then hands it off to worker three. So on and so forth. 10 15 iterations on the same list for a single job. Every handoff is a chance for intent to drift.

[00:08:40]So when intent drifts, this is known as the Marov chain. Consider this. Every step is 90% reliable. If you chain five of those steps, you end up at 59%.

[00:08:52]Not 90. 59. I'll say that again. Put the people in the back. 90%* 5 is 59%. That chain in a business does not work right. And this has to do a lot with the probability and the intent to drift. So if you're able to, you know, use paperclipip as a research function and a user interface, a harness to operate and see things, understanding that what you get out of the horse doesn't live here, it lives here in the single agent. All right? And it doesn't have to be an open claw. This can be your codeex. A harness is also anti-gravity, claw code, clockwork. All these tools are harnesses that you get to take advantage of and you get to use.

[00:09:36]So with that in mind, let's put these three side by side. For cloud manage agents, you're sucked to claude, right?

[00:09:41]Whatever anthropic gives you, that's what you get. With paperclip, you get to use anything. Same thing with open claw.

[00:09:46]Uh minus anthropic because of bans and whatnot. Rate limits. Anthropic, you're paying for the caching. You're paying for the cropping of information, right, which an agent does anyway. Paperclip, you just got to hope that that happens.

[00:10:00]If you don't engineer it, you don't see it. In fact, it actually bites you if you don't. For OpenClaw, it is conscious of it. It knows when it's been going on for a while and it also compacts.

[00:10:09]Difference is it doesn't charge you for it. Data ownership, everything lives in on anthropic servers. If it's cloud managed agents for paperclip, it's wherever you put it. Open cloth, it's the same story. For the architecture, meaning when does it go from 90 to 59%.

[00:10:22]We don't know. That's a black box, right? Franthropic, that just happens.

[00:10:25]We never see it. For paperclip, you get to see when the ship is on fire. You get to see when the plane is heading towards the mountain, right? And it's very apparent when the multiple agent serves not actually working together and they're all just kind of like confused and errors are flying everywhere. You're like, "This doesn't really work and it's not worth the squeeze." That means it's not your harness. That's where most people give up, right? It's not economically valuable for taking a terrible saddle and putting it on a horse that doesn't really have much experience. That's where the tuning comes in place. This idea behind a directive orchestration and execution framework, a DOE, is how you actually go from 59 to 90% time and time again reliably. We'll get into that in a bit.

[00:11:04]The last piece is your reliability. So, because of the rate limits, can't really trust Anthropic to not bump the prices and you know, if they get a whole country to now in the same way co-pilots on every Microsoft computer, you get an adoption like that and it doubles, tell me they're not increasing their prices, right? Especially at the enterprise point for paperclip, if you don't have a well-working individual worker, the telephone effect will eat you alive. The individual worker on open clock, whatever it is, if it has deterministic scripts, meaning proven bricks, things that are working, it's not something to worry about. So, let's take you into that DOE framework. So, skills alone are not enough. They define what to do and what's possible. But if you don't have a decision-making framework in place, these things tend to fall apart. So, what does that look like? This has been used by many consulting firms and has come across my table for a while now. I would say since last June 2025. And truthfully, once I discover it, it really changed everything for how I work with AI. But before we get there, what is it? Directives, the what, the standard operating procedure, the mission, the output, the allowed tools, what does success look like? Those are all things that takes business acumen.

[00:12:13]Understanding business intent different from implementation. The implementation part is actually the most expensive thing. And this is usually where people hire somebody else. They hire somebody to manage a project. They hire somebody to potentially use their proprietary LLM, their own AI or hope that they have the brain for it to then run everything that needs to be run to then delegate and actually hand off things to different agents so that they can then evaluate the state, pick the next best step, have an agreement, handle the unforeseen, handle the failures, and continue to work on it until they get to where success lives. If you have spent countless hours and you have lost your sleep and you know there's that saying around sleep schedules where you don't really lose sleep but you lose the uh the part of you that can go to sleep. If you have gotten that far, you know what I'm talking about here. It is expensive to implement and build on the unknown.

[00:13:06]And typically when you put a number of different agents to do that, it is not just expensive in cost but expensive in time. The unknown is an expensive territory. So how do we work against that? We understand that in the execution when things actually happen it lives in scripts right so they don't take reason they don't take inference they take inputs and they follow steps to then reliably perform the same real world effects that scripts today perform in businesses if you're interested in digging more into this again in the school community free PDF check it out you can feed it into your LLM of choice and ask hey how do I integrate this today to actually build something worthwhile that I can use in my day-to-day business all right so I mention mentioned that this changed everything for me back in June. Reason being this was a curious DM over Reddit.

[00:13:52]um an Indonesian guy who had this framework working within his own virtual private servers uh making around 7K a month basically performing the task that a legal clerk would do and he was able to do it multiple agents right this thing made money right at its core that uh I won't say names but they were able to actually build something worthwhile mentally from an individual standpoint he didn't quite have it all together and what I was able to extrap away from that is even though He had a great harness and he had a great horse. He himself as a horseman was not equipped to take that thing where he needed to go. So that was interesting for me cuz I saw this as an opportunity of like how can I take what I know and to be a better horseman myself to have skills and understand what walls and what ceilings am I hitting in my world that I am limiting myself from. And by using multiple agents I was able to and continually it's not a oneanddone situation. Again, no exaggeration, no fugazi headline of this being insane. But once you realize single agent for single task equals single outcome, and you iron that out for multiple tasks, for multiple agents, for multiple of the same outcomes, you start to see palpable results in your day-to-day. So, I hope that made sense and I hope that was valuable. So, here's the thing. Nethropic agrees with me.

[00:15:13]Check this out. So, they published a piece on their own multi- aent research system. It's an orchestrator worker pattern, right? So, opus 4 as the big lead, sonnet 4 as the workers, and it showed 90.2% improvement over single agent for research task. They say paralyzation, meaning having multiple run at the same time, cut research time by 90%. But here's the catch. It also burned 15 times the token in a single chat. So, what does that mean in plain English with all that caching and all the pricing stuff we talked about? This approach works, but it's expensive. And the only reason it works is because they designed the orchestration layer, the research pattern really well and carefully. So those workers that are now doing carefully narrow defined task within their own environments, they're not playing telephone and they're not a drifting from their intent. And that's exactly what the directive orchestration and execution framework gives you. It's tight orchestration. It's deterministic.

[00:16:07]You can tell it's going to work every single time. Anthropic figured that out and they went ahead and built managed agents that basically put the orchestration layer on their own back.

[00:16:17]So you can't tune it, you can't play with it. It's just a black box. You pay them and you hope. I rather have my hands on that. So here's what that looks like. If this is clicking for you and you're really excited about all this stuff, again, school community, it's all there for free. Check it out. See you there. So this is the part where I get to walk you through the fun stuff. All right. So, you'll find it within the guide, but I essentially have a whole slew of different skills, different access points, different ways that it can access LinkedIn Sales Navigator. It also understands what does an ideal customer profile look like. And um it understands where I'm targeting them based on our previous conversation and research. And because of the connection through the Chrome MCP, which is a whole connector in itself, you can set that up with really even OpenClaw. This is just the harness I'm showcasing because, you know, I'm not anti- clawed or anti-anthropic. I'm just pro. It works.

[00:17:06]All right. This is under the 110 max plan that this is achievable. Um, and I'll show you off RIP where we're at with our usage right now. So, usage at this current point. Snapshot is 48% in 37 minutes. So, ideally before the recording is done, you get to see what this looks like. As you see, it's going ahead. It's working. It's pulling transcripts. It's pulling details. very shortly it'll actually open a browser using my own cookie and my own sessions and actually get to work. the way it's able to do this reliably and I'll show you like I guess you'll see it in the guide is that directive orchestration and execution kind of setup and then also the loop so that I can go and do it reliably right so instead of me even prompting this here which is you know for demo sake I can set this up and I do have it set up where it does it weekly for me already and I'm able to then work with it as an outbound prospecting list so this pattern again works in any harness that you take right so even nadn make.com early on I was doing the exact same thing where I would set up a bigger agent and I would have it orchestrate across smaller agents that then outputed deterministic reliable outputs. So what does this look like? As you can see, there's quite the amount of things happening in my computer. This is using my own instance. So it is using my own processing power. Um I'll extend this so you can see it operating as this cooks.

[00:18:19]There you go. It's automatically saving to list my mouse. I'll raise my hands.

[00:18:23]I'm not doing any of this. It's no magic. There's also if you're interested in the shorts on the YouTube channel, I also show you how to connect this to make.com. So through natural language, it can go and build this for you on the regular, which I know is pretty crazy.

[00:18:35]Um, early on I used to use natural language and plug it into GPT saying, "Hey, help me make this automation." And they were just spit in my face. Times are different. These uh these clinkers, they're they're moving different. There we go. List. It's going ahead and adding to that. Lo and behold, so I'll show you the actual prospected list afterwards.

[00:18:54]And again, you can connect this to a full chain of like connecting to your CRM where you're working a call list, where you're um even working an email list, working a LinkedIn list, which is great if you have LinkedIn Sales Navigator. If you don't have this tool, and lo and behold, you can do this with whatever tools you use. Again, this is a principle that works across the table.

[00:19:15]I'm not just giving you, you know, things that work once. These are things you can apply across the board. It genuinely makes a difference. I didn't add a single one of these. These are just kind of thrown in here. So, let me pull back for a second while it's wrapping this all up. There we go. Gave you the list and the output's done and it's ready to move it across for email and calling. So, let me pull back for a second. Salesforce just dropped their state of sales 2026.

[00:19:38]4,000 sales professionals surveyed.

[00:19:40]Here's what the data says. Sales reps still only spend 30% of their week actually selling. So, 30 70% goes to admin, research, data entry, internal meetings, right? 48% of those reps say they don't even have enough bandwidth to do outreach. Right? Half of your pipeline team is saying out loud, I cannot find the time to prospect, right?

[00:20:02]I can't do what these sales development reps are doing, right? We need the team, right? We need the talent. That narrative is the part that should make you set up because now sellers who use AI agents for prospecting expect 34% cut in research time. Top performing reps are 1.7 times more likely to already be using these tools. And teams running AI in the past year grew revenue 83%. Teams that didn't 66%. That's a statistically significant gap. And it's widening, right? A DOE framework that can do research for you in about 15 minutes every day per rep running on whatever harness you choose to go with in the background while you continue to do things. That's the game. That's what we're doing here. So, here's what I want you to walk away with. Amplification, not automation. Right? I don't want these agents to replace people. Want agents to hand those people 40 prospects before they open their laptop. Achieve data entry that they would have had to sit down hours doing, but now they get to go have dinner with their family.

[00:21:03]They don't have to stay late anymore. If you build with this in mind, you can see the difference. You build it with Anthropic and their any model. You build it within one environment, you are renting the runway. If you build it with open source and your own tools, you own everything. So fail fast, touch some grass, let it cook, ship what you have, right? I was going to record this on my phone and you know did equipment doesn't really matter. Taste matters. All right.

[00:21:28]So if you actually want to build this, again, code is free, community is free.

[00:21:31]Link in the description. It's what we do here. Plinko solutions if you're interested on the mid market and putting it into your business to actually make it work. So that being said, till next

#ai agents #multi agent systems #claude code #orchestration #ai automation

Related Videos

Artificial Intelligence

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Artificial Intelligence

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Artificial Intelligence

5 Mind Blowing Omni Uses Cases

PaulJLipsky

1K views•2026-06-02

Artificial Intelligence

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Artificial Intelligence

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Artificial Intelligence

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Artificial Intelligence

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Artificial Intelligence

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Trending

Revisiting The Cat Cafe For The Final Time

BenGtalks

3195K views•2026-05-29

Lil bro is a menace 🤣

NotAirJordan

2037K views•2026-05-31

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

Political Science

My response to the Police

RecklessBen

1496K views•2026-06-01