Claude Opus 4.8 introduces dynamic workflows that enable hundreds of AI agents to work simultaneously on complex tasks, with agents coordinating through team lead and helper structures to complete jobs that previously required human teams; this represents a significant advancement in parallel processing capabilities, with benchmarks showing 88.6% accuracy on SWE-bench Verified and 69.2% on SWE-bench Pro, while also featuring improved long-context memory (68.1% vs 40.3% in previous version) and enhanced honesty in code review processes.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
NEW Claude Opus 4.8 is WILD!Added:
Claude Opus 4.8 just dropped and this is the biggest upgrade Claude has ever shipped. Here's what's new. You can now run hundreds of AI agents at the exact same time. All working on the same job, all coordinated automatically. One agent can plan, build, one can check the work, all running at once. And one developer actually used this to do 750,000 lines of work in just 11 days. A job that used to take a full team of people done by AI agents with Claude Opus 4.8 running on their own. But here's the one thing you need to know before you do anything else with Opus 4.8. There's a setting most people are going to miss, and if you miss it, you're leaving most of the power sitting on the table. I'll show you exactly what that is and how to turn it on later in this video. By the end of this one, your Claude is going to be running like a whole team, not just an assistant. Let's get into it. So, the biggest story here is what these agents can now do and how long they can run.
So, let me get into the biggest new thing first, cuz this is the one that matters most, and it's called dynamic workflows. This is a new feature inside Claude that changes how the whole thing works. Before when you gave Claude a big job, mostly sent one agent to do it, one worker and some code here, fix something there, do a bit of research, repeat, one thing at a time. Now Claude can plan out the whole job and then spin up hundreds of agents working at the same time, all in one single session. They split the work between them. Each one takes a different piece and then this part is important. Claude checks his own work before it tells you it's done. Now, to use this, you want to use something called Ultra Code. So, you can see, for example, we've built it into our agent operating system right here, and we can just toggle this on and off, which means that we can use ultra code and spin up a dynamic workflow in one single click inside the chat right here. I've also tested it for building stuff out. So, for example, like this. Number one, it feels very human when it replies, but number two, the actual output is pretty cool. Example right here that they literally built in like 1 minute. It was so fast to do this. So, there's basically a setup now where one lead agent runs a show and sends out smaller helper agents, right? There's also another setup with a team of three or five agents all working side by side passing messages to each other like co-workers, right? And how this works is one agent is a team lead and they literally have a send message tool and a wait for message tool so they can talk to each whilst they work. It's a little team of AI organizing themselves. And here's a real number from their own testing. on a hard search task. A five agent team beat a single agent and did it in about 1/5if of the time. 1/5if on the hardest problems. The team was about three times faster than going solo. And that's the whole point of running them in parallel. The more agents you point at a tough job, the faster it gets.
There's a real example out in the wild, too. So, a developer named Jared Sumner used dynamic workflows to rebuild a big software project called Bun in a different programming language. We're talking roughly 750,000 lines, 11 days from start to finish, hundreds of agents working at once, two reviewers checking each file, and a loop that kept testing until everything passed clean. Almost 99.8% of the tests passed. You don't really need to know what any of those words mean. But here's the plain English version. A job that used to take a team of people weeks got handled by a swarm of AI agents in 11 days, mostly running on their own. And with Opus 4.8, eight. Those agents can run for even longer than they could before. And Fro actually says some of these jobs can now stretch into days.
That means days of agents working whilst you're not even there. You could give it the goal, you walk away, you come back and it's further along than you left it.
You can turn this on in Claude code with a mode called ultra code. And that's basically the highest effort setting plus dynamic workflows switched on. Or you can just ask Claude, so you can say it in your own words, create a dynamic workflow. Now, a quick pause here because this is exactly where most people get stuck, right? Knowing about dynamic workflows is one thing. Setting it up so a swarm of agents can actually run tasks the right way. That's the part nobody shows you. What we've actually built the agent operating system for inside the air profitable volume is exactly this. So, you can plug in all your AI tools in one place, your open claw, your free clawed code, and build out your own workflows on top. Right?
So, instead of running one agent at a time, you can set up your own little team of agents in one click using ultra code. And if you want the whole thing, we've got it as a zip file inside the AI profit boardroom with a video tutorial and prompts on how you can use it too.
The 30-day road map for implementing it in daily updates as we improve it too.
Plus, you get four weekly coaching calls every week where you can bring your exact OPUS 4.8 setup and get real answers. Link in the comments description or go to the aiprofitb.com to get access. Let's get into the benchmarks now because I want to be specific here and these numbers are pretty strong. So, let's start with coding since that's what these models are built for. On SW bench verified, which is real coding problems checked by human engineers, Opus 4.8 hits 88.6%.
Opus 4.7 was 87.6%.
So, a small bump on the easy version.
But on SWE Pro, that's the hard version with bigger, messier multifile problems. Opus 4.8 jumps to 69.2%. The old version 4.7 was at 64.3%. GP5.5 sits at 58.6%.
Gemini 3.1 Pro sits at 54.2%. So on hard coding, 4.8 isn't just ahead, it's well ahead of the competition. And on the hard coding test, Opus 4.8 at its lowest effort setting matches the old 4.7 its highest effort. So if you sit with that for a second, basically the new model barely trying is as good as the old model going all out. Now I want to be 100% honest with you. So on terminal bench as well, which is coding inside the command line, 4.8 8 scores 74.6% which is a big jump from 4.7 which was 66.1% but GP5.5 still wins that one 78.2% in this test and even higher with it own setup. So it's not like a clean sweep on terminal work. GPT5.5 is still on top. You should know that there's also a test called Frontier SWE and this is a brutal one. 17 giant engineering problems and the AI gets 20 hours per task. things like, for example, make a real compiler faster or build a database server from scratch. Opus 4.8 ranks number one on this. The old 4.7 was number three. So on the longest, hardest jobs, the kind that run for hours, 4.8 climbed to the top. Now, let's leave code in because most of you don't code and the numbers here are just as interesting. So there's a test called GT P Val. This one matters. It uses 220 real professional tasks across 44 different jobs. making documents, slides, spreadsheets, the actual stuff people would do day-to-day, right? Real people judge the work in blind comparisons as well. So on this test, Opus 4.8 beats GPT 5.5 by about 121 points, which works out to winning roughly two out of three head-to-head matchups on average. That's office work.
That's the work a lot of you do every day. There's also a legal test built by a legal AI team with over 1,200 real law firm tasks. For example, like sorting through emails, contracts, client files.
Opus 4.8 is currently the top ranked model on it. There's also a Zapia test about doing real business workflows. For example, connecting to dozens of apps, making the right moves in the right order. Opus 4.8 scored 15.5% up from 4.7s 9.9%. low numbers across the board because this stuff is genuinely hard, but it nearly doubled.
And here's one that's quite huge. Long context. This is about how well the models hold on to information when you feed it a mountain of stuff on a hard memory test where the model has to track connections across a giant pile of data.
Opus 4.8 scored at 68.1% on the toughest version. The old 4.7 was at 40.3%. It's not a bump, it's a leap.
What it means for you is you can feed it a huge document, a long chat history, a giant pile of notes, and it actually keeps track of it instead of forgetting in the middle. So, let me answer the question straight. Is this a big step forward? Anthropic's own answer is honest. They actually call it a model that's a modest but tangible improvement. This isn't a model that's going to change your life overnight, right? The everyday chat probably won't feel dramatically different, but actually the agent side, that's where the real jump is. Running hundreds of agents at once, running them for days, the huge leap in long memory, the number one spot on the hardest, longest engineering jobs. That's the part that is genuinely a step forward and it's the part most people will sleep on it. Now, let me actually touch on the honesty thing cuz it's real, not just a headline as well. So these models have always had like a bad habit which is you ask them to adop do a job. They say it's all done and they didn't actually do it. They read half the document, they claim they finished when they didn't. You felt it.
I've felt it. Everyone's been there.
Right now Opus 4.8 is four times less likely than the old version to let a flaw in its own code slide by without flagging it. So when it writes something with a bug, it's far more likely to stop and say, "I'm not sure this part works."
That's it. It just owns up more. It makes fewer claims that it can't back up. There's also a measure of bad behavior. Sneaky things, right? Opus 4.8 scores way lower on that than 4.7. Lower is better here, obviously, and it lands close to a model called Mythos, which brings me to the last thing worth knowing. Mythos is a model Anthropic's been teasing. It's a step above Opus.
The order goes Haiku, Sonet, Opus, and then Mythos right at the top. They say it's extremely capable, especially at finding security problems. Capable enough that they say it needs stronger safeguards before everyone gets it. Now, if you actually look at the system card, it has over 189 mentions of Claude Mythos. And right now, only a small set of companies use it as a preview. Here's the interesting bit. On those behavior charts, Opus 4.8 sits very close to Mythos. Some people are calling 4.8 a a mythos light like a softer version of the big one. I'm not stating that as a fact, but the numbers are close in terms of benchmarks and some things and Anthropic flat out said they expect to bring Mythos class models to all customers in the coming weeks. So, it's coming. It's not here yet, but it's coming. A few practical things before I wrap up. So, it defaults to high effort, not the highest. You've got two levels above that, which is extra max when you need a bit of va voom. There's a new effort control in the regular Claude chat and in co-work now, not just in Claude code. So, anyone can dial it up or dial it down. So, you can turn it down for simple stuff to save your limits, turn up for hard stuff. Fast mode also runs at two and a half times the speed. It is now three times cheaper than it was. And it's also less lazy.
So, the old version would quit on a task too early sometimes. That drive to keep going is baked into this one. So, let me pull it all together. Opus 4.8 is a modest jump in everyday chat and a real jump in agents. It beats 5.5 and Gemini on most things, but loses to GPT 5.5 on terminal coding. It runs hundreds of agents at once. It runs them for days.
It holds on to huge amounts of information far better than before. It tops the tagus long running engineering test. It costs the same as the old one.
It looks like a softer cousin of the Mythos model that's coming soon. The bigger picture is this. The way you work with these tools is shifting. It's moving away from you typing one prompt and waiting. It's moving toward you setting a goal and a team of agents just going off and handling it. The people who learn to set up to point a swarm of agents at a job and trust strict to run are going to get a lot more done than the people still doing it one prompt at a time. That's exactly what we focus on in the air profitable boarding. The agent operating system lets you plug Opus 4. Alongside Hermes and Gemini and everything else, you can also use ultra code so that you can build the kind of agent teams 4.8 8 was made for running in parallel running on their own. You get the agent operating system zip file, the video tutorial, a 30-day setup road map, and daily updates as the tools change. And you get four coaching calls a week where you can bring your actual dynamic workflow setups and effort settings and get them dialed in. Plus, we have a community of 3,200 people figuring this out together, a prompt library built around these agent workflows, and a member map so you can connect with people near you. link in the comments description or go to the apiprofiting.com to get access. That's Opus 4. Test it on your own. Play with the effort levels and dynamic workflows and keep an eye out for mythos in the next few weeks. Thanks for watching.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











