Anthropic is shifting the focus from raw power to cognitive honesty, proving that reliability is the new benchmark for high-end models. This update makes Opus a much more practical tool for complex workflows without increasing the cost for developers.
深掘り
前提条件
- データがありません。
次のステップ
- データがありません。
深掘り
Anthropic Drops The Opus 4.8 BOMB追加:
Anthropic just released Claude Opus 4.8 today. So, in this video, I'm going to very quickly run you through what's changed and what you need to be paying attention to with this brand new model.
So, let's just jump into the benchmarks right away. So, we have Opus 4.8 over here highlighted and compared to Opus 4.7, GPT 5.5, and Gemini 3.1 Pro, Opus pretty much clears them all in every single category except agentic terminal coding, which is the Terminal Bench 2.1.
There, it scores a 74.6, which is still a huge leap forward from Opus 4.7, yet it still falls behind GPT 5.5. But everything else, the Sweet Bench Pro, multidisciplinary reasoning, agentic computer use, knowledge work, as well as agentic financial analysis, it pulls ahead of the rest of the pack. Now, we all take benchmarks with a large grain of salt at this point, but it is nice to see these large leaps forward from what they reported with Opus 4.7, really not that long ago. I mean, what, it was just a few months ago 4.7 was released and we already have 4.8, and we're going up from 64 to 69 on agentic coding. Like, this is good stuff. Now, one of the big improvements of 4.8 versus 4.7, according to Anthropic, is its honesty.
And by honesty, we are saying that this AI model, when you tell it to do something, if it can't do it or if it hasn't done it, it's actually going to tell you. This is a really big deal if you've used these models at all over these last few years, where you tell it to do something, like, "Hey, take a look at this giant transcript and actually read it and tell me what you did." And then when you look at its output and you actually interrogate it, it'll say something like, "Well, I actually just kind of summarized it. I didn't read the whole thing." Like, this is a major problem. And if you've been using AI for any sort of real work, you know how important it is to create all these tests to actually like make sure it does what it says it's doing. But Anthropic is saying, "Hey, this might not be an issue as much with 4.8 versus some of the previous models." Specifically, they say according to their evaluations, which you can take take at inside of their system card, which is about 250 pages long. They say it shows that Opus 4.8 is around four times less likely than its predecessor to allow flaws encoded as written to pass unremarked.
So, again, it's going to be much more honest about what's not working versus what is, and it's not going to gaslight you. They also assess that 4.8 has rates of misaligned behavior such as deception or cooperation with misuse that are substantially lower than Opus 4.7 and are similar to Mythos. And you can see that misaligned behavior right here, where Opus 4.7 and especially Sonnet 4.6 would have some of these tendencies, and we don't really see that as much with with Mythos or Opus 4.8. Now, beyond the model itself, there's a few more updates Anthropic has pushed forward. The first one is dynamic workflows. Now, dynamic workflows is similar to goals. The idea is that we can now put Claude code on a very complex task, and it's going to work on it over time, spawning tens to hundreds of parallel agents in a single session to make sure the work is actually completed. As you will know, there's a lot of problems that even if you do something in plan mode and break it out into a bunch of tasks, they're just too much for Claude code to handle at once. This dynamic workflows is the answer to that problem, and I'll be doing a deep dive on dynamic workflows very shortly. But, if you want to try it today, there's two real options. The first is to use plain language and say, "Hey, Claude, create a dynamic workflow." Or, switch on the new Claude code specific setting called ultra code.
Another big change for Claude.ai, the actual chatbot and co-work, this isn't really the case with code, is that they now have more controls when it comes to selecting how much effort Claude puts into the response, right? We've had this with Claude code for a while, with like high versus extra high versus max. Well, that's now on the side of things like Claude.ai and co-work. And lastly, if you're someone who's been using the messages API, it now accepts system entries inside the message array. This is really nice because you can update Claude's instructions mid-task. This is kind of similar to Codex and like the steer feature versus the queue feature when you give it an additional prompt.
Of note, Opus also defaults to high effort, not extra high. Remember with Opus 4.7 where they showed us that graph, they were were telling us, "Hey, extra high is kind of where you want to go." So, just understand 4.8 is on high and you still have two levels above that you can go if you want to get a little more effort from this new model. And in case you're wondering about token usage, they have increased rate limits in Claude code to accommodate the higher token usage of higher effort levels, which is really nice. So, that's your down and dirty overview of the brand new Claude Opus 4.8. Remember it has the exact same pricing as Opus 4.7, so you're not paying anything extra for this new power as well. As always, let me know what you thought. Make sure to check out Chase AI Plus in the linked comment if you want to get your hands on my Claude code masterclass, and I'll see you around.
関連おすすめ
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29
3D Platformer Update - NO CAPES
SolarLune
294 views•2026-05-30











