Install our extension to search inside any video instantly.

Claude Opus 4.8 Just Beat GPT-5.5 (EVERYTHING You Need To Know)
Added: 2026-05-31

995 views1214:49JesusMartinezCryptoOriginal Release: 2026-05-28

Claude Opus 4.8 introduces an effort-based system (low, medium, high, extra high, max, ultra code) that allows users to control task complexity, with higher effort levels enabling dynamic workflows that coordinate hundreds of parallel sub-agents for complex tasks. The model demonstrates significant improvements in agentic coding (4x fewer errors), multi-disciplinary reasoning, and long-horizon task execution through the /goal command, which enables sustained reasoning over extended periods. However, it remains the most expensive model available, with fast mode being 3x cheaper than standard but still costly, and the fragmented interface (separate apps for chat, co-work, and code) presents usability challenges compared to unified platforms.

[00:00:00]Claude Opus 4.8 just came out and I did some initial testing it. Looks pretty awesome. I want to go ahead and go over the improvements over 4.7, the ways that I'm going to be using it, and whether or not it's actually better than GPT 5.5.

[00:00:15]Um depending on I guess your tax bracket, >> [laughter] >> um it is better in some ways and depending on your tax bracket, it is very much not better in other ways. So, let's get right into it. So, I actually generated this using Opus 4.8. Uh difference between the older model Opus 4.7 and Opus 4.8 is that you can now choose the effort in which you're doing your task. And so, for example, if I go over here, I actually am currently running it on Claude Co-work as we are speaking to make this video.

[00:00:48]Um but, if I go ahead and actually go over here onto Opus 4.8, you could see that um the effort can be changed, which is amazing, amazing, amazing. Make sure you're putting this thing on on max if you're doing any writing or reasoning work. If you're just asking it normal questions, anything below like high is probably going to be uh perfectly fine.

[00:01:08]But, as I see it, uh I think it's much better at actually writing and and making presentations. It's something the 4.8 model is showing a lot of strength in. I mean, this looks absolutely incredible. Look at this. This is just like such a good dissect of the entire situation. So, what is it better at doing than 4.7? Apparently, it's marginally better at agentic coding. It makes 4x less errors on release. It's 4x more careful with its own code. It defaults to high effort. So, for coding, it spends about the same tokens as the 4.7 default while doing better work.

[00:01:42]Reach for extra high on hard or long-running asynchronous jobs. Claude code rate limits were raised to absorb the extra usage that is most likely going to happen. Another thing that happened here is it is better at multi-disciplinary reasoning, which is apparently humanities last exam. Very interesting.

[00:02:01]Um it is apparently using like sub agents to be able to do more of the critical task. So, for example, like if you're using the extra high or the max reasoning model, it will actually pull up a different agents to be able to uh create the task that you're doing.

[00:02:16]Okay, so here goes. It went Let's see what it what it pulled up for us as a title. Let's see if this is a good title. Claude Opus 4 painting is here, the next leap for AI crypto agents.

[00:02:24]Claude Opus 4 painting just dropped and this is wild for crypto AI.

[00:02:28]Uh okay, maybe not the crypto part. I think it's pulling some of the crypto information. That's totally irrelevant.

[00:02:34]But, you know, that's that's part of like my own memory dictation. That's my error. But, anyways, better at agentic financial analysis by uh small margin.

[00:02:42]So, just kind of across the board a little bit better at some of these key benchmarks. Um it also is apparently really good at long horizon beats. So, what I can imagine us doing is something like this, for example, {slash} goal. Um by the time I wake up, I want you to do an analysis on all of the AI channels out there and give me a report uh through HTML uh breaking it all down for me, so we can make content. Give me 10 video ideas and also titles to accompany them. Something like that would be kind of what you you'd use it for. The reasoning is like uh if you just prompted normally, it's just going to do the one task at hand. But, if you do a {slash} goal, it will pretty much look through everything. And for the next couple hours, it will try to give you the best output, which I think is pretty cool. Like you're you're going to get better outputs. It's going to take you much longer, but if you're going to sleep or something or you're doing a much longer term coding beat, like let's say you really need to fix this thing on a website and, you know, you don't really know how to fix it, >> [laughter] >> then you do {slash} goal. And it will try to figure out how to fix it. If it does it wrong, it will keep trying until it goes ahead and actually fixes it, which is cool. So, {slash} goal is a new thing that's coming along. Fast mode is another one. Uh no one was using fast mode on Opus apparently, according to a couple of these YouTubers that I was checking out that were testing Opus 4.8 before. Uh fast mode is uh according to them competitive to GPT 5.5's fast mode. So, fast mode on GPT 5.5, just so you know, is 1.5 times faster, so it's 50% faster, but it cost 250 for 250% more.

[00:04:27]What from what I remember about the Opus fast mode, it was I think it was six times more expensive. So, if it's 3x cheaper, it's got to be what, like it's it's like double it's it's double the amount of cost. The thing is like Opus is the most expensive model out there by far, and so [snorts] that that 2x cost it's a big leap. Now, I think like if you are someone that doesn't have a lot of capital, you're someone that wants the leanest and meanest model, this is not going to be your model. Like this isn't going to be your model at all. I also think that the harness in which uh Claude is built on sucks. It's terrible. Uh for example, like I have Claude chat, I have Claude co-work, and I have Claude code.

[00:05:12]I I have these three different places I need to click on for all my different tasks. It's very annoying. Versus if I go over to to Codex uh right now, I just have one place to do everything. I don't need to do anything different. So, I'm hoping that on whatever Opus 5 or the Mythos 1 release, that they'll combine all three of these together, and I think it'll be a lot easier to convince people to make it a daily driver, to make it like your main reasoning model. Because right now, GPT is still cheaper, uh GPT is still very capable, and also has has everything in one app.

[00:05:44]Codex is just absolutely incredible. Uh so, I still think in that regard Claude has some catching up to do. I imagine they know that. I do think that from a reasoning perspective, Opus is king.

[00:05:54]Opus 4.8 is the best model out there. It didn't take that long for this to generate. It was actually shorter than when I was generating stuff on Opus 4.7.

[00:06:02]So, it does seem marginally faster. It also has this here, a dynamic workflows.

[00:06:06]For the hardest task, Claude writes an orchestration script on the fly. Spins up hundreds of coordinated sub-agents in parallel and it verifies its own work before reporting back. This is so cool.

[00:06:15]This is like really cool. Use the word workflow in a prompt or set effort ultra code. Okay, let's Is this a powerful but token hungry. Burns through usage fast. Start with a scope task to get a feel for it. It's available on max. Do I I have a max plan. Okay, so I actually have a max plan. So, let's see if I can actually do it here.

[00:06:35]Uh use the word workflow in the prompt.

[00:06:38]Workflow, build out the thumbnail and and optimize it.

[00:06:45]Let's see.

[00:06:47]I I'm not sure if it's going to work on co-worker or if it's something for Claude code. I'm assuming this is for Claude code. It says Yeah, it says Claude code research preview. So, I don't think it's actually going to do anything. Maybe if I go to Claude code it'll do it. Okay, let's see here. So, it's Opus 4.8 uh 1 million contexts.

[00:07:03]Okay, click that. We'll put it on max.

[00:07:06]And for the sake of this video, we'll we'll just like burn through all my tokens. We'll burn through everything.

[00:07:11]Uh so, we've got workflow.

[00:07:13]Um Oh, look. Look. Look. Look. Look.

[00:07:15]Look. It actually pulled that up.

[00:07:17]Okay, that's very cool. Uh build me a simple game on an island that looks pretty.

[00:07:26]Uh yeah, I don't know. Let's Let's just like select the folder for the local session. We'll uh give >> [snorts] >> uh let's see. Downloads.

[00:07:37]All right, I give it a folder. Trust workspace. All right, let's do this. So, let's see how fast it produces this. I'm actually really interested in seeing what happens here. I'm going to I'm going to actually pull it here on the side while we have the the video running at the same exact time. Let's Let's actually do that here.

[00:07:51]Um anyway, so let's continue. So, migrate one command run {slash} Claude API migrate in Claude code. It updates your model strings and suggest prompt improvements tuned for 4.8. So, if you're coming into 4.8 right now, run this Claude API migrate. Very interesting. Man, I probably should have ran that here.

[00:08:10]And we have the effort ladder, which is low, medium, high, extra high, max, and then ultra code. So, ultra code is extra high plus Claude auto triggers dynamic workflows when warranted, which I'm assuming is this right here.

[00:08:21]We have prompting it well. Start at extra high for coding and agentic work.

[00:08:26]And then use a minimum of high for anything intelligence sensitive. Effort is now respected strictly at low {slash} medium. It scopes tightly to what you asked. Okay, so low {slash} medium isn't going to give you a lot. And then if you see shallow reasoning, raise the effort.

[00:08:39]Don't prompt around it. So, if you're getting if you're getting a bad prompt, you probably just need to put more effort. At high {slash} extra high, give it room. Set max tokens as large. So, you're going to have to give it a lot a lot of tokens. This is going to be very expensive. Like this is going to be a very expensive model to use. Let's be honest. This is going to be very expensive. So, in terms of the fundamentals, you should be very clear and direct. You should say why you're doing what you're doing, which in this case I didn't.

[00:09:03]>> [laughter] >> You should show examples that are relevant.

[00:09:06]You should structure with XML. Give it a role. And then also put long context.

[00:09:11]So, in this instance I put it on the 1 million context model. And apparently these are the quirks to prompt around for 4.8. Um you don't need to to shout anymore. So, you don't have to put critical you must.

[00:09:24]You it over triggers it. Okay, so you can just talk normally to it. All right, that's cool. It self-calibrates how long it needs to be. Say what to do and it reads you literally. So, it's a lot more direct, which I like. I I like the old model that was just beating around the bush every 5 seconds. All right, so it says it's almost done thinking. Maybe by the end of this video it'll actually be done. And other things, you should be very explicit about the action parallel two calls are a default strength.

[00:09:49]And that's mostly it. Apparently, so okay, what you need to take from this, it is more honest, it has sharper judgment, and also longer solar runs.

[00:09:57]So, it is able to do slash goal, which means like it's going to work for much longer, and it's going to have 4x better coding outputs apparently because it's much more careful about its own code, so it doesn't mess up as often. I I found that Codex was just better. It would just come out with better code than 4.7.

[00:10:15]So, I'm glad that Anthropic released a coding model with less flaws. We'll see if it's better than than 4.

[00:10:23]4.7 or 5.5 for GPT. And [snorts] then also it seems like they're using sub agents now. Like I bought a Mac Mini, and I bought a Mac Studio, and I'm running right now GPT on it because I don't want to run the API on Claude, it's really expensive. It's like it's really expensive, guys. I'm paying $100 a month for my plan, I still run to to rate limits.

[00:10:42]But, and from what I've been able to take at least, I I really like Claude for its writing capabilities. I really think it's really good at writing, it's really good at reasoning, it's really good at UI. It's really good at the front-facing stuff in your business. If you need someone to talk to about your business decisions from the back end, I still think GPT image 2.0 is probably going to be the best. Okay, look at look over here. I have my design direction, a golden hour tropical sunset with a cute crab that scuttles along the beach catching falling fruit. Let me check the directory, then build it as a single self-containing HTML file.

[00:11:18]And then it has an unrelated Okay, so it's still creating. Okay, so this is going to this is going to be using lots lots and lots of tokens, it looks like.

[00:11:25]Look at look at it. It's like it's munching all of my tokens. It's eating everything in between.

[00:11:29]But, if we compare it to GPT 5.5 cuz I think that's the one a lot of you are going to compare to, um agentic terminal coding is still better on 5.5, apparently. So, if you're like, "Oh, actually on the terminal," um 5.5 is going to be better. Uh other things, it seems like across the board it's just like it's not that much better. It's not that much better. It's a little bit better, but it's it's like if you go over and you look up the artificial Uh I'm not spelling it right, I don't care. A index, artificial intelligence index.

[00:12:00]And okay, come on, guys.

[00:12:05]Artificial intelligence index.

[00:12:09]Give it to me. Give it to me. Okay, great. If I go to the artificial intelligence index, then you're going to see for yourself that it is very, very expensive. Very expensive. Uh very, very, very, very, very, very, very expensive. In fact, like Opus 4.7 is like the most expensive model. It's >> [laughter] >> It's the most expensive model. And so, 4.8 doesn't solve any of that. It's still equally as expensive. Yes, the fast mode is cheaper, but it's still equally as expensive. It's a very expensive model. And so, not very excited to see that. Like, you could already see this thing's munching tokens right now. It's It's eating up tokens.

[00:12:42]It's taking 5 minutes and it's just like eating up all of my tokens. 5.5 Okay, so this is what it ended up coming up with.

[00:12:47]We've got Coco Cave, sunset's pouring fruit off the palm, scoot little Coco the crab across the sand and catch all you can, but mind the spiky urchins.

[00:12:55]Okay, so we move with our mouse.

[00:12:58]Interesting. Some catching coconuts. It actually doesn't look bad. This is This is pretty solid. So, I'm guessing I'm supposed to dodge these little coconuts.

[00:13:07]That's also bad for me. We've got the score. If I lose all of my lives, what happens? And then I lose, I'm assuming.

[00:13:15]So, what hits me?

[00:13:19]I'm getting points.

[00:13:22]Okay.

[00:13:25]This is interesting. So, spike maybe here?

[00:13:28]And then play again. That That looks all right. That's what it looks all right. I mean, it is an island game. I would say that's that's pretty solid. Imagine like I didn't really give it a lot of context there. So, I could definitely fix it.

[00:13:38]So, pretty pretty solid. I mean, the crab's legs a little derpy. Kind of looks like a squid. But overall, not bad. Not bad. Thousand tokens already.

[00:13:45]>> [laughter] >> It's taking all my money. Uh but, just in general, it's just a very expensive model. So, I think like if you're someone that can afford the $100 a month plan, the $200 a month plan, then you and and you're also doing writing and reasoning tasks, then then 4.8 is probably going to be perfectly fine for you. It's going to be solid. But, I think for everything else, people are still going to just going to use GPT until Claude comes up with a better solution to this weird thing that they've got going on. They have three different tabs for some reason. Why do we have three tabs? We should have one tab for everything. On on Codex, co-work is the equivalent of like Codex working on your computer, which it does. And then code is like more of it coding, which Codex does. And then chat is like the regular chat that you do with your chat GPT, which Codex also does. Like I don't really see why like there would be three different chats for the same exact thing. So, that's in essence Opus 4.8. I hope that this video gave you the context that you needed to use it in its best form. Thank you so much for watching the video. Subscribe if you want more AI agentic content. Until next time, I'll see you later.

#JM Crypto #Jesus Martinez #Jesus Martinez crypto #JM Crypto YouTube #crypto news

Related Videos

Artificial Intelligence

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Artificial Intelligence

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Artificial Intelligence

5 Mind Blowing Omni Uses Cases

PaulJLipsky

1K views•2026-06-02

Artificial Intelligence

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Artificial Intelligence

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Artificial Intelligence

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Artificial Intelligence

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Artificial Intelligence

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Trending

Revisiting The Cat Cafe For The Final Time

BenGtalks

3195K views•2026-05-29

Lil bro is a menace 🤣

NotAirJordan

2037K views•2026-05-31

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

Political Science

My response to the Police

RecklessBen

1496K views•2026-06-01