Install our extension to search inside any video instantly.

OPUS 4.8!!! (also maybe GPT5.6??)
Added: 2026-05-31

30,031 views8572:34:17matthew_bermanOriginal Release: 2026-05-28

Claude Opus 4.8, released by Anthropic, represents a significant advancement in AI model capabilities with improved coding performance (69.2% on SWE-Bench Pro, a 5-point jump from Opus 4.7), enhanced reasoning abilities (3-point improvement on Humanity's Last Exam), and the introduction of dynamic workflows that enable parallel execution of multiple sub-agents for complex tasks. The model maintains the same pricing as previous versions while offering 2.5x faster response times in fast mode, and has achieved the top position on the new Frontier Suite benchmark with an 83% score.

[00:01:23]All right.

[00:01:24]Yes, it is true.

[00:01:27]Opus 4.8.

[00:01:30]It is here. We're going to be talking about it. And maybe, just maybe, we'll get uh GBT 5.6 six as well.

[00:01:43]So, yeah, that's what we're going to be talking about.

[00:01:50]Yeah. Yep. Here we go. Uh, wait till a few more people jump in. Let me know where you're from.

[00:01:56]Always love seeing uh everybody throughout the world. Let me know where you're from.

[00:02:08]Sarah Okconor is watching us become friends with these things in disgust.

[00:02:13]Right on. Yep. Yep. Yep.

[00:02:17]We have Cheety from Spain. Welcome.

[00:02:22]Marmar. Welcome back from Minneapolis.

[00:02:36]Osman from Turkey.

[00:02:39]Greg, welcome back. Welcome back from Venice Beach. Beautiful Venice Beach, right? Uh very close to where I grew up actually. Santa Monica.

[00:02:49]Got Germany, Armenia, British Columbia, Canada.

[00:02:58]What's up? India.

[00:03:02]Oh, what an exciting day.

[00:03:07]What an exciting day. We definitely got one Frontier model dropped today. Uh, possibly two. We'll see. We'll see if we get GPT 5.6. There are rumors that it dropped as well.

[00:03:28]All right, Japan. What's up, Japan? One of my favorite uh places on earth to visit. Bad attack zero. Japan, what's up? Egypt, what's up, Eric? Welcome back, Monttok. We got Chile.

[00:03:54]All right. So, I'm going to be recording a video during this live stream. So, just keep that in mind. If I flub some words, that's why I'm repeating them.

[00:04:05]But, we're going to be talking about, of course, let me see if I'll move this a little bit up. There we go. We're going to be talking about Opus 4.8.

[00:04:15]I got an email from Anthropic yesterday.

[00:04:22]Yesterday morning, uh saying, "Hey, I we're going to have a new model or we have a new model. We want you to test it." I didn't see the email till later in the day. I literally had time to play with the model for about 10 minutes. So, I have I'm basically coming into this cold.

[00:04:41]But yes, it is true. Opus Claude Opus 4.8 is here.

[00:04:48]Uh, and not just um not just the blog post, I actually before I even saw it, I opened it up.

[00:04:56]Uh, I opened up my clot and there it is.

[00:04:58]Opus 4.8 most capable for ambitious work. By the way, when did Opus 47 come out?

[00:05:06]Feels like just yesterday. April 16th.

[00:05:09]So, we're about six weeks past Opus 47 and we have Opus 4.8. Now, I have a bunch of hopes as to what they improved.

[00:05:19]Obviously, coding performance, obviously, overall intelligence, overall performance, but one of the biggest drawbacks of using Opus and using really like anthropic models at all right now is the cost per task. It is extremely extremely expensive.

[00:05:44]So, we're going to be talking about Opus 4.8. I'm going to go over it. I'll record a video. We'll also do a little bit of testing with it.

[00:05:53]Punch it up a little. What do you mean by that? Punch it up. Like, audio volume, it's pretty much as high as it goes.

[00:06:09]Eric, over a billion tokens per day.

[00:06:12]Geez.

[00:06:15]Ripped to your wallet.

[00:06:18]All right, let's get started.

[00:06:21]So, yes, it is true. Let me get Okay, so here it is. Uh, the rumors are true. Opus 4.8 has been dropped and it does look like a fantastic model. That's what we're going to be going over today.

[00:06:44]So, introducing Claude Opus 4.8. It builds on Opus 4.7 with sharper judgment. So, that could mean more intelligent. That could mean less tokens to get there. I hope that's what it means.

[00:07:03]more honesty about its own progress and the ability to work independently for longer than its predecessors available today at the same price.

[00:07:13]So that's good news. Anthropic models tend to be the most expensive on the market, but the fact that they've upped the intelligence and also kept the price the same is very welcome. Right? That is effectively a uh cost decrease.

[00:07:32]and fast mode, which by the way, I know everybody loves GPT 5.5. I get it. It's a fantastic model, but I just keep going back to Opus. I really love it. It is my preferred model for almost everything.

[00:07:49]And fast mode specifically is so good. I'm a speed maxi through and through.

[00:07:58]Hey, Michael Chancellor. I'm a speed maxi through and through and I am always happy to see when the model runs really quickly. So, same model roughly 2 and a half times the speed. So, if it let's say is running at 100 tokens per second on average, now you get 250 tokens per second. This is a substantial increase in speed. And really, Anthropic stands alone in their fast mode. I know OpenAI has fast modes, but it's not quite as fast as Anthropic, and I I really do prefer that. So, turn it on with SLfast in Cloud Code on the API. Uh, contact your account manager to request access or join the wait list.

[00:08:50]Uh, but token cost, yeah, we're going to get into token cost. We're going to we're going to break down everything that they announced. I'm going through it right now uh for the first time, so we'll see together. In cloud code, Opus48 makes calls like an experienced engineer without needing constant check-ins. It stays on track across long running sessions and follows work through wait through in your repo. That doesn't make sense. They needed to have Cloud review this language. Wait, am I misreading this? It stays on track across long running sessions and follows work through in your repo. There we go.

[00:09:25]Okay. So, you can hand off a feature or a bug sweep while you focus on what's next.

[00:09:36]Also new in Claude Code, dynamic workflows research preview. For the hardest task, Claude makes a plan, runs hundreds of parallel sub aents, and verifies its work before reporting back.

[00:09:48]I think a migration touching hundreds of files. Okay, hundreds of parallel sub aents. I am just hearing my API bill skyrocket when I read that. But we're going to read all about that.

[00:10:05]Uh Alex Albert from Anthropic, I've been liking this model a lot. Excited for folks to try it out.

[00:10:14]Uh who thinks we're getting GPT 5.6 six today. I mean, it would be perfect. It would be right in line with everything we've seen. Sam Alman, hyper competitive CEO. Oh, you dropped 48. Great. We got something for you. Tell me in chat, do you think uh 56 is coming today?

[00:10:42]By the way, if you're watching the stream, I would very much appreciate if you gave it a like, if you shared it with somebody, subscribe to the channel.

[00:10:49]Would really appreciate it. It does help. Thank you so much.

[00:10:57]What's up, Mark Santos? Welcome back. I love the competition. Me too. Me too.

[00:11:04]by end of day. Freddy Bitcoins by end of day. We'll see. It has to be soon. Like just the the way that the media works, it has to be soon. They have to drop it soon. Okay, let's keep going.

[00:11:19]So, this is what everybody wants to talk about, the benchmarks. Obviously, I just made a video about what is now considered kind of the most accurate benchmark, Deep Suite. Um, and I hope that they test Opus 4.8 on it. So, here is what we have. Aentic coding SWE Pro. So, obviously just looking at this quickly, uh, Deep Suite is not on here.

[00:12:01]Red one GBT 5.6 preparing the bench charts right now. Yeah. Yeah.

[00:12:07]Absolutely. How do you test these strawberry? Uh I asked it how many Rs are in the word strawberry obviously.

[00:12:18]Okay. All right. All right. So let's talk about the benchmarks. Uh we have aentic coding with SweetBench Pro. This is substantial. 69.2% 2%. That is a five point jump from Opus 47 that just came out 6 weeks ago, mind you, 6 weeks ago.

[00:12:39]And we already have another bump. The acceleration is accelerating. It feels like that. I genuinely feel like I can't keep up. And it is my literal full-time job to keep up and I can't.

[00:12:59]Okay. So, uh, GPT55.6 as compared to 69.2. Now, this is where it really gets interesting because a lot of people are saying Sweedbench, Sweepbench Pro is done. It's done meaning it's it it can't be taken seriously with scores like this because the vibe check on GPT 5.5 as compared to Opus 47 in which Opus 47 is, you know, what is that? seven, eight points ahead.

[00:13:30]It doesn't make sense to what they're feeling, right? So, I did this video about Deep Suite and this is much more reflective of what people are seeing or what people are actually feeling when they use these models.

[00:13:50]So GPT 5.5 extra high at 70%, claude opus 47 at 54%. This is a big difference here. So now coming back to uh Swebench you could just see like the difference between GPT 5.5 and 4.7 Opus 4.7 is pretty substantial in the other direction. Now we have another massive jump with uh Opus 4.8. eight.

[00:14:23]Uh, Momento artificial analysis shows GPT 5.5 is better overall uh, than Opus 4.7. Okay, good to know. I think it's interesting. Artificial analysis is an index of other benchmarks, not their own benchmark.

[00:14:38]Um, okay.

[00:14:41]Uh, Gemini 3.1 Pro. Come on. Come on, Google. Do something. All right. Then we have a gentic terminal coding where GPT 5.5 is still number one. This is terminal bench 2.1. This is the model's ability to uh successfully navigate Terminal to write commands and and go through uh directories and and all of that.

[00:15:13]I'm going to be back one second.

[00:15:33]All right. Sorry, there was some noise uh in the other room. Okay.

[00:15:41]David from Pharaoh Islands here. Where's Pharaoh Islands? I don't even know what that is. Uh, okay.

[00:15:50]Yeah. So, so still number one on being able to actually successfully navigate the terminal. GPT 5.5 coming in at 78.2.

[00:15:59]Kind of way ahead of everybody else, even Opus 4.8. And so, if you expect GPT 5.6 to be dropping soon, you can only imagine it's going to be better. And so their lead on agentic terminal coding is substantial and maybe that's what people are feeling.

[00:16:21]The vibe of the model that is best for realworld coding use cases is GPT 5.5.

[00:16:27]Now again I've said my preference is Opus. I stand by that. I still enjoy Opus the most. I still think it um is the best performance for what I use it for.

[00:16:40]But a lot of people are saying 5.5.

[00:16:51]All right, let's keep going. So, multidisciplinary reasoning, humanity's last exam. I'm kind of sad that they don't include um uh that the Oh my god, I'm forgetting the name of it. the benchmark, uh, Arc AGI, friend of the show, Greg Cameron, president over there. Yeah, I'm I'm kind of sad that they don't include that in any of these. But we got humanity's last exam, the most ominous benchmark name in history. And Opus 4.8 is just absolutely dominating. We have a three-point bump from Opus 47.

[00:17:31]Uh we have a three-point bump uh uh sorry we have a three-point bump from no tools and another three-point bump with tools and that is a pretty substantial improvement over GPT 5.5 at 41.4 and 52.2 respectively. Now we have agentic computer use. So that is the agent's ability to actually control your computer, to move windows, click on buttons, resize things, to actually get real things done on your computer, which even if it's highly accurate, I have found to be almost uh entirely too slow on most real use cases.

[00:18:16]Um, okay. And so yeah, 83% here as compared to 82 as compared to 78 with GBT 5.5. GBT uh GDP val aa um this is uh this is a benchmark actually created by open AAI and it tests the model's ability to actually get realworld knowledge work done and this is a great improvement.

[00:18:46]basically 140 point improvement over the previous model Opus 47 and about the same 100 what 130 point improvement this is I believe an ELO score uh over GBT 5.5 we have uh AENTIC financial analysis which it's interesting that they're including this finance agent v2 um you know all about the same except for sad Gemini 3.1 pro over here but all these are about the same as two point improvement in Opus 48. Um I think it's interesting because Anthropic seems to be going hard on very specific verticals obviously coding but also financial analysis and there's a there's a very big market if they can nail financial analysis with models.

[00:19:37]Um, okay. So, yeah, obviously every single time a new model comes out, we see the benchmarks numbers go up. Number always go up. There's, you know, that that's why it's like it's interesting to see the benchmarks, but the vibe the vibe check is really what matters.

[00:20:01]Okay, we got fast mode.

[00:20:05]Um, let's see what this is.

[00:20:13]Sonnet will do finance.

[00:20:16]Yeah, interesting. It It's interesting because with the um sorry, with the finance use case, the price per token matters a lot less than many other use cases. Obviously with coding it matters but with finance the like kind of cost per unit of intelligence can be a lot higher given the average value of some output of production is a lot higher as well.

[00:20:46]Uh okay then we have something new. We have then we have something new something brand new called dynamic workflows in cloud code. Now I will try to test this out after this but let's take a look at what this actually is first.

[00:21:06]Today we're introducing dynamic workflows in cloud code helping claude take on the most challenging tasks end to end.

[00:21:16]Work you'd normally plan in quarters now finishes in days. Okay, I'm pretty sure this is all just parallelization.

[00:21:27]Claude dynamically writes orchestration scripts that run tens to hundreds of parallel sub aents in a single session.

[00:21:36]Okay, who thinks they would have released this a month ago? I'm going to guess absolutely not. But now they have access to all the compute in the world with their XAI deal for Colossus Access.

[00:21:53]And I think this is really interesting.

[00:21:55]They've probably had all of these features and all of this functionality that they couldn't they like simply just couldn't release because they could not support the demand. And now gloves off like they're they're going all in and it's like okay you have money spend it with us. We're going to, you know, spin up tens to hundreds of parallel agents, which is nuts. Which is absolutely nuts.

[00:22:22]Checking its work before anything reaches you.

[00:22:32]Some problems are too big for one pass by a single agent, especially in complex legacy code bases.

[00:22:39]A bug hunt across an entire service, a migration that touches hundreds of files, a plan you want to stress, a sorry, a plan you want stress tested from every angle before you commit to it. Dynamic workflows can handle all of these end to end.

[00:22:58]All right. So, what we're seeing here, let's see, cloud code. It looks like I don't know 10 12 agents running in parallel.

[00:23:09]They're touching multiple different files. All opus 4.8.

[00:23:14]Doesn't really tell us what they're actually accomplishing though, but very very interesting nonetheless.

[00:23:24]Uh, okay. Dynamic workflows are available today in research preview in the Cloud Code CLI. I don't know if I'm going to have access to that, but we will check later. We're going to do some testing after this. By the way, um if you have ideas for what we should test Opus 48 on, like some prompts, give me prompts. I will test them. Drop it in chat and I will try to test it later in this video.

[00:23:53]Yeah, Opus 48 or sorry, Opus 47. God, that image. Uh, Opus 47 was not wellreceived, right?

[00:24:06]They came out, they said, "Hey, Opus 47 is fantastic. It's so much better than Opus 46. much more expensive, much more verbose, cost per task went up substantially, and it was the combination of GPT 5.5 being really good, being very token efficient, and then at the same time, uh, Anthropic releasing Opus 47, which was poorly received, where the kind of vibe shift happened and a lot of people in, you know, tech Twitter, AI, Twitter started moving to GPT 5.5. Now obviously that is not reflected in the overall revenue of these companies because now Anthropic is way ahead way ahead of open AI like their revenue is vertical at this point.

[00:25:04]Let's see if I can find a chart. Uh, okay. All right. So, this is from the information. The Anthropic is likely generating at least 35% more revenue than OpenAI and I've heard that they're going to have their first profitable quarter.

[00:25:31]So like there you know the it's a lag the revenue is a lagging indicator and the absolute frontier of users the earliest of early adopters of these models there has been a vibe shift towards GPT 5.5 and codeex but those same users have no loyalty including myself I am not loyal to any of these companies or models you show me the best model and I will use it and So the rest of the industry does have kind of indirect loyalty. They have to because change management for an enterprise company is difficult. They're signing contracts. They're committing uh revenue to these companies to get discounts or uh uh like um partitioned uh uh compute resources. It is very difficult. So what we're seeing is at the absolute frontier, people like me, people probably like you if you're watching this video, uh they'll switch whatever like the the change uh the switching cost of AI models is effectively zero. There's, you know, you swap out the API key, you swap out the model name, and then you know, you make a little changes to the prompt, but overall it's like I'm going to go with whatever model is best in the moment. Uh but again, for the rest of the market, it is not like that.

[00:27:00]And look, hopefully, you know, I'm I'm being honest with my reviews and and my opinions of these models because I have um I've had a few words to say about Anthropic these last few weeks. But I will also admit, even against what seems to be the trend in early tech Twitter, I still like Opus best. I still go back to it most often.

[00:27:27]All right, let's keep going on dynamic workflows. A new feature from Anthropic.

[00:27:39]All right, so dynamic workflows are available today. research preview in cloud code CLI desktop and the VS code extension which who uses that for Mac team and enterprise if admin enabled plans uh as well as on the cloud API Amazon Bedrock Vertex AI and Microsoft Foundry. This is really important.

[00:28:02]This is important because these are all deals that Anthropic has struck with these companies and they seemingly are finally out of their compute crunch, right? Not only did they strike a a very good deal with XAI to use all of their excess compute because XAI doesn't have a frontier model and they have all of this additional capacity, but they also struck thirdparty deals with Amazon and and others. Um and so they have a lot of compute now. They have a lot of compute and so they can release a feature like this which is like again tend to hundreds of sub aents running in parallel. Uh we are definitely going to be testing this and uh hopefully my quota lasts more than five minutes.

[00:28:57]Uh yeah, here right. So look at this.

[00:29:00]Dynamic workflows can consume substantially more tokens than a typical claude code session. So we recommend starting on a scoped task to get a feel for usage in your work. Now, if you thought Anthropics revenue was vertical, now they are like, "No, no, let me show you. Let me show you how vertical we can really get." Because all of a sudden, and again, I I I think it's probably the top 5% of anthropic users who are maybe 80 plus% of overall revenue of total revenue. And now those top 5% are going to have a feature that allows them to get so much more done in such a a smaller amount of time with these parallel agents. And of course, when you trade off or when you're trying to optimize for time, you need to pay more.

[00:29:53]That that's the trade-off.

[00:30:00]And by the way, just one more reminder.

[00:30:02]I would really really appreciate if you're watching this stream to just like it and uh subscribe to the channel again. It it does very much help and thank you appreciate it.

[00:30:15]Okay. Um for the best experience, turn on auto mode when using dynamic workflows. From there, you have two ways to start a workflow. Ask Claude to create a dynamic workflow directly. By the way, keep dropping in chat um the different prompts and tests that you want me to run.

[00:30:38]Uh I'll be testing hopefully I can test dynamic workflows or dynamic workflow.

[00:30:44]Uh but we will be testing Opus 4.8 for sure both uh in uh cloud directly and in cloud code.

[00:31:00]Okay. uh switch on a new cloud code specific setting called ultra code. This is accessible through the effort menu.

[00:31:09]Okay, so that's where we're going to be finding it. And it sets the effort level to extra high.

[00:31:16]Extra high plus like multiple parallel agents. God, this is going to be so expensive. Uh while letting Claude decide automatically when to use a workflow to handle your task. Imagine if they came out with all of this and then they were like, "Oh yeah, also we're charging 20% more per million output tokens." So I think this is why they did it. They're giving you as users, you know, 15 different ways to increase your usage of tokens. Uh so they didn't necessarily need to increase the price per token. Plus, I think it's actually again a reflection of their increased supply, right? It's all it's all supply demand. They have compute now. They don't have to raise their prices again and again. Now they are just about market penetration.

[00:32:08]What's up Ashne?

[00:32:10]Ashley. Ashne I I always say Ashy when I read your name uh on on uh on your your profile.

[00:32:20]Um okay.

[00:32:25]Let's see. Dynamic workflows in action.

[00:32:27]Early access users.

[00:32:30]Uh, Okasuko, I joined this late. When is Opus 4.8 releasing? It's out. It's out right now. You should have access to it.

[00:32:38]And if you don't, probably by the end of today, everybody will have access to it.

[00:32:43]There's probably some slow roll. I checked immediately uh when I first saw the first tweet saying it's out and I had it in my uh claude.

[00:32:58]Good to know. Okay.

[00:33:01]Uh yeah, excellent. Yeah, very very cool. You know, my favorite days are are new model days. Can you test it out?

[00:33:10]Yes, I will be testing it. I promise.

[00:33:12]Stick with me. We will be testing it.

[00:33:15]Keep dropping prompts you want me to test in chat and I'll I'll test it out.

[00:33:22]Okay. Uh let's see. So, dynamic workflows in action. Codebaswide bug hunts, profiler guided optimization audits, and security audits. Claude searches a service or repo in parallel, then runs independent verification on every finding. So, the report surfaces real issues. Okay, this is all kind of just uh a whole bunch of nothing right here.

[00:33:44]large migrations and modernization efforts.

[00:33:48]Interesting. Framework swaps. Wow. API deprecations, language ports that span thousands of files end to end. Very interesting. Very interesting. There still needs to be a main agent that delegates to the sub aents. And so like the bottleneck still is the intelligence of that main agent. How well can it plan? How well can it manage its sub agents?

[00:34:13]Critical work you need checked twice.

[00:34:15]When the cost of a wrong answer is high, a workflow gives Claude independent attempts at the problem and adversarial agents working to break the result before you see it. That is really cool.

[00:34:26]That is really cool. So if you didn't have full trust in your agents, if you did not have full trust to just let Claude Code write code for you now, maybe this will help. Obviously, you're going to be paying up for that, but uh yeah, there you go.

[00:34:44]Re rewriting bun with dynamic workflows.

[00:34:47]I pretty sure bun was acquired by Anthropic, right? Or was it OpenAI?

[00:34:57]Yeah, Anthropic acquires Bun. Yep. Yep.

[00:35:00]Yep. Yep. Yep. Okay.

[00:35:08]All right, this is interesting. Let's talk about how workflow works. And yeah, I promise we will be testing it. Uh, when a workflow kicks off, Claude plans dynamically based on your prompt, breaks it into subtasks, and fans the work out across sub agents running in parallel.

[00:35:23]Results are checked before they're folded in, and you come back to a single coordinated answer. Agents address the problem from independent angles. I wonder how they're structuring this from a prompt perspective. Uh, you know, if they're spinning up tens to a hundred agents and they're giving them all unique uh perspectives on a problem, all unique personalities. I wonder how they're deciding. Is it the main agent that's deciding everything?

[00:35:52]Very interesting. Other agents try to refute what they found. I love this adversarial prompting. This is all stuff that was discovered by people early on in this whole new AI wave. You know, everything from thinking kind of think step by step to adversarial prompting.

[00:36:12]Now, they're just saying, you know what, we're building it in to our scaffolding.

[00:36:16]It's not that the models themselves have this ability. They're probably optimized for being able to um break up tasks and and delegate out to sub agents, but ultimately it's the scaffolding that's handling all of this.

[00:36:33]Oh, I like this take. We'll see if it's true. 4.8 max fast feels like prime opus 46. Opus 46 is ah just the best. The best shit's expensive though. Yes, it is. Yes, it is. especially if you're talking about a 100 agents.

[00:36:53]Uh okay. And uh and the run keeps iterating until the answers converge, which is how a workflow reaches results a single pass can't. Are they going to give like any benchmarks for this? They don't. Interesting.

[00:37:09]Why do you think they're not showing any benchmarks for this feature? It seems like if there's this drastic improvement in quality or potential long horizon task solutions, like why would they not put out benchmarks comparing it to just the base model?

[00:37:39]Yeah.

[00:37:40]Ashley, how many are in strawberry? I mean, that's a saturated test. You got to come up with something harder than that. Although, we will be testing that.

[00:37:48]We'll see. We'll see. I wonder how many uh sub aents we need to solve that. Um, okay.

[00:38:00]Let's see. Let's see how it works.

[00:38:05]All right. I think this is enough talking. You guys are probably sick of me just reading this. Let's get into it.

[00:38:10]Um, that's all the info we have.

[00:38:18]Tariq, let's let's look at some uh um tweets from people from the anthropic team. Cloud code outthropic. I think you'll really like Opus 4.8. It's as smart as its benchmark show, but expresses and utilizes that intelligence in a warm and collaborative way. Yeah.

[00:38:36]By the way, still to this day from a personality perspective, I prefer Claude. I prefer Opus. Um, from like within OpenClaw, when I had to swap out and Open Claw, sorry, when I had to swap out Opus for GPT, there was a substantial degradation in personality.

[00:38:56]And obviously, that's my just preference, but I I I think they're kind of uh talking about it here. Workflows are a great way to utilize it. I'm hooked. article on that soon.

[00:39:11]Oh, interesting. Anthropic says it plans to release Mythos class. Where's Where is the proof of this?

[00:39:24]Yeah. I don't know if this is true. Can we get a source? No.

[00:39:30]Let's see. Do you think this is true?

[00:39:32]Mythos class model the source. Oh, wait a second.

[00:39:39]No, but where it's what's next on their page?

[00:39:50]On this page. Wait, where?

[00:39:57]Not on this. Maybe.

[00:40:01]What am I missing?

[00:40:03]Can someone link it to me?

[00:40:12]This is true. And there's a source for it. Scroll up a little so I can see the tweet of Opus making a city.

[00:40:28]By the way, is does text highlighting just never work? Like when you're trying to find the other um instances on a page like I I it never works for me. It says there's two. Can you guys see this? No, you can't see this. Look at this. So, I'm trying to search for it and it's not switching around the page.

[00:40:50]It just shows me the one instance of it.

[00:40:52]So, so frustrating.

[00:40:58]Search glass wing. Okay.

[00:41:01]No.

[00:41:03]Uh, let's see. Let's see.

[00:41:18]Yeah, Eric, I think you're right.

[00:41:21]Usually offscreen or hidden HTML. It should say that. Um, but I Yeah, you're you're right.

[00:41:28]Yep. Yep. Zoron, same thing. Hidden span element. Yep. Kind of frustrating. You would think it would be uh better. Um, okay. I don't see it. I'll try to find it. Um uh okay. So it looks like Opus 48 is now available in Windsurf and Devon and probably Cursor as well. Probably Factory. They probably all got early access.

[00:42:12]I feel like I get no love from Anthropic or at least just a little bit. Here it is. Yeah, there it is. Cursor claw opus 4.8 is now available in cursor on cursor bench. It's able to work more efficiently than Opus 47. We've also found it to be more persistent on harder tasks.

[00:42:31]Yeah, I want to see the cursor bench results.

[00:42:41]Okay, let's test it. Uh, here we go. All right, what prompts do we have? Let's see.

[00:42:49]Give me a sec.

[00:43:02]All right, drop your prompts in chat and we're going to test it. What? What do you want me to see? Uh what do you want what do you want me to test Opus 48 on?

[00:43:13]So let's also set the uh God, this is just going to burn through my tokens.

[00:43:18]Max Opus 48 max. How many Rs are in the word strawberry? Okay, that's obviously the first one. We're going to do a detailed Windows clone. I mean, I need to be able to do more than one prompt here. Soccer game with 3JS. All right, let's try that. That sounds cool.

[00:43:44]Create a soccer game with 3JS.

[00:43:49]Uh, I'm trying to think.

[00:43:57]Yeah, I'm going to burn through my rate limit quickly. But then I'll just switch over to cursor because I have a bunch of uh tokens there that I can use. Uh create a soccer game, 3D soccer game with 3JS. I'm going to turn it down from max effort. I'm going to put it on high because I don't think we need max effort for that. Let's see how that does.

[00:44:25]Uh, let's do the Rubik's Cube test just because it's fun. Uh, create a 3D Rubik's cube simulation.

[00:44:35]Uh, allow for scrambling, solving, and everything step by step. All right, let's see the car wash test. All right. Yeah, good call. Let's do the car wash test. So, um, what's the right way to say this?

[00:44:57]This is a I need to get a car wash. The car wash is 50 ft away. Should I walk or drive?

[00:45:10]Walk.

[00:45:14]Okay. All right. I guess that's it. It It's It's still failing on these kind of edge Casey logic questions. I mean, these are just built to fool.

[00:45:26]I mean, this is crazy. Yeah, these tests are built to fool the models. Um, jagged intelligence is here. It's here to stay.

[00:45:38]Why am I doing this in the chat interface? I will switch over to cloud code. Don't you worry. Don't you worry.

[00:45:44]Uh, all right. So, that didn't work.

[00:45:48]Let's try on max effort.

[00:45:52]I think I'm going to need a dozen parallel agents for this walk. Unbelievable.

[00:45:58]Opus 48 Max. The most expensive version of Claude and it still cannot get it.

[00:46:11]Unreal.

[00:46:14]Well, you Okay, hold on. Am I Am I Okay.

[00:46:17]Well, you did not specify you have to wash the car.

[00:46:22]No, I said I need to get a car wash.

[00:46:25]That means I have to wash the car. The car wash is 50 feet away. Should I walk or drive? It's like, look, if I ask, you know, an 8-year-old about this, they're going to get this. So, so I don't need any more clarification to prove that these models are not doing this uh properly. Right.

[00:46:45]I should have said get my car washed.

[00:46:48]But why? You're basically telling me I need to add more clarification.

[00:46:54]And and like by the way, everybody in chat knows what I mean by this, right?

[00:46:58]So why do I have to explain it to the quote unquote best model on the planet, the most expensive model on the planet?

[00:47:10]Okay. Yeah. Yeah. I like this.

[00:47:13]How would I wash the car if I walk there?

[00:47:18]Max thinking. Max thinking.

[00:47:23]Fair point. Drive. Can't watch a car.

[00:47:26]I'm basically giving it the answer at that point though, right?

[00:47:32]All right. Um, okay.

[00:47:59]Let's see the soccer game. Yeah, thanks for reminding me. Uh, okay. Let's start here.

[00:48:07]Um, oh man, can I not? Oh, yeah. Here we go.

[00:48:16]Um, okay. I'm controlling it. CPU scores and it's getting infinite points. Yeah, I'd say this is not good.

[00:48:36]Let's try it again. No, didn't want to work. Let's try it again.

[00:48:42]It just It just kicks. And Oh, wait.

[00:48:45]Which one am I? I'm the green guy.

[00:48:50]Hold shift to charge.

[00:48:53]And how do I kick? Release to kick. I see. I see. I see. I see. So, king. Bam.

[00:48:59]Oh, this is actually pretty cool. I don't know why he has like a little pyramid coming out of him.

[00:49:09]All right.

[00:49:13]Um, both my kids said walk. Six and n years old. I consider them pretty smart, but I guess it caught them out, too. All right. For sure. I'm gonna ask my kids about this tonight. That's hilarious.

[00:49:30]All right. So, yeah, pretty good. Let's see the 3D Rubik's Cube simulation.

[00:49:41]Still working on it. Still going.

[00:50:02]I don't I don't know. Is this even Yeah, it is working. Look at that. Oh, boy. It is so slow. It's crawling. How do I I can't turn on fast from here. Yeah. No, I can't. Okay. Um, we're switching over to cloud code, okay? Because I think that's where you guys want to see me actually work from.

[00:50:38]I'll come back to this. I hopefully it finishes soon, but it is uh brutally slow right now.

[00:50:46]I should probably use cursor because I get a lot of tokens from them and they have all the features.

[00:50:54]Maybe I'll test it in cursor. You know what? I I have all the credits in anthrop or in Claude. I I actually do want to burn through them because otherwise I'm not using them. Um okay, Claude. Bam.

[00:51:16]All right. Can y'all see that? Yep. All right. Cool.

[00:51:27]Okay, let's try it. We have Opus 4.8 high. Can you all see that on the screen? No. So, here we go. Just to show you real quick. Opus 4.8 high. You can see that in the bottom right. Um, I have auto on. Let's Let's see all these settings here. Auto on.

[00:51:47]Opus 4.8.

[00:51:49]Let's enable fast. Gosh, I'm going to have like one and a half prompts if I go up to max here. So, I'm going to leave it on high. Actually, I'm not even going to put on fast. It's too I I just don't want to burn through all the credits.

[00:52:03]Uh, this is Opus 4.8, Not the million context.

[00:52:24]Okay. Um, here we go. Okay, let's test some things.

[00:52:40]Uh, drop your prompts. Drop your prompts in chat, please. Uh, we will test them.

[00:52:45]Keep dropping it. Create a video editing tool that allows the users to cut, reframe, and add titles to a video. I kind of like that one.

[00:52:52]Um, why can I not highlight copy text?

[00:52:55]Here we go. Uh, okay.

[00:52:59]Thank you, Louis.

[00:53:01]Luis, sorry, Lewis. Louise, um, create a video editing tool that allows the user to cut, reframe, and add titles to a video. Let's go. Um, I wonder if the new workflow feature is available in cursor.

[00:53:16]Probably not is my guess because it's very much tied to the harness and cursor does their own sub agent management.

[00:53:37]Real Madhatter. So sad to see someone negotiate with tokens over a few prompts. Anthropic is such trash.

[00:53:44]Yeah, I've never felt the token squeeze as much as I do when I use Anthropics products. Absolutely. Uh, how should this video editor be built delivered standalone web app integrated into Journey? Interesting. It thinks I'm working out of uh a different project of mine. Serverside processing tool. Boom.

[00:54:04]We're going to do that. Uh, what does refframe mean? Yeah, I guess. What does refframe mean? Uh, change the aspect ratio. crop to a free form region. Pan and zoom within the frame. Let's go with that. Uh, how important is real video file export versus a working prototype?

[00:54:19]Let's just do a working prototype.

[00:54:38]Um, what am I missing here?

[00:54:47]Oh, interesting. Interesting. I think I missed something. Let me let me switch back while that's going. There's actually um who who who gave me that? I think that was Jonah. Thank you, Jonah.

[00:54:59]thanks for helping me with that. Um, yeah, there's actually something I missed and I think it's quite important.

[00:55:15]Okay, check this out. So, I did I not get this?

[00:55:25]Yeah. Okay. Oh, what's next?

[00:55:36]Oh, yeah. Okay. Okay. Okay. There's two things. All right. So, I actually skipped right over the actual blog post for Opus 4.8. So, let's cover that just for a moment. I think the one interesting thing immediately so fast mode which is I love so much where the model can work at two and a half times the speed is now three times cheaper.

[00:55:59]Three times cheaper. So I hope I'm doing the math right. But it was six times more expensive if I remember correctly.

[00:56:07]Which means now it is only two times more expensive.

[00:56:11]That's crazy. for two and a half times the speed, you only pay 2x. That is a direct reflection of their increased supply of compute through their deals with companies like Amazon. Uh obviously their massive partnership with XAI that is fantastic. Um yes, three times cheaper still exane still insanely expensive. So if you're paying uh sorry, let me bring this back. Resize the fill canvas. If you are paying $30 per million output tokens, you are paying effectively $60 per million output tokens. And you know what? That seems worth it to me. That seems worth it to me. Uh, okay. So, we already went over this uh misaligned behavior. We didn't talk about, but I'm going to skip over that.

[00:57:02]Um, maybe I have this cool video, although it's like nothing.

[00:57:08]Um a note on effort.

[00:57:13]Uh Opus 4A defaults to high effort which we judge to be the best overall balance of quality and user experience on coding tasks. This level this effort level spends a similar number of tokens as Opus 47's default but with better performance. Okay.

[00:57:30]Yeah, I'm I'm typically not using max thinking on anything. I'm not solving erdosh problems.

[00:57:37]Why would I need max thinking?

[00:57:40]Um, okay.

[00:57:44]Now, here is here is where this is actually true. So, it seems like mythos is actually coming out in the next few weeks. Not only that, but we plan to release a new class of model with even higher intelligence than Opus.

[00:58:01]My goodness. In the next few weeks, I can just feel the OpenAI team scrambling, scrambling to finish their training and get the their nextG model out the door. As pro as part of Project Glasswing, a small number of organizations are currently using Claude Mythos preview for cyber security work.

[00:58:19]Models at this capability level require stronger cyber safeguards before they can be generally released. I honestly did not think we were going to get Mythos so soon.

[00:58:33]Yes, go competition. That is right. Go competition. Love it. Uh we're making swift progress on developing these safeguards and expect to be able to bring Mythosclass models to all our customers in the coming weeks. My goodness. And shameless plug, if you want to see me review that model, go ahead, subscribe to the channel. I would very much appreciate it. We're we're we're trying to keep up together. I honestly find it very difficult, but we'll do it together. Um, okay. 48 available everywhere. Pricing for regular usage is unchanged. $5 per million input tokens, $25 per million output tokens. And yeah, pricing for the fast mode 10 and 50 respectively. 5.6 today. No, I don't know. We'll see.

[00:59:20]Um, sure. LEGO Geeks, I heard rumblings, Twitter rumors that 5.6 might be released. That's why I threw it in there. We definitely have uh Opus 48.

[00:59:31]Okay.

[00:59:34]Yes, as Brian said, please drop your 48 tests here. We're going to spend a while on this or until I get hungry enough.

[00:59:48]Okay, let me switch back now. Um, we're going to switch back to Claude.

[00:59:55]Let's resize to fill the canvas.

[00:59:58]Let's close that. Let's try that one more time. Boom. Okay. So, here we go.

[01:00:04]Uh ah, you know, I kind of find this annoying sometimes. I just want Claude to go do the thing rather than creating a plan. But let's see. So, deliverables and how it runs. A single self-contained index.html. Yes. Uh we get a preview canvas, a transport bar, and timeline plus plan uh plus panels. Three features: cut, reframe, and titles. Yes, build it. Okay, what else should I be testing?

[01:00:31]Uh, thank you, Brian, for putting some of these uh prompts together.

[01:00:43]Okay, Ryan Refe, single page Space Invaders. Okay, let's add that.

[01:00:54]Single page Space Invaders. Um, let's try build a single HTML page Space Invaders game. Make it look fantastic and add lots of cool functionality. This is about the extent of prompting that I want to do with future models. Oh, you can't see this. Let me uh show you for a sec. There we go. So, build a single HTML Space Invaders game. Make it look fantastic and add lots of cool functionality. We are going to be using Opus 4.8 high on auto mode. Okay.

[01:01:35]And go.

[01:01:38]No mistakes. Make no mistakes. Of course, you have to include that. Yeah.

[01:01:43]Yeah. Yeah. Yeah. Um Okay. While we're here, let's do the car wash test again because uh for those of you who joined the stream after I did it last time, it failed the car wash test on max thinking. I can't I probably spent $10 on that prompt alone and it failed it.

[01:02:04]Um so I need to get a car wash.

[01:02:10]The car wash is 50 ft away. Should I walk or drive there? You can't see it.

[01:02:17]Let me open that up. There we go. Okay, let's see what it says. So, I'm on Opus 48 high within Cloud Code. Let's see.

[01:02:31]Critter Run game. What the hell? How did Mythos go from it can only be used the government and a few companies because it's so dangerous to everyone? Yeah, I guess uh all they needed was I don't know, two months to sort it out. That's crazy. I think it's a preIPO um pumping is probably my guess of what's going on.

[01:02:58]Um Oh, there it is. There it is. Wait, wait. It gave the wrong answer and then explained the logic as if it gave the right answer. Walk.

[01:03:19]It's a car wash. You need the car there to wash it. But 50 ft is closer than most parking spots are to a store entrance. Just drive it the 50 ft into the washbay. What? Okay. Wait.

[01:03:36]Drive. You can't wash a car that isn't there. Okay, I guess it got it right technically.

[01:03:46]Uh, here's the video uh editor. Let's see. Video editing tool. I wonder why it that's a kind of a bug. I I'm here and it popped open somewhere else. Weird.

[01:03:57]Okay, let's see.

[01:04:00]Preview.

[01:04:02]Okay, I don't really have any videos. is let me let me just try to create a quick video that I can upload here.

[01:04:10]I'm just going to open up QuickTime and record my screen.

[01:04:16]Okay, let's record.

[01:04:21]And that's enough.

[01:04:24]Okay, I got a video now. Okay, let's try this first. I am obviously unfocused to say the least right now.

[01:04:33]Uh, neon invaders. Here we go. Press start to play.

[01:04:38]And nothing is happening.

[01:04:41]Nothing is happening.

[01:04:46]Oh, okay. I don't know what happened there.

[01:04:51]Space Invaders. It's still going. So, it it popped it open, but it's still going, it seems.

[01:05:07]Lala, thank you for the super chat. $10.

[01:05:10]Thank you so much. Please ask Claude if the secret to life is potato.

[01:05:16]You got it.

[01:05:20]Here's chat.

[01:05:23]Is the secret to life potato? And we're going to use Opus48 Max. You helped paid for some of these tokens. I appreciate it.

[01:05:38]Ha. No. By the way, this is what I'm talking about with the personality.

[01:05:46]GBT 5.5 isn't doing that, right? If anything, the closest thing to a secret is something boring. Meaning comes from a few things done consistently.

[01:05:56]relationships you invest in, work that feels worth doing, taking care of your body, paying attention to the present instead of always living in the next thing. There's no single trick. Potatoes are great though, underrated.

[01:06:08]There really is a lot of personality coming out of this response here.

[01:06:14]Open preview and browser app sucks.

[01:06:16]Yeah. All right, let's go back to Cloud Code. Let's see the video editing tool.

[01:06:21]Uh, I think it's done.

[01:06:27]and let's choose a video.

[01:06:36]Okay, so this is the video. It did load.

[01:06:41]Here's the playhead.

[01:06:45]All right, it is actually like pretty impressive. I can't move the playhead around, unfortunately.

[01:06:51]Uh uh. Okay. So, press play to go and split.

[01:06:58]No, that's not working.

[01:07:02]Oh, did it work? No, it's not working.

[01:07:04]Split at play head. Okay, the button over there does work. I'll play it again. No, it's only playing the second one. I see. Yeah, the functionality isn't great. There it goes.

[01:07:18]Okay. Yeah, that's pretty cool. It worked decently well.

[01:07:24]Let's see. We could delete segment. It says deleted.

[01:07:29]Um, interesting.

[01:07:38]Okay, so that was that one. Let's go back to the Space Invaders game. That's done.

[01:07:44]Oh, cool. It gave me a nice screenshot.

[01:07:46]Here's what I built. Space Invaders.

[01:07:48]Press start and it does not work. Darn.

[01:07:53]Let's see. I'm clicking start game and nothing is happening.

[01:08:03]GPT 5.5 oneshots that. How many Rs are in strawberry?

[01:08:08]Yeah, I think I started that and then got distracted. Let's see if it did it.

[01:08:18]Oh, here's another thing that I actually uh Okay, I'm going to switch. I did the Rubik's cube. Uh, it's kind of saturated at this point, but it's still fun.

[01:08:30]Um, okay. So, here's the Rubik's Cube 3D simulation. Can you guys see this? No, it's half cut off. One second.

[01:08:43]Let's see if I can get this in the screen. Well, okay. Cube lab. Here we go. Looks good.

[01:08:53]Looks good. Seems to work. Uh, scramble it.

[01:09:00]Yep, there it goes. Okay. And solve it.

[01:09:06]Solve it. Solve it. Solve it. Nope. That does not want to work.

[01:09:12]Dang.

[01:09:13]Oh, there we go. Okay, solving it. I actually had to move the speed uh dial a little bit for it to work. Let's see if it's actually able to solve it. Sure does look like it's getting there. Yep.

[01:09:23]Oh, and we have a little bit of confetti. Very cool. So, a little bit of issue. I wouldn't say this is the cleanest implementation, but it seems to work pretty well. Um, now make a 20x 20 cube available.

[01:09:40]actually now allow me to set the number of uh squares on each side of the cube slot.

[01:09:55]Okay. Um Okay, so there's that. Let's switch back to Space Invaders. Let's see if we got that one working.

[01:10:10]All right, just in time. Seems to be fixed.

[01:10:16]Here we go. Start game. Nope, that's not working.

[01:10:24]Uh, I'm going to answer this question.

[01:10:26]James Lum, why are you not using max when you're trying to push the model to its limits? I don't understand. um for the task that we're giving it, especially like Space Invaders, the Rubik's Cube simulation, it does not require max. The intelligence improvement between high and max like that is for like frontier math problems. It's not for what we're doing.

[01:11:10]Yeah, but this isn't working still. Uh, hidden overlays. Nope, it's still not working. So, still not working.

[01:11:28]All right, let's keep testing. I hit the limits for Opus 4.8 in one prompt.

[01:11:34]Clubhouse 1661. What was your prompt?

[01:11:39]Try the same prompts with codeex. The vibes maybe. I don't know. I don't want to switch back and forth. The vibes are not vibing with Opus 4.8.

[01:12:02]Okay, I'm going to share something that Jonah just shared with me. Um, this is something that he just created with Opus 4.8.

[01:12:17]All right. This is steel talent sky assault. I know nothing about this.

[01:12:22]Press.

[01:12:26]I mean, this is sick. Yeah.

[01:12:34]Oh, man. This is uh bringing back my childhood here. This is fantastic.

[01:12:44]Oh, this is so cool.

[01:12:49]Oh, there's so many little powerups here.

[01:12:55]Yeah. You know what? Um Joe or Brian, can you share this in chat? The URL for this. So, if you guys want to play this, we're going to share it in chat right now.

[01:13:08]Actually, here. Let me I'm going to share it. I'll share it myself. I got it.

[01:13:13]Here we go. If you guys want to try that out, enjoy. Shout out to here. Now, uh they do sponsor. This is not a sponsorship. It just happened to be the easiest way to share this with you all.

[01:13:31]All right, I guess I could just sit here playing this for the rest of the stream.

[01:13:34]Okay, this is awesome. Yeah, well done.

[01:13:36]Well done.

[01:13:38]Um, okay. Let's go back to Claude.

[01:13:53]Okay, this is still going. Let's see if we have anything else going.

[01:14:07]We got the video editing tool. Okay, let me let me find some more uh things we should test.

[01:14:21]Oh yeah, here we go. AI programming zero. This is a great one. I like this.

[01:14:26]Okay, we're going to try this. This uh let's see. New session. Can y'all see this? No, it's below the fold. Let me adjust this a little bit. Here we go.

[01:14:37]Build a fully interactive Artemis moon landing simulator using HTML, CSS, and JavaScript with realistic physics, fuel constraints, and a guidance UI system.

[01:14:52]All right, let's see it. I mean, this is assuming we landed on the moon. Am I right, guys?

[01:15:04]Yeah. 4.8 is out. Yes. Yes. Yes. Yes. It is out.

[01:15:26]Moon Faker is the game.

[01:15:29]Tell me about it. Uh, okay. While this is going, let's see. Let's see.

[01:15:51]Okay. Yeah, we'll just keep testing. I'm going to throw some more at it. Uh, did you need to add credits? I haven't added credits yet. Uh, Lego Geeks. I can't find the model anywhere.

[01:16:02]Assuming you're on the right subscription tier, it will show up. Uh, just give it time.

[01:16:10]Is this Opus 4.8? Yes, algorithms. Yes, it is Opus 4.8. Yes.

[01:16:20]Uh, don't sleep on the new codeex version release that just dropped, though. No, of course. Open AI can't be left out of the fun. They cannot.

[01:16:41]Okay. Um yeah, let's let's keep testing. Keep throwing uh your prompts in chat and I will keep testing them.

[01:16:55]Throw your prompts in chat. Brian is helping out. He'll uh grab them. We'll put them all together. We're currently working on build a fully interactive Artemis moonlanding simulator. So, that's going to be cool. Um, anything else, Brian, that you're seeing in there? I see some in the dock that you sent to me, but not many.

[01:17:16]Okay. Okay. Let's try this. Um, I mean it I cannot imagine it's going to get this wrong, but we shall see.

[01:17:26]Uh, what days of the week have the letter D.

[01:17:31]Let's see.

[01:17:34]Yeah, algorithms. Yes, I we are doing some tests. Uh, they're looking good. I honestly like it it's hard to figure out tests that can really show a distinction between these frontier models at this point. Um they're they're all just that good.

[01:17:52]They're all just that good now. Uh and and like I'm not going to be able to come up with an airdash problem.

[01:18:00]So I don't Yeah, it's it's hard to test them. So it's all about vibes during coding and that takes a lot of testing to really understand.

[01:18:09]Have you used the dynamic workflow mode?

[01:18:11]I have not. Let's try to get that working. Um, oh yeah. Okay. Spectre o zero zero. I don't know something.

[01:18:25]Spectre. Okay. This is a great one.

[01:18:28]Okay. So, this one's still going. We're going to pop open a new session.

[01:18:33]And let me make sure y'all can see this.

[01:18:37]Make a bl Make a Blenderl like tool featuring core Blender features such as editing, sculping, sculpting, sculpting. I don't know.

[01:18:51]Uh materials, etc. Okay.

[01:18:55]What was that I just saw? Oh, yeah. Here we go. Needs input. Uh verified working.

[01:19:00]The game launches and renders correctly.

[01:19:02]Two things to know. I mean, they took a screenshot of the actual game, so it must be there.

[01:19:12]Uh, let's see. Where's the link? Space Invaders.html H.

[01:19:36]Grim Satan. Are we really expecting for a model to just build an entire working tool in one go? Yeah, why not? Let's have high expectations for these models.

[01:19:48]All right.

[01:19:52]Okay, it does work. Let me share my screen so you can see that.

[01:19:58]Um, here it is. Here's that Space Invaders game.

[01:20:11]Uh, pretty basic.

[01:20:16]Very basic. Okay, we're going to take this to the next level. Let's see what these dropping S and Rs do. Merely little. Yeah. Spreadshot. Rapid fire.

[01:20:23]There we go. Oh, I see. I have to get rid of these guys first. Okay. So, we're going to make that a little bit more complicated.

[01:20:41]Okay, the workflow the workflow thing blog says oh shoot uh ask claude to create a dynamic workflow directly.

[01:20:48]Okay, so I just have to ask it. So let's test that out. And this is I don't know like I have to give it an existing database then um maybe let's see you know what I'm going to do let's see so we're on cloud code new sessions I'm in journey let's open up a new folder I'm going to give it um the homepage of my website and so we're going to use Opus 4.8 high.

[01:21:23]We're going to use auto mode and we're going to say redesign the website using the same brand colors as we have now. Um, okay. So, I'm going to show you the website just so you can see what it looks like right now. But, uh, oh yeah, and use dynamic workflow.

[01:21:48]Oh. Ooh, that's cool. Can you all see this? Yeah. Look, as soon as I typed in workflow, it highlighted it and has this cool shimmer effect.

[01:21:58]Very cool. Okay, go. Let's see what that looks like. This is dynamic workflow.

[01:22:06]I'll start by understanding the current state and your vision. Before diving in, let me invoke the brainstorming skill.

[01:22:12]Okay. I don't think that's one that I've given it. Super. Oh, it is superpowers.

[01:22:17]Okay. I forgot I installed this. Um, I can't tell. Is dynamic workflow actually working right now? Can anybody see this? Did I do something wrong?

[01:22:39]Um, okay. Here we go. Oh, yeah. No, it didn't. What do you mean by use dynamic workflow? What? It needs to be on extra high. Okay. Okay. Okay, we're going to try it again then.

[01:22:52]Let's try this again.

[01:22:54]New session, extra high. We're on auto. Use dynamic workflow. It's interesting because it should have told me it's not going to use that. It did highlight and have this shimmer effect on it, but it did not actually tell me that by only using the high level of thinking, it wasn't actually going to be using the dynamic workflow.

[01:23:16]Okay, here we go. Let's try it again.

[01:23:22]Oh, I'm just feeling my quota being burned up right now. It's still using the skill superpowers brainstorming.

[01:23:30]That's fine.

[01:23:34]All right. And while this is going, I am quickly going to be right back.

[01:24:58]All right, I'm back. Okay, let me just make sure OBS is still recording. Yep, we're still good to go.

[01:25:08]Uh, okay.

[01:25:10]What do you mean dynamic workflow? Yeah, I don't know. What am I doing wrong here?

[01:25:17]What What am I doing wrong? Why isn't dynamic workflow working? Um, let's go back to Let's go back. Let's go back to the blog post and see uh Project Glass Wing. Project Glass Wing introducing Here we go. Um, oh, you can Oh, maybe I don't even have a max plan anymore. Could that be it?

[01:25:56]Let me check. Let me make sure cuz there is a distinct possibility I canceled it after they pissed me off so many times.

[01:26:08]No, I I still have the max plan. And this is very interesting. Very interesting. Okay, they definitely got more friendly with their usage limits.

[01:26:20]So, from everything I've done so far, I'm going to I'm going to share this with you. Um, here's how much I've been using.

[01:26:29]Okay, look at this. So, I've done a bunch already.

[01:26:38]Maybe not that much, but I've done a decent amount of prompting already, and I've used only 8% of this 5 hour window. I've only used 2% of total, and it resets tomorrow. Oh, we got to we got to use it, guys. We got to use it.

[01:26:58]Here it is.

[01:27:01]Did they just reset the limits?

[01:27:08]All right. Um, switch back to Claude. I cannot get it to work. I cannot get it to use dynamic um, what is it called?

[01:27:17]Dynamic something. All right, let's switch back. Dynamic workflows.

[01:27:23]Why is it not working?

[01:27:29]Turn on auto mode when using dynamic workflows. You have two ways to start a workflow. Create a workflow.

[01:27:36]Switch on the switch on a new cloud code specific setting called ultra code accessible through the effort menu. Okay, let's go test that. Let's go test that. Um, let's see. Let's see. Why isn't this working? Extra.

[01:27:59]I I don't see it.

[01:28:02]Am I missing it? Maybe it has to be a million context.

[01:28:07]I doubt it.

[01:28:10]I doubt it. No, I'm going to leave it on million. Um, why am I not seeing this? Let's try it again. So, use Yeah, create a workflow for Let's try it again. I mean, the word workflow is getting highlighted.

[01:28:30]No. Okay. It's just not wanting to do this. Okay. Let's try I'm going to stop it for a second. Let's try it again.

[01:28:40]New session. I feel a lot better about token usage now, though.

[01:28:47]Uh redesign the website using the same brand colors as we have now.

[01:28:53]Create a workflow to do so.

[01:29:02]from CLI. Oh, I have to do it from the CLI.

[01:29:06]Well, that's silly. I shouldn't have to do that.

[01:29:15]I don't even know if I have CLI installed right now, nor do I want to. I I like the interface.

[01:29:25]I guess that's the only way to do it.

[01:29:27]Okay.

[01:29:29]All right. I guess I'm doing it then. Um I probably do have it installed.

[01:29:44]It's been so long since I've used cloud code like this. Um slashmodel And we're going to be using uh Huh.

[01:30:08]Give me one moment. U I'm just trying to figure this out. CLI leaks email. Be careful. Okay.

[01:30:31]Which model is this? This is 4.8. We're still working through it. We're trying to get it uh trying to get it going.

[01:30:40]By the way, uh like the stream. Very much appreciate it. Helps get into more people's hands. would very much appreciate you liking the stream. Thank you.

[01:30:50]That's what Theo is always screaming about. What is he not screaming about, though?

[01:30:56]Uh what are we building? We're trying to build uh I I you know what? Okay, I'm going to just let this go. So, redesign my website uh using the same brand colors we have now. So, that's what we're going to do.

[01:31:08]I'm just going to let cloud code go and then in parallel I'm going to also try to get this uh cloud code CLI working so that we can actually test dynamic uh workflows.

[01:31:27]So let's update to the latest. Okay, Claude, let's see if it defaults to the new 48.

[01:31:36]Yes, I trust this folder. Yes, there we go. Okay, let's switch over. We're going to try to get dynamic workflows working.

[01:31:52]There we go.

[01:31:55]Opus 4.8 is here. Um, so slasheffort x high like that, I think. Nope, just kidding. Slasheffort X high. Why am I doing this wrong?

[01:32:13]Oh, defaults. Oh, I uh Okay, let let's try to do this. Uh redesign my website. Oh, I have to God. Sorry. I'm I'm like rusty as heck with terminal.

[01:32:37]slasheffort ultra code. Oh, that's so sick.

[01:32:43]Oh, that's so cool. Look at this. Okay, Max has a nice rainbow then. Bam. Ultra code. Extra high plus workflows. There it is. Oh my god, they went hard on the rainbow. Let's go. Uh, very nice. Very nice. It is. It is Opus 4.8. It is. Um, okay.

[01:33:04]Oh, yeah. No, I have to switch to switch to my FF I think it's called FF homepage. Give me one sec. Uh, switch to my HP folder on my desktop.

[01:33:29]What?

[01:33:31]Invalid API credentials.

[01:33:34]Oh my goodness. All right.

[01:33:37]Why is that not working? I have to run slashlo. Okay. I'm going to switch off this for a second. Try to get this working, guys.

[01:33:50]And let me just kick this back off.

[01:33:52]Someone some we're working on might be easier to explain. If I could show you in a web browser, I can put together mock. Yeah, just build it. Just build it. Stop asking.

[01:34:08]CLI is superior form for cloud code.

[01:34:12]Come on. No, I disagree. I'm an interface person. I like interfaces.

[01:34:18]Um, okay.

[01:34:31]Uh yeah, I'll share. Are you talking about the uh the one that Jonah created?

[01:34:36]Hi collector. Yeah, here it is. I'll drop it right here. There it is again.

[01:34:43]It's actually It's actually pretty good.

[01:34:47]Pretty darn good.

[01:34:57]Surprised you haven't hit your limits, Justin. Not even close, actually. So, I think there's a lot to be said there.

[01:35:05]Not hitting the limits. Um, yeah, not even close yet.

[01:35:10]I'm doing $200 a month plan. Uh, they recently doubled, I believe, the quotas.

[01:35:26]All right, I'm going to try to get the CLI working. slashlo cloud account with subscription.

[01:35:36]Authorize.

[01:35:37]I'm doing this in the background.

[01:35:40]All right, you're all set. Login successful. Escape to continue. Let's switch back once again. Let's try to get dynamic workflow working. Let's see if we can burn through all of my 5 hour window quota. Let's see if we can do that.

[01:35:58]Okay, so here we go. Um, switch to my HP folder on my desktop.

[01:36:08]There we go. Now it's working.

[01:36:12]Uh, okay.

[01:36:17]Let's see how many tokens it's it takes to do a CD.

[01:36:22]I'm just I'm just wasting tokens at this point. Okay, it did it pretty good.

[01:36:27]Okay. Uh redesign the homepage using the same brand colors as you have now.

[01:36:35]Create a separate work tree since I have another set of agents doing the same thing elsewhere. Um 15 seconds to change directory.

[01:36:52]It's peak laziness. Peak laziness. Um okay. So this should work. And then we I think it automatically uh does a workflow, but I'll say create a workflow. Oh yeah, and it does that here. So it definitely identifies the word workflow. Let's go. Let's burn those tokens.

[01:37:13]I'm super excited. Oh yeah. Look at them go.

[01:37:17]But I want to see I need to see the multiple agents running in parallel. I feel like we haven't seen this.

[01:37:25]Check your effort level. Did it reset back? Is that why?

[01:37:30]Slasheffort.

[01:37:33]Ah gosh, I am struggling to get this going. Effort. Yeah, it did switch back.

[01:37:38]All right, we're back to ultra code.

[01:37:40]Let's try it again. Bam. Let's try it again. See you.

[01:37:46]Yeah.

[01:37:50]Yeah. Ultra code is insane but burns a lot. All right. We are finally I think we are finally in ultraode mode.

[01:37:58]Ultra code means dynamic workflows means you know potentially dozens up to a hundred agents I I believe it said running in parallel at the same time.

[01:38:11]Okay.

[01:38:13]I still don't I mean it's definitely ultra code. We can see it right there.

[01:38:18]But I do not see uh still thinking with extra high effort.

[01:38:30]Run workflows.

[01:38:42]run workflows to uh view your runs.

[01:38:46]Well, I don't know.

[01:38:50]Here we go. Uh, okay.

[01:38:56]Yes. And don't ask again. It's going to ask me everything, isn't it?

[01:39:06]Oh my goodness. All right. Jonah just shared some news.

[01:39:11]Wow.

[01:39:13]All right. They're just dropping everything today. This is uh I'm going to switch over to this. We got new news.

[01:39:21]My goodness.

[01:39:24]My goodness. Breaking breaking right now.

[01:39:31]Wow. Okay.

[01:39:33]We've raised $65 billion in series H funding at a nearly trillion $965 billion post money valuation led by Alimter Capital, Dragon, Green Oaks, and Seoia.

[01:39:53]My goodness.

[01:39:55]My goodness. This investment will help us advance our research and expand our capacity to meet growing demand for Claude. Let's read about it.

[01:40:09]My goodness.

[01:40:13]Another $65 billion.

[01:40:16]They had their series G in February.

[01:40:19]Adoption has continued to grow across global enterprise customers.

[01:40:24]Wow.

[01:40:26]run rate revenue crossed $47 billion earlier this month. $47 billion earlier this month.

[01:40:40]Sam Alman is throwing things around his office right now.

[01:40:47]My goodness.

[01:40:49]All right. This round was co-led by Capital Group, CO2, D1, uh, GIC, Iconic, and XN.

[01:40:58]Interesting. They say it here.

[01:41:02]Oh, here we go. Led by Altimter Capital, that is Brad Gersonner.

[01:41:07]Um, wow. Significant investors in this round include AMP PBC, Bailey, Gford, Blackstone, Brookfield, Dehaw Ventures, DST, Fidelity, General Catalyst, Insight Partners, Jean Street, Lightseed, MGX, everybody. Wow, huge news.

[01:41:38]It also includes $15 uh $15. It also includes 15 billion of previously committed investments from hyperscalers including 5 billion from Amazon. Joining them are strategic infrastructure partners Micron, Samsung and SK Highix.

[01:41:56]Wow. So it's all so uh insular. It it's it's just like one company investing in another, buying from another, owning another, and it's just this crazy spiderweb of ownership and debt and oh my god, crazy. This is crazy.

[01:42:24]This is a huge amount of money.

[01:42:26]65 billion, right? 65 billion plus. It just continues the absolute incestuous nature of the AI industry and ultimately Nvidia wins everything. I mean that's really like all this money is almost entirely going to Nvidia.

[01:42:46]Um yeah they talk about SpaceX for access to GPU capacity classes one and classes two.

[01:42:55]Uh, cloud is the first frontier model available on all three of the world's largest cloud p cloud platforms. Amazon web services, Google cloud and Microsoft Azure. AWS remains our primary cloud provider and training partner.

[01:43:11]And we have Brad Gersonner quote basically a I don't know bunch of nothing to be honest. Um, wow.

[01:43:21]Okay. Okay. Let's go back to testing now.

[01:43:31]So it is redesigning the homepage here.

[01:43:34]Um I don't know why.

[01:43:39]Yeah. Okay. So in the very brief testing that I did last night, I did have access to Opus 48. I didn't know it was Opus 48. Oh, why does it keep switching around? Um Oh, I see. It's just uh yeah, this is so this is our website forwardfuture.ai. Go check it out. By the way, we're investing a ton into our newsletter specifically. We give you all the awesome latest guides to help you learn about artificial intelligence. We have our newsletter. We post our interviews. I'm literally doing nothing right now. Uh Clad Code is just clicking around, but really highly recommend going to forwardfuture.ai um and subscribe to our newsletter. We are now developing our own original content, original articles and essays from some of the best uh including um like Dave Shapiro who just wrote a great piece about anthropic. So go check that out. Um meanwhile all the Yeah. Uh meanwhile all the tests during stream have failed. No, I just get super distracted easily.

[01:44:46]Let's go back. Um, okay. So, as of the testing I was doing last night, and obviously what we're seeing here, uh, the new Opus 4.8 has an affinity for rounding edges, rounding everything on my actual website.

[01:45:04]There is no rounding at all. Everything is very square. This is kind of what it looks like, minus a subscribe button.

[01:45:10]Everything's very square, but it has this strong desire to just round all the edges. Now, it's still working, but yeah, that is that is what it does.

[01:45:33]That is what it does. Um, just making sure everything's still working. Yep.

[01:45:39]Okay, then now let's switch back.

[01:45:43]Uh, do you want to proceed? Yes. God, Cloud Code CLI asks me for permission to do absolutely everything. It's very frustrating.

[01:45:55]Um it and then if I always use cloud design. Yeah, I am actually a huge fan of cloud design.

[01:46:14]Although five slides is your whole weekly quota, but other than that it's actually really good.

[01:46:32]dangerously skip permissions. Yeah, but then I have to end this task and start it over, correct?

[01:46:38]I assume.

[01:46:43]Let's go back to iTerm so I can show you that. Resize the fill canvas.

[01:46:52]Okay. Do I think I have to restart it if I if I I mean it hasn't done much. So, okay. I'm going to I I have to restart it, right?

[01:47:01]You need to switch to dangerous uh using shift plus tab.

[01:47:11]I don't am I not doing this right?

[01:47:14]Control E to explain.

[01:47:18]Can save session. So, you can just see I'm not familiar with cloud code CLI. I just don't use it.

[01:47:25]Uh, by the way, um, okay, cl Yeah. Uh, okay. I'm just going to I'm just going to end it just because I I just don't know.

[01:47:36]Um, escape to close. Okay.

[01:47:40]Uh, keep work tree. Let's close out of here. We're going to do Clyde with uh dangerously skip permissions.

[01:47:51]Continue.

[01:47:54]Yes, I trust this folder. Yes, I accept.

[01:47:58]Yeah, we're going all in, baby. Uh, let's change the effort. We're going back to Ultra Code, but I just can't get over how cool of an animation this is.

[01:48:07]Uh, we're switching to that and we're going to say continue.

[01:48:12]All right, let's see. Let's see. Do you have a Discord server, LEGO Geeks? Yes, we do. Thank you for asking. And Brian, producer Brian has been putting a lot of time and effort and love into developing and making sure that the Discord community is fun and safe and not full of spam, which I myself could not prevent when we last tried to do it. But he is doing a fantastic job. Brian, if you can drop the link to Discord, that would be awesome. Yes, go check out our Discord. Go join. I'm in there. Brian's in there. Great conversations to be had about all Frontier AI things.

[01:49:00]Okay, so we're still testing. We are testing Opus 4.8.

[01:49:04]I'm trying with all of my effort to try to get dynamic workflows working.

[01:49:18]I I just I don't know.

[01:49:22]Oh, what is the keyboard? The pop is out of this world. I have a new keyboard and I'm absolutely in love with it. Let's see. This is the Nufi Air75, and I love the pop it makes. I absolutely love it. Oh, yes.

[01:49:46]I'm all over the place, guys. Thank you for keeping me on track, Matt. Open Artemis Moonlanding Project. Yes, good call. Let's go do that. Um, okay. Let's see.

[01:50:01]Give me one second, please.

[01:50:12]What is ultra code? Ultra code is the setting that turns on dynamic workflows and it puts it on high mode, high thinking mode. Okay.

[01:50:24]Okay. So, back to here we are. Here we are. Resize the fill canvas.

[01:50:38]Here we go.

[01:50:40]Uh, I think I need to adjust this a little bit.

[01:50:48]Okay, so here it is. The Artemis moon landing simulator begin descent. Let's see what happens.

[01:51:00]main engine with W. Oh, there we go.

[01:51:02]Hold on. Here we go.

[01:51:07]But it's not moving.

[01:51:17]What am I doing wrong here? Uh, let's restart it now. Vertical speed. It says it's going, but like I can't see it actually moving.

[01:51:29]Yeah, I mean, if you look at the vertical speed, it's negative meters/s, so it must be going, but I don't actually see it moving. Turn around and go down. I don't think I can, Greg.

[01:51:39]Yeah, I can't turn around more than this. You can only go horizontal. Oh, what is this? Whoa. Hey.

[01:51:46]Whoa.

[01:51:48]Uhoh. Uhoh.

[01:51:52]Yeah, I wouldn't say that this is perfect. Here I'm going down. So, I see negative vertical speed, but am I actually going down? Yeah, I guess I am. Yeah, there we go. Let's see. Landing. Boink. Lost a vehicle.

[01:52:15]All right. Um, let's switch back. Let's switch back. Let's see if we can actually get dynamic workflow working. Oh. Oh, I think we got it. I think we got it, guys.

[01:52:32]This sure looks like dynamic workflow, doesn't it?

[01:52:36]Yes. Yes. Yes. Very cool. All right, let's feel those tur tokens burn.

[01:52:46]Um there is very little indication that things are actually going okay I can actually that's not true. Okay here we go so understand we're on the understand phase mapping the structure mapping the design and mapping the content all using opus 4.8 We are seeing tons of tokens being burned. We are seeing tool calls happening right now. So this is working.

[01:53:11]This is working.

[01:53:33]It's working. It's very slow. I wish I could turn on fast mode. I mean, maybe I can. I just didn't.

[01:53:40]Let's say fast mode plus dynamic plus max thinking.

[01:53:51]Oh, Simon. Yeah, Simon Makavoy 88. I got to run 4.8 with paperclip. I actually haven't tested paperclipip. I'm actually I'm thinking about making a video about it though. It does seem really cool. I'm not a big believer in zero human companies or one human companies, but you know, anything's possible. It's a fun experiment.

[01:54:19]All right. So, there we go. It mapped the structure.

[01:54:26]Uh, Maxwell, did he go over max verse ultra code yet? I think ultra code is a setting that puts it on high mode, high thinking plus dynamic workflows. I think that's what the difference is there. And then max is just a higher uh level of of thinking.

[01:54:44]Check usage, guys. The usage is not much. I'm going to show you The usage is not much. After everything we've done, I have only used 11% of my total usage for this 5 hour window. 2% of my total quota for the week. There is a significant possibility that Anthropic has fixed its quota, right? I I would I would look at Claude code and run out of quota. Now we're running multiple agents in parallel and it's not using much at all. It is so nice to see. Not bad. Yeah, go Gabby. Not bad. Absolutely not bad.

[01:55:43]In fact, I'm very happy with this. Let me refresh the page just in case.

[01:55:52]Interesting. Not Not bad at all. Not bad at all. How many agents are you running in parallel? I think it's only running three. Let's switch back and check.

[01:56:01]Let's switch back and check.

[01:56:08]Okay, here we go.

[01:56:10]Very cool. So, the concept is done. Now it's sorry the understanding is done.

[01:56:18]Now it's switching to concept. This is running four agents in parallel. For understanding we were only running three. In total it took just under 3 minutes.

[01:56:29]Used 85,000 tokens for the first agent to map the structure. 70,000 tokens to map the design and 87,000 tokens to map the content. Now we are on concept.

[01:56:46]Okay, this is really cool. This is really cool. It's not anything I haven't seen before, right? I mean, if you use cursor with sub aents, cloud code has a sub agent, I believe. Um, but like having multiple agents run in parallel is not new.

[01:57:12]I believe there's a limit on on cursor.

[01:57:16]I think it's four agents.

[01:57:20]Um Dave Wilson, thank you for the $10, Dave. Just saw this. IBM and OSS project lightwell establishes a trusted enterprise clearing house for open source software with a new AIdriven model for securing the uh software supply chain. I don't know what that means, Dave, but thank you for the $10. I appreciate it.

[01:57:44]I am on a $200 plan.

[01:57:47]Um, Tim Tim, I am on a $200 plan with Anthropic. I pay that out of pocket.

[01:57:54]Cursor gives me tokens, but I Anthropic does not. They don't they don't give me any tokens. They don't sponsor my channel. Uh nothing like that. Neither does OpenAI.

[01:58:12]Codeex is six limit I think. Okay.

[01:58:16]D'Angelo, how is uh how all Hey all, how is the model looking so far? Um great.

[01:58:25]Yeah, I mean it's a great model and I think the few things that are impressive is first of all its token usage or at least the quota increase that we've seen since they've come out of their uh supply constraint of GPUs.

[01:58:40]So that's quite nice. Uh but the model Yeah, the model's great. Um here we go. Okay. Okay, now it's switched.

[01:58:50]Now it's switched to the judge.

[01:58:53]And the way that Anthropic described how this works is it's uh delegating out the tasks to sub agents. The sub agents will actually be um discussing with each other what the best solution is. And then finally we have the judge taking all of the solutions from the different agents and deciding and synthesizing and giving you the actual solution. Okay. So next it should implement then verify then fix. We're going to stick around and uh watch it to completion. So hopefully it doesn't take too long. So far in total, it's been 7 and 1/2 minutes.

[01:59:32]7 and 1/2 minutes.

[01:59:52]Okay. Yeah. So, for those of you who are just joining, we are testing the new Opus 4.8 in Cloud Code CLI using their brand new dynamic workflow feature. And that allows a single agent to delegate out to up to 100 agents, which is kind of crazy. So, that's what we're watching right now. It is working.

[02:00:18]I'm going to move myself out of the way.

[02:00:20]So, you can see it's currently idle, which is kind of interesting. There it goes. Okay.

[02:00:31]What is this tool? Clog code. That is what we're using right now.

[02:00:40]Marvin Colani, where is the deep suite benchmark? Agreed. Agreed. Deep Sweet is going to have to come out with something very soon.

[02:01:12]Um, here's a new benchmark while we're waiting for this to finish.

[02:01:19]New benchmark testing Claude Opus uh 4.8.

[02:01:28]Here we are.

[02:01:34]Okay. So, this is by a company called Proximal Advanced Coding Intelligence.

[02:01:40]We evaluated Cloud Opus 4.8 on Frontier Suite ahead of today's release. It is now the best performing model on Frontier Suite. Okay, so this is called Frontier Suite, which I'm actually not super familiar with.

[02:02:01]Uh yeah, here we go. They have a dominance index. That's a funny name. Uh Claude Opus 4.8 number one with GPT 5.5 number two, Opus 47 number three, Opus 46 number four, and we have a score of 83% on this new benchmark.

[02:02:55]All right, I'm going to record the sponsor transition for this video.

[02:03:01]Um, I'm going to switch back to this.

[02:03:03]We're going to keep watching this. I I am going to record the sponsor transition for this video. Uh, by the way, if you guys want to help me come up with the transition, drop what you think it should be in chat. So, that is like, you know, hey, we're talking about this thing and by the way, you know, some natural nice segue into the sponsor. Let me know what you think it should be uh in chat.

[02:03:30]You guys keep asking about this keyboard. Uh, it's the Nufi Nu PHY Air 75.

[02:03:38]And uh, this this little uh, knob thing here I added afterwards. But I just love it. I love how it sounds.

[02:03:47]The build quality is fantastic.

[02:03:50]Turn the background lights on. They are on. Quiet on the set, everyone. Uh, Okay. Okay. So, actually it's switched over to the implement. You know, it could it could um it's it's not using as many agents as I thought it would. Um Okay. I'm going to record the transition.

[02:04:21]One moment, please.

[02:04:31]And by the way, paying for all of these tokens that I'm burning during these tests is not cheap. And so that is And so we have help from today's sponsor to pay for all of them.

[02:04:46]And so let me tell you about today's sponsor to help pay for some of those tokens.

[02:04:56]Hey, Moist Moments. Longtime listener, first- time watcher. Welcome. Awesome.

[02:05:04]I've got Yeah, I like that. Dope shorts.

[02:05:08]I've got bills to pay. I've got bills to pay. And today's sponsor is No, that's two too on the nose. Two on the nose.

[02:05:16]Um, use Opus 4.8 to come up with the transition. Oh, that's the one. Wait, where did you go?

[02:05:27]Here we go. William Charles, that's what I'll do. That is what I shall do.

[02:05:40]Let's do this together.

[02:05:58]Okay. Um, I need uh I'm making a YouTube video right now and need a sponsor.

[02:06:07]Oh my god. Sponsor transition from a video about Opus 4.8 8 release.

[02:06:15]Um, help me come up with it. And of course, we're using Opus 4.8 Max for this.

[02:06:27]Let's see. Let's see what it comes up with.

[02:06:33]Thinking. thinking.

[02:06:43]Nope. That's not what I wanted.

[02:07:00]Unknown sponsor.

[02:07:07]It should be a corny segue. Yeah, I mean I enjoy coiny segways.

[02:07:20]Should have got Segue to sponsor. Oh my god, that would be perfect.

[02:07:30]All right. I I know it's going to be so lame what Opus came up with. Thematic bridge. Look, Opus for uh 8 is a beast, but a model only uh but a model this capable is only as good as the tools you wire into it, which is exactly why I want to talk about today's sponsor. That is assuming Opus 48 works with the sponsor. That's not a good one. Um okay. Yeah, let's do a clean break. I like this one.

[02:07:59]So, I'm going to talk about the benchmarks. Maybe I haven't talked about it yet in the video. Let's see. Probably actually. So, I'm going to talk about the benchmarks, but before I do that, let me tell you about the sponsor that made this video possible. All right, I think that's good. We got a couple versions out of that.

[02:08:20]And then I'm going to do an outro. I know I've been talking a lot about Let's see. And yes, Opus48 is fantastic, but a lot of what Anthropic has been doing as a company hasn't been so fantastic. And I made a video about it here. That is all right. Let me upload this everybody and then we'll get back to the testing.

[02:08:59]Yeah, by the way, I feel like uh Nui, the keyboard company, should sponsor me at this point cuz I do. I re I really actually love this keyboard.

[02:09:18]Click. One, two, three, four, five. Um, I do love this keyboard. I I bought three of them.

[02:09:29]Thank you. Uh, thanks, Gabby.

[02:09:32]Um, I Okay, again, they're not sponsoring me. I just want to talk about this keyboard because I love it. And I bought three different keyboards about a month ago just to test them out. And this is the one that stood out the most.

[02:09:45]It has the most customizations. The build quality is fantastic. That like thckiness, the kind of clickiness of it sounds great to me. Um, and uh, yeah, just a big fan. So, Newfi, how much is the keyboard? I think it was $125. I could be wrong. Somewhere in the $100 range. Um, it's definitely, you know, a high-end keyboard. Um, lowprofile mechanical. Very, very good. Newfi.

[02:10:10]Yeah, Nu PHY. Melty. Melty 1000. Dope brand. You should reach out to them. I bet they would respond. Cool. I I don't know them, but I I love the keyboards.

[02:10:21]Which model is that again? This is the Air75, I believe. I'm going to type it in chat right now. Air75.

[02:10:28]Nufi Air75. I believe that's the one.

[02:10:41]Take the All right. Um let's go back to iTerm. Let's go back and see how this is doing. It is still implementing so far use. It's going it's been going for almost 19 minutes.

[02:10:59]140,000 tokens on the implement redesign stage alone. Oh, it just finished. It seems like it just finished.

[02:11:09]Now it's verifying. Okay, I'm going to switch to that so we can see what the verify is. Here we go. It kicked off four separate agents to do the verification of the redesign of the homepage. We have type check, brand design, and functionality testing.

[02:11:31]Um, maybe I should have dropped an Amazon referral link for this keyboard, but no, it honestly like highly recommended. Trying to remember the other ones that I bought. Um because I got three of them all at the same time. What a uh there was a really unique one that I I I I enjoyed a lot.

[02:11:51]Um but I didn't end up going with because it just wasn't as customizable.

[02:11:58]Um let me try to find it.

[02:12:04]Okay, I got that one and that one. I will show you what I purchased once I make sure there's no information I don't want to show on here. And there is.

[02:12:15]Okay. I got the Keyron K1.

[02:12:24]I tested that and I did not love it. And I got the Lowree Flow 2.

[02:12:34]Those are the two keyboards that I tested. And I'll I'll show you those as we're going here.

[02:12:40]Um, so this is one I actually really really liked this keyboard.

[02:12:53]Um, out of all of them, it had the most premium quality or premium kind of production build build quality. Um, it had see-through key caps which like did not work well. Some of them were brighter than others. And then if you can see right here, if you can see right here on the side, this was a touch like slidebar and it was basically useless.

[02:13:21]And so I although this had the highest quality build and the best sound, I just didn't I didn't go for it. Um, yeah. So, those are that's that's the one I got. And then I I think uh Keyron K2 was the other one.

[02:13:47]This one I got the bigger version, but this is basically it. And I just didn't love it. Just didn't love it. This one was my least favorite out of the three.

[02:14:01]Okay. All right. Let's go back. Let's finish finish what we're working on. I'm going to try not to get too distracted going forward.

[02:14:11]We're almost done. We are 22 minutes into this build. We are using Opus 4.8.

[02:14:18]We are using dynamic workflows.

[02:14:22]It has spun up 13 agents in total. We can see that at the top here.

[02:14:55]All right. So, it's still verifying.

[02:14:56]It's at three out of four. So, the first three agents running in parallel are done. I would love to see a way to get, you know, 100 different agents running at the same time, which uh Anthropic said is very possible. I just haven't seen it. Um the Swiss Mint, I just joined 10 minutes ago. What is it building? We are redesigning my website, forwardfuture.ai.

[02:15:20]I'll drop that in the chat, by the way.

[02:15:21]And if you're not already subscribed to the newsletter, please go do that. Uh I had mentioned earlier, we are investing a lot into original content now, original articles and essays and research and analysis. So go subscribe to our newsletter. I am very excited to show you what's what's going to happen with it. And we're already putting out awesome original content like from friend of the channel Dave Shapiro who just wrote a great piece about anthropic.

[02:15:57]Okay. So there now it's on the fix stage last stage. Then we're going to see what it looks like. And in the meantime, let me show you what the website looks like now.

[02:16:18]All right. So, this is the website right now.

[02:16:22]Okay.

[02:16:27]Okay. So, we have our branding at the top left. We have a bunch of links. Very simple. No hover animation, which is kind of wonky now that I'm thinking about it.

[02:16:43]By the way, I didn't give uh props to Spirax Spyax. Spyax, sorry. Thank you for the $499. I appreciate it. 3D render the Artemis space mission comparison.

[02:16:56]Um, I'm not going to do a comparison right now just because I think that's going to take a while. Um, maybe I'll kick it off. Maybe I'll kick it off. We'll see if I actually get to it, though.

[02:17:15]I'm gonna kick it off, actually. That's not a bad idea. Yeah, because it's literally a fraction of the price.

[02:17:26]Okay. All right. Here we go. All right.

[02:17:28]I'm going to kick it off in cursor. I'm not going to pull it up on screen. I'll just kick it off and then if we get to it, we get to it. But, uh, yeah. Spy Spyax, thank you. Appreciate it very much.

[02:17:47]All right, I am going to kick it off and I am going to use Composer 2.5 fast and go. All right, so that's going and let's see. Is this done yet? Oh, I think we're done.

[02:18:06]Are you guys? Uh, let's see. Let me switch back to iTerm.

[02:18:16]I think we're done, but it's still going. It's still going. Oh, look at that. So, it fixed it. Now, it went back to verifying. So, it's actually looping back to a previous step and it's using another eight agents.

[02:18:32]Very cool.

[02:18:34]Um, meet Juny. Really appreciate it. Um, thank you for the $5. Can you build a browser flight simulator where selecting a country changes airspace, airports, weather, restricted zones, and rules?

[02:18:50]Um, I have too much going right now.

[02:18:54]Yeah, I think I'm going to not do that.

[02:18:58]Um, maybe I'll kick it off in a bit, but nonetheless, very appreciated. Thank you.

[02:19:07]Okay. So, I'm really hoping we're going to be done soon. Um because this is now uh 27 minutes, by the way.

[02:19:17]Okay. So, all of this time running.

[02:19:23]Let's see how much tokens we've used.

[02:19:26]Let's see how much of my um let's see how much of my quota I've used. Do you have any guesses before I show it?

[02:19:36]Any guesses? So, I think it was reset pretty shortly before the new model drop.

[02:19:44]Fox on the run. Uh, we're redesigning the homepage with Opus 4.8 with dynamic workflows. Redesigning my own personal company website. Um, what do we think?

[02:19:55]What do we think? 60% 22% 5% 30% 40%.

[02:20:01]I'm going to switch over in one minute.

[02:20:03]I want to see what y'all think.

[02:20:06]80% and I'm asking specifically about the 5 hour window. Sorry, I should have clarified that. And even as I'm talking right now, it's still burning up tokens.

[02:20:15]So, keep that in mind. All right, here we go. Let's switch.

[02:20:19]Drum roll.

[02:20:23]15% That's crazy.

[02:20:30]3% total usage for my entire week. 3%.

[02:20:37]Well done for the people who are close on that one. 15% in the current window.

[02:20:42]And I've been running Opus 48 basically on on very high settings, multiple agents running in parallel, and still not that much used. I am I am quite impressed.

[02:21:08]Okay. Are we done? Did it finish?

[02:21:12]Done. Yes, it says done. 28 minutes 46 seconds later. Nearly a half hour to redesign a homepage.

[02:21:22]And let's see. Oh, run homepage redesign workflow.

[02:21:28]Wait, why is it still going?

[02:21:33]What?

[02:21:39]I should have used claw design. Yeah.

[02:21:42]Why? Why? I don't understand what happened here.

[02:21:46]Completed. Okay. So, here's this dynamic workflow. Redesign the forward future homepage.

[02:21:53]The workflow finished. 18 agents, 29 minutes. Let me read the full results.

[02:21:57]So, it's reading it.

[02:22:01]Set up an isolated work tree mirroring the current site. Run the homepage redesign workflow. Why is it still running the workflow?

[02:22:14]Weird.

[02:22:21]It's creating a work tree. Yeah, but that shouldn't take so long. I mean, it already did that.

[02:22:28]Okay. Well, we just saw something.

[02:22:30]Winner.

[02:22:33]The Ford Future Dispatch. Okay. It renamed my product, but fine.

[02:22:40]Brand tokens unchanged. Implemented across 12 files.

[02:22:46]It's still running though.

[02:23:04]All right. It's finding some issues.

[02:23:06]Numbering collision is real. Each guide card renders the same style. Okay. New the new shadows are fine.

[02:23:28]Oh, I like that.

[02:23:31]Juanpa, I'm going to do that.

[02:23:45]Yeah, I know. I I'm like I'm not super familiar with Cloud Code CLI. It's been a long time since I've used it. I just prefer the desktop app. I always prefer an interface. That's just my preference.

[02:23:59]Um Skeleton XTX.

[02:24:04]How's the new model performing? It's good. I don't I think it's going to take a lot of time to actually understand the full vibe of the model to see how well it performs. Um, it has performed pretty darn well for these first kind of sing like zero shot tests. It did get the car wash test wrong though on max thinking 4.8.

[02:24:29]Open local host. Okay. Did it tell me that? Yeah, it's actually finished.

[02:24:37]What's still running is the nextJS dev server. Okay. Okay. Moment of truth.

[02:24:42]Then let's see. I'm going to load it and see if it's looking good.

[02:24:48]Okay.

[02:24:51]Okay. All right. I'm going to show you.

[02:24:53]All right. Let's see what you all think.

[02:24:55]Um Brian, can we can we get a vote going a poll on whether people like the new design or the old design better?

[02:25:11]Here we are.

[02:25:15]So, this is the new design. Let me move this out of the way. It renamed it to forward future dispatch.

[02:25:26]Um, I would love to get a poll. So, hopefully Brian will be able to spin up a poll and you guys could tell me which one you like better. This is the new one. Uh, okay. Give Brian a minute. I'll show you around and get ready to give your answer on what you like best. This is the new version. Boy, does it love rounded edges. Everything is a rounded edge. This does not look good. Uh when you hover over, look at look at the animation. The like purple kind of line that drops down the two sides or the left side, I should say. Uh I'm not in love with that. There's a lot of blank space. Just empty space. Show the old one. Okay. Yeah, here's the old one.

[02:26:04]This is the old one.

[02:26:20]Um, okay. You like the animation. Some of y'all like the animation. That's cool.

[02:26:26]Um, so this is the old one. Everything is very square.

[02:26:31]Um, yeah.

[02:26:36]Okay. So, this is the old one.

[02:26:38]This is the new one. I mean, this this having the video completely um like on a on a on a different line than this on the right side than the our original.

[02:26:54]Makes no sense.

[02:27:05]By the way, I'm messaging Jonah right now of like why why isn't an image showing right here? And he's like, I'm on it right now. So, he's actually we're we're kind of iterating on the homepage right now. Um, Altevaka new one looks like it's trying to mimic iOS Glass. Yeah, a little bit.

[02:27:25]It It has a Yeah, everything. Oh, I kind of like the little shadow or um fade effect that it has there. Let's see what it looks like. Yeah. I I don't love it. I don't love it when you hover over it. Like it just moves so much.

[02:27:49]I don't love it. Uh too much empty space here. We have kind of the main original right here. And it it just doesn't look good. It does not look good.

[02:28:02]But I want to hear your opinion.

[02:28:07]Okay, so this is the old one.

[02:28:10]Everything's square. So basically really all it did was just square off or sorry, round all the edges, round all the corners above the fold is better on the old one.

[02:28:26]Yeah, absolutely. like having a the two lines here. No good. No good. It completely throws it off. Let's see what it would look like.

[02:28:40]Let's see what it would look like if it didn't do a new line.

[02:28:51]No, it's still bad. It's still bad.

[02:29:01]Yeah. Yeah. Um, try again with claude design. Yeah, I'll do that. I'll do that later. I'm not going to do that right now, but that's a good idea. Eric 80.

[02:29:19]Clayton explains, "Some concepts from the new one are much better. Just need some clean up." I feel like rounded corners are so overdone at this point as I'm saying it like with my video showing in an ecam having rounded edges. So maybe I'm not completely off that trend, but like I don't know. I'm kind of into the square look now.

[02:29:47]Yeah. Okay. So once again, uh, by the way, I know I'm like shilling the heck out of my own product here, but forwardfuture.ai, like it is it it it it is like such an awesome thing that we're putting so much time and effort into. We're investing heavily into it. We're bringing writers on, right? Go subscribe to the newsletter because you're going to get incredible originals like this one from Mr. Dave Shapiro, friend of the channel, Dave Shapiro, who wrote about how Anthropic is not on your side.

[02:30:23]Okay, fantastic, fantastic, fun article to read. Um, hybrid. Okay, so let's see. So far, we have 155 votes. It is 50/50. It is 5050 right now. 155 votes. 50/50.

[02:30:47]Go vote. I need to see this change. We need a winner. This is the old design.

[02:30:51]The old design. Mostly the only difference seems to be, by the way, 30 minutes and like hundreds of thousands of tokens later. And really all that it did was round the the corners. Are you kidding me?

[02:31:08]That's really all it did?

[02:31:12]Everything else looks nearly identical.

[02:31:16]It didn't even touch the guides page.

[02:31:20]Look at this.

[02:31:23]And by the way, another reason to go to forward future.ai.

[02:31:26]We have all of our awesome guides.

[02:31:30]Whether you're playing with OpenClaw or uh Chatbt, it doesn't matter. We have guides for you. Go check it out. Uh we have our newsletter. This is different.

[02:31:41]Not good, though.

[02:31:45]Not good.

[02:31:47]All right, I think we're gonna end it there.

[02:31:53]Brian, can you show me the latest results?

[02:32:15]Yeah. Okay. Um, okay. So, it looks like the old design with 196 votes won just barely. 52% of the votes towards the old design, 48% towards the new design. Very interesting. Very interesting.

[02:32:51]Very cool.

[02:33:01]All right.

[02:33:03]Anything else that you guys want to talk about?

[02:33:09]Let me know.

[02:33:11]Um, probably going to end the stream in a little while, but if there's anything else you want to talk about, let me know. Uh, of course, we can just browse Twitter together.

[02:33:33]I think that's it for today. All right.

[02:33:35]Thank you all for joining me. Um, this is a lot of fun. This is now officially the longest stream I've done and the reason is because you guys are awesome in chat and I'm actually having a ton of fun reading chat and responding and really appreciate all your prompt suggestions. Thank you for the super chats. Thank you for joining. Go check out forward future.ai. We're going to be editing the video that we recorded right now right now and we'll be publishing it later today. So stick around for that.

[02:34:06]And thanks. Good seeing y'all. Bye.

Related Videos

Artificial Intelligence

OpenHuman VS Hermes AI: Who Wins?

JulianGoldieSEO

285 views•2026-05-29

Artificial Intelligence

Long-Running Agents — Build an Agent That Never Forgets with Google ADK

suryakunju

142 views•2026-05-30

Artificial Intelligence

This computer is made from real human brain cells. And you can buy it.

Talktmsmedia

3K views•2026-05-28

Artificial Intelligence

BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2

aimmediahouse

122 views•2026-06-03

Artificial Intelligence

I Made the Same Anime Fight Scene in Every AI Video Generator

NobleGooseAnime

295 views•2026-05-30

Artificial Intelligence

Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S

cnnnews18

3K views•2026-06-01

Artificial Intelligence

I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)

AICodingDaily

298 views•2026-05-29

Artificial Intelligence

3D Platformer Update - NO CAPES

SolarLune

294 views•2026-05-30

Trending

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30

The Fastest Way To Board A Plane 😮

zackdfilms

6504K views•2026-05-29

Artificial Intelligence

DOOM Runs On Everything...except Neo Geo

ModernVintageGamer

143K views•2026-06-01