Claude Opus 4.8 introduces effort control allowing users to choose how much computational effort Claude applies to responses, with higher effort enabling deeper reasoning but consuming more tokens and rate limits, while lower effort provides faster responses; additionally, Claude Code now supports dynamic workflows that can orchestrate hundreds of parallel sub-agents in a single session for complex tasks like large-scale code migrations, though this feature may significantly increase costs and usage limits.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Claude Opus 4.8, New copilot Studio Agents, ChatGPT Agent updates and 7 other AI featuresAdded:
Another day, another world's best AI model.
That's right. Anthropic just released their latest version of Opus, Opus 4.6.
And well, at least according to most benchmarks, it is the best model out available. And although the reviews are a bit mixed, it looks like at least for now, Anthropic will regain that top spot. So that's probably an AI update that you didn't miss this week, but there's a handfulish that you probably did that flew under your radar. Like, did you know that now you can build an agent in Microsoft Copilot Studio without code that can actually use your PC? Or did you see that Chad GBT's agents got a pretty big under the hood update that I literally didn't see anyone talk about? Like, not exaggerating, I didn't see this anywhere. So yeah, unless you're glued to a bunch of RSS feeds and Twitter drops or reading company blog posts like I used to read the back of a cereal box on Saturday mornings, you're probably going to miss some of the most important AI updates that you can start using today. Don't worry, I work for you. I stay up late and wake up early so you know what AI features are worth your time and which ones you should avoid.
So, this is our new Friday features.
Let's get into it. So, on today's show, here's what you're going to learn. Uh, you're going to learn the sneaky way that Anthropic's new powerful Claude Obus 4.8 might end up costing you more money. You're going to see why Microsoft's agent updates bring them up to speed with everyone else. And speaking of agents, you're going to know a small Chad GBT agent update that is going to make a big difference if you start using it today. All right, let's get into it. Welcome to Everyday AI. My name is Jordan Wilson and yeah, we do this thing every day, Monday through Friday at least. It's an unedited, unscripted daily live stream podcast and free daily newsletter helping everyday business leaders like you and me make sense of the barrage of AI updates. I tell you what's important, what's not.
you use that information and become the AI wizard in your company. Everyone's like, "How did this person know so much?" Well, it's because you listen to me and I don't sleep and neither do my agents. All right, so if that's what you're trying to do, be the smartest person in AI at your company. Our website is where you make that happen.
Your everyday.com.
Uh go sign up for the free daily newsletter. We recap the highlights of each day show as well as all of the other AI news that you need to know to keep up and get ahead. Uh, speaking of keeping up, if you want to keep up with me, I'll actually be in San Francisco uh, next week. So, I'll be out there, I think, Monday through Wednesday. So, if you happen to be in San Francisco, make sure to hit me up in the show notes. I always leave my LinkedIn. Just make sure to leave me a message, otherwise I'm not going to know who you are. Stranger danger. Anyways, I'll be out uh, checking out the Microsoft Build Conference. So, uh, let's get into it.
our first one. This one sounds very small, but it's actually pretty big. And I'm going to give you some random ideas uh on this one, but Notebook LM and Google Drive are now working a little bit better with the latest update. So, Notebook LM now has Google Drive sync.
So, here's what it is. That's automatic syncing between Google Drive and Notebook LM. So when a Google doc sheet, slide file changes, whatever, the notebook updates automatically if you've added uh one of those things as a source in any of your notebook without having to manually upload it. So Notebook LM also strictly respects file deletions and permissions. So removed or access revoked files also sync and then will drop out of that notebook. All right.
And this is available to all Google Workspace customers and users with a personal Google account that have access to Notebook LM. So essentially everyone.
So it already started to roll out. And Google did say this will take about two weeks. So make sure you go check it out.
There's no admin control, no endusers for this or anything like that. So if you don't know notebook LM like okay, first of all, how number one, uh I use it literally multiple times a I used it so much today. I was like, how did I function before this? And one of the things that I uh it was actually a little kind of quoteunquote cheat code because there was a way to manually do this before that most people didn't know about, but now having this automatic I think opens up a lot of new um use cases. And think about this and the quick 30 second primer on notebookm if you are super new here. Uh it's Google's amazing uh kind of grounded AI app powered by the latest Gemini models, but it is grounded in your data. So you can add different sources, whether those are static files that you upload or dynamic links to your Google Sheets, your Google Docs, your Google Slides, etc. So that's the big unlock here because well, guess what? Those docs are live. They're dynamic. So, one thing just to put this little bug in your ear, I constantly have agents updating my Google Docs.
Now, I don't have to do anything else in Notebook LM. So, this piece to me is huge. I can have just a daily, and this is what I'm going to try once it does roll out to my account. I wasn't in the first uh kind of trunch of of rollouts here, but I'm just going to have my um my uh agent. So, I've been using codecs to go into my notebook LM using the um browser use and just update all of my notebooks. Now, I don't have to do this because it's going to automatically sync. But now, all it's going to have to do as an example is create a new uh custom audio overview for me every single day. So, as long as I have those connections already set up in a notebook, so maybe it's, you know, a certain Google sheet I use to track important metrics, uh, a Google doc that an agent will update every single day. I don't have to configure it each day, add a new notebook, all that. So, this right here, y'all, this is that personal assistant. We finally have it. So, this is a small feature. It's like, oh, these two things that already work together.
Well, now they sync. So, just add in one more layer. So, I can't wait. And so, Google uh and and maybe I'll reach out to uh the the team there um and give, you know, give this as a suggestion. If this could just give you automatically a new, you know, audio summary every day, a new audio overview, that would be amazing. There's literally very popular apps that this is all they do. So, uh, this is one, if you couldn't tell, I'm personally geeked about a small one. But if you are a, uh, Google user or just a Notebook LM fan and you love those audio overviews or any of the other, uh, um, artifacts in the Notebook LM Studio, this is one you should definitely be trying out. All right, our next update, and yeah, this is one I literally forgot where I found this. Um, and it is a random uh help article on the OpenAI website. This wasn't on the normal release notes. This wasn't on OpenAI's Twitter accounts blog post. It was on the uh Chad GBT enterprise and edu release notes, which is not the one that I normally check. So, I almost missed this. But this is actually another one that I'm excited for. A small one that I don't think you should overlook. So, here's what's new. uh new updates to workspace agents. There's new controls and capabilities. So, if you don't know, workspace agents are shared and reusable team agents, kind of like an evolution of custom GPTs that run multi-step workflows in the cloud across connected business tools. Uh and you can build them, you know, no code, right?
Conversationally. Uh so, right now, workspace agents are only available on team plans. So if you're on a business, uh, edu, enterprise, etc. Uh, and I think that these have been slept on because of all the codeex hype.
Actually, right now, workspace agents are one of only like three things that I use on the chat GBT web app. Uh, it's deep uh, deep research, workspace agents, and canvas. Although Canvas uh, just yesterday got kind of removed, but replaced with writing blocks. So, you're not going to see the Canvas button there anymore, FYI. But the canvas feature and functionality is still there. I was talking to the team and encouraging them to add a toggle back and it sounds like they might. Uh anyways, here's what's new. There's new model and thinking control for the workspace agents. So now there's a model selection that appears in the composer when you are building workspace agent and there's thinking effort controls that have moved into the model picker. So before you didn't have this fine uh fine-tuned control when you created a a workspace agent, right? It just used the latest model GBD55, but there wasn't a lot of um uh information on what version of GBD 5.5 that it was using because technically I'm doing the math here. Uh there's instant, there's four levels of thinking, and there's two levels of pro. So there's technically seven variants uh that you can choose from. So, not having that control, right? I've said this many times uh on the uh on the show before. Although the uh GVD 5.5 instant model has gotten better and it also was updated yesterday, I still don't use it, right?
It's it's like I always use a thinking model and I would encourage you to do the same. So, this is pretty big. uh just from the model uh capability to set the thinking level uh is a big upgrade for those workspace agents that you can run around the clock and connected to all of your apps and connectors as well.
So uh in the agent builder you can add those tools apps custom MCP skills files I mean the workspace agents are getting absolutely slept on and I think the reason is is because uh codeex is just blowing up. Uh so uh here's why it's useful. Well, you can pick the model and you can dial in the thinking effort and it lets builders, you know, match the horsepower to the task. So you can use lighter models for simple high volume steps or you can use the thinking and pro at higher effort for more uh complex reasoning. And then there's also uh updated connector action constraints uh that add control beyond write action uh approvals by restricting how specific connectors can be used. So here is uh the little tidbit here from openai. So they said new controls and capabilities for chatbt workspace agents. Uh we're rolling out new model admin app access and responsibility capabilities for chatpt workspace agents in enterprise and edu. Uh it says enterprise and edu but I did verify this under uh the model capabilities are also available in the normal business plan. So there's normal business plans, there's enterprise and edu. So although they don't denote it in this help article uh the um model uh kind of reasoning effort controls are available in the business plan. So aside from that there is also the uh updated role based publishing permissions so workspace admins can control which roles can publish agents to the shared workspace directory. And then you also have the guided agent setup. Chad GBT now asks setup questions to help users create useful agents more quickly and a couple other smaller things. All right, so pretty pretty nice update there as well. All right, next we have a new anthropic update that is not their new Opus 4.7. So here's what's new and why you might want to keep an eye on your Claude bill. All right. So, uh this is a new claude code dynamic workflows. Uh so this is a new cla code capability where claude can plan the work and then run hundreds of parallel sub aents in a single session then verify if its outputs before reporting back to the user. So for example, you know, cloud code with opus 4.8 8 can carry out codebased scale migrations across hundreds of thousands of line of codes uh from kickoff to merge with the existing test suite as its bar. Uh so here's who has access well it's available now uh if you use cloud code uh for enterprise team and max plans. So I have tested this out a little and my gosh goodbye to your rate limits. You thought anthropics rate limits were tough before? Try using hundreds of agents in parallel. Uh I had the $200 max plan and I was able to run one prompt. I saw other people that were on the $100 max plan and tried to test this new feature out. Actually, I don't know anyone that was actually able to uh and they hit their limits. So yeah, like I said, good luck. I think this is only if you're uh you know like there was that story that I don't know if it's true or not that some large company accidentally spent $500 million uh on their uh I think it was their Claude bill because they weren't looking at usage. So yeah uh Claude rolled out a lot of things that they're like oh it's you know look at all these great features but all it really does is run up your bill and exhaust your limits very quickly.
Although in theory a really cool and useful feature. So if you are Scrooge McDuck or the Monopoly man and have play money, uh this is great. So here's what Anthropic says. They said, "Today we're introducing dynamic workflows in Claude Code, helping Claude take on the most challenging tasks end to end. Uh work you'd normally plan in quarters now finishes in days. Claw dynamically writes orchestration scripts that run tens to hundreds of parallel sub aents in a single session, checking its work before anything reaches you. Some problems are too big for one pass by a single agent, especially in complex legacy code bases, a bug hunt across an entire service, a migration that touches hundreds of files, a plan you want to stress test from every angle before you commit it, dynamic workflows can handle all of those end to end. So yeah, uh I tested this once uh but on a very small codebase, right? Just on uh a tool that I've been building off and on for the past uh three to four months. Who knows if I'll ever release this. I've thought about it, you know, it's it's actually really cool. I use it every single day.
Uh but it's a it's a small codebase. So if I had a larger codebase, I don't even think I would be able to do this on the $200 max plan. I think this is just for API credits. It's like every time someone hits that feature, you know, Enthropic adds another, you know, I don't know, billion dollars to its valuation or something like that. So, uh, why it's useful, I mean, it just collapses those large multi-day engineering efforts. So, when you're talking about, you know, mass migrations, repowide refractors, uh, into, uh, refractors into a single orchestrated session instead of manual subtask management. So, obviously, who's going to find this useful? engineering teams, right? Uh and just enterprise development organizations. So, uh good one if you have the budget and well, if you're uh really a dedicated engineering team. All right, the next one that I think will probably be more widely uh used is the new well newish uh co-pilot studio computer using agents. So these were announced actually quite a while ago but they now are just generally available. Yeah, it's kind of the problem with uh you know Microsoft and Google. Sometimes, you know, their conferences, they release, you know, they announce things and they're released right away. And then other times they go to these, you know, frontier programs or trusted tester, you know, which is like 0.1% of the general public and it stays there for like 6 to 12 months and you never hear about it again until Yeah. Well, now this is one of those uh the computer using agents inside Copilot Studio is something when it first came out, I think I actually went out and bought a laptop uh a PC just to use this and they just didn't become generally available until now.
So, I might have to go find that laptop.
All right, so here's what it is. So computerusing agents are now generally available in Copilot Studio, letting organizations build agents that interact directly with websites and desktop applications through the user interface to automate processes where the underlying systems lack APIs. Uh so this is definitely a new enterprise capabilities that lets organizations manage credentials more securely, choose models best suited for different automation scenarios, and build automations that adapt to changing interfaces instead of breaking when a screen or web page changes. So computer using agents can now be embedded directly into multi-step workflows as well. Uh that feature has moved into preview. So who has access? Well, right now everyone does. it is generally available. So if you do have a Microsoft co-pilot uh studio license, you will have access to it. Uh so why it's useful? Well, I think number one, if your organization only uses Microsoft Copilot and you hear all these cool things happening in cloud code and codeex and how I'm talking about my agents are just running the uh you know, literally using every single um application on my computer. I'm looking at my current uh agent that's in goal mode now. It's at 25 hours running straight, right? How do you do that?
Well, you give it computer use because, you know, yes, MCPs, these model context protocols are great to bring data from all these different services. You know, Microsoft Copilot like others, they have apps and in uh apps, integrators, connectors, whatever you want to call it, that bring your dynamic data sources in. But what about all those other things that don't have MCPs and don't have APIs or don't have uh you know apps or connectors supported within Microsoft Copilot Studio right the simplest thing for me is like as an example looking at my podcast stats but sometimes for me that takes like so many clicks it's almost impossible or one thing I like to do is you know look at you know release threads right so uh on Twitter so when a company you that's kind of their preferred channel. Now, I like to see what people are talking about, what's working, and what isn't. I don't want to sit there and scroll through all those, right? Social media is distracting. So, that's an example of a computer using agent that can take over your browser and really do anything that there's not a direct connector for already. Here's what Microsoft says. Uh, computer using agents are now generally available. With computer using agents now generally available in Copilot Studio, organizations can uh organizations can build agents that interact directly with websites and desktop applications through the user interface. This helps you automate processes that previously relied on brittle scripts or manual workarounds because the underlying systems lacked APIs. With the new release also comes new enterprise ready capabilities designed to help you operationalize UI automations more confidently. Organizations can now manage credentials more securely, choose models best suited for different automation scenarios, and build more resilient automations that can adopt changing interfaces instead of breaking whenever a screen or web page changes.
All right, so uh pretty exciting one.
And our next AI feature that hopefully you didn't miss. This one there are a couple of new features, but it's going to look completely new. that is and I'm guessing this is in preparation for Microsoft build next week. Uh Microsoft is rolling out, it's live now on some accounts, but it is rolling out uh to everyone a brand new uh design for Microsoft Copilot. So for uh with work IQ. So here's what's new. a new design for Microsoft 365 copilot built on progressive disclosure. So Microsoft says that copilot begins with a clear readable response, then adds structure and next step support as you refine what you need with formatting when it improves clarity, suggested prompts when they deepen work, and follow-up actions when they move it forward. So this progression is powered by Work IQ, which is now a little bit easier to see and use. It is a drop-own toggle to either include work data, work data in memory or to not include it. So there's a little toggle that you can turn work IQ on by default. Uh which is great because then you can see in the responses, the thinking trace, the searches and what data it is automatically going to pull in uh from kind of that work IQ graph.
So kind of this new progression is powered by work IQ. the intelligent layer you can see uh when active and directly controlled. So this draws on your emails, files, chats, meetings, all that. Uh so work IQ adopts to the depth your work requires including the ability to choose between AI models when that can surface more relevant results. So who has access to this right now? Well, it's rolling out now. Uh so this is the consumer365 copilot app experience uh right which is distinct from copilot studio. Uh so here's why it's useful. Well it aims to make copilot's controls feel less intrusive while keeping AI assistance close to word excel uh and powerpoint work surfaces. Uh so also by grounding in your broader context and not just individual art uh individual artifacts, work IQ helps co-pilot support significant shifts like performance reviews uh cycles that an org can change. So who's going to find this useful? Well, I mean anyone that uses uh Microsoft 365 C-pilot daily, right? Um, and if you didn't like the previous or I guess technically the current because this is a slow roll out, but it is live now. So, if you didn't like the other co-pilot UI, uh, you know, you might like this one a little bit more. You know, the big thing that I'm seeing, again, I talked about this, uh, I talked about this on the show, uh, yesterday.
Uh, it looks like everything else, right? uh co-pilot maybe wasn't the um best designed previous experience, but it was at least a little bit unique whether you like that or not. So the new design from my uh perspective is a little flatter uh a little technically cleaner uh you know more monochromatic and less uh colorful. Um so on the surface it's a design aesthetic but uh also there are some you know new features and new tweaks uh that Microsoft says make it a little bit easier uh to use. So um there's that.
Let me know if this is something that you guys want to dig into more once it's released. All right. Uh next one.
This one pretty impressive. dubbing V2 from 11 Labs. So, uh what is this? Well, as you can guess, it's dubbing the second version. All right. So, 11 Labs obviously a leader in uh text to speech, but this is a new dubbing model that translates spoken content across languages while preserving the original delivery. So, dubbing v2 conditions on the source performance, not a transcript. That's the key thing here.
So your tone, emotion and delivery carry across every languages. So right now it supports 90 plus languages and accents enabling localization for international audiences. So this is already launched and it's available in 11 Labs dubbing studio. It's web- based uh and it works for individual creators through enterprise teams on the 11 Labs platform. So uh here's why it's useful.
Well, it's conditioning on the actual source performance rather than a transcripts, meaning the dubbed output keeps the speaker's emotion and pacing, which transcriptbased dubbing tends to flatten. Uh, right, that's also obviously a little bit better for someone that's very emotive like me, right? Like if you don't uh watch the video version of the podcast, you know, sometimes I'm, you know, flailing my arms in the air and making crazy faces.
It's funny. I have people that, you know, sometimes send screenshots when I'm making a very unflattering face or something like that. But, you know, it captures that same emotion. And looking at some of the demos, I haven't had a chance to do this uh just yet because this one literally just came out. Uh but from the demos that I've seen from uh you know people that I kind of know or you know I I watch or listen to these people um really good right not just preserving the tone of voice across multiple languages but just really uh capturing the emotion right it actually sounds like this you know these people that I've heard their voice before. So obviously who's going to find this valuable? I mean content creators, podcasters like me, uh but also just video teams, uh you know, HR departments, people in learning and development. Like I actually think there's a ton of use cases. Uh people doing international business. If you're a global corporation and you want to have a more localized and friendly onboarding for the new, you know, 100 people that you train every single week, whatever it may be, uh this is pretty big. Uh, and results impressive because normally I wouldn't put a texttospech update on, you know, this Friday feature show, but this one was that good. So, this is how 11 Labs describe it. They say, "Dubbing V2 brings high quality dubbing to creators, marketers, and studios. Fully automated with no pipeline to build." Uh, so they say this supports source audio, source text, and target text. the full pipeline, translation, cloning, dubbing, and syncing runs automatically with no manual intervention.
Uh they say it's perfectly synced. It is an audioto audio model, so it doesn't require a transcript or text. And they say close to human quality. So, uh again, uh pretty impressive one for me that I'm like, "Yeah, I think people need to hear about this." All right, speaking of hearing about our last big Friday feature update, you probably didn't miss this one. There's a new, not quite undisputed, but probably king of the hill when it comes to AI models.
Not harnesses though, so keep that in mind. So this is more of I think and this goes to my point that I made in yesterday's show about how you know all models are starting to be the same and it's more about the harnessing the tool calling what works under the hood which I know is technically part of um you know a model but as an example you know the harnessing of codecs using GPT 5.5 by all measures is much better than the harnessing of clawed code now using Opus 4.8. So just a a quick distinction there uh for our audience. But let's talk about CL Claude Opus 48. It is good. It is impressive. So Claude 48 is an upgrade over Opus 47 with improvements across benchmarks for coding, agentic skills, reasoning, and practical knowledge work and is a more effective collaborator. So uh there is a new effort control on Claude. Hey, another sneaky way that you're probably going to burn through your uh usage pretty quick.
Uh so there is a new effort control on Claude and Co-work that lets users choose how much effort Claude puts into a response. So a higher effort thinks more frequently and deeply and a lower effort responds faster and uses rate limits more slowly. So yeah, this one's I'm actually super glad that Anthropic did this because with 47 uh they took away extended thinking and they introduced adaptive thinking which adaptive thinking was a disaster, right?
It was a toggle and it essentially I would always want it to think and I would instruct it to think and it would never actually use its reasoning and logic abilities which when it used it it was great. So, the 46 toggle uh for that thinking was terrible. So, uh big props to Anthropic for bringing it back, but they brought it back in a way that might have you burn through your token usage, right? So, just keep that in mind. They are not kidding when they say it uses more of your limits. Uh also, uh Anthropic says Opus 4.8 8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked reflecting an emphasis on what they call honesty and flagging uncertainty. So, uh I am not too certain about that in my very little experience, but um I did watch a couple review videos from people that had early access and they called this out as well.
Normally people that are very, you know, pro-anthropic called this out the honesty thing and hallucinations didn't seem to check out from some, you know, initial vibe tests.
So, we will see once it goes through all of the benchmarks where it lands on that. So, uh, who has access? Well, everyone. So, uh, the cool thing is it is marked at the same price as OBIS 4.7.
So I think this is like the first time in a very long time that Enthropic has upgraded the model without an upgrade in price. However, it is token inefficient AF. Uh right, there's a lot of very helpful charts. If you're uh you know looking at what models to use, I highly recommend looking at artificial analysis and they have a great intelligence uh per uh kind of cost. And yes, Opus does great. Opus 4.8 8 does great on benchmarks, but it is extremely token inefficient to essentially achieve the level uh of intelligence that it does.
It uses way more tokens than anyone else and it is not even close. and then you know add in these you know these quote unquote new reasoning levels and yeah to get that you know to pass the quality bar for whatever artifact you're working on whatever output you're trying to create uh in claude it's going to cost a ton more uh so it's obviously available systemwide on the workbench in claude on the web in claude code all that good stuff so uh here is what Anthropic says about Opus 4.8. So they say, "We are upgrading Claude Opus to a new version, Claude Opus 4.8. It builds on Opus 4.7 with improvements and crossbenchmarks and is a more effective collaborator.
Uh, Opus 48 launches alongside several new features, which we already talked about one of them. Users on Claude.ai now have control over the amount of effort Claude puts into a task. Cloud Code has a new dynamic workflows feature that we already covered that allows it to tackle very largecale problems. And there is also as well a new fast mode for Opus 4.8 where the model can work at 2.5 the speed. I I like how they phrase this. They said it's now three times cheaper than it was for previous models.
Uh so yeah, it used to be a 6x cost. So now it's a 2x cost. So yeah, don't let that like, oh, it's cheaper. Oh my gosh, let me do this 2x thing uh or this 2.5 speed thing. Yeah, it's double the cost.
All right, but the capabilities uh yes, they are impressive. Um so the uh benchmarks for the most part, at least on the ones that Anthropic handpicked, uh it outperforms uh they only included five uh no six benchmarks here. uh and it is tops except interestingly enough on terminal bench which is the one that they should be winning on which is agentic terminal coding uh right GBD 5.5 is still very far ahead on that one and I do have a more comprehensive benchmark list but as you'll see from my list and there's actually two others that I left off this list um where GPD 5.5 was winning so that's why I'm like okay what's the best model in the world I mean if you look at artificial analysis, the intelligence index, which I think is the best indicator. Uh not everything is fully out yet uh for Opus 4.8, but it does look like it will come in ahead of GBT 5.5 uh by a slight to moderate margin. Uh but here's the thing. At any point, right, we're we're hearing uh that GBD 5.6 could be here any day. Um the normal codeex Thursday release was pushed back uh to today. So maybe by the time you read today's newsletter uh we'll see what new uh things that we have from OpenAI via codeex or maybe they'll uh sneak in 5.6. Not sure. But regardless, we do know that uh OpenAI's next model is around the corner. Uh Google already said that their Gemini 3.5 Pro is going to be released soon.
So, interestingly enough here, Enthropic Strategy, not sure how it's going to work. Uh, obviously they have Mythos around the corner. Uh, and they did say that that was going to start rolling out in the coming weeks, right? Uh, so whether that's two weeks or 10, we'll see. But the assumption on how this is going to play out, OpenAI and Google are going to come out with models that top OPUS 4.8 in the next few weeks. Uh, and then once they do that, Anthropic will then release Mythos. And my thought is uh that Mythos will probably have a decent lead at least on most benchmarks.
Although it's already even though it's not available. There's already I think three benchmarks that Mythos has been passed on even though it's not publicly available right when it came out. You know, everyone's like, "Oh my gosh, Mythos is is going to be the benchmark king forever." It's not even released yet and some of their reported benchmarks have already been surpassed.
Anyways, I do expect that to be the case. So, at least for King of the Hill when it comes to models only, uh, Enthropic probably took the lead back with Opus 4.8. Uh, we're going to see OpenAI take the lead back with 56. Uh, I assume Google will recapture the lead uh, with 35 Pro in June and then probably shortly thereafter uh, we'll have Mythos and I think Mythos might hold on to it uh, for you know, maybe two months. Who knows? All right. Uh my first impressions with uh Opus 4.8 really good in some respects absolutely terrible in others. Here's what I mean.
And the stuff that you would expect uh a clawed model to be good at, it is really good. Specifically front-end design, amazing, you know, creating uh fully functioning games, apps, websites, those things, you know, HTML designs, whatever. So, so good. certain things that I like to use these models for and you know I will call out uh you know another benchmark that it is behind on and I maybe that is one of the reasons on MCP atlas which is multi-step workflow orchestration tools via the model context protocol which is interesting considering considering uh anthropic invented this and they're behind uh Gemini 3.5 flash uh right so that's one it's behind on and it's behind on browse comp as Well, uh, and it is the worst model actually on browse comp. And that's a model's ability to research questions by searching the live web. And my gosh, did I ever find that out by uh, you know, using uh, Opus uh, a little bit today, actually, in preparation for the show. Um, and not just that, but this this new thing where Anthropic says it's, you know, really prioritizing honesty and refusals, which I actually found absolutely infuriating. So, I think it was a combination, right? I told uh uh Opus 48 to do some research on a transcript from one of my shows. I know that I mentioned the different tiers of risk for the EU AI act, and I wanted to see what those were, and it essentially said, "Nope, not going to search the web." Right? because these tears don't exist. And I'm like, "Yeah, it does." Blah, blah, blah. Right. So, I had to tell it three different times.
Uh, you know, and it's and and it was using this new honesty language and how, you know, it's not going to tell me something, you know, that's that's not factual. So, you know, it's it's at least for me in in my very limited use, uh, not loving the new 48. I know this is weird. I think I'm still using 46. I really like Opus 46. I think it's a great model ju just this new honesty thing and uh you know there's some other quirks but you know mainly the even on a $200 a month plan you I can't use it how I want to right that's that's the reality I can't really use any cloud models the way I want to on a $200 a month plan but maybe that's just a new reality right as we you know shift from token maxing to token efficiency uh right maybe even people like myself who have seemingly unlimited you know budgets at least to spend on the subscription ion side are going to have to cut back a little bit. All right, so that's it. Let me know. Should we do an Opus48 show dedicated uh next week? Let me know. If so, you know what? If you're still listening at this point, that's how I know, right? Sometimes I leave little things at the end. Uh but just put Opus 48 in the comments on Spotify, in the comments on uh LinkedIn. I got my arbitrary number. Uh you know, if we get that num that many, I'll I'll do a dedicated show and I'll take my time and put it through the uh put it through the ringer for you guys. So, that's it.
That's a wrap. A lot of new things, you know, from ChatGpt agents to Copilot Studio, uh, you know, agents that can use your computer to small little things like Google Drive and Notebook LM syncing. I think that there's a lot of new capabilities that were unlocked and you probably had no clue. I hope this was helpful. If so, make sure you go to your everydayai.com, sign up for the free daily newsletter, and remember, if you are in San Francisco, check out the show notes. I always leave my link to my LinkedIn, hit me up if you want to chat AI, whatever it is, or if you're going to be at the Build Conference. Uh, let me know. Like I said, I always put my LinkedIn link in the show notes. Make sure to tell me you're from the podcast, though. Otherwise, I'm going to assume stranger danger. Thank you for tuning in. And hope to see you back next week and every day for more everyday AI.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 views•2026-05-29
Long-Running Agents — Build an Agent That Never Forgets with Google ADK
suryakunju
142 views•2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K views•2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K views•2026-05-28
BREAKING: Microsoft’s New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 views•2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 views•2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K views•2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 views•2026-05-29











