The annual summer AI slowdown panic is an inevitable market phenomenon driven by token shortages, rising inference costs, and the transition from subsidized to pay-per-use pricing models, which creates temporary constraints on experimentation but ultimately benefits the industry by forcing sustainable market adaptation and revealing genuine implementation challenges that distinguish between theoretical capabilities and real-world engineering success.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
The Annual AI Slowdown Panic Is HereAdded:
Today on the AI Daily Brief, the annual summer AI slowdown panic has arrived a little early this year. Before that, in the headlines, a new coding benchmark that's getting rave reviews. The AI Daily Brief is a daily podcast and video covering the most important news and discussions in AI.
All right, friends, quick announcements before we dive in. Quick thing. The most important way that the podcast has grown over the last few years is when people share it internally with their work colleagues. And I realize that the podcast as it is can be fairly dense and actually sort of difficult to transmit into that sort of work setting. I've got a survey up on the website right now about how I can help make that easier.
It's only a couple questions. It'll take you less than a minute to do and I would so appreciate it if you take the time to let me know how I can make aidb for teams work better. You can find a link right on the main page there at aidbrief.ai.
We kick off today with a new benchmark that has people pretty excited. Now, if you are a regular listener, you might remember my episode from back a couple months ago called Why AI Needs Better Benchmarks. Effectively, the lament of that piece is that most of the benchmarks we have either are or are getting saturated incredibly quickly.
And even if they're not, are highly susceptible to gaming in a way that makes their value in terms of understanding how good a model actually is pretty low. One of the ways that this shows up is a real disconnect between what benchmarks say when a model is first released and what people go experience. One of the areas that this has been on display recently is in the realm of agentic coding where people's lived experience with the models has been fairly different than what's suggested by the benchmarks. Well, now we have a new entrance to the field called deepswe. The benchmark comes from a company called data curve and in their announcement data curve Serena G writes on public leaderboards top models often look relatively close in capability.
Deepsw s or deep suite shows where they actually diverge. We wanted tasks that reflect realistic novel engineering work. The swbench family scrapes existing GitHub issues in PRs reflecting the realistic experience of developers in their day-to-day work. Now coming to a critique of what has previously existed. Serena writes, "We wanted tasks that reflect realistic novel engineering work. The Sweetbench family scrapes existing GitHub issues and PRs, which causes two problems. Memorization, i.e. models have already seen the solution, and triviality. Most tasks are small.
Deep Suite tasks are built from scratch, keeping prompts intentionally short and natural while requiring significantly more code to solve. On the initial benchmarking run, data curve found that GBT55 was head and shoulders above the competition with a score of 70%. GBT 54 was in second place with 56% narrowly beating Opus 47 at 54%. Results rapidly trail off from there suggesting the benchmark is very good at identifying the handful of models that are truly able to handle long horizon coding tasks. To give one example of the difference in performance, Kimmy K 2.6 narrowly beat GPT54 on Terminal Bench 2.0 and SWEBench Pro. While on Deep Su, GPT54 beat Kimmy 2.6 by more than 30 percentage points. In fact, all of the Chinese models look pretty far behind on this benchmark. Kimi was the highest scoring with 24%, but Deepseek V4 is way down the leaderboard at just 8%. Beyond the simple pass fail to mention data curve also published cost, speed, and token efficiency findings with once again GPT55 being the clear leader in all three. Compared to Opus 4.7, GPT55 used around half as many tokens, completing the run in less than half the time and costing around a third as much.
This obviously has big implications as we move into AI's trade-off area that evolves effectively token shortages. In addition to just the results, there's a bunch of things that people are responding positively to about how Deep Suite does things. The tasks require real world workflows like parsing repos, working across multiple files, tool use, and long context reasoning. And in addition, data curve isn't uploading their solutions to GitHub to prevent them being included in training data.
Developer and entrepreneur Siki Chen summed up the feelings of many when he wrote this benchmark very much matches the vibes for my real world long horizon usage adds Y cominator CEO Gary Tan this is the new standard for engineering evals chubby noted that this is a real alignment between what was rising as the proc codeex and 55 vibes putting it in numbers that validated people's feelings a couple more interesting things from deeper in the notes data curve designed a qualitative evaluation harness to figure out why models fail tasks the evaluation found that the biggest difference between the leading models and the rest was self-verification.
GPT54 and Opus 47 wrote their own tests to verify their work over 80% of the time, while the weaker models were far less likely to take this approach. Data Curve also found a distinct failure pattern for anthropics models. Claude often missed stated requirements for a multi-part prompt. For example, if a task required support for both sync and async, Claude would often do one and forget the other. OpenAI models were unlikely to make this same error, and this prompt adherence was consistent across multiple runs. Data Curve did note a few limitations. Most notably, their benchmark harness forced models to use bash commands, which Data Curve wrote could hold the models below their native ceiling. The testing also strips out synergy from native harnesses like claude coder codecs, potentially degrading performance in an uneven manner. Overall, when I was imploring the world to have better benchmarks, this is exactly the type of thing that I was hoping for. So, I'm very excited to see where this one goes. Moving now from the realm of benchmarks to the realm of narrative, a portion of AI leaders at least are finally starting to change their tune on the AI jobs apocalypse.
One of my big beefs with the Frontier Labs for the last several years has been the way that they've seemed to jump at any chance to tell everyone how likely it is that the technology that they're building inexurably for some reason is going to inevitably steal everyone's livelihoods. Now, of course, my much bigger beef with the messaging is the fact that they actually believe it, which regular listeners will know I simply do not. And before we're all done here, I will spend hours and hours and hours explaining exactly why I disagree.
But regardless, whether it's based on a changing assessment of what is likely to happen or just, you know, like a third grade level analysis of how terrible the communication strategy has been, it does seem like OpenAI at least has changed their tune. This week, Sam Alman is reinforcing his new talking points that actually it looks like people will probably continue to have jobs despite powerful new work tools being introduced. During an interview on Tuesday, he said, "I don't think we're going to have the kind of jobs apocalypse that some of the companies in our space advocate or talk about. I thought there would have been more impact on entry-level white collar jobs being eliminated by now than has actually happened." With a healthy dose of humility, in fact, Alman suggested the industry had miscalculated how easily people could be replaced by computers. He continued, "I now think I understand more about why it hasn't, and I'm obviously grateful, but that is an area where my intuitions were just off."
He went on to explain that the human part of employment can't be replaced by AI, adding, "We really do care about our interactions with people, which updated me to thinking that the jobs picture is likely to be very different than we thought." Now, many economists have attempted to make similar points over the past year, but you know, when push comes to shove, you're all going to lose your jobs is a much better headline. In any case, for those economists, the argument is typically that task automation is categorically different to job automation and or that the frictions of deploying AI at an organizational level provide a natural speed limit to any change. Up till now, those arguments have been a bit theoretical, but we are starting to hear practical case studies that explain the difficulty of mass AI replacement. Last week, for example, Goldman Sachs CEO David Solomon published an op-ed in the New York Times, declaring the AI job apocalypse concern overblown. Now, he was not polyianish about the situation, citing Goldman Sachs economists who believe a quarter of work hours will be automated over the next decade. Within his own firm, his estimate was that AI had already displaced 16% of entry-le tasks.
Solomon's argument was that AI, like previous technological revolutions, will create more jobs than it destroys and generate a productivity boom. He observed that markets rarely deploy productivity to sell the same product at a lower cost. Instead, they use new tools to deliver a better product at the same price. Giving an example from his own world of investment banking, he wrote that this might look like delivering more comprehensive analysis on a faster timeline with higher touch client service. Ultimately, the thing that is encouraging to me is not just the shift in tone, but the actual first principles thinking, an observation of real world phenomenon that's going into these changed estimations of just how disruptive AI is likely to be. Finally, in the funding world, the inference layer is gathering the next big wave of startup funding as the token crunch crunches. The information reports that B 10 is closing in on a billion dollar fundraising round that would value the startup at 11 billion. B 10 is a neo cloud of sorts, providing a vertically integrated solution for fine-tuning open- source models and deploying them in production. B 10 doesn't own their GPUs, instead serving as a middleman and value added reseller for larger cloud providers. This round would see their value more than double from their last fundraising round announced just 3 months ago. The growth in valuation is in line with some incredible revenue numbers so far this year. Sources said base 10 saw annualized revenue triple from 200 million to 600 million during the first quarter with their run rate increasing 20x since March of last year.
Open Router is another beneficiary of the funding surge becoming the latest AI unicorn this week. They announced a 113 million series B on Tuesday led by Capital G which is the investment arm of Google parent Alphabet. Sources said the round valued Open Router at 1.3 billion, double their value from their series A last June. As the name suggests, Open Router is a token routing service.
Basically, a way for a customer to get access to lots of different AI models with a single platform. So, for example, if you're designing some application that is at least a little bit model agnostic and you want to optimize for factors like performance, cost or simply have some redundancy, you can build on top of Open Router instead of chunky APIs from all the different model providers directly. Like B 10, Open Router's business is absolutely booming.
Current Open Router investor Menllo Ventures reported that the company is now serving a 100red trillion tokens per month, a 5x increase from where they were 6 months ago. Menllo also noted that their revenue run rate has already doubled since the round was opened in February. These raises demonstrate just how much focus the AI industry has now on inference and serving models above and beyond just simply training runs.
Dylan Brislot of Nebas posted, "Sim Alman recently said we have to become an AI inference company now. Editors note, I'm pretty sure he said, "We are an inference company now." But regardless, the point remains. Dylan continues, "Feels like that sentence is the cleanest reorg of the year and kind of went under the radar." The frame the public still uses training. Who had the biggest cluster, the most data, the best post-training pipeline, the boldest scaling bet? That story is still real, but it's not where the marginal dollar goes in 2026. The marginal dollar goes to serving a reasoning model that has to think for 10 seconds before it answers.
Hold a million token context without falling over. Fan out to a tool, come back, verify itself, and bill you for every token in the trajectory. The training run is amvertised. The serving run repeats every time a user opens the app. Congrats to base 10 and open router. But for now, that is going to do it for the headlines. Next up, the main episode.
Welcome back to the AI daily brief.
Every year, like clockwork, the summer sees some AI slowdown panic. Now, the particular nature of the narrative has changed each year, but it has come without fail every single time. It appears to me that we might be getting ours a little early this year. And with the Memorial Day holiday coming a little early and kicking off the summer in the US, sure enough, the shape of the panic is starting to reveal itself. Now, these panics are sort of an unintentional collaboration between the professional critics. In other words, the people who have made it their personality and or business model to deny or disparage AI with the people who are just tired and desperate for AI not to be as big a deal as it seems because thinking about adapting to it is just exhausting. Back in the summer of 2023, the narrative hit when in June Chatbt had its first down month ever. Similar Web who presented the stats claimed it was the quote novelty wearing off. Pretty soon, people came to the conclusion that it was about students going home for the summer, which, if true, according to vaunted publications like Business Insider, was a bad sign for OpenAI's long-term prospects. Fast forward to 2024, and the summer panic was an early version of a pre-training wall, where a lot of the discourse was that companies were just going to run out of data to train their models on. And walking down that implication path, if there wasn't new data, then at some point models were just going to stop improving. Now, 2025 was a doozy. It was that oh so lovely MIT study, and I use the most aggressive air quotes possible around study, that found that 95% of generative AI projects fail, which was of course not the only factor. GBT5 came out to largely universal disappointment. And given that there had been a flurry of infrastructure deals signed by companies like OpenAI in the previous couple of months, the financial side of the AI bubble narrative really picked up steam.
The combination of the idea of AI not being able to get all that much better as witnessed by GPT5 plus not really performing inside organizations as witnessed by MIT had big implications.
The story went for the financial stability of the AI industry. Spoiler alert, however, these panics never last all that long. In Q4 of 2023, we had a number of companies start to release their own GPT4 class models. Maybe most notably in December, Google got back in the game in a big way, launching Gemini.
In 2024 and September, OpenAI answered the concerns about pre-training walls with a fundamentally different approach to scaling and the introduction of 01, which would become their first reasoning model. Now, in 2025, the bubble narrative actually persisted longer than the summer. It was a driving story throughout quarter 4 of last year, but eventually it was absolutely smashed by the combination of claude code, opus 45, GPT53 and 54 and the recognition that in fact not only was AI still getting better, but some major Rubicon of capability had been crossed. This of course set up the first half of this year, which has been insane, exciting, and for many completely exhausting.
Agents became real. People started to recognize the importance of harnesses with many getting their first taste of harness engineering as they set up their open claws on new Mac minis. In the enterprise world, the capability overhang became more pronounced than urgent than ever. And it has been an absolute race to catch up. Now, it is in that cauldron that we've gotten phenomenon like token maxing. Token maxing, in short, is the idea of incentivizing team members to use AI as much as possible as measured by the number of tokens they consume. We found out that Meta had a token leaderboard, but that actually this was happening in companies outside of technology as well.
Companies like Uber announced that they'd burned through their annual token budgets in just a few months, and we were truly off to the races. And alongside the massive shift from assisted AI to Agentic AI, came an incredible increase in revenue. As the thing that mattered for the big labs was no longer the number of seats that they could sell, but the number of tokens that those seats could consume. This is what has gotten us to OpenAI being at a $30 billion run rate and Anthropic surging to a $45 billion run rate.
Caveat asterisk, the comparison isn't a perfect one to one as they have different accounting practices. But hold aside the specifics, the trend line is what matters. Revenue has skyrocketed, leading many people to question some of those bubble assumptions that had been so prominent at the end of last year. If we were, as everyone would admit, just barely scratching the surface of how much AI could be used and already we were seeing revenue numbers like these, maybe these big infrastructure deals didn't look so crazy. As recently as the beginning of this month on May 1st, The Atlantic published a piece called So about that AI bubble. Thanks to the rise of Claude Code and other AI agents, revenues are finally catching up to the hype. And yet, for those watching closely, it's been clear that there's something of a reckoning coming. Tokens are too expensive and there's not enough of them. All of a sudden, companies are having to change their business models to be usage based instead of seatbased.
This has caused incredible constrnation among especially proumer style users who were sometimes consuming five or even $10,000 worth of tokens on a $200 a month plan. The shift from the subsidy model to the pay-peruse kind of model is now showing up everywhere. And it's clear that the AI subsidy era is well and truly over. putting a fine point on the idea that we are shifting from a subsidy era to a trade-offs era. The US government is even at this point getting involved in the rationing of the most powerful models. Recently, when Anthropic wanted to expand access to their most powerful and still limited access mythos model, the White House opposed the expansion not just because of cyber security concerns, but because they wanted first crack at all those tokens. The sum total of this is that the very very short golden age of agent experimentation, which lasted from the beginning of this year to the middle of this year, has come to a close. And what's bad about this is that experimentation plays an incredibly important role in figuring out how we're going to actually get the most value from these agents. The implication of agents is not doing the same stuff we were doing before just a little bit faster, a little bit cheaper. It's doing totally new types of things in totally new ways. And I don't think that there's any way to figure that out without just actually going around and doing it. This is especially true when it's lots of non-technical folks doing totally net new work. And so the loss of the ability to experiment freely is a genuine loss.
It also significantly increases the chance of AI inequality where only the already resourced have access to the most advanced models and the differential between the models that the most wellresourced have access to versus everyone else gets bigger and bigger.
And yet on the flip side, there are some good things about the place that we find ourselves as well. Certainly, the fact that we're discovering that extensive agentic usage is actually much more expensive than we thought changes the calculus on human replacement fairly significantly. Even if it's just a temporary state of affairs, there is incredible value in buying ourselves time to adapt to transition. The question of AI disruption is not just about how much of our current work AI can do. It's about the speed with which it starts to do it and the pace of our ability to adapt. Having the most advanced agentic uses not be clearly short-term financial wins gives us more time to adapt. And by the way, this sort of market-based adaptation is a way healthier and more sustainable type of adaptation than some sort of force slowdown pronounced from on high.
Speaking of healthier markets, although it sucks for us who are losing some of our toys, companies being forced to make the market pay at a sustainable price is obviously way healthier long-term for the sustainability of the industry as a whole. The irony of what we will see as the resurgence of the bubble narrative is that a world in which companies are continuing to subsidize usage is one that is way more likely to have a big bubble form than one where the market is adapting to the actual price of the goods being sold. Still, regardless of what's good or what's bad about how this is changing, what was completely inevitable is that this was going to generate a new bubble narrative. I discussed on a recent show that the new line from the professional AI deniers is no longer that the AI models themselves aren't useful, but that actually your vibecoded apps are crap. And of course, it's more than that. Not only are your vibecoded apps crap, but if those crappy vibecoded apps aren't making money, they're not useful. And if they're not useful and not making money, then you're just wasting money. And since we're now in a token shortage, and when that money wasting gets cut off, well then of course all that revenue growth from OpenAI and Anthropic will stop. And as the market sees that, they won't have the resources to need to continue their infrastructure buildout. and the bubble will finally pop again. I am saying that this narrative was completely inevitable based on the changes that are happening and of course it was going to line up with the summer session. And boy howdy, here we are. AI policy adviser Dean Ball wrote recently, "I feel us approaching yet another summer of discontent with AI, just like last year when many of my peers in the AI commentariat declared deep learning to have hit a wall because of GPT5 blah blah blah." And sure enough, yesterday, Uber somehow once again made big news when following the revelation from its CTO that the company had burned through its token budget in 4 months. Now, in a new interview, the COO said that all that token spending wasn't worth it. Specifically, he said that there wasn't a link between that increased token usage and an increase in the number of useful consumer features that were being pushed out. And my goodness, did the professional critics jump up all and down over this, weaving basically a story just like the one I just gave you that draws a direct line from this one interview to the catastrophic failure of the entire American economy as the AI bubble bursts. And to be fair, it's not just the most died in the wool AI deniers that are starting to walk down this path. CNBC's Dear Drabosa writes, "Part one is companies realizing they're spending too much on AI. Part two is companies switching to cheaper AI because there are good enough models to do the job. This may not bode well for OpenAI and anthropic valuations that assume they can hold pricing power. The argument here is that if companies start to choose, for example, cheaper Chinese versions, that could threaten the ability for OpenAI and anthropic to charge what they want to charge, which could have big implications for their revenue growth, which could have big implications for their IPO price, which could have big implications for the way investors see AI as a whole. Adding to this, you got this wildly viral chart this week of the daily install counts of AI coding assistants in VS Code that basically saw a plateau over the last couple of months in terms of the number of daily installs. Reard Jark writes, "It's clear that growth for coding tools such as cloud code has decelerated from the pace it was since the start of the year. It might be compute constraint related or due to many clients blowing their fullear AI budgets, monitoring this trend very closely. And of course, all of the AI consultants will come out of their holes to shake their heads vigorously and agree with how aimlessly companies are spending tokens because of course it becomes just an advertisement for their services. These are the same firms that were the biggest culprits in perpetuating the MIT lie last year because they got to say 95% of AI work fails. We can help you be in the 5%.
Now, as you can probably tell from my tone, don't put a lot of stock in this resurgent bubble narrative. Professor Ethan Mllik wrote, "We aren't going to do this again so quickly, are we? Rising demand results in higher costs. Higher costs result in lower demand. It's almost like some sort of equilibrium is being achieved, but there's no indication I see that companies are finding AI less valuable over time."
Journalist Eric Thompson writes, "We're getting another round of the AI bubble is popping stories with the news about Uber and Microsoft pulling back on AI subscriptions because their agent cost went crazy. Maybe, but GPU rental prices are still up 2x from where they were 4 months ago. It doesn't seem like demand is slowing down at all when eg New York City hotel prices are twice as high as they were last year. You shouldn't believe people telling you that nobody is going to New York City anymore. Maybe someone smarter than me can correct me on this logic, but if the price for accessing AI comput is skyrocketing, that's because demand is still significantly outrunning supply, which sounds to me like the opposite of the beginning of the end of a bubble.
Research firm Epic AI put some numbers around this, trying to estimate both the expansion of token supply versus the expansion of token demand. And the TLDDR is that while global inference capacity, i.e. the supply of tokens, is more than tripling each year, their estimates have global demand for tokens growing by roughly 10x per year. Now, I wasn't a math major, but a 3x expansion of supply in the face of a 10x expansion of demand certainly doesn't seem like a scenario where OpenAI or Anthropic are going to have any problem selling every token they produce. But let's go beyond the macro because the really interesting things that are happening are the way the market is trying to adapt to what it's spotting as this shortage. First of all, we're getting innovation in the models themselves. I've talked a bunch recently about Cursor's new Composer 2.5 model, which has jumped to third place on Artificial Analysis's coding agent index behind only Opus 47 Max and GPT55 extra high while costing 10 to 60 times less than those models. And although they didn't choose to highlight it much at IO last week, sneakily Google's small cheap model Gemma 4 is seeing adoption that outpaces Chinese models like Quen 3.5 and 3.6. Leighton Spaces Swix writes, "Everybody talks about the China to US catch-up. Not enough people talking about the US to China catchup.
And what about that VS Code chart? Now, first of all, I think it would be reasonable to be not all that stressed out about a plateau after a period of massive growth. Things don't tend up only forever. Growth in most areas tends to come in fits of punctuated equilibrium where things stay pretty stable for a while and then spike up and then stay stable for a while and then spike up again. But honestly, I don't even think that's what's going on here.
Remember Rard who shared the chart said it's clear that growth from coding tools such as claude code has decelerated from the pace it was since the start of the year. Developer Simon Willis bit back or does it reflect that the most popular interface surfaces for coding agents these days no longer live in developer IDEs. What he means by that is that if you're wondering what VS Code even is because you use cloud code or codecs, you're a person who wouldn't be counted in those numbers even if you had recently adopted these tools. As Ronan Berer put it, cursor and VS Code are just losing market share. Lots of folks now using CLIs, i.e. the terminal interface or desktop apps. But are there perhaps some numbers we should put around that? Simon again shared a chart of npm installs of codecs, which means when Codex was installed directly through a terminal interface. He points out that they were at about a 100,000 a day in January and are at over a million a day right now. In fact, in the last couple of day, they've surged up to 1 and a half and 1.8 million. In other words, this chart is as much or more about VS Code as it is about Cloud Code or Codeex. Now, I want to be clear. We are entering a new moment. And as we peel off the frenetic pace of growth of the last 6 months, there is a lot of valuable discourse to be had. As I tried to articulate before, there's a lot of good that can come out of a resource constrained era. Entrepreneur and content creator Greg Eisenberg recently talked about a trip to San Francisco where he writes, "I heard the phrase agent debt for the first time. like technical debt but for agents. When you hack together an agent workflow fast and never clean it up, the system prompts conflict. The memory gets polluted. The tools overlap. Six months later, the agent is doing weird things and nobody knows why. Now, treating agent debt as a new phenomenon of this agent era and figuring out how to deal with it is exactly the type of conversation that can be extremely valuable in this type of slower period. You're also going to continue to see, I believe, more and more resources flood in to help support better, more thoughtful adoption. It's why both OpenAI and Anthropic have spun up consulting ventures recently. Look, ultimately for those in the no, these AI slowdown panic periods are amazing. If you are even the least bit competitive and want to be getting ahead of peers in understanding how you use these tools, there's nothing better than everyone else opting out for a couple months, hoping that this whole thing finally goes away. In any case, inevitably we will continue to track the AI slowdown panic here on the show. But for now, that is going to do it for today's AI daily brief. I appreciate you listening or watching as always and until next time, peace.
Related Videos
The #1 Reason Your Top People Keep Leaving (How to Fix It)
Entreleadership
470 views•2026-05-29
What Happens After A Motorcycle Dealership Shuts Down?
FastestWay.1
374 views•2026-05-29
The Evolution of DSP's Pokemon Unpack-ack-acking Grift
Toxicity_Unmasked
2K views•2026-05-29
Help re-structure my finances, I want to buy a house, save and invest
JennNxumalo
2K views•2026-05-29
Asian Paints Q4 Results: Revenue Beats Estimates, 5 Key Takeaways For Investors
NDTVProfitIndia
111 views•2026-05-29
Trying to Afford Vancouver on a Single Income | $2,550 Mortgage
chelseaspursuit
308 views•2026-05-28
Are you busy but still feeling broke?
TaraWagner
305 views•2026-06-01
7 Nigerian Stocks That Could Explode Because of Dangote Refinery IPO
femiakinwale9269
478 views•2026-05-29











