As AI agents become viable at scale for enterprises, organizations face critical resource constraints where token usage, inference costs, and compute limitations create new business challenges; while inference costs are decreasing 10x annually, usage is growing 100x, and the competition between training and inference resources means leading organizations must implement systems thinking around model selection, token management, and resilience to maintain competitive advantage.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Why the agentic era is already hitting resource wallsAdded:
On this trendline episode of you can with AI, we are talking the tokconomics of the agentic era.
Welcome back to UK with AI. I am Nathaniel Whitmore, host of the AI daily brief and partner with KPMG on Ukan with AI. Today we're experimenting with a new format that is meant to be a fast, rapid conversational deep dive on one really important topic that's happening right now. Today we're discussing the tokconomics of the agentic era.
Basically, the fallout and implications of what happens when agents become viable at scale for the enterprise. We get into issues like resource management and token efficiency. And I'm joined for the conversation by KPMG's global head of AI and digital innovation, Steve Chase. All right, Steve, welcome back to another You Can With AI. We're trying something a little bit different. you can with AI we've conceptualized as this very practical ground level kind of view show that takes what's happening with AI but you know not just the news it's sort of what's happening actually inside companies tries to distill that into actually useful uh nuggets that that people can take away >> but one of the things that we found is that there's so much happening so quickly that we wanted to try a sort of a format that could pair with the speed at which it's moving and so this sort of you know trending episodes or trend line episodes. What we're going to do is dive into some topic that is very eminent from the news from the last, you know, call it a couple weeks, month type period and just kind of double click on it. And for this one where we're starting over the last few months, there was a broad recognition that agents had become very real, right? And this was a recognition that happened on the part of power users and then it happened on the part of non-power users who discovered claude co-worker claude code and then it came to the enterprise and you know we've got all sorts of interesting evidence uh including the most recent pulse report from KPMG that this is happening and now we're all kind of living in the implications of it and so uh I want to talk about what some of those implications are you and I were just starting to talk about them and just kind of dig into you know how companies what the new challenges basically that we're finding as we transition to the agentic era.
>> You know what's funny? I was look I was listening to our episode just to get ready to you know kind of remind myself what we had actually covered which was great by the way but I was thinking about you had said something about agents was absolutely it was right here and now but we were talking a little bit about if you're not building those agents you likely don't know some of the constraints and some of the things. So it sounds more like you're talking about something but you're not doing the thing right and so I feel right now what this new conversation occurring around the resource constraints the token usage the what like is there a conversation wherein you know an agent isn't the right answer because actually it's more expensive than a person but then there's the question about well how long will that be true because you know ostensibly uh you know we're going to continue to follow this path where tokens are going to continue to cost less less and less and less. And then there's maybe a counterveailing view that says, well, actually a group of items are quite in conflict with each other right now, which is like training the next generation of these models, which is incredibly important, is consuming a tremendous amount of the same resources that are needed by my business potentially to build the agents that I'm going to need to run it in the next generation enterprise. So I think we're in this really, you know, really business ccentric moment right now about that helps with prioritization is what's that business outcome I'm getting for the investment I'm putting into it and the cost of it and how am I going to be able to both create resilience around that and great outcomes for my customers or my employees or wherever I'm I'm deploying those agents.
>> Yeah. So it's interesting we have kind of two sides of this conversation which is one is the implications for businesses for enterprises who are deploying this new set of capabilities that we have and the second inverse but obviously very related set of implications are for the companies who are delivering the models who are delivering the things that are powering these agents and I would argue sort of flipping to that side for a moment because it kind of sets the context in which the enterprises have to play for the first time we are actually grappling with major resource constraints in a way that even the big companies like OpenAI could pretend that they weren't. As recently as the end of last year, you know, yes, to some extent on the margins, they knew that, you know, if they spent all of their comput resources over here, they couldn't spend them over there, but OpenAI shut down Sora and that wasn't just because I don't even think that was because they it wasn't working in some sense. You know, the the daily activives were still going up like it hadn't become some Tik Tok breakout, but it was a a real viable thing. they just couldn't dedicate that compute. It was just wasn't worth it from a business standpoint, from a trajectory standpoint. And I'd never seen OpenAI make an actual constrained decision before that. Um, you know, and this is in the context of it's not a money thing. You know, they raised another 122 billion. There is just actual limits to how much compute is available. And I think to your point, there is a competition between compute for training and compute for inference. And I think that the big new factor now is that the it turns out that when you have these superpowers that agents enable, you don't have people finishing the work that they used to do and looking down at their clock and realizing it's 3 p.m.
and they can check out early. Instead, they're looking down at their watch at 3:00 a.m. realizing that, you know, they want to stay up an extra hour because they're now in Q3's work and they just want to keep going, right? It is a massively expansionary resource consumption. And so, you know, basically, you got a situation where two things can be true at once. On the one hand, yes, inference costs come down dramatically. They're coming down 10x, you know, over the course of every year.
But if usage grows 100x, when inference costs only come down 10x, you're still facing a very very serious resource constraint. And that's now the you're the moves that these companies are making. The other one that we should obviously talk about in the context of what businesses have to deal with is more and more they're going to be eliminating the subsidy that people were getting from their from their subscription accounts. You know, where you could be paying 30 bucks or 100 bucks or 200 bucks and be using $1,000 of tokens. those days it feels like are coming to an end extremely quickly.
>> Yeah, I was going to so if I if I jump in there with um like how I think about that for the business. We've talked a lot about whether or not you know most company or how many companies have have really deployed this capability and scale, right? And so we know there's a a group of market leaders that have but there's also a group of companies that are coming up that are deploying. We can see more agent deployment, but that's not the same thing as longunning token hungry agents that you were talking about the the guy in the, you know, at 3:00 a.m. trying to knock out that next um block of capability or what have you.
As we the tools become more capable, especially the coding tools, those are always a harbinger of what's to come.
That experience that individuals are having can start to occur with enterprises. And there's been more enterprise deals announced where people are bringing in again these highly capable agentic longrunning token hungry solutions. I mean I don't think we've seen that level of demand come into the space before quite like this because you know for what it's worth I mean the original kind of waves were I would say reasonably lower token users coming from the enterprise. So I think that's another thing coming is more pressure on the inference side, right? And so I think that you don't really know how those economics come together, what the supply and demand curves look like.
There's a lot written around that if you just assume token like the two things press on each other. I also worry about almost like the way with electricity is peak demand being right in the middle of the workday. you know, for an enterprise that's a real issue. If all if my customer service that I've I've digitally deflected with my agents all of a sudden slows down right at that moment, then I've all of a sudden I've got a bunch cascading into my humans that I might not have as many, right?
You know, so resilience becomes maybe as important, Nathaniel, and the the way that you're going to build your processes for resilience as uh the cost part of that would be. Yeah, it feels like it it certainly puts even more emphasis on the systems design need around agentic deployments versus even just a you know there's a human capability need clearly in terms of sort of training and how they use these things but there's you know to your point there's a real systems design need here as well because >> none of the solutions are perfect I mean sonnet 46 is great but the reason that this is happening is opus 46 not sonnis 46 >> and it's very hard. We're not yet at the point where a lot of the cheaper models are good enough for many of the use cases that are getting people the most excited. Maybe that looks totally different in 6 months, but it's not right now. So, it's not as simple as just switching to Sonnet. And frankly, it's not even as simple from a kind of a redundancy picture as switching from 46 to 54 GPT 54 if something goes down.
They they have totally different behaviors in the same applications as anyone with an open claw running you know who who made a switch recently will tell you it's really really different experiences.
>> Yeah. Which is really important is you got your evals right. Right. Especially we would suggest to everyone be multimodel have um you know avoid lock in in those situations because you need the ability to move across them and they all do different things. So you got to have your eval right. need to understand for your critical workloads if I switch the model what would that require and as you say you can't just like think oh well one is just going to be just as good as the other you know we we have a robust set of evals that we are always tracking you want to try to use the lower models when you can when they're good enough for what that thing is that you're doing but you also want to know what it's not going to be good for and you need to know that to be able to like do that systems thinking right it's like a critical element of systems thinking you can't buy that externally like you can't like go to somebody else's evals You have to know yours, the behavior, your stuff. We spend a lot of time with clients on this topic of how they're designing their environment, less so about the individual agent, but more so how they're designing the environment to be able to be resilient in that way. But and in particular, uh, think about which model am I using for what, what's the eval? Because as the models get better and I, it's not just so obvious you can just replace a new model in there without entirely rethinking the eval.
That's a lesson people learned a while back. This is going to cause us to like maybe relearn it a couple of times. And then the other thing is knowing like so resource management is always critically important. Most people aren't great at resource managing tokens yet or where that might be occurring and the visibility. We talked about where are the systems that can actually show you all that. Being able to turn somebody off who's doing something that, you know, in the enterprise it's like slowing you down. It may be some someone's. It's going to feel almost like a denial of service attack from inside. It's too much uh token use. You know, leaders are already doing that and needing to do it because frankly mistakes can happen that like cause a lot a lot of token use. And so being able to like see it and execute on it and be able to do that automatically very important. Well, and and to just add another layer of complication on top of this, you have some of the most sophisticated organizations in the world, or at least some of the most sort of technologically sophisticated organizations in the world, who are preaching a gospel of going in the exact opposite direction. I just got done recording an episode about how Meta has now a leaderboard set up for token maxing for who can consume the most tokens with their bet being effectively that actually it's hard enough to consume tokens that even if you got a bunch of people who are kind of running parallel agents doing the same thing in order to to to win on that conspicuous consumption board that net the benefit to the organization is better but like try to put that inside of the ra the average organization but the I think the the challenge then is if you are getting good at resource management, can you do like that requires not just cutting off the heads of the top users because they're using the most. Using the most isn't a priority, a bad thing. It's the issues come when someone doesn't realize that they left some process running with their kind of sanguin version of OpenClaw that they built that accidentally just chunked off $10,000 worth of tokens in the last week or something like that.
>> Yeah. I said like it's actually hearkens back to the early days of this. We wanted people using AI. We were tracking people's usage, right? We could demonstrate that people who use AI generally the the higher users were performing better. So it's the same thing, right? Like you need to be consuming tokens and but then there's a you know then right behind that you do the effective use version of that. So now I mean I'm not sure what exactly that measure of effective use of token.
It'll be some sort of outcome measure.
Clearly that'll be like what we move to after we get after the blunt instrument of just use a lot, right? Because also that's super easy to game, right? Like you know any sort of those like per consumption things. I read a really interesting thing, you know, if you get to the like where is all this going? And I was talking to some some politicians that were talking about universal basic income or what like what is going to be the one of the ways that countries are going to get involved in this and they were talking about token taxes, right?
Like that that might need to be part of what one of the ways that we pay for what whatever societal impacts of these things. So I thought that was a fascinating that they were going to that point. I know I believe OpenAI just put out their version of a concept around I don't think that was what they were saying about it. I think they were talking about agents more specifically but anyway interesting uh interesting times right now Nathaniel.
>> So maybe as we close out the last conversation we had one of the key themes was the increasing gap between leaders and laggers.
>> Yeah. And right now when it comes to a gentic usage, you know, and this sort of systems thinking, I think it's still very early that even though leaders are still just we're all fumbling around in the dark even if you are a leader. But with that being said, and for the sake of kind of trying to have some amount of actionability to these conversations, are there any patterns, early patterns that you think you're seeing among organizations who you consider leaders even in terms of how they're approaching this new set of problems? Even if it's as simple as they are recognizing these as the new problems that they have to address.
>> Yeah. Well, um I mean one in particular is that they have a sense when they build something of how much it will cost and depending on how they solution it like that actually you can tell how sophisticated someone is if they're like oh you know that would I'm solving a $50 million problem with a $20 million solution that's what I think it's going to take to run this thing. So, but yet I have another way to do it which is a like $4 million answer, right? Like someone who can like actually has a point of view on that. That's systems thinking that you were talking about.
That's one. Two, I'm just if I if I leave off the coding like just specifically how people are running their coding areas and more about like how they're thinking about bringing AI into and agents into business processes, a legitimate trade-off conversation about what is the outcome? when do I want to get to that thing? How do I organize and prioritize around the things that are my most important? And if they are my most important, how am I building resilience into it? So, at least they've included the resilience point in it and they've thought about what would happen if I lose access to this. What will then I how will I then accomplish the same outcome? What's my backup plans and whatnot? So, those are just a couple of them. and then tracking to see if it's occurring that way and then use that insight to bring back into their forecasting models.
>> No, I think that resonates. I mean, even just from the standpoint of things that I'm trying to do. There are not really good systems right now for tracking token consumption on any sort of granular basis, which means everyone's kind of got to spin their own things a little bit for the moment.
>> Um, but boy is something better than nothing. even if it's just some automated alert that you know that yells at you when your your anthropic uh account is overdrawn again, you know.
>> Well, you definitely don't want to give your uh your Claudebot access to uh do an auto refill build up maximize on that.
>> Yeah. Yeah. All right, Steve. This is super interesting. You know, I think this is fascinating. I think it's going to be very interesting to see over the next few months how companies manage some of these transitions. I think that some of them will be solved. I imagine by time, right? If we get a mythos model and a spud model that make 54 and 46 look silly, like there's much many more use cases that can have a lower cost to serve model. But I think that probably regardless of that, the organizations that do build these types of systems now are still going to be significantly advantage relative to their peers uh going forward. You know, Nathaniel, if I end on where I think it goes, right?
Like I think that we will start to see more responsiveness towards token efficiency, you know, from the I would guess the more token efficient model will end up being a winner. It'll be part of the characteristic of the benchmarks that we see how many token like and we'll be put more pressure on that particular one. I also think the rap there's you've mentioned this in your own episodes the opportunity around rapper comp you know for people that are rapping to actually really you know look into how they can also orchestrate find the lower cost models and stuff like that that becomes for anybody who's building AI or agent integrated applications that's going to be like a critical success factor for them so you know we're going to watch and see this massive exponential growth in demand and how the market reacts to that.
Typically, it won't be in a straight line, right? It'll be some new thing is going to show up as you said. Not that's what I'm really interested in is what's going to like what's that maybe a little bit of an orthogonal step that's going to say, "Oh, yeah. Well, that just completely changed things." Like, for example, when we saw some of what was going on in China.
>> Fascinating stuff. All right, Steve.
Great to chat again. Uh, excited to see how this changes in a month or two.
>> Thanks, Nathaniel.
Related Videos
The #1 Reason Your Top People Keep Leaving (How to Fix It)
Entreleadership
470 views•2026-05-29
What Happens After A Motorcycle Dealership Shuts Down?
FastestWay.1
374 views•2026-05-29
The Evolution of DSP's Pokemon Unpack-ack-acking Grift
Toxicity_Unmasked
2K views•2026-05-29
Help re-structure my finances, I want to buy a house, save and invest
JennNxumalo
2K views•2026-05-29
Asian Paints Q4 Results: Revenue Beats Estimates, 5 Key Takeaways For Investors
NDTVProfitIndia
111 views•2026-05-29
Trying to Afford Vancouver on a Single Income | $2,550 Mortgage
chelseaspursuit
308 views•2026-05-28
AI Investment: Data Centers & The Bottom Line
MemeTeamClips
134 views•2026-05-28
Are you busy but still feeling broke?
TaraWagner
305 views•2026-06-01











