This massive GPU expansion highlights a brute-force approach to intelligence that prioritizes hardware dominance over fundamental algorithmic efficiency. It is a masterclass in infrastructure scaling that underscores the staggering capital requirements of the Agentic AI era.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
2026 ZKast #82 - AWS at GTC 2026: Scaling to 3 Million NVIDIA GPUs & The Future of Agentic AIAdded:
Welcome to Zass everyone. I'm Zas Caraval from ZK Research. I'm here at GTC 2026 in San Jose. I'm obviously at the AWS stand uh in the really busy expo hall, right? And so, and the AWS stand's pretty busy, too. I think it's all these robots we got behind me. So, >> yeah, very dynamic.
>> Mostly for the robots, not for us.
>> Yeah. Well, just watch out. Yeah, I've seen the movies. Uh so I'm with Karthik Vetto and Rachel Jiang both from AWS and just uh before we start talking about AI and AWS a quick intro on what you do. So Rachel let's start with you.
>> Yeah so hi everyone this is Rachel and I lead product marketing for AI infrastructure at AWS.
>> I'm Karti Gwen. I'm I'm head of product for um EC2 instance portfolio.
>> Yeah. Okay. I know it's a it's a busy show, busiest uh you know u GTC that I've been to and uh uh just any initial thoughts from the show.
>> Well, I mean definitely it's um I mean I've been here for almost 3 years now running and definitely I think every year I feel like it's the getting bigger obviously um uh manifestation of the Genai interest in general in the market.
So we're excited about uh partnering with Nvidia and then delivering the product for our customers.
>> Yeah. In fact, Rachel, during his keynote, Jensen Hong, the CEO of Nvidia specifically called out AWS said they've been working with you the longest. I believe there's more Nvidia workloads running on the AWS cloud uh than anywhere else. And so what's that mean to AWS?
>> Yeah, so it's over 15 years of partnership. So we really sweat the every detail from an engineering perspective to really make the system works for our joint customers. So what it really matters now especially now is because um as customers we see them progress from pilot to production for their AI workloads. It's really not just about running your single workload on a GPU instance, but it's more about the things around it like being able to run the system reliably at scale and then also the how do you control cost, how do you um build the observability and uh management around it and then also the governance of the data etc. And so, um, AWS is really doing with NVDS partnership, we're doing a great deal over the fa past 15 years to really make it really good for customers and we're really good at it because we've been running infrastructure at scale for over 20 years now. So, >> yeah. And Garik, I think one of the data points they gave, and I want to make sure I get it right, right, they announced the deployment of over 1 million NVIDIA GPUs. Uh, right. So, can you put that at the scale for the average enterprise? What's that mean?
>> That's right. So yeah, we just announced uh that we'll be planning to add a 1 million GPUs this year. Uh just to put that in perspective, as Richard was saying, we were the first cloud providers to bring GPUs to to to customers in 2010 when you know before all of this was you know so such a big market. uh and since 2010 and now when 2020 you know 5 15 years our footprint has of of GPUs have grown to 2 million GPUs and now we're seeing just this year we're going to bump it up to 3 million by end of this year which like 50% growth compared to the existing footprint. So couple of things there shows how much you know demand is there for for for these platforms for these hardware platforms on AWS for customer side and also a testament to you know uh AWS being able to scale up and add infrastructure at a massive and a rapid pace. Uh in fact you know we probably and we have been doing that for 20 years consistently across uh infrastructure and we're bringing that skill set to the the more complex ML infrastructure.
Yeah. And I think initially the partnership was about just providing fast access and lowcost access to GPUs, but now you're really focused on co-engineering the stack. Correct.
>> Right.
>> No. So yeah, I mean our partnership with Nvidia is very deep and and in addition to just bringing their GPUs to uh to the cloud. We've been partnering up with full full up and down the stack for like all the way from uh the Nemoorn models on on bedrock optimizations on EC2 working with closely with our Nvidia uh engineering partners uh and all the way to Nvidia as a customer for us with the project SBA where we have delivered massive uh massive um uh scale of uh Grace Blackwell clusters for them so that they could offer the train and inference for their customers.
>> Yeah, the robots calmed down a little bit. So that's good. Yeah. Uh now AWS is the first uh to support the RTX Pro 4500 Blackwell server edition right and so can you explain um you know how the Nitro system changes u you know the profile for GPUs compared uh to to onrem. Yeah, absolutely. Maybe you want I don't know if you want to talk about it. I can talk about it. Um so, uh Nitro system is a foundation for our uh EC2 instance infrastructure. So, um it sort of like offers like three four big benefits for our customers. One is like you know the security aspect of it. So, we basically offload the management of the server to a dedicated uh chip on sitting on the server versus sort of like running it on the whole CPU that the customer shares. So sort of like the hardware isolation is an additional layer of security for the customers on all VCDs and RS or GPUs >> and also it improves performance as of for customers. We can by offloading uh uh most of the all of the customer uh server management to dedicated chip we could offer all the host uh resources to the customer and and and last but not the least it helps us deliver new instance variants pretty fast. So because it's some sort of a model design. So that's where I think the G7 instances come in as well. Like you know for for us we recently launched the G7E with the RTX 6000 and quickly following that with the RTX Pro 4500 G7 instance sort of shows the pace of how we can uh launch uh new instances and scale infrastructure with with underpinning on nitro system.
>> Yeah.
>> And perform or u sustainability is a big theme here. Performance per watts.
Right. And so the G7E helps with that as well. Right.
>> That's right. So uh all the new generation GPUs and also the the investments we make on the uh the networking uh infrastructure side as well helps customers deliver more performance per watt >> and and I think with the G7 I think we're seeing about 2x improvements in performance per watt.
>> Oh that's pretty significant.
>> So that's a significant sort of like you know uh in terms of sustainability and also obviously in price performance too.
>> Yeah. Now Rachel uh obviously the the there there's a big data bottleneck right a lot of projects fail because of data and I know uh I believe the data was a 3x speed a lot of patchy spots for EMR and so how are you doing that and you know for those data scientists who are stuck how they help them get unstuck >> yeah I mean >> there's a lot of them that are stuck right >> coming from an analytics background this is really near and dear to my heart right so as we see data scientists and data engineers like work on iterating their a IML models and as they report on their daily sales operations for so for example we see a lot of um of that a lot of our customers are spending hours of time waiting for a data pipeline to be completed and that's a problem that we're trying to solve and so with uh this new architecture of um Apache Spark running on Nvidia QDF and then running on um the the latest uh G7 instances on Amazon EC2. Uh we're seeing that customers able to get actually 3x faster performance on this new infrastructure compared to CPU.
>> Yeah. And then dat is a sort of like a new wolf that we sort of like uh u enabling benefits to customers with the G7 RTX for400 which I guess Jensen was also talking about in his keynote.
>> Yeah.
>> U yeah.
>> Yeah. And also uh another theme obviously is genic right. I think last year there was a lot of u generative event now it's agendic AI and so as companies move from chat bots to agendic agents that actually complete tasks what are some of the infrastructure considerations they need to think about >> yeah so with agentic AI because the model would need to keep reasoning and delivering output based on the reasoning right in a sequential order so that adds a lot of u high bandwidth memory requirement for the infra underlying infrastructure and that's why uh with the announcement um probably a week ago about how we're combining uh the power of tranium and uh the power of cerebr uh to to really deliver the fastest inference for those type of workloads and so we're letting each chip to do what it's best at uh training for the prefill phase of the of the inference which is like uh understanding the prompt and then also uh to let uh Cerebras chip to do what it's best which is uh doing the the decode phase of uh of the inference which is to delivering the token and then uh delivering the uh response to a promise. If you could just add one point on that ultimately the goal being for agentic adoption bringing down the dollar per token right I think that is what we consistently hear from customers is like they have all these use cases that they want to enable with agentic AI workflows but >> the cost is still high today for inference and that's that's part of the goal that >> at the event there was a lot of chatter about neatron 3 right and so you can able to run that on bedrock now what's the advantage of running that on bedrock >> um so it's all about giving customers the model choice that they'll need to run their applications in production. So what we hear from customers is that when they run an application they need like usually need more than one model and that's why bedrock is that unified platform where customers can uh choose and uh pick the model that they want um to to run their workloads and then it's of course optimized by the underlying infrastructure that we built uh across the full stack and then so adding Neotron super 3 to that mix is that commitment to delivering the best choice, the broadest choice for our customers.
>> Yeah. And I also noticed in the in the press release you had or was it a blog?
It was one thing blog.
>> Yeah. There was a a feature coming soon reinforcement finetuning. Right.
>> Right. And so uh Gar talk about the how that's used especially in the regulated industries to help really refine what they're trying to do from an AI perspective versus just you know off-the-shelf large language models.
>> Yeah. So I think it's it's nowadays it's more than just building the largest frontier models, right? It's all about like how you can customize the models to fit for your needs like leveraging the domain specific knowledge whether it's finance or um healthcare and life sciences or retail to really bring that uh domain specific knowledge to your model um to make your model more accurate and more relevant for for your business context. And that's where uh the reinforcement fine-tuning uh is really helpful and and that's available again on Bedrock as well.
>> Okay. Yeah. Now, now we're entering an era where um we are starting to see more production deployments and that's bringing security and trust into focus.
In fact, I was in Davos earlier this year at NRF show mobile congress and >> I love to be there.
>> Yeah.
>> Didn't didn't make didn't go there.
>> Uh trust came up as part of all that.
Now I know within the AWS framework as Cun tells customers even AWS employees can't see the data right and so um >> you know just I guess explain you or that for your customers as we try and build these confidential computing layers in to our AI issues.
>> Yeah. So uh so again back to the Nitro system story right? So all the modern EC2 instances including the uh GPU based instances is built on a nitro system. A key benefit of that is uh Nitro is really really reinforces uh the workload isolation so that uh not nobody including AWS employees can access uh your sensitive data and AI workloads and that's why that's what makes running GPU on AWS more secure uh than the other clouds and then also um one key uh differentiation for Nitro is also because um what we internally call the life updates, right? You can do bug fixes. uh I mean AWS can deliver bug fixes and uh system upgrades to the nitro system without taking the system down which means customers workloads can continue to run uh while that upgrade is happening and that's really important especially for uh like data scientists training on a large model and then if that model goes down that workload is interrupted and it takes hours to bring that work workflow up. Yeah. Now I'm not sure who this question is right to but um the GPU's changed so fast today architecture changed so fast today so if you're a CTO today or a data scientist or someone who spons AI initiative how do you build an architecture that can meet your needs today but are flexible enough that as you know the next version of Ruben comes or the next version of black comes you can still take advantage of that technology without having to you know burn down the house and start over.
>> Yeah. So I think uh broadly speaking I think the architectures are fairly consistent at least if you take the GPU GPU world u I think what I would recommend you know the business leaders to look at is basically they have a vast range of use cases a generic use case in in a in a in a company right so um and and they might need a different different types of GPUs between between these different workloads right so identifying hey where I would need the latest and the greatest GPUs which workloads would benefit from that and which ones are okay staying on some of the current generation and older generation GPUs, right? So, we're still seeing a um a strong demand for our current and older generation GPUs as well because they are most cases good enough for many use cases >> on the ML side. So, identifying that doing that work is what I would sort of suggest or recommend to figure out uh given you know the broader use cases again with different latency profile requirements, different uh dollar token requirements, all of that.
>> Yeah. And we already deliver the broadest selection of GPU options for customers today.
>> Make your own, right?
>> Yeah. And so that's that's that's what we're continuing to do with our uh integration with our with the latest uh Nvidia platforms.
>> Okay. Uh so let's wrap up with a little bit of a rapid lightning round. So quick answers questions. Okay. Can we ask you both? Yes. Uh and so um which GPU architecture is cooler, Reuben or Blackwell?
>> GPU architecture. Okay, I'm uh I guess you're not asking about the names of uh the names the scientists. Uh I like all I love all the scientists Grace Black and Rubins. Well, I had to pick the the latest and the greatest product because you know customers love always the new GPUs, better performance, better price performance and all of that. So I had to go with Ruby.
>> Can I pick both?
>> Okay, you pick both. You can pick both.
Okay. And uh what's the biggest myth about AI in production? Cost, security, or complexity?
Um that's a tricky one. Um >> they're all right.
>> I would say they're all but I mean it depends probably depends upon the customer and what type of workload they are or use case they're trying to implement. I think uh the biggest myth probably still I would say complexity.
>> Yes I I would think so. Yeah. And you can help a lot with that.
>> Yeah. Exactly. So I think uh uh and I think we're seeing a lot of that going back to the point of PLC to production right. I think you see one of the major uh uh challenges for customers are they're doing a lot of PC's but but many of them are not making it to production sort of like and and that's where probably I think someone like AWS and and you know we can help sort of like customers uh with with our expertise.
>> Yeah, I would say it's data governance.
>> Data governance. That's a good question.
>> Yeah.
>> Okay. Now here's a go. Your favorite AWS hidden gen service that all Nvidia customers should know about.
>> Wow. We have so many of them. Uh >> is Nitro even a hidden gem now? I think that's important. Then we talk about >> I think I have one maybe. Um so we announced a new service called Nova Forge at reinvent last year.
>> So it's basically back going back to your fine-tuning use case, right? So customers can use Novo force to optimize and fine-tune their Nova models which are proprietary models offering lower lower developer token than uh the the the comparative models and for your uh use cases whether it's finance with healthcare you know and get better accuracy and and and better efficiency out of the model.
>> All right. All right. And uh what's one way AI has made your lives these years?
Right. Maybe you want to go first.
>> Really to understand a new AI solutions better and faster, right? Because for example, >> you AI helps you learn AI.
>> Yes.
>> Yes.
>> So I use Amazon quick suite all the time. Uh it's available on my phone. So that like for example in Jensen's keynote, I was on on my way here and I looked up all the new announcements and then and how they relate to different partner solutions. So that that really helps. Well, in my case, I think uh staying on top of my emails is a challenge has been a challenge for me. I use quick quick as well, but but it has a very good functionality where I can tell it, hey, go check an email from so so and so and then see what they're telling uh what what they're asking about and then come back with a reply.
>> Yeah.
>> And I can just, you know, uh proofread it, make some edits, and send send it out.
>> All right. Last question. If you had to describe GTZ26 in form of an emoji, what emoji would it be?
Energizing. Is that an emoji?
>> I think so. A lightning bolt.
>> Lightning. Lightning bolt.
>> Yeah, it's a guy's name is Zeus. I'd go with the lightning bolt. And you >> I like the the thumbs up.
>> Double thumbs up.
>> Double thumbs up.
>> All right. No. Anyways, thanks for your time.
>> Of course, you know, from the really busy AWS stand here at GTC. This is D3 too. I can't imagine that.
>> On behalf of Car Rel from Z Research, D.
Thanks for watching. Give us a like and also hit that subscribe button. We'll see you next time. No other person next.
>> Nice. Thank you for wrapping up.
>> Thank you.
Related Videos
OpenHuman VS Hermes AI: Who Wins?
JulianGoldieSEO
285 viewsβ’2026-05-29
Long-Running Agents β Build an Agent That Never Forgets with Google ADK
suryakunju
142 viewsβ’2026-05-30
5 Mind Blowing Omni Uses Cases
PaulJLipsky
1K viewsβ’2026-06-02
This computer is made from real human brain cells. And you can buy it.
Talktmsmedia
3K viewsβ’2026-05-28
BREAKING: Microsoftβs New Image Generating Model Beat Out GPT 1.5 and Nano Banana 2
aimmediahouse
122 viewsβ’2026-06-03
I Made the Same Anime Fight Scene in Every AI Video Generator
NobleGooseAnime
295 viewsβ’2026-05-30
Nvidia Bets Big On AI PCs | New Chip To Power Windows Laptops | Technology | AI Updates | N18S
cnnnews18
3K viewsβ’2026-06-01
I Tested NEW Opus 4.8 on Four Projects (Updated LLM Leaderboard)
AICodingDaily
298 viewsβ’2026-05-29











