Installez notre extension pour rechercher instantanément dans n'importe quelle vidéo

How Google's 8th Generation TPUs Power the Agentic Era
Ajouté : 2026-05-05

198 vues3211:08googlecloudVersion originale : 2026-04-22

Google’s TPU8 architecture marks a pivotal shift from brute-force scaling to latency-optimized design, addressing the critical bottlenecks of real-time agentic reasoning. This vertical integration remains Google’s most formidable moat in the escalating AI infrastructure race.

[00:00:00][TYPING] AMIN VAHDAT: Over the last few years, we have all seen the evolution of AI.

[00:00:06]We have moved from generating text and images to complex, multistep reasoning tasks.

[00:00:12]Models have evolved to keep up with the changing demands, and the infrastructure needs to evolve as well.

[00:00:18]Recognizing that the infrastructure requirements for pre-training, post-training, and real-time serving have radically diverged, I'm thrilled to announce that Google will be launching two eighth-generation TPUs, TPU 8t and TPU 8i.

[00:00:33]These next-gen TPUs will power Google's own AI infrastructure needs to build the best models, like Gemini, and these will also be available for our Google Cloud customers by the end of the year.

[00:00:43]Model sizes are growing exponentially.

[00:00:45]The new frontier models will reach trillions of parameters.

[00:00:49]To build these frontier models of tomorrow, we need infrastructure that can handle megascale pre-training without being stalled by data bottlenecks.

[00:00:57]That is why we built TPU 8t, the large-scale training powerhouse successor to the seventh-generation Ironwood.

[00:01:04]TPU 8t is optimized to reduce training time for trillion parameter frontier models, delivering a staggering 121 exaflops of native FP4 computes and 2 petabytes of shared HBM within a single 9,600 chip superpod, a 3x increase in peak performance per superpod over the previous generation.

[00:01:26]With 2x ICI scale-up bandwidth and 4 times raw scale-out DCN bandwidth compared to the previous generation, TPU 8t drastically reduces data bottlenecks.

[00:01:37]To further accelerate the development of frontier models, we are scaling distributed training beyond a single superpod.

[00:01:43]Utilizing pathways in conjunction with the JAX development framework, we can now scale to over 1 million TPU chips in a single training cluster.

[00:01:52]Within the cluster, up to 134,000 chips have nonblocking communication, delivering 1.6 million exaflops with near-linear scaling performance using Virgo, our brand-new, state-of-the-art networking fabric.

[00:02:06]To effectively scale data-heavy workloads, we've eliminated the training delays caused by data ingestion bottlenecks.

[00:02:12]By integrating Managed Lustre, now with an industry-leading 10 terabytes per second of throughput, and TPU Direct, we deliver 10x faster storage bandwidth and direct-the-chip RDMA access.

[00:02:24]TPU 8t redefines performance capability by moving block scale multiplication directly inside the MXUs, eliminating VPU overhead and enabling higher quality, low-precision flops through native and MXU quantization.

[00:02:39]By supporting smaller block sizes, TPU 8t maintains superior model quality while staying perfectly balanced across all arithmetic and network intensities.

[00:02:50]This architectural harmony minimizes exposed vector ops time through balanced VPU scaling and finer-grained offloads to secondary threads.

[00:02:59]Ultimately, by unburdening the primary Tensor Core and easing pressure on collective communication overlap, TPU 8t allows for more flexible partitioning strategies that push the absolute limits of model flops utilization at massive scale.

[00:03:14]Whether you are training a multitrillion-parameter model or performing bulk inference, TPU 8t provides the raw arithmetic density and bisection bandwidth needed to push the boundaries of AI pre-training.

[00:03:26]As we enter the agentic era, the industry is hitting what we call the latency wall, where traditional architectures struggle with the real-time demands of autoregressive decoding and complex chain-of-thought reasoning models.

[00:03:40]That is why, alongside TPU 8t, I am also thrilled to announce a second eighth-gen TPU that directly addresses this issue.

[00:03:48]TPU 8i is our specialized post-training and inference engine.

[00:03:53]While it remains highly capable of training any state-of-the-art frontier model, we are optimizing its core features to be the best reinforcement learning and serving infrastructure for the next generation of reasoning models.

[00:04:05]A defining breakthrough for TPU 8i lies in its ICI networking architecture, which pioneers the Boardfly topology for TPUs.

[00:04:14]Moving beyond our standard 3D torus mesh, this hierarchical Boardfly design maintains a compact network diameter to reduce tail latency for the complex all-to-all collective communications that drive the next generation of Mixture of Experts models.

[00:04:29]By shortening the network diameter needed for all-to-all communication, the very heartbeat of MoE and reasoning models, Boardfly achieves up to a 50% improvement in latency for communication-intensive workloads.

[00:04:41]TPU 8i also breaks the memory wall with 3x more on-chip SRAM over the previous generation to host the larger KV cache entirely on silicon, significantly reducing the idle time of the cores during long context decoding.

[00:04:55]By integrating a specialized collectives acceleration engine, TPU 8i further reduces the on-chip latency of collectives by 5x.

[00:05:04]Lower latency per collective operation means less time spent waiting, directly contributing to higher throughput required to run millions of concurrent agents instantly.

[00:05:14]The result is a system that delivers 80% better performance per dollar for low latency serving compared to previous generations.

[00:05:21]TPU 8i doesn't just deliver speed, it delivers ultra-low latency and economic efficiency for real-time reasoning at scale.

[00:05:30]JEFF DEAN: The pace of innovation is breathtaking, and at Google DeepMind, we're at the very heart of pushing the boundaries of what's possible.

[00:05:37]But progress in AI isn't just about algorithms and ideas, it's fundamentally also powered by computation.

[00:05:43]I'm thrilled about the prospects of the incredible progress we will make with TPU 8t and TPU 8i.

[00:05:49]For over a decade, Google has been on a journey to build the world's most powerful AI accelerators-- our Tensor Processing Units, or TPUs.

[00:05:58]These engines are behind the AI breakthroughs coming out of Google and Google DeepMind.

[00:06:03]Think of AlphaFold.

[00:06:05]Running on TPUs, we predicted the 3D structure of nearly all known proteins, a monumental achievement for science recognized by a Nobel Prize.

[00:06:14]From the early days of AlphaGo mastering complex games to our most advanced Gemini models today, each generation of TPU has allowed us to tackle problems once thought intractable.

[00:06:26]But the most profound shift happened when AI looked inward.

[00:06:29]Our colleagues introduced AlphaEvolve, a coding agent designed to discover new mathematical truths.

[00:06:35]It did what humans couldn't for 56 years, improving on Strassen's algorithm for matrix multiplication in mere days.

[00:06:43]Even more remarkably, AlphaEvolve began optimizing the lowest levels of hardware powering our AI stacks.

[00:06:50]It proposed a circuit design so counterintuitive yet efficient that it was integrated directly into the silicon of our eighth-generation TPUs.

[00:06:58]This is the latest example of TPU brains helping design next-generation TPU bodies.

[00:07:04]Our most advanced models, the Gemini family, were trained and are served on TPUs.

[00:07:08]The ability to handle multimodality, to reason, and to generate human-quality text, images, and more requires training on vast data sets with unprecedented computational power.

[00:07:19]TPUs make this feasible.

[00:07:21]This virtuous cycle has created the foundation for a new kind of workload, agentic AI.

[00:07:26]We're moving from models that predict to agents that plan and build systems that solve multistep problems in physics, biology, and a variety of knowledge worker tasks.

[00:07:37]We're seeing the birth of automation loops, where AI agents formulate hypotheses and run experiments autonomously.

[00:07:43]This requires more than just raw power.

[00:07:46]It demands the high-bandwidth, low-latency loops that reinforcement learning and continuous learning require.

[00:07:52]Through projects like Genie, we're building infinite interactive world models.

[00:07:56]These allow our agents to practice and refine their reasoning and behavior in diverse, simulated environments before they ever act in the real world.

[00:08:04]For both Google and Google Cloud customers, TPU 8t and TPU 8i represent a fundamental advance in capabilities.

[00:08:12]These eighth-generation TPUs empower us to develop increasingly sophisticated and general purpose iterations of the Gemini model family, accelerate frontier research in pharmaceutical discovery and climate resilience modeling with unprecedented speed, and unlock the era of autonomous scientific inquiry and real-world physical AI breakthroughs.

[00:08:33]The horizon of AI is expanding rapidly, and with the raw compute power of TPU 8t and the architectural efficiency of TPU 8i, we are providing the computational foundation for the next wave of innovation across the world.

[00:08:45]We aren't simply engineering faster processors.

[00:08:48]We're constructing the bedrock for the agentic era, a world where AI moves beyond assistance to think and act alongside us as a true partner in discovery.

[00:08:57]AMIN VAHDAT: Customers often ask if scale must come at the expense of speed.

[00:09:01]The reality of our eighth-generation TPUs is that we have transitioned away from a singular one-size-fits-all approach.

[00:09:08]While both architectures remain highly capable across the full AI lifecycle, encompassing pre-training, reinforcement learning, fine-tuning, and serving, we have purposefully optimized each system to unlock maximum efficiency and value for the most critical stages of AI development.

[00:09:27]JEFF DEAN: Exactly.

[00:09:28]While TPU 8t is the powerhouse for pre-training throughput and TPU 8i is the specialized engine for real-time inference, both use the same full-stack co-design approach.

[00:09:39]TPUs have evolved immensely since TPU v1 in 2013.

[00:09:43]At that time, we recognized that the increasing computational demands of neural networks, like those for speech recognition, would necessitate the creation of accelerators specially designed for neural net computations.

[00:09:55]That insight drove the development of the specialized hardware which is now the bedrock for Google's AI advancements, powering Gemini, Search, Ads, and YouTube.

[00:10:05]Through Cloud TPUs, we bring that same proven-at-scale infrastructure to our GCP customers.

[00:10:11]For over a decade, our journey has been about building the world's most powerful and efficient AI accelerators.

[00:10:17]Each generation has unlocked new frontiers, allowing us to tackle challenges once considered intractable.

[00:10:23]AMIN VAHDAT: Couldn't agree more, Jeff.

[00:10:25]The eighth-generation TPU chips are not simple derivatives of one another.

[00:10:29]Each is a ground-up redesign to specialize for individually demanding serving and training workloads, while still staying true to the TPU architecture and remaining fully integrated with the powerful JAX, PyTorch, XLA, and Pathways AI Hypercomputer software stack.

[00:10:46]This specialization and ground-up redesign, all in deep collaboration with Google DeepMind, will deliver unrivaled price, performance, and power efficiency.

[00:10:54]We can't wait to see what the world will build with the power of TPU 8i-- JEFF DEAN: --and TPU 8t.

Vidéos Similaires

VALORANT's Latest 'Exclusive' Tier Bundle is Rough...

KangaValorant

17K views•2026-05-28

Flight Attendant Mocks Poor Looking Black Woman — Mid Air Announcement Exposes Her Real Power

SkyboundStories-b4r

184 views•2026-05-28

I FIXED My Friend’s Blown Turbo RX-8… Then Sold It

Cameron-RX8

134 views•2026-05-28

NewsWatch 12 at 5: Top Stories

NewsWatch12

1K views•2026-05-28

Simon Jordan & Danny Murphy deliver PREDICTIONS for Arsenal's Champions League FINAL with PSG

talkSPORTArsenal

6K views•2026-05-28

Botting is OUT OF CONTROL in Classic WoW (Again)...

SolheimGaming

108 views•2026-05-28

The "AI Job Apocalypse" is CANCELLED!

WesRoth

9K views•2026-05-28

STREET FIGHTER 6 - INGRID Story Walkthrough @ 4K 60ᶠᵖˢ ✔

RajmanGamingHD

12K views•2026-05-28

Tendances

Why Batman Lets The Joker Live 🤨

zackdfilms

9222K views•2026-05-30

The Meta AI Hack Is a DISASTER

LowLevelTV

141K views•2026-06-03

Paris is in SHAMBLES right now 😭

H1T1

4053K views•2026-05-31

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03