The AI memory crisis stems from fundamental physics constraints in manufacturing high-bandwidth memory (HBM), where stacking 12-16 memory dies requires extreme thinning to 30 micrometers, thousands of through-silicon vias with micron-level alignment precision, and specialized bonding processes that cannot be accelerated by capital investment alone; this explains why SK Hynix Chairman Chey Tae-won predicted the shortage would persist until 2030 despite massive capacity expansion efforts.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
The $400 Million Machine That Could Make Intel Unstoppable
Added:On June 2nd, 2026, Jensen Huang stood in front of a booth at Computex in Taipei.
He is the CEO of Nvidia, the most powerful man in the AI industry, the person responsible for the hardware that runs virtually every meaningful AI system on Earth. He is not someone who typically asks for things publicly. He is someone other people ask. He picked up a marker. He walked up to a wafer display case containing a next-generation memory chip called HBM4E, and he wrote three words, "Please make more." Then he signed his name. That image became the most widely shared semiconductor photograph of 2026.
Not because it was technically interesting, because it captured something no earnings call or analyst report had managed to communicate clearly. The CEO of the world's most powerful AI chip company was standing in a convention hall writing a handwritten note on a supplier's display case asking them to make more of the one thing that his chips cannot function without. And the man whose booth Jensen was standing at, SK Group Chairman Chey Tae-won, had told reporters just hours earlier that even if SK Hynix doubles its production capacity over the next five years, the memory shortage will still not be resolved until 2030. I'm a chip design engineer, and I want to give you the complete picture of what is actually happening, because the AI memory crisis is not a supply chain story. It is a physics story. And once you understand the physics of why memory is so hard to build at the scale AI requires, you will understand why no amount of capital investment can close this gap quickly.
Subscribe to the channel and let me show you exactly what Jensen was asking for and why the answer is not coming fast enough. Before we continue, I want to mention something I've been working on behind the scenes. Over the past several months, I've spent hundreds of hours researching AI infrastructure, semiconductors, data centers, power systems, supply chains, and the companies building the foundation of the AI economy. One thing I discovered is that most of the information exists in scattered pieces across earnings calls, industry reports, technical papers, government documents, company filings, and news sources. Understanding the complete picture requires putting all of those pieces together. That's why I created the AI infrastructure report 2027. Inside this 46-page report, you'll find the key data, industry analysis, infrastructure frameworks, company research, and long-term forecasts that took months to collect, organize, and synthesize into a single resource.
Instead of spending months digging through hundreds of sources yourself, you can access the complete research in one place. If you enjoy the deep dive analysis on this channel and want to go much deeper than what I can cover in a single video, you'll find the report linked in the description below. Now, let's get back to today's story. To understand the crisis Jensen walked into that exhibition hall to express, you need to understand what high-bandwidth memory is and why it is the single most physically difficult component in the entire AI infrastructure stack to manufacture at scale. Most people understand memory in terms of the RAM in their laptop or the storage in their phone. That kind of memory is designed to store information reliably and cheaply. It is optimized for cost and capacity. Speed matters, but it is secondary. High-bandwidth memory is a completely different product designed for a completely different problem, and that problem is the bottleneck that sits between a powerful processor and the data it needs to process. Think about what a GPU actually does when it runs an AI model. It performs billions of matrix multiplication operations, mathematical computations where enormous arrays of numbers are multiplied together to produce outputs. Those operations are fast. The GPU cores themselves can execute them at extraordinary speed.
But, to execute those operations, the GPU needs to constantly retrieve data from memory and write results back. And the speed at which data can move between the processor and the memory is called memory bandwidth. In a conventional system, memory sits on separate chips connected to the processor by physical wires on a circuit board. Those wires carry data in parallel, but there are limits to how many wires you can fit on a circuit board and how fast each wire can switch. Those limits define the memory bandwidth ceiling of the system.
And for traditional applications, web servers, databases, consumer software, that bandwidth is sufficient. But, AI model inference and training are not traditional applications. A large language model with hundreds of billions of parameters needs to constantly load those parameters from memory, multiply them by input data, and write outputs back. The number of memory operations per second required is orders of magnitude beyond what conventional memory interfaces can provide. And when memory bandwidth becomes the limiting factor, when the GPU cores are sitting idle waiting for data to arrive from memory because the connection between them is too slow, you have what engineers call a memory wall. And the memory wall is one of the defining constraints on how fast AI can actually run. HBM solves this problem through a completely different physical architecture. Instead of putting memory chips on a circuit board connected to the processor by wires, HBM stacks multiple memory chips vertically directly on top of each other connected through the silicon itself by thousands of tiny vertical pathways called through silicon vias.
This stack of memory dies sits directly next to the GPU die on a shared silicon platform connected by a massive 1,024-bit or 2,048-bit wide interface. That interface is so wide, by comparison a conventional DRAM module uses a 64-bit interface, that data can flow between the GPU and its memory at rates measured in terabytes per second rather than gigabytes per second.
Nvidia's Blackwell B200 GPU, the chip that powers most of the AI infrastructure being built in 2026, uses 192 GB of HBM3E memory. That is a 140% increase from the 80 GB in the H100, and it connects to that memory through a combined memory bandwidth of 8 TB per second, 8 TB per second from memory to processor. That bandwidth is what enables a single GPU to run the largest AI models at the speeds that make real-time inference commercially viable, and this is where the manufacturing challenge becomes clear. Because building HBM is not like building conventional memory, it is not like building anything that semiconductor manufacturing had to do before AI arrived. To make one HBM stack, you start with individual DRAM memory dies. These are thin wafers of silicon, each carrying billions of memory cells. To create an HBM stack, you take 12 of these dies for current HBM3E or 16 dies for the upcoming HBM4 generation, and you stack them vertically. Before stacking, each die must be thinned to an almost unimaginable dimension. HBM4 dies must be thinned to approximately 30 micrometers, roughly 1/3 the thickness of a human hair. A human hair is already barely visible to the naked eye. 1/3 of that is thinner than most biological cells. At 30 micrometers, silicon is fragile in ways that semiconductor manufacturers had never previously had to manage. Handling a wafer that thin without cracking it requires equipment specifically designed for this purpose.
Processes that work fine on a standard wafer become yield killers on a 30 micrometer die. Then you drill through those thinned dies with thousands of through silicon vias. Each via is a vertical conductive pathway punched through the full thickness of the die using laser or plasma etching and then filled with copper. The alignment precision required for these vias is extraordinary. Each via in the bottom die must align with the corresponding via in every die above it across the full height of the stack. Misalignment by even a fraction of a micrometer causes an open circuit. One failed via in a stack of 16 dies renders the entire stack defective. And there are thousands of vias per die. Then the stack is bonded together using one of two techniques. SK Hynix uses a process called mass reflow molded underfill where solder connections are reflowed simultaneously across the entire stack in a controlled heating process. Samsung is pushing toward hybrid bonding which forms direct metal-to-metal connections between dies without solder at all enabling even tighter die-to-die spacing and potentially higher bandwidth. Both approaches require atomic level surface preparation, contamination control beyond what conventional packaging lines achieve, and equipment that is specialized, expensive, and produced in limited quantities. And then the completed HBM stack must be mounted alongside the GPU on an advanced packaging substrate called a silicon interposer. This is a thin slice of silicon that sits between the GPU die and the HBM stacks routing the thousands of electrical connections between them at densities impossible on conventional circuit boards. TSMC's CoWoS, chip on wafer on substrate, is the industry's dominant interposer packaging technology, and CoWoS has become its own bottleneck, separate from and in many ways tighter than the HBM manufacturing bottleneck itself. Here is what that means in practice. Nvidia holds an estimated 60 to 70% of all CoWoS capacity at TSMC's packaging facilities.
The lead time to book additional CoWoS capacity runs 52 to 78 weeks, not weeks, nearly a year and a half. Companies that want to build AI accelerators using advanced packaging today are placing orders for packaging slots that will be fulfilled in late 2027, and TSMC's CoWoS capacity expansion, targeting 25% higher output by late 2026, is already slower than the growth in demand for packaging services. This is why the AI memory crisis is actually two simultaneous crises. The first is HBM die manufacturing, making the memory chips themselves at the yields and volumes that AI demand requires. The second is advanced packaging, assembling those chips alongside GPU dies on interposers in a process so complex that TSMC's packaging lines have become as constrained as its most advanced logic fabs. Now, let's talk about the specific numbers that define where the supply gap actually sits, because the chairman of SK Hynix said something at Computex that deserves very careful attention.
Chairman Chey Tae-won said that memory demand is running more than 20% above supply across the entire HBM market, not 5%, not 10%, 20%. 1/5 of the demand for AI memory has no supply to meet it, and that gap is not static. It is growing because the transition from HBM3E to HBM4 that is beginning now does not just bring a new generation of memory. It brings a fundamental increase in how much memory each GPU needs. Nvidia's Vera Rubin platform, the next generation after Blackwell that is now entering production, uses HBM4, and HBM4 requires 16 memory dies per stack instead of HBM3E's 12. That is a 33% increase in die consumption per GPU. On top of that, Vera Rubin GPUs will use more HBM stacks per chip than Blackwell, which means the memory requirement per GPU is increasing dramatically at exactly the moment when the memory generation transition is consuming additional manufacturing capacity for the transition itself.
Micron's HBM4 product runs at 7.85 gigabits per second on a 2048-bit bus, delivering 2 terabytes per second of bandwidth per stack. That is 60% higher bandwidth than HBM3E.
But, producing those stacks requires thinning more dies, drilling more vias, and achieving finer alignment tolerances than any previous memory generation. The manufacturing complexity per unit is higher. The yield challenges are greater, and the ramp time to stable high-volume production is measured in years, not months. SK Hynix's entire 2026 HBM production was sold out before the year began, not partially committed, entirely sold out. Every HBM stack SK Hynix will produce in 2026 was pre-allocated to customers, primarily Nvidia, through binding contracts signed in 2025.
When Jensen wrote, "Please make more on that wafer," he was not expressing a preference. He was expressing a structural reality.
His company cannot get more chips from SK Hynix in 2026 because there are no more chips to get. The allocation is complete. The contracts are signed. The wafers are spoken for. And this brings us to the deeper strategic story that the three words on that wafer were actually communicating. Microsoft, Google, and Amazon have reportedly all approached SK Hynix with offers to directly fund capacity expansion, not to place purchase orders. To provide capital investment to build the fabs and equipment needed to produce memory that doesn't exist yet. The hyperscalers are so desperate for HBM supply that they are offering to finance the construction of the factories that will eventually supply them. This is historically extraordinary. Technology companies buying capacity from chip manufacturers through purchase agreements is normal.
Technology companies offering to pay for the factories that will eventually supply them is something else entirely.
It is a measure of how acute the supply constraint has become that the largest technology companies in the world are willing to put capital at risk to fund manufacturing infrastructure they won't own and can't control. SK Hynix's response at Computex was to announce that it would double its total wafer production capacity within 5 years. The centerpiece of this expansion is the Yongin semiconductor cluster south of Seoul, one of the largest semiconductor manufacturing investments ever attempted, spanning approximately 120 trillion won and adding 360,000 wafers per month of DRAM capacity when the Yongin cluster reaches full production. SK Hynix's M15 X facility in Cheongju is also ramping, and SK Hynix is building a $3.9 billion HBM packaging plant in Indiana, the first advanced HBM packaging facility on American soil designed to integrate HBM stacks with silicon interposers domestically, reducing dependence on TSMC's CoWoS lines, and creating supply chain resilience for American AI infrastructure customers. And yet, Chairman Chey said it himself, even with all of this expansion, the shortage will persist until 2030. Let me explain why.
And this is the physics argument that no amount of capital investment can change on a shorter timeline. Building a new memory fab from groundbreaking to first wafer output takes a minimum of 3 to 4 years. The construction phase alone takes 2 years for the clean room structure and utility infrastructure.
Equipment installation and calibration takes another year. Process development and yield ramp takes a further 1 to 2 years before the fab is producing commercially viable product at competitive yields. There are no shortcuts in this process. The physics of semiconductor manufacturing does not accelerate because there is more money.
The equipment required for HBM production comes from a small number of highly specialized suppliers. The through-silicon via drilling equipment, the wafer thinning systems, the die bonding tools, the CoWoS packaging lines. Every one of these equipment categories is supplied by companies that themselves have limited manufacturing capacity. When every HBM producer in the world simultaneously decides to expand capacity, they are all placing orders with the same equipment suppliers. Those equipment suppliers cannot double their own output immediately. They also face lead times measured in years. This is the compound constraint that Chairman Chey was describing when he said 2030.
It is not a planning failure or a lack of ambition. It is the physical reality of what it takes to build advanced semiconductor manufacturing capacity from scratch. Money flows immediately.
Equipment arrives in a year or two. Fabs get built in three to four years. Yields ramp in four to five years. And the demand that exists today is already outrunning the supply that the next five years of expansion will produce. Now, let's talk about what this means for the economics of AI. Because the HBM shortage is not an abstract supply chain problem. It has direct consequences for the cost of running AI that flow through to every company and every user of AI services. Nvidia passes the cost of HBM directly to its customers. The cost of HBM 3E memory in a Blackwell GPU represents a significant portion of the total bill of materials for the chip.
When HBM prices rise, and they have risen 15 to 22% year over year for current generation products, Nvidia's cost structure rises with them. Some of that cost increase gets absorbed in Nvidia's margins. Some of it gets passed to data center customers as higher GPU prices. And some of it flows through to cloud providers who charge higher rates for GPU compute. And eventually, some fraction of it reaches the end users whose monthly subscription costs for AI services quietly increase. There is also a more direct constraint. When HBM is in short supply, Nvidia cannot ship as many GPUs as it would otherwise ship, even if logic chip production at TSMC is running smoothly. If the HBM stacks needed to complete a Blackwell or Vera Rubin GPU are not available, the GPU cannot ship.
The entire AI chip supply chain becomes as constrained as its tightest bottleneck. And right now, CoWoS packaging capacity and HBM supply are fighting for the title of tightest bottleneck in the AI hardware ecosystem.
Samsung's entry into the HBM4 market at scale changes this picture meaningfully, but not immediately. Samsung has passed Nvidia's final quality tests for HBM4 and is expected to begin full-scale supply in the second half of 2026.
Samsung holds approximately 21% of the HBM market currently, compared to SK Hynix's 58%. If Samsung ramps HBM4 production successfully, while SK Hynix also scales, the combined output growth reduces the deficit. Micron, the third HBM producer, is targeting HBM4 production with 2 TB per second per stack bandwidth and 20% better energy efficiency than HBM3E.
A three-supplier HBM4 market with all three scaling simultaneously is meaningfully better than a one-supplier world, but better is not the same as resolved. When demand is growing 60% per year and supply is growing at 30 to 40% per year, the gap compounds in the wrong direction, regardless of how many suppliers are racing to close it. And here is the dimension of the story that will define who wins the AI infrastructure race over the next 5 years, because not every company is equally exposed to the HBM shortage.
Companies with first-call allocation agreements with SK Hynix and Samsung have AI hardware. Companies without those agreements wait, and the companies with the deepest pockets, the longest-term supply agreements, and the ability to offer capital co-investment to memory suppliers, Microsoft, Google, Amazon, Meta, are systematically securing allocation at the expense of smaller AI companies and emerging markets that cannot offer the same financial terms. The HBM shortage is not distributing its pain evenly. It is concentrating advanced AI hardware in the hands of the largest technology companies while leaving everyone else waiting for supply that is years away from being adequate. This is quietly reshaping the competitive landscape of the AI industry in ways that compound over time. The companies that have abundant GPU supply chain better models faster, deploy at lower cost, and capture more users. The companies that cannot get allocation fall further behind every quarter that the shortage persists. Jensen Huang understood all of this when he stood at that booth and picked up that marker. Three words on a wafer, the most succinct possible statement of the most important constraint on the AI industry. Please make more. And the honest answer from the people who can, the answer embedded in Chairman Che's statement about 2030, is that they are making more as fast as the physics will allow. And the physics will not allow it fast enough.
If you enjoyed this video and want the deeper research behind it, check out the AI infrastructure report 2027 linked below. It brings together months of research, industry analysis, and 50 forecasts through 2030 in one place.
Thanks for watching and I'll see you in the next video.
Related Videos
BMW Built A Radial Engine So Good It Made The Spitfire Obsolete Overnight
MachineTitans999
123 views•2026-06-18
UÇAK MOTOLARI ÇALIŞMA PRENSİMİ
PistonTV
428 views•2026-06-17
The Bizarre Design Flaw That Ruined The Convair 990
Jet-Deck
631 views•2026-06-19
Why Are Rocket Nozzles Bell-Shaped? Propulsion | Aerospace engineering | GATE | Viru Sir IITian
conceptlibrary
189 views•2026-06-15
US Navy's Helios laser tech
Striketech0310
6K views•2026-06-18
NEW ENGINEERING DESIGN FOR IAM MARWA APPALOOSA FARM @iammarwa
findingian001
443 views•2026-06-17
The Air Force Built a Jet With Wings Swept the Wrong Way
TheAbsurdArchiveYT
639 views•2026-06-16
China Is Building a Machine the World Can’t Stop
TechAIVision-f6p
192 views•2026-06-15











