AI workloads are highly predictable, allowing hardware architects to eliminate complex hardware schedulers and memory management units, replacing them with software-based compilers that pre-map data movement. This approach enables independent cores with local memory to operate without waiting, while integrated networking allows multiple chips to function as a unified system. The result is an open-source AI chip architecture that achieves comparable performance to Nvidia's GPUs at significantly lower cost, demonstrating that hardware efficiency can be dramatically improved by leveraging the predictability of AI computations.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
This OPEN-SOURCE Chip is Faster Than a GPU (And CHEAPER!) | Tenstorrent Chips ExplainedAdded:
Just two weeks ago, the man behind the iPhone chips, the PlayStation, and the entire comeback of AMD stood on a stage and said, >> "Whatever Nvidia does, we'll do the opposite." This is Jim Keller. He spent the last four years building a chip that runs entirely on open-source architecture and it beats Nvidia's best inference system while it costs five times less to [music] run. So, if he actually built a better and cheaper AI chip, why have you never heard of it?
Because the [music] chip that's beating Nvidia at inference isn't built like any chip you've ever seen. It doesn't use Nvidia's architecture. It doesn't use Nvidia's software. And it doesn't even compute the same way a [music] GPU does.
And that all comes down to one decision Jim Keller made at the very [music] beginning. He threw out everything Nvidia does and started completely from scratch. Every [music] time you use Gemini Claude or any AI tool, there's a server running somewhere burning through electricity and billions of dollars.
OpenAI [music] spent $3 billion last year just for their servers. And right now, Nvidia controls that bill entirely.
They charge whatever they want. That's the problem Jim Keller decided to fix.
And he started by rejecting every single assumption Nvidia's architecture is built on. You see, if you look closely at a modern GPU, you'll realize most of the silicon isn't actually doing math.
GPUs are burdened by massive hidden overheads like hardware schedulers, cache controllers, and memory management units constantly trying to figure out where data needs to go next. When you add it all up, less than half the chip is actually doing what you bought it to do. But Jim Keller had a critical insight. AI math is not random like rendering a video game. It's perfectly predictable. You almost always know exactly what and when the data is needed before the chip even starts computing.
And if the workload is entirely predictable, you don't need dedicated hardware to manage unpredictability. So Keller's team literally removed all the schedulers and the [music] traffic controllers. But after deleting the traffic controllers, how are you going to manage the data? Simple. It all moves into the software. The compiler now knows the entire journey of every single piece of data before the chip even turns on. This completely changes [music] what the actual processor looks like. 10 storage architecture is built around the 106 cores. Unlike a GPU, it's not a massive processor fighting for resources. There are five independent risk 5 cores per tile and every single one has its own local SRAMM. Now, you might be thinking that it's just a bunch of tiny CPUs duct taped together. But that's the entire point because every core has its own memory and its own instructions. No core ever sits around waiting for another one to finish its job. And here's where it gets interesting. Because the hardware isn't doing the predicting, there's no secret sauce in the silicon. All the intelligence lives in the compiler, and that compiler is completely free and open source. Anyone can use it, and anyone can improve it. This results in 352 Tensix cores packed onto a single chip, each one executing independently with no global clock, forcing everyone to work together. But if you look at the spec sheet for this chip, you will immediately spot what looks like a fatal flaw. To run modern AI, Nvidia relies on HBM. It's a special type of memory that sits directly on the chip package.
[music] It's insanely fast, but it's also the single biggest reason an AI chip costs more than a car. Tenstor looked at the HBM and instantly refused to use it. They decided [music] to use standard GDDDR6, the exact same cheap memory you'd find inside a PlayStation and mid-range gaming GPUs. But the problem with those is [music] that they have extremely low bandwidth. Even a relatively cheaper 3090 has more of it.
But for Jim Keller, this wasn't a limitation. Remember, they deleted the hardware and the compiler already knows the future. Because the software maps out every single calculation before the chip even turns on, it doesn't need a massive pool of expensive memory to act as a buffer. All that empty space they freed up by deleting the schedulers was filled with 200 megabytes of SRAMM and raw bandwidth only matters if you don't know what's coming. 10 compiler knows everything. It prefetches [music] exactly what each core needs from the slow GDDR6 pool right before it's even required. So the cores are never actually waiting. But this prefetching has its limits too. When you move on to running larger models that are dozens of gigabytes, it becomes impossible to dynamically fit that big model into this tiny SRAMM. the pre-fetching just can't hide the gap anymore. So, you get back to being bottlenecked by the low bandwidth at some point. And the 3090 actually starts [music] performing better. But here's the thing. They knew this from day one. One chip was never the product. When a company like OpenAI builds a data center, they're not buying one chip. They're buying hundreds of thousands of them. And here's where every other chipmaker runs into the exact same scaling problem. The moment you chain multiple chips together to run a massive model, they start spending more time talking to each other than they do actually computing. To fix this, Nvidia relies on NVLink, which is incredibly fast, but cost [music] as much as the chips themselves. Other companies use massive external infiniband switches, [music] which just add layers of latency, heat, and complexity to the server rack. Tenstor's answer was brutal in its simplicity.
They baked 400 GB per second Ethernet straight into every single black hole chip. Every chip is simultaneously a processor and a router. And remember how their compiler pre-calculates the data movement for the SRAMM on a single chip?
It does the exact same thing across [music] the entire network. The software premaps the data movement across every single chip in the cluster before a single calculation even starts. Because of this, 32 chips inside a single 10 torrent galaxy server don't behave like 32 separate components. They behave like one unified brain. Chain 36 of those servers together and they act as one massive supercomputer built with more than a [music] thousand chips without any bottlenecks. On a massive complex model like DeepSeek R1, this [music] architecture easily pushes 350 tokens per second. And these are third-party community tested benchmarks [music] not provided by Tenstor. But the number that actually matters isn't the speed, it's the cost. It costs just $6 per million tokens compared to Nvidia's $30. It's the same performance [music] but five times cheaper to run. So, if the hardware is this efficient, the software is free, and the numbers are this undeniable, we're back to the original question. Why haven't you ever heard of this? And why isn't every data center on the planet already ripping out their Nvidia racks to buy it? Well, there are two core reasons to it. One of it is the software, and the other is Jim Keller himself. You just heard me say the software is completely free and open- source. But in the world of enterprise AI, free comes with a massive asterisk.
Tentor themselves proudly claimed that 90% of hugging face models just run out of the box on their hardware. And to a developer, that sounds incredible until you start asking [music] about the other 10%. Enterprise buyers do not make billion-dollar infrastructure decisions on 90% compatibility. They need absolute certainty. A hospital running medical imaging AI or a massive financial institution deploying real-time fraud detection cannot afford the 10%. And on top of that, the black hole software is still incredibly new. Tentorin's previous generation wormhole chips have years of software optimization behind it. Black hole is still catching up. But this is not just a software problem.
[music] It points to something much deeper about the man behind all of this.
If you look closely at Jim Keller's resume, you will find the most legendary track record in silicon history. But you will also find a [music] pattern. He designed the Zen architecture that quite literally saved AMD from bankruptcy. But he left in [music] 2015 before the chips ever even shipped. He laid the foundation for Apple's A series chips, the architecture [music] that makes modern iPhones the most efficient devices on Earth. He left before they fully matured. He went to Tesla to design the custom silicon for full self-driving. He left after just 18 months. Every single project he touched went on to be a massive industry shifting success, but they succeeded without him every single time. The pattern isn't that Jim Keller fails.
It's that he starts fires, builds the foundation, and walks away before they become infernos. And now he is four years into 10 store which is roughly when Jim Keller historically starts looking for the door. So is this just another pit stop? Are data centers supposed to buy millions of dollars of hardware just for the chief architect to walk away? The software gap is very real but it's closing faster than anyone expected because tentor and [music] stack is fully open source. The entire global community is helping fix it rather than just one isolated engineering team. That open- source momentum [music] is actually a much faster trajectory than how Nvidia's CUDA was originally built. But Tense is fundamentally different from every project [music] that came before. He isn't a hired architect this time. He's the CEO. For the first time in his career, Jim Keller isn't building someone else's future. He's building his own. By the way, Tentor isn't the only company that figured out how to threaten Nvidia's monopoly. There's another one.
Except Nvidia found out about it first and paid $20 billion to make it disappear. But the question is, what did they build that Nvidia got so scared?
Watch this video to find out.
Related Videos
U.S. Military Just Flexed The Most Dangerous Aircraft Ever Built The F-47
MaxAfterburnerusa
11K views•2026-05-29
Heating Staying On On The Hottest Day Of The Year
PlumbLikeTom
507 views•2026-05-29
발전 효율을 높이는 태양광 추적 시스템의 기술적 원리 #공학 #공정 #태양광 #알고리즘 #재생에너지
찐현장기술
2K views•2026-05-29
직관 및 곡관 배관 결합 고정 작업 #worker #process #fabrication #pipework #clamp
월드촌촌
2K views•2026-05-30
Wire To Wire Connection Trick | Strong And Secure Electrical Joint #shortvideo #wireworks
ElectricianTips-b1h
5K views•2026-06-02
Peterborough to Newark Northgate Driver's Eye View aboard an InterCity 225 - East Coast Main Line
TrainsTrainsTrains
822 views•2026-05-31
AI turbine design: hypersonic cooling leap #shorts #ai #hypersonic
bobbby_rn
671 views•2026-05-31
How Far Can A Tomahawk Missile Actually Travel?
WarCurious
13K views•2026-05-28











