Install our extension to search inside any video instantly.

Tesla vs Waymo - Who is closer to Level 5?
Added: 2026-05-12

107 views315:33thinkautonomous6318Original Release: 2026-05-05

The analysis effectively captures the clash between Tesla’s data-driven scale and Waymo’s precision engineering. However, it treats Level 5 as a data-gathering race while downplaying the immense safety and regulatory hurdles that cameras alone may never clear.

[00:00:00]Is Tesla going to crush Ramo to level five autonomous driving? Let me tell you what I think. So Tesla has over two million cars on the road collecting data every single day. Ramo has around 2,000 3,000 cars driving in just a handful of American cities. By every traditional measure, Tesla is bigger. But now there's a question nobody can really agree on. Which company is going to win and get to level five first? The reason no one can agree in that is that these two companies are not actually running the same race. They have different sensors, different algorithms, different business model, and honestly a different definition of what winning even means.

[00:00:38]Now, in this video, I'm going to give you a real engineering breakdown, sensors, algorithms, strategy. I want you to be fully aware of what it means to build a self-driving car company that goes to level five. Let's go. Now, before we compare anything, you need to understand one thing. comparing Tesla and Whimo the way most people do. Camera versus LAR is like trying to compare the theory of evolution with the story of Genesis. It's like they're both different frameworks trying to answer the same question. Now, let me give you the correct frame. Tesla sells cars.

[00:01:10]Whimo sells rides. Tesla's goal is to ship software to millions of privately owned vehicle, making each one more autonomous over time with each upgrade.

[00:01:20]Wayimos's goal is to run a robot taxi service with zero safety driver in just a selected specific cities. These two different goals drive every single technical decisions they make. So that means when they build a sensor, when they decide on an algorithm, on a mapping strategy, everything is decided by this difference and this framework.

[00:01:41]And we're going to see that. So let's start right away with the first point on sensors. Now there's a big difference on sensor. Let me talk to you about Whimo first.

[00:01:55]A Whimo car currently has 29 cameras, six radars and five lightars. And they are not generic ones. They build their own light. Now even their radar, it's not classic 2D radar. They have built imaging radars that prod cloud with height information. From a pure engineering standpoint, I think just in terms of calibration, this setup is probably an absolute nightmare.

[00:02:19]Calibrating a single camera with a single LAR is already hard. But 29 cameras with five liars and it varies by vehicle model and size, that's probably like super painful to do. And it gives Whimo a sensor stack with genuine redundancy. And that's the point that they're doing here. It's like if a camera misses something, the LAR probably won't. Now look at Tesla. They have eight cameras. That's it. Just eight cameras. No LAR, no radar since 2022. This is Elon Musk conviction that a camera is the correct sensor for autonomous driving because human drive with eyes and a camera is the closest thing to an eye we have. Now, the argument is that if you build your algorithm to be good enough, cameras give you everything you need. Now, I think this goes a small one online, right? And honestly, the engineers who push back were not wrong. There's a specific scenario where vision only is going to fail. If you have fog, heavy rain, snow, light is going to send laser pools and it's going to measure, for example, in light the time of flight.

[00:03:23]So, it's going to have a very good measurement. You probably saw this famous like fake wall video where Tesla drove straight through a wall painted that looked like a highway. It's it's exactly that. It's like the algorithm cannot fix what the sensors cannot see.

[00:03:39]On the other hand, and this is real, Tesla's approach, Tesla's approach and sensor design is just simpler. It's less energy, less compute, less cost, less calibration overhead. There's not 10 million point clouds to process. The car cost 35 35k instead of 150k for the way mode. And at the scale Tesla operates, this matter enormously. Remember re said Tesla sells to millions. So this reducing the price and the cost is their living challenge. Whimo does not sell vehicle. Okay, keep that in mind. Now I'm going to say Whimo wins this round to level five simply because I do not see a level five where you just say when it rains, when it there is snow, when there is fog, etc., etc., we will just not drive. That that could be possible because these are edge cases in a way.

[00:04:36]It doesn't happen every day. But a sensor stack with camera LAR and 4D imaging radar would be objectively more reliable than cameras alone. Tesla's simplicity is not a weakness. It's a deliberate bet on algorithm quality. And so let me tell you a bit more about that.

[00:04:54]This is where it gets really interesting because the public narrative is that when you think of the algorithms, Tesla is doing cutting edge end to-end deep learning, Whimo is stuck in an old modular architecture. And that's kind of true, but it's also wrong. So, let me walk you through both. Tesla's procession system is built around what they call the hydrate. One network, one shared backbone, multiple heads. The eight cameras fit into a CNN that extract features from each view. Then each feature is fused specially and temporally f of transformer architecture and from that multiple output heads predict everything. So there is a head for the objects, one for the lanes, one for the depth and on top of that they have a second parallel occupancy network that predicts the vauels that are going to be occupied and free. So you can see it as a Minecraft representation of the world, a 3D dynamic map generated in real time from cameras alone. Now in 2024 they replace the rulebased planner with a neural network planner. So that means they have hydrate for the objects occupancy at the bottom and then a planner based on deep learning. The last step is what makes the whole system fully end to end. Every gradient flows from sensor input all the way to the driving decision. they now updated that a bit and they have this foundation model where they can now process videos and they can also add add prompt and text and reasoning and things like that.

[00:06:23]So it makes them really cutting edge.

[00:06:25]Now on the other hand Whimo have a more classical approach. They have separate modules for perception, detection, tracking, prediction, planning. They have a highly engineered modular stack.

[00:06:37]Their LAR perception is based on an architecture originally called SW former. a sparse window transformer that processes 3D point clouds and output bonding boxes and this detector is wrapped in an early late to early temporal fusion pipeline which basically feeds SW former output from past time step. So it's like you have the former times sent back in the network in some kind of loop. So there is a temporal detector. It's not just frame by frame.

[00:07:07]Now for tracking and prediction they have this stateful track transformer and so that's a transformer that maintains track state over time rather than rerunning from scratch every single time. For planning they have shown Emma the end to end architecture. Now it looks like it's working. A lot of people think that Whimo has moved to end to end. They have not and the shutdown that we saw last December in San Francisco proves that it's a modular architecture but they have been exploring end to end to see if it's going to work. So here is the thing both companies are now heading towards end to end. The modular versus end to end debate is kind of over now.

[00:07:49]Okay, they both want end to end. The question is who gets there first and more importantly who gets there with enough data to actually make it work.

[00:07:58]End to end learning is only as good as the data you train it on. Tesla has millions of cars driving every day in every city in the world in every weather condition edge cases. And they have been doing that for nearly a decade. So think about it. They have seen the faded lines, the construction zones, the wrong drivers. They even have trigger classifier. It's a technology that automatically flag unusual scenarios and then send them to retraining. Like if you are out of a tunnel and that's an unusual scenario, the network will know when to record. They have dojo, a custom supercomputer built specifically for processing this data at scale. On the other hand, Whimo is just 3,000 cars driving in just a few cities. So every time they want to add a new city, they are just going to do the mapping and just drive a lot in this city to understand what the rules are to be written. Now if you want an end to end scenario, it's going to be harder for them. So on algorithms, I would say Tesla has the point over here. And not just because the architecture is more elegant or more cutting edge. It's like they have actually the data infrastructure to make end to-end learning work. Now if you think end to end is the way to reach level five by this standpoint they win. If you think modular or hybrid is the way then it's a different story and the answer is not there. Now let's talk about strategy.

[00:09:29]Not the sensors, not the code, the business logic. Because the fastest route to level 5, it's not just technical. It's also like there is a scaling curve. So think about it. In 2016, Tesla shipped a car with basic highway assistance. Every year they had an overthe-air software update that pushed the car slightly further. More disengagement than Whimo? Absolutely.

[00:09:53]But the data collected from these disengagements is the foundation of everything they're building today. Tesla starts with massive scale and gradually improve capability. Whode does it exactly the opposite order. They start with an already capable system but only in one city and they geoence it. So they map every street, every traffic light, every corner to centimeter level accuracy and then they add a new city and then a new one and a new one and a new one. And so they scale capability first then scale geography. The thing most people miss is that Whimo has a big liability with agap. But it's also a big asset because they map every square inch of every road they drive, every traffic sign, etc. This makes their car dramatically more predictable and safer.

[00:10:42]It also means they cannot simply show up in any new city. They have to map it first. And this is expensive, slow, and complex. Tesla is really the opposite.

[00:10:50]It's like if a car is really autonomous, it should drive anywhere on any road without needing a pre-built map, just the navigation basic map. Tesla does use map data. They don't require it. So, if you drive a Tesla into a town and it's never been mapped before, FSD could actually handle it. So, when you think of growth and scaling, Tesla has an advantage here. There is the cost argument of course as well. It's like when you build a new Whimo vehicle, it's 2,50K to build. Now it's probably down to 150K and it's probably falling falling because the lighter price falls, etc. So time is on Whimo's side. Now when you think of miles driven, Tesla has over 5 billion miles of autopilot and FSD data. Whimo is 100 million. 5 billion 100 million. So it's really different. Tesla seems dominating here.

[00:11:43]But these number are not really comparable. Like Whimo's 100 million are fully driverless in complex urban environment. Zero safety driver. Tesla's five billion include every highway cruise. Like the thing that are very simple like there's a lot of that. The definition of an autonomous mile is doing a lot of work in both of these number right because a mile is different based on like which mile are you driving on. So on strategy I'm going to call this one a draw. Whimo's approach is maybe more defensible technically. They know exactly where the car is, what it can and cannot do. But Tesla scale and data mode is something Whimo may never be able to replicate. So they are both potentially right for different markets in different horizons. All right. So the final verdict in a bounded sense Whimo is probably closer to level five in a way that within the cities they operate in on the roads they have map in the weather conditions their sensor handle their car is really remarkable driverless no safety driver today in San Francisco in Phoenix this is realos level 5 has an asterics though it only works whereos says it's going to work so if you go outside of the geo fence area, it no longer works. And that's not level five. In the SAPE definition, it's not.

[00:13:03]So now Tesla on the other hand is building a network that is capable of navigating no matter where. It just learns to navigate. It doesn't overfeit to a city. It learns to it learns navigation in general. At some point, Tesla's going to take the lead simply because their car is going to be more and more capable at some point where it's going to be able to drive basically anywhere. So I would bet on Tesla's data management flywheel and if I had to bet on who will have the most commercially deployed autonomous ride in the next 5 years I don't know maybe Whimo but frankly I don't know as I said they are not competing right so five things to remember from this video let me tell you number one Tesla and Whimo are not building the same thing Tesla sells cars to individual Whimo sells rides as a service number two on sensors Whimos camera plus LAR plus imaging radar stack is objectively safer, more reliable than cameras alone. Visionally has a real physical ceiling. Number three on algorithms, both companies are converging on end to end learning. The race now is about who has the better data pipeline and Tesla is way ahead on that race. Four, Whimo's HD map strategy makes them safer today. It's going to cap them eventually. So is going to give them an advantage today when they can scale a new city in a couple of weeks, but Tesla is going to arrive with a mapless philosophy. At some point, it's going to work. Five, disengagement statistics between the two are almost meaningless to compare directly. The conditions, definitions, and cities are completely different. So look at not the absolute number, but the trend. Okay, did you like this? Now, if you want to go deeper on all of these specific things we covered today, I have something very interesting for you. An end toend migration codeex. It's an interactive free road map I built that shows how self-driving car are migrating from modular architecture to end to end learning. Everything we talked about with Tesla and way more converging on this approach is going to be in the codeex. It's completely free and the link is on think autonomous.ai/ AI/E2 report. I will put in the description and put it in the comments. And if this video was useful, subscribe to the channel. I cover AV systems, deep learning for robotics, sensor fusion every week. And in the next video, you're going to get even deeper algorithm we touched on today. I will see you there.

Related Videos

Beyond Robotics | European Rover Challenge 2026

beyondrobotics

189 views•2026-06-01

Beatbot Sora70: JetPulse Technology and AI obstacle avoidance and navigation!

DroidModderX

26K views•2026-06-02

Tesla FSD 14.3.3 Hits Phoenix Streets - FIRST LOOK

anthonystesla

114 views•2026-05-29

Elon Musk Just Revealed Fremont Line for Optimus Gen 3 Mass Production

TheAINexusOfficial

180 views•2026-05-30

人機一体「零式人機 ver.2」子ども企画【おもしろ発見！モビリティー】 #乗り物 #automobile #robot #shorts

KyodoNews

1K views•2026-05-28

China’s New Luna AI Robot Looks Shockingly Human...

NextGenHumanoids

850 views•2026-05-28

Reachy Mini: the $300 open source robot you can actually hack — Andres Marafioti, Hugging Face

aiDotEngineer

662 views•2026-05-29

柔軟指×AI画像処理食品の仕分け作業システム！#柔軟指 #ロボット #自動化 #製造業をもっと盛り上げたい

KiQ_Robotics_Corp.

113 views•2026-05-28

Trending

Revisiting The Cat Cafe For The Final Time

BenGtalks

3195K views•2026-05-29

Lil bro is a menace 🤣

NotAirJordan

2037K views•2026-05-31

The Casino Had Us Guessing All Day

VegasMatt

157K views•2026-06-03

Political Science

My response to the Police

RecklessBen

1496K views•2026-06-01