Gaussian splatting is a photorealistic 3D capture technology that converts 2D images into detailed 3D models, and with the new open standard (KHR Gaussian splatting extension for glTF), this technology can now be shared across platforms for applications including AR glasses, delivery robots, drones, and AI world models, enabling a unified spatial computing layer for both humans and machines.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
This Tech Solves Google Earth's Biggest ProblemAdded:
So, this is what Google Earth has looked like for the last decade.
Awesome, but it's blocky as hell.
Now, this is the same block as a 3D Gaussian splat, and it looks indistinguishable from reality.
And all this technology is no longer a research project. It's already on Zillow listings. My AR glasses can use it to know where they are. Robots train inside them. And human pilots use them to make sense of life and death situations.
We've got the same file, but two maps, one for humans and one for machines.
And now there's an open standard to distribute it at planetary scale. In this video, I'm breaking down the companies scaling this new map of planet Earth and what's unlocked by humans and machines using the very same one. Let's get into it.
The term of art here is 3D Gaussian splatting. It's a realistic method of turning 2D images and other sensor data like LAR into a photorealistic 3D model of the world. Uh, Gaussian splats are a fundamentally different way of capturing and displaying 3D scenes. And it's fantastic for highly realistic captures, a very complex scenes uh that can be both rendered very efficiently on a whole range of uh existing GPU hardware and interestingly looking slightly more forward, Gaussian splats are perfect for AI to uh begin to understand uh the real world. And by the way, if you want to go deeper first, here's a video on what it is and how you can capture them. But even though this technology is just a few years old, it is moving fast and critically capture has gotten radically easier in the last few years. Even 3 years ago, getting a Gaussian splat of your living room meant a high-end rig with a beefy Nvidia GPU. And today, you could do it all on your phone. So, Niantic Spatial Scannverse app turns the phone in your pocket into a scanner. You could do room scale 3D reconstructions completely processed on device. That's right. The data doesn't even leave your phone. Pretty wild for something that used to need a workstation. Now, if you want to step it up a notch, you've got 360 cameras. These things cost a couple hundred bucks. You put it on a monopod and walk around. Then you can take that footage and toss it into Niantic's web 360 pipeline. That's right. Just drag in a couple minute collect into this web UI. it gets processed into a highfidelity city block scale Gaussian splat. Now, because obviously you wouldn't be able to capture something like this on your phone, the cloud can do the heavy lifting for you. Now, whether you use Niantic or a bunch of the other tools we covered in the past, speed is the story here because unlike your phone or even a dedicated DSLR, the 360 camera sees everything. So, it drastically reduces the time it takes to capture a large expansive space. So, phones and 360 cameras, amazing at ground level. For everything bigger, we've got drones. And now we've got 360 cameras on these puppies, meaning you're not looking through a soda straw at one specific part of the scene with a limited field of view camera. You can capture absolutely everything.
Anti-gravity has one partnering up with Insta 360. And of course, DJI has their own called the Evata 360. So, you've got three altitudes, phone, 360, and drone, all feeding the same capture layer.
>> And uh with these new 360 uh drones that are out there, you can have a 3minute flight that covers tens of acres of reconstruction. It's just amazing. The sensors that are in people's hands today are far better than they were 10 years ago. A phone camera is amazing compared to what we were dealing with in the past. And the, as I said, these 360 cameras are almost the perfect device for capturing reconstructible video. And so, as consumers, you know, being able to capture your own scenes, your own backyard, your own models, uh, is going to be very, very easy. But I see it as also applying to mapping the world again. that as we see the devices getting out of the world, the you know, we've had self-driving cars for a while, but we're going to have rolling robots with cameras on them and that data can be used to reconstruct the world. And I think that, you know, you're seeing it uh in early days with some of the delivery robots, but I think they're going to be ubiquitous sensors and they won't be the highest end, but it doesn't matter anymore. We're now able to generate a really good visual model of reality just by combining large amounts of lesser sensors.
>> All right, cool tech. So, what can you do with it? Check out Zillow Sky Tour.
They're doing drone captured photoreal aerial flyounds of actual homes available on showcase listings since July 2025. So, regular people are basically scrubbing through 3D captures of houses every day without realizing that they're flying through a Gaussian splat. The US Coast Guard is using it as a training simulator. So, Niantic captured real helicopter landing zones.
Think ports, harbors, coastline, all the stuff that looks pretty decent in Google Earth at a high vantage point. But when you get close, as close as you need to to land the damn thing, the detail falls apart. So they partner with a company called Echelon Technologies that makes a flight simulator to drop these highfidelity scans right into the mix.
Meaning Coast Guard pilots and rescue crews can now train in photorealistic copies of the exact environments they'll deploy to. And that is just the start.
>> So in the initial implementation, we're going to help them train on their own landing pads better by getting a much better visual representation using Gaussian splats of these landing pads.
But the real goal is for the hard problems where they're called out to the side of a mountain to save a person and there is in fact a landing possible area but they have no training on it and you know they haven't even seen it yet. But if somebody on the ground can go take their phone out and scan that area and create a 3D model of it and send it to the pilot and allow the pilot to train on it or preview it on the way as they're uh you know approaching this uh safe area. then they'll be in a much safer situation because they'll know what it looks like and they'll know what angle to approach it at and they'll also know how careful they have to be about the edges.
>> When it comes to companies like Snap, they care about consumer AR. So, they're putting the captured world inside AR glasses. You can create 3D lenses that are anchored to specific places. City squares that are hosting virtual concerts, park benches that open up into portals, multiple users with glasses and phones standing around seeing the same thing with centimeter level accuracy.
All running Niantics VPS technology. And the technology is really the same.
Whether you're annotating a rescue pickup spot or a place to drop your 3D content, you can annotate a 3D model of a location on your computer. And then when you go to that place, you see that annotation through your glasses or through your phone anchored precisely there.
All right, so we've seen what's possible with this tech. But here's the catch.
Until very recently, none of these tools could easily share their captured files.
Every product and pipeline was basically authoring their own quasi format, interpreted oh so slightly differently until now. Ladies and gentlemen, we finally have a standardized 3D format for Gaussian splats. And perhaps the best part of it is it is built into the most ubiquitous open 3D standard for the web called GLTF. So what is this amazing royalty-free open glTF format? It stands for graphics library transmission format. Think of it like the JPEG of 3D.
It's an easy way to move 3D data between different tools and across the web. and you've already used it without knowing.
Like every time you swivel a 3D model in your browser, let's say a couch on Wayfair or a sneaker on Amazon, that's a glTF model that you're loading up. But let's be honest, 3D models of like small objects are kind of limiting. I want the whole damn world. That means people, places, and things. So in February of this year, the Kronos group, the folks behind Vulcan, OpenGL, and of course glTF itself, released a new extension called KHR, Gaussian splatting, and it does exactly what it sounds like. It extends glTF2.0's mesh primitive so that Gaussian splats can now live inside a gltf file. You've got the same container, the same standard, meaning the JPEG of 3D now supports 3D Gaussian splatting, which means it goes everywhere 3D on the web already does.
>> We found that the that the fundamentals, and that's what we've enshrined in the uh the existing Gaussian splat extension uh in glTF, that part of the Gaussian splats is is stable. It's the fact that you have, you know, an array of ellipsoids with no color, transparency, orientation, and you feed them in the GPU to render. That part was ready for standardization. In fact, people were adopting it and shipping Gaussian splats so fast, we recognized if we didn't do it now, we would be too late and we would miss our our window of, you know, having the best standard to prevent fragmentation at the right time. And critically, if a viewer doesn't support 3D Gaussian splats yet, the spec gracefully falls back to a point cloud representation. So nothing breaks. The asset still loads up. It just renders as a bunch of points instead of these ellipsoidal splats all beautifully coming together.
>> Formats like glTF, it's traditionally they have been just concerned with the visual part of a 3D pipeline. But now 3D is beginning to be understood by AI with world models. glTF will step up and make sure that we have we're not just a container format for 3D visuals but we're a container format for scene understanding semantics and that is going to play a key role both in the metaverse which is kind of the humans tapping into this spatial computing realm but also embodied AI uh autonomous vehicles and robots is very interesting they kind of need the same thing right whether it's a person wearing smart glasses or you know an autonomous vehicle vehicle navigating a cityscape, they kind of need the same data. So there is a kind of a big fusion about to happen, I think.
>> Now look, this may sound technical and it is, but the consequence is enormous.
With this beautifully standardized format, the splat you captured in Niantic Scannverse loads into Ezri ArcGIS. The splat Ezri publishes loads into a CMJS map. The splat in Cesium loads into Autodesk's pipeline or Blender. We're talking about the same file, every tool, no conversion required.
So, when you have Gaussian splats that can play nice with all the other data formats you know and love, you can do some cool things. For instance, here's me loading up a 3D Gaussian splat of the Palace of Fine Arts that's captured a drone height, blending it seamlessly into that 3D tiles from Google. That way I have the spatial context of the surrounding area, but when the camera swivels down and looks at my splat in detail, you can see all the rich complexity that was missing in Google Earth. So, think of glTF like the container. The next question, of course, is how do you take these massive 3D scans, and some of these flats are like multiple gigabytes, and put them on somebody's phone without melting the damn network? That's where SPZ comes in.
SPZ is the JPEG of 3D graphics. Through a lot of experimentation, we found what we think is about a 10x reduction in file size of of these original Gaussians. And this enables fast loading and uh you know high quality visualization uh at a much lower cost.
>> That's how a splat goes from a multi- gigabyte capture to something 10x smaller that you can actually download over your 4G or 5G network. So now that we have the file sizes down, we're not done yet. The other thing we need to think about is a spatial index. Meaning, we need to break the world into a bunch of smaller tiles. So, we only load in what we're actually looking at and add the level of detail you need for that zoom level. So, Cesium is basically the company that built the open standard for streaming massive geospatial data on the web. It's called 3D tiles, and now they support 3D Gaussian splatting as well.
So when you fly into a cityscale splat, you don't need every individual splat at full resolution, only the ones that are close to the camera.
>> Uh for those that don't know, uh the origin story behind 3D tiles, it's from Patrick Coy was as CEO of Cesium and now is is chief platform officer at Bentley and he was one of the very early supporters and co-inventors of a lot of the glTF specifications that that that we have today. So he went to the open geospatial consortium which is the right place to do geospatial standards but rather than reinventing the wheel he said well let's just use glTF and just put a geospatial wrapper a layer on top of gltf to do the things that the geospatial community cares about which is handling terabyte size data sets figuring out how you're going which part are you you looking at which part are you wanting to process right now >> so the system basically serves you low resolution splats from far away and as you push in higher detail tiers swap in seamlessly. Without LOD or level of detail, you'd be downloading the entire data set just to look at one street corner. But with this more efficient tiling schema, you can stream in hundreds of millions of splats, reconstructing massive areas all inside your browser tab. This is basically the same idea of how Google Earth works as well, except now it applies to massive 3D Gaussian splats. And if you think that this was just a research paper a few years ago, I mean, this is insane.
We've now got photorealistic runs in your browser city scale captures and you can stream them on a freaking 4G network. Let's go.
>> And in fact, we haven't formally announced this, so you're you're getting hot off the press. We are formulating a plan for glTF2.1.
And the most important feature uh is going to be we're integrating 3D tiles capabilities into that new core glTF version. Now we're still designing it.
We don't have a draft spec to circulate yet, but no u breaking news. That is what the uh glTF working group is is working on right now. And look, there are many different ways to create 3D Gaussian splats, open- source, commercial, and everything in between.
I'm just glad that the industry is standardizing formats and keeping them open. So whether you use Niantic or Multiset, Post Shots or Lichfield Studio, you know, we are not the only consumers of these maps. These things are for machines too. The AI can begin to analyze these scenes and begin to do semantic labeling. So it'll recognize a chair and a door. Then I think this is Brian's tagline from Niantic which which I love but I think it's there so I want to acknowledge that but making the the world machine readable is a wonderful way to kind of summarize how Gaussian splats world models which is an AI uh maintaining a model of a scene uh and being able to predict what's going to happen and what it looks like from any angle uh in the future. It's going to be the new frontier because we've had AI doing text. We've had AI now doing pictures and now we can generate videos.
Uh this is bringing AI into the domain of 3D and world models are an intense area of interest right now. One of the advantages of Gaussian splats is their visual accuracy. They are so good that they can be, you know, a replacement for the real world for robot training. And one of the things we struggled with in the past is this idea of simulation to reality. robots train on a simulator and then they're thrown out into reality and it doesn't work. And one of the reasons is the simulator is not quite right. It doesn't visually look exactly right. Now with uh Gaussian splats, we can actually do what's called real to sim. So you can take a scan of the real world, this room or or the factory floor and create a model that is good enough that robots can train in a virtual system and do that at much much higher speed. but know that they're looking at the real world as far as their visual sensors are concerned.
>> After all, machines need a 3D model of the world so they can understand where they are within it and what's around them.
So when I worked on Google Maps, this is exactly why we built visual positioning system, knowing exactly where you are in 3D space on the surface of the globe without GPS. But the crazy part is you can now do this on a pair of consumer Ray-B bands with open- source software.
You basically look at the world and match what you're seeing against a prior 3D model and pin yourself to the centimeter in physical space. Obviously, this is exceedingly useful for augmented reality glasses. But if you think about it, AR glasses are basically half robots. They've got all the sensors that a robot does, right? To perceive and map the world around you. Except in this case, it's the human that takes the action. Whereas with robots, they take it a step further and act themselves.
>> And robots have the same problem. They need to know exactly where they're standing, exactly where they're headed, so they can understand, do they need to work around that trash can that's there, or can they assume that it's uh it's not going to be there tomorrow and that they'll have a shorter path.
>> This is basically the UDA loop that we covered in our previous video. So check this out. This is Niantic's spatial robot adventures demo. Basically, a two-minute scan with an Insta360 X5 creates this highfidelity Gaussian splat of a trail. Then they can build a simulation seeing how the robot would localize and navigate inside that splat.
And this to me is nuts because you can take consumer grade hardware, map a place, and now suddenly robots can operate within them. This is an extremely simplified version of what Whimo is doing with much more expensive hardware at city scale. And of course, the captured splat is the map. And by the way, the next generation of visual positioning models don't even need to have a perfect map of everything because of course you can't capture absolutely everything in the world. It would just take too long. So these new algorithms can guess where they are even if you don't have a proper 3D scan in that exact location. I mean, this is basically what pros like Rainbolt do in the geoging community. It is thus no surprise that Niantic and Cocoa Robotics have partnered to take this tech out of the lab and into the sidewalk. This is perhaps the first consumer-facing deployment that they're doing of delivery robots navigating sidewalks via visual positioning systems in places where GPS just doesn't work. And if you're curious about why GPS just doesn't work, you should go check out this prior video over here. And of course, speaking of GPS not working, drones need this, too. And Vantor is the aerial side of this. They create a 3D model of an entire area and heck, the full globe using satellite imagery. And then they can use a drone which just matches its camera view against this prior map to figure out exactly where it is. Meaning this drone can no longer be jammed. We're talking about the same capture layer, just a different form factor. Instead of glasses or robots, we're now talking about a drone. I do think there will be one map to rule them all and it will be built from the ground to the sky using this new technology with the Gaussian splats.
>> So look, this is what's clear to me talking to both Neil and Brian. We finally have the technology to both digitize reality and make it compelling for humans and machines alike. We also have the standards to distribute static scans of the world at scale. And very soon we'll have the ability to do it with dynamic captures, too. Because after all, the world is not stationary.
It is not a snapshot in time. The world is brimming with life. And the 3D representation of it should as well.
This is why I'm so passionate about building this 4D god's eye view of the world, taking all this dynamic data that layers on top of this static substrate.
But when it comes to companies like Niantic, their goal is clear to own that foundational layer. I think we would like to become a a a core part of spatial computing for all of these tools. And I think what we need to do is figure out who needs them first and most, you know, most critically build for that audience and then expand out as more and more adopes uh start to include it in their system, which is much what happened with AWS. the same captured world that humans can browse at leisure, that machines can train inside, and our technology uses to know exactly where it is. And this is funny to me because most people think of maps as a way to get you to your latte. But with augmented reality, robotics, and simulation tech, the map literally becomes the window through which you see the world. It is thus no surprise that all these players are trying to own that foundational layer that connects the world of bits and the world of atoms, especially now that the customers are humans and machines alike. So, what do you think of these technologies? And look, while I'm all for big tech subsidizing closed maps like Apple and Google have done in the past, I really do hope in this new kind of map for a new kind of era, there's an open- source alternative, too. All right, so if you want to go deeper into what visual positioning system is and how it works, you want to check out this video over here. If you're very curious about how to capture these 3D Gaussian splats from the cheapest and the most expensive methods out there, check out this video over here. Bavo signing off and I'll see y'all in the next one.
Cheers.
Related Videos
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 viewsβ’2026-05-28
How agent o11y differs from traditional o11y β Phil Hetzel, Braintrust
aiDotEngineer
450 viewsβ’2026-05-28
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanationπ―β
LearnwithSahera
1K viewsβ’2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 viewsβ’2026-05-29
Search Algorithms Explained in 60 Seconds! π€π¨
samarthtuliofficial
218 viewsβ’2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 viewsβ’2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 viewsβ’2026-05-29
π BCS613C Compiler Design | Module 1 to 5 Schema Evaluation π₯ | VTU 6th Sem π― #VTU #bcs613c #exam
Pranavaa-y4y
104 viewsβ’2026-06-02











