Ultra-low latency video streaming (targeting 4ms glass-to-glass latency) is essential for real-time machine control applications like remote robotics, drones, and cloud gaming, requiring synchronized multi-stream transmission over UDP with forward error correction to handle network packet loss while maintaining millisecond-level precision for feedback interaction.
Inmersión profunda
Prerrequisito
- No hay datos disponibles.
Próximos pasos
- No hay datos disponibles.
Inmersión profunda
How live streaming works: The challenges of low latency video streaming explained | Lex FridmanAñadido:
I would love to sort of uh zoom in and and and talk a little bit more about the distinction between kind of downloading a file and watching it offline versus streaming. So the the complexities, the challenges of streaming. Is there something we could say about what it takes to uh stream files? We've been talking about codecs and I think a lot of that implies encoding and decoding uh without the having to communicate over the network.
>> Sure.
>> Sure.
>> Uh so can you can you elaborate like what's required to do over the network stuff?
>> Yeah, but it is less complex than it seems compared to everything that we've talked about. Um especially because the most complex thing is not about streaming in terms of um uh streaming services but it was was what was done to actually broadcast through satellites.
Um because in in most of the modern uh broadcasting services you can pose and you can go on. But when you're sending live streaming whether it's broadcast or live for streaming services which are live this is much more difficult because you need to encode in real time you when you go on a satellite you have a specific size of the link right you cannot have a burst of bandwidths even for a second right because you don't have the space for that in your your total file however there is different types of challenges which are interesting challenges but I think they are less complex than the one we've seen with um late '9s and early 2000s about broadcasting and streaming through satellite >> they're different they're control systems challenges whereas whereas some are more mathematical I think there's a difference >> in the streaming world what you have is called what we call adaptive streaming because the difficulty and it's not really a video problem it's mostly a CDN problem is that you might have too many people watching the same thing at the same time and it's a congestion of the network right so your player um has difficulty downloading things fast enough to play them. So what happens is that locally the player is going to read a lower resolution of it. Um but >> there are some very clever algorithms to do that but most of it is quite basics to be honest.
>> Even the buffering side is pretty basic.
>> Yeah. Yeah, you you you start to download a a segment, what we call a segment, and then you you time, right?
And if you if it takes more than 50% of the time to download a segment, you go down to, right? And the difficulty is more about when do you go up in bandwidth in quality, but this is not very complex to do. When you encode, you're going to encode seven resolutions, right? And and you're going to give the bit rate. Um, the difficulty is to have your encoder gives the same bit rate, but it's not as strict as it used to be. So, uh, probably YouTube has to figure out how the human psychology side of that, like how pissed off do you get when it's like very low bit rate and uh, how long should it wait before it increases the bit rate even though the connection is better? because maybe the changes in the bit rate is what like affects you psychologically.
>> I I think actually the interesting one is the audio that you can kind of notice when they move from um full fat AAC to the um there are compressed versions of AAC that use spectral band replication.
You can kind of see it goes a bit tiny and that up and down is very jarring.
The video side is a lot smoother and there's less notice. It's really the audio. You can you can definitely you can definitely feel it from when it's moved you from a different audio profile to one or the other.
>> I don't know. We're surprisingly tolerant at skipping audio glitches. I I'm surprised people I know who are not video engineers how tolerant they are how tolerant they are to watching sports at 30 fps for example whereas it should really be 60. The world is a lot more tolerant to that. But audio people are very there it's an immediate feedback mechanism of oh >> if you hear a glitch you realize it directly. Yeah, I get to fully realize that. I suppose one of the things I'm afraid of when I listen to audio more and more that I get to notice every single tiny detail and that you can oversess when uh people people in general are able to kind of kind of blur their consumption. They they can they can look past certain imperfections. But then when you combine like um an event that is for example a sport event that is probably going through satellite or somewhere else and goes to a central place for encoding and then you need to encode this all the resolution you in real time you don't have time for QA you need to push that to CDNs you need to add probably DRM uh protection you need to have that over a ton of different um devices then yes it is complex X um but and also like you're in the web browser or in very much different devices that you use for television where you had like a a defined setup box or cable box that that you know where you control end to end.
Um so it's a challenge but it's less I think the networking part uh while you agree to have 10 20 seconds of latency I don't think this is very difficult.
Speaking of networking and latency. So your new effort as we mentioned is Kyber which is uh aimed at ultra low latency as you say every millisecond counts and uh you're applying that to remote control machines like robots, drones, computers. Can you tell me about it?
>> Sure. Um if you start from where we used to be, right? You used to use ffmpeg to encode files, right? And then we used FFmpeg and VC to encode in streaming services, right? And then you need to go lower and lower. And the question was where up to where we can can we go?
>> Um, and this question is very important because there are many use cases where you need to be fast and it's when you have feedback interaction, right? We're not just listening to something, you're actually controlling it, right? because and that's the biggest difference that compared to what we've done so far is that I I need video to have a feedback on something that is happening live whether it's a drone flying whether it's um controlling a humanoid robots from distance whether it's controlling a rover whether it's playing a video game in the cloud gaming because this is um what I did on a previous job right I was CTO of a cloud gaming startup um and this is an very interesting topic because you push to the limits the network. You need to be to care not about the quality like we've done on video and we've talked about with x264.
You care about latency because a milliseconds is meaningful when you're controlling a car, right? For well you we've you've seen you've used Whimos, right? when whimos don't work and that happens even if 1% of the time there is someone that is basically remote controlling that um and this is exactly the stuff that we're building it's a really an SDK um platform uh to do end to end control of machines so this comes up quite a lot in a lot of different context in robotics so obviously teley operation teley op is becoming more and more important and including for training uh robots uh via machine learning.
>> Yes. And what we do is a bit different from any everyone else is that we take only one socket, one connection which is a quick uh protocol based on UDP. Um which is interesting because it's done for low latency. It doesn't have two of the what we call the TCP end ofline problem and HTTP end offline problem.
It's safer by default but on the same wire we send multiple streams like multiple track. We send audio, we send in video, but we also send the comments, right? Uh mouse, keyboard, gamepad and so on. And we do that while maintaining coherence, right? Synchronization.
Because what people don't realize is that all the clocks actually drift. And when you're controlling a robot, a robot is going to have like two cameras, five cameras, 10 cameras, a ton of captors, GPS, and so on. Um, and if you want to train correctly your robotic AI model, you need to have all those that are in sync and currents. And what we've done, and it's all the stuff that we learn on VC in broadcast in real time and impact ts that kons know well, is that we account for clock drifting. And so when I record a kyber stream, a robot, I am sure that it's going to be predictive in the way you played back. And so when you're going to do recording and training of your AI model, you need to be sure that every time you retrain based on the data, the data is going to stay coherent and clocks actually drift.
Like the existing solution works with one camera. Once you're going to a five or seven, it's more complex.
>> So you want to make sure that the visual snapshot perfectly matches the time it actually happened.
>> Exactly. And also if you're going to control right I do something on robot I need to be sure that it is actually happening at that precise time right and so we have on the the server which would be a robot a time of like rettime stamping mechanism accounting for clock drift for that right so that's one of the use case um of kyber to to control robots um I also see like remote drones remote uh whether it's defense or non-defense remote cars remote submarines there There's many places in industry or remote surgery where the expert cannot go everywhere the machine is because either dangerous or it's too costly. Right? So you you allow people to have machines next to you, right? The goal of Kyber is to make distance disappear um because it's either projection of skills or projection of power, right? So imagine we we all like you've seen the meta reban and everyone else, right? you need to stream there, right? Because you're not going to run anything over there, right? So, you need GPU power, whether it's on a cloud, on a phone, to stream that. And so, all of these use cases needs to be not about extremely low latency, but real time latency for video. And so, that means you need we're toing with the encoders so that the encoders encoder frame in for milliseconds. and and Kiran with his company also goes under those type of license of of latency because you need to optimize at max the local latency right because it's the decoder the encoder um and so on um because this time is going to be added to your networking time um so and it's not just about low latency it's also about like reliability we do clever things like uh forward error correction, right? So forward error correction is you over transmit a bit of data, right?
A few percent um and while over transmit you're allowed to lose some packets because all of that is very difficult over a internet network uh where you're going to do things very far away. Um and if you check that all packets are delivered, you add a ton of latency. If you don't want latency, what we do is that we over transmit some data that you can retrans reconstruct on the client side when there is um things that are broken, right? So um and we um a few a few days weeks ago we were doing the demo around Las Vegas for the CES about we had a a rover that is fully 3D printed. It's very simple. It's a car, right? It's a small car with a um a telescopic arm and it was actually controlled from France, right? And the the video uh was uh with a webcam in a very small server, right? A small a small PCB was basically running and send that to someone that is on the other side of the planet. Uh and so there is so many use cases. You can also think about having AI who are going to control many drones and so on. And the technically we need to be amazing in video. We need to be amazing at networking. We need to care about any milliseconds in networking, in encoding time, in decoding time and also you need to integrate very low level.
>> So sync everything together well. But how like what kind of latency can you get to like what when you say milliseconds, what what's the goal?
>> So my goal is 4 milliseconds glass to glass latency. Um >> what's glass to glass mean?
>> So it's easy, right? You have a computer which is running a program, right?
Probably a video game. And this one is actually running, right? It could be it's an example of a robot, right?
>> And you have the replicate that is done to the network >> and and you want >> if you take a a 1,00 Hz camera, you can take a picture and you want that to be at 4 milliseconds. 4 millconds means 240 Hz, right?
>> Yes. Not >> um so far we we achieve um 7 milliseconds from a Windows to Windows or window to Mac. Um and if you look in the timing most there is around 3.5 milliseconds inside the Nvidia uh hardware encoder and around 2 milliseconds on the Intel decoder right so like the encoder plus the decoder is already 6 milliseconds right so in order to go down we need either to have some other type of codecs um or some better encoder that are faster uh but 4 milliseconds is would be the growl That's pretty nuts. I love it though. I I don't think anyone's ever achieved that, right?
>> That's fast.
>> You can achieve that with custom hardware, with SDI, with professional hardware. But I want that to work over the internet. I want the to work with any robots where you're going to have a small Jetson Nano in it or or N150, right? I want that because there is going to be millions of robots or drones are just rolling robots or flying robots or or swimming robots, right? It's just you a machine that you control and in order >> either you need to teleoperate them or when everything will be fully autonomous, you need to te observe them, right? You need to check what's happening.
>> Yeah. And in my view in the future like all those remote cars will be teleobserv observed by an AI model which is just going to say well everything is gone good and when it's not good say hey there is a problem and then you have an operator right and this is going to be about safety right when you have your humanoid taking care of your grandma or my grandma I want to be sure that everything goes well and I'm not in those type of horrible scenarios where the robot is dangerous or when I'm driving I I want like the car to to stop when it should stop and if needed someone takes care of that right and so there is so many cases scenarios about real time and so the goal of Kyber is to make real time control of machine distance disappear >> it's it's incredible and some of the same technology some of the same ideas we've been talking about is is connected to what you're doing >> and for for me it's amazingly challenging right because I would say that on video I'm doing okay but networking I have so much more to learn, right? It's uh um about like congestion protocols, bit rate adaptation in real time. Um but it's it's quite funny and and so I created this project and and we we have fundraised in the US of course but it's open source, right? This is important, right? Like we've not said that, right? But everything on Kyber is open source.
>> So how do you make money?
>> It's a dual license commercial and AGPL, right? If you remember what you said about about uh licenses, uh basically if you want to use Skyber in your product, you must have your full product open source. If you want to use this amazing technology but not open source, you pay the commercial license, right? So the small people or the the the hobbyist and the the very small guys who want to do that, they can use the technology, they build something that is open source and cool. That's awesome. And if you're a large company, you're going to have the support, all the IP, the right modification and so on. So, um yeah, it's really cool. And and and also I'm building robots and I love that, right?
Like like the the rover we have is 3D printed. Um we are finishing a demo where it's an actual wing, right? Like a type of drone wings that is also fully 3D printed. Um we are trying to do a a [clears throat] sailboat that is 3D printed. Uh, and and and we'll work on some humanoids. Of course, they're not going to be very good robots, right?
It's not our job, but we're here for everyone to make robots. Cool. Ah, you're talking to the right guy. I love robots. There's a bunch of them upstairs. Uh, and tell is going to be really, really important, especially as the number of robots goes across the world. So, 100%
Videos Relacionados
Agentforce NOW AMA: Build with React and Salesforce Multi-Framework
SalesforceDevs
490 views•2026-05-28
How agent o11y differs from traditional o11y — Phil Hetzel, Braintrust
aiDotEngineer
450 views•2026-05-28
Re: 🗣️📍theprophedu📍2026 GST 103 CLASS (E-EXAM REVISION)
theprophedu
636 views•2026-06-04
WEB TECHNOLOGIES UNIT-2 | Degree 4th sem BCOM Computers web technologies unit-2 full explanation💯✅
LearnwithSahera
1K views•2026-05-29
More tests are always better? How to use AI to identify tests that bring little value
Alliance4Qualification
335 views•2026-05-29
Search Algorithms Explained in 60 Seconds! 🤖💨
samarthtuliofficial
218 views•2026-06-01
People of Game of Thrones using JavaScript DOM
AltCampus
296 views•2026-05-30
Introduction to Problem Solving Part - 1 | Lecture 1 | Intermediate DSA
ascensionix
107 views•2026-05-29











