Dr. Li offers a profound look at how reinforcement learning transcends the limitations of traditional control by turning temporal history into hardware robustness. This transition from "cerebellum" to "brain" is a pivotal step toward achieving truly resilient and adaptive embodied AI.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
VINMOTION INTELLIGENCE PIONEERS SEMINAR #1 | From Robot Cerebellum to Robot Brain - Dr. Zhongyu LiAdded:
Good afternoon everyone. Uh we will get up very soon and welcome to those joining us on Zoom and uh hello to everyone here in the room and during the Q&A session uh if you have any question then simply raise your hand uh go in front of the the screen ask the question to the the speaker.
Can you hear us clearly via our microphone?
>> I can hear you. Yes.
>> Yes. Thank you very much. So, good afternoon from Vietnam. Uh we are joined here with uh all our engineers here from Vietnam. And also joined here with you on Zoom is uh our uh Dr. Kun and also Yuch Chen, our CTO. So the first thing is big greetings to you and welcome to the first episode of our seminar. Thank you so much for joining.
>> Uh thanks. Thanks.
>> Uh yeah, nice to meet you all. Nice to meet you. Nice to meet you.
>> Okay. How are you?
>> Pretty good. Pretty good. How are you?
>> Good. Okay. Good. So I think uh we still have time but uh >> before moving to the main part let me see if I can check something.
>> Okay so just to go over a bit of the format here I want to make it comfortable. Okay no pressure.
>> Mhm. [laughter] Um so the first part of the uh seminar um I will try to make it uh kind of more interactive between you and me. Okay. Uh so maybe during the presentation please allow me to interrupt with question if you don't mind.
>> Yeah sure.
>> I don't want you to spend 40 50 minutes just to present. Okay. So um that's what maybe sometime I will interrupt you so that I can ask question and I think that's could be a good time for for people here at bin motion and also joy online for the seminar to understand it a bit >> and then at the end I will give time for people from the conference room there uh to ask question okay >> okay >> um so that's the first part but then um the second part I also want to spend a bit of time to basically have a casual chat with you. Okay.
>> Um so that maybe some people here at Vin Motion can want to hear from your advice. Okay. To see how can you become so successful in the field. Give us the advice for people in this exciting but also very noisy time for robotics. Okay.
>> Yeah.
>> Okay. So let's maybe just wait for a couple of more minutes for people to shelter down and then we can can get started.
>> Great. Great.
>> Yeah, sounds like sounds great.
>> And just for to let you know this is our very first seminar. Okay. So you should [laughter] >> Yeah, I'm quite excited. It's my honor to be the first [clears throat] speaker here. Okay. Lot of pressure. [laughter] >> Yeah. Yeah.
>> Okay. So, uh let me know when when people settle down. Okay. So that we can get going. Maybe give three more minutes and we can get going. Okay.
>> I I will give u >> let me um >> Okay. Uh are you ready? So hello everyone welcome to the first session of the ocean intelligent harness seminar.
>> Um hi Chen I think that a bit of uh echo from I'm not sure if the microphone is connected.
>> I think I think it's a mic. It's a mic.
>> Yeah it's a microphone. So it's not connected directly to the zoom. So the voice that I hear from the zoom is not really clear.
Oh, can you hear me clearly?
>> Yeah, it is slightly better now.
>> Okay. So, hello everyone. Uh, welcome to the first section of the Votion Intelligent Pioneer Seminar. uh a glo a global uh initiative by votion designed to bring together the world leading researcher and uh our broader community to exchange the ideas share breakthrough and shape the future of the robotic and AI and this seminar series stand at the heart of our mission pushing the next pies of motion intelligence and thank you so much for joining us today.
>> Okay. Hi Tin. So uh for the next part I will um introduce Professor Lee here.
Okay. It's fine. At the same time you still can fix the microphone. Okay.
Okay. Um thank you all for joining. It's my great pleasure to introduce to you Professor Jongju Lee to be our speaker for our very first Vin Motion seminar.
Right. Professor Lee is an assistant professor at CUSK, Chinese University of Hong Kong and um he's also currently a co-irector of Hong Kong embody AI lab which is a very exciting new initiative from robotics and physical AI in Hong Kong. Um so to my prediction right so I think this will soon become a top research robotics center uh in Asia and also in in the world and um before joining uh CUSK um he also got his PhD from UC Berkeley until 2025 and interestingly under the supervision of professor Kosho Shinut. Okay. And some of you may not who is also my uh PhD advisor, right? So, uh why many of you read lot of well-known paper from professor Kosho in recent years? Uh Jungju is actually one who really brought RL to the table for for the lab.
Right. So and to be honest in in my opinion is the one who make a significant contribution to bring reinforcement learning to the field of black robot and humanoid robot that we are seeing today.
Right. So u uh without further ado it's my pleasure to welcome professor Jongju Lee to our seminar. Thank you again for the time and for joining us. The floor is yours.
>> Uh thanks thanks for the invite and thanks for everyone. s for immune motion. I'm very happy to be here. It's actually it's my great honor to give a talk here as the first speaker of this uh uh series. Um yeah and as you probably may know that you know we have some uh we have a longstanding collaborations with uh Quang's lab especially with EU uh during my PhD.
Yes. that we keep you know exchanging research ideas you know discussing you know how to do uh robotics like robots we are very happy now you know uh we actually recently just start started our careers yes is now a leader uh tech lead uh you know in votion you know and we I I recently just started my lab uh in Hong Kong yeah so um yeah so my title here is like from robot from robot cerebolum to robot brain uh and uh uh let me just some self conduction I thinking already done uh so I just I just skip but yeah very interesting things is that uh my PhD advisor is professor kosas who is the same also the advisor of kuang and uh kosher always told us kuang story when he was a PhD he told us to be uh very hardworking you know to be uh you know passionate in in research and he always use Quan's examples in the lab. So this is the thing I want really want to share. I'm very excited you know to you know uh reconnect with Kang uh here. So um during my PhD we actually uh done quite of interesting uh work at Berkeley. So we actually one of the first who introduced you know reinforcement learning to control the real world bipad robot and uh we in 2023 uh July uh we actually the first enabled bipad robot to finish the uh 400 meter dash um in real so this is actually a very uh very is a very big things that at that time so the robot can finish one the 400 the one entire lap uh in I think 2 million and a half and now I think 2026 just this year I think there's a half marathon held in Beijing now the robot you can see uh they can finish the half marathon like within hour which is a very big lift uh from at that time but all the algorithms uh currently used it's not there's no much that not much difference from what we actually developed uh back to you know back to the time and uh we also um try to you know explore the uh possibility of applications of lack robots at early stage. So at back to 2021 we at the first who enabled the coach pad robots to be a guide dog robotic guide dog to guide a blindfolded person to navigate in a contracted environments. So at that time it's a was a very very big uh uh impact there at that time because it's the first time that we show that the lack robots can not only do the control problems the test bed for control but also can be very helpful uh for the entire community not only for the techn technical committee but also the entire you know community such as you know uh blind and the visually impaired community and even the animal protection community, they also uh they're also very are very interested in this type of work because they really want to free the animal from walking for you know for the for humans.
But you know Robbo dog can be a looks like can be a very nice replacement for this. And the interesting thing here interesting things is that still in the in this year in the same time as marathon half marathon in Beijing they also show people there some company also show some demo on a codipai robot to guide a blind blindfolded person to navigate. So this now they try to commercialize this kind of um a type of idea of robotic guide dog which is I'm also very excited about. Yeah. And uh beyond just simply locomotion we also have some you know fun attempts on the L robots such as let enable this robot to play soccer. So how to do the soccer shooting, how to do a goalkeeping. We have some very fun you know attempts on that. At that time we don't have really have humanoid robots. So we only have coipar robots but we do lots of attempts on the robot suckers using lac robots.
Uh as you can see now these days we see so many you know sports uh attempts using humanoids uh ping pong uh tennis ball uh bminton uh soccer basketball every every sports you can think about people want to use a humanoid to do the to play sports. So but all the algorithms uh we see um as as far as I can tell have no much different from what we develop on the coach pad robot at that time. So that this news was like 2022. Yeah. So really uh the things that really we actually have very early research on how to enable robot to perform dynamic locom motion controls and how to make this robot become useful and for fun. Okay. So first of all you know back to the research parts. So why do we really need a lack robots? So I don't think we need to explain this to you guys because you are working on humanoids and as far as I can tell you are the best do humanoids in Vand even though you know the great Asia in Asia region. So I think you have the belief that humanoid will be one day you know be um you know around us right. So if you want the robots to around a human be like to walking nearby the humanoid will be the best embodiment because it looks like it resembled humans. So all the environments are designed for humans you know indoor environments uh even the outdoor environment they are basically construct for human uh embodiment. So a robot with a human shape um just like a humanoid which is a humanoid can be the best embodiment to walk around us to coexist with our humans. So this is for the application size. Uh but for the you know technical size um I think the values of humanoid is that the morphology resembles closely with human. So if we really want to use the whole body motion data generated by humans, this will be the humanoid will be the we have the minimum gap to utilize such a uh humans whole body motion data. And why this is important because we as a humans we actually generating tons of data every day and this data coming from our body. uh we have legs, two legs, we have two arms, we have a head. So we we're generating this data like using our human embodiment to interact with the world to inter with other other people's. So we actually generating tons of data every day and this data will be can be recorded know either as a YouTube video on the internet or you know helocop data uh or or just you know egocentric data like we if you can wear a glass you can actually recall everything right so human actually generated tons of data based on the human whole body motions and you want to leverage this data you know humanoid will be the minimum gap to utilize this data. So I think for the technical wise we don't want we want to explore the possibility of the humanoid because we want to unlock the potential bought by the you know humans whole body motion data. Okay. So my vision uh is to create the next generation of the dynamic robots in the AI era. So my past efforts actually expands on these three axis agility safety and intelligence. So agilityities that how to enable this robot to to perform ala and the dynamic bipad locom motion uh bipad locomotion skills in the real world. How to make it run faster? How can make it jump higher longer and further? Sorry. And uh once the robot is able to move around. So we also think about how can we enable safety on this robot. How to enable this robot while to make sure that this robot can uh avoiding can perform can move around while avoiding obstacles nearby.
Have some safety guarantees and uh once is able once the robot is able to safely navigate in environments then we think about how can we advance the intelligence of this robot. So how to enable this to interact with the objects as well as other agents.
So uh in the future uh we want to keep further expand these three axis because we we can summarize my previous work as we actually building a robot ceremon and then I I believe this is all more or less uh ready. So now we can step further to build up the robot brain which have the higher a more advanced or higher level of intelligence. So we can have we can want to do the you know advanced aesthetic intelligence uh to make the robot to better interact with the scene and how to enable this robot to uh better understand the priority of the safety when the robot coexist with both the human and the environments. So human must have the higher high uh level of the priority of the safety than the robots and u and the robot want to you know sometimes robot probably want to uh have some awareness of that the robot need to sacrifice its own safety in order to uh ensure the safety of the human which is who have the higher priority which is also be very interesting uh directions.
So but then however you know um solving this is uh very very challenges because you know lack of commotion is a very challenging task uh problems the you know this is known as a marabox paradox um which is well known paradox basically means know it's very easy to you know sometimes the easiest thing for a computer probably could be a very hard things for a human and vice versa. So the computer can easily win the chess game. Uh but it's very hard for them to get the same level of the mobility and the perceptions as the oney old child.
So why is this problem very hard? So we can actually use the Cassie as an example.
So Cassie is a bipad robot uh which is many as the which is many test we used in the previous days. So this is actually the one of the first not one of just the first commercially available batai robot in the world. So at that time there only like seven labs have cassie uh in the US. So fortunately uh in our lab uh we also have the I have we have the asset to one of the Cassie. So this is two this bipart robot and you have a 20° foot in total. You have you know in each leg you have five actuating motors denoted by the red arrows and two passive joints denoted by the you know uh blue arrows. So the passive joints are connected by the leaf springs.
There's no actuators at that joint. So all the research we develop on the cass actually shape what we have seen in the humanoid of the humanoid these days. So basically the research in Cass actually shaped the current the re the recent directions of the dip robots and the humanoid robots. There are some uh very you know ahead of time designs of encases such as the leaf springs uh the passive joints and uh and the you know the motors the actuator designs um you know the psych I think it's the cyoidal gears that the robot used u some proved to be a very u very successful design but some actually proven that with the pants goes on with not necessary So but uh all the research developed on Cassie actually you know have very large impact influence on the uh humanoid communities and anyway so it is floating base have 20 degree freedom in total so it's nonlinear system for sure when the robot start to walk it become a hybrid system so it's under actuate okay because it have floating base the base is holding in the air the robot really need to make the contact with the environment ment in order to move and also have some passive joints. Okay, but we can you know uh use London to write down his dynamics which we can obtain this uh line of equations which we call uh you know a full order dynamics models of the of Cassie. Okay, these are kind of general uh equations for multilink uh dynamics. So it's basically a general form f equal to ma.
Okay, but it's looks as very simple, right? Just like one line of equations uh to describe the dynamics that that describe the model of it. But actually if you write down all the details of these equations, you actually obtain a huge file. Okay, so this is this part is a very only a very very small part of this equations of the mass matrix of these equations and in fact we actually need 3.6 6 megabyte of C code just to store these equations not just to evaluate just to store these equations.
So such a high dimensionality and high nonlinearity makes you know tackling the problems on CIA very very hard very challenging problems.
Okay. So how you know previous people tackle these problems. So we always use Boston dynamics as example because they are you know they're the best in the you know the the in the humanoid field. They use the model based control. They use the models we just just described we just we just showed and they try to use this to predict the future of the trajectories and do the optimizations based on the prediction use based on the models. Why you know have some constraints such as you know uh uh the non-colision the the self collision constraints support region constraints uh whatever. Okay. So basically the modelbased uh framework use the physical por of you know of the dynamics equations put them into optimization problems with a cause and constraints and want to solve such a uh so-called optimal plans online. Okay.
To in and then let the let the robot to execute to optimize the plan. And it works well actually because the B dynamics show that using this method they can make the robot dance. They can make the humanoid hydraulic driven humanoid to dance to pu to do all this fancy stuff. Okay. But actually the challenge is here that they really need a very good models because the model we we derived the dynamics model we used is actually um no lots of parameters we don't know don't really we don't 100% sure for example the link mass may affect the mass matrix but the link mass itself we don't really know the exact number of the mass of each link and humanoid have you know 20 deg no 40 deg freedoms for example like easily can go to 40 deg freedoms so there are many kind of uncertainties here so it will be further coupled by you know l joint you know we don't really know the joint fusion ratios joint damping ratios uh center of mass offset the position of the center of mass of each link these numbers we don't really know all the equation we derive actually from the you know from the design file from the cat file we So we don't really exact know the numbers there. Okay. And these numbers will all accum these arrows of parameters will all accumulate it. So which will reflect that the models we use for positions may deviate from the robot we actually going to control. So how to tackle this? So there are so many the previous are so many kind of ways.
So both dynamic using a sequential of the optimal uh optimal controllers from a simple model to complex models from a high frequency uh from a low frequency plan oposition to a high frequency ops but in the end they really need to have one uh essential step that is to tune the parameters on the hardware. So the human will deploy their optimiz optimal controllers on the hardware and observe the you know the performance of the controllers and then human have their own intelligence to tune the parameters to improve based on their knowledge, their training, their expertise to tune the parameters of all the you know of the order all the control they use all the all the options they use. Okay. So this is actually and actually it can work it can work well because the boss dynamics is a very good examples but they need to have you know 100 top the best engineers in controls and robotics to make this happen. Okay but what if our models is so bad we cannot what what will happen.
So this is a recent example from our lab is actually a paper just last last year.
So this is a Berkeley humanoid which are very small humanoid bipad robots were developed from scratch in house in the lab. Okay. So the students actually Chayen is leading these projects. He's a new he was a new PhD at in the lab at that time. He developed the entire robot from scratch starting from the motor actuators. Okay. and uh he's was very excited about uh modelbased control and he once he finished the robots he start to build up his own MPC uh or optimum controllers and um it works perfectly well in simulators.
Okay. And but once he de deployed such a controllers on the hardware the entire lag was gone because the one of the motor just blow up.
Okay, because what the reason is the model went so off that the optimized torque go unbounded and we directly hit the mechanics limits of the robots and the entire L was just was gone. So then the the student need to go back and repair the robot retwe tweakering the you know the MPCs he developed and hopefully and then after amounts of repairing pro probably um hopefully the new controllers the new models will not blow up the robot to give him a chance to fine-tune to tune the parameters on hardware which as you can tell is quite tedious. Okay, so this actually one of the reason why the entire field move that slow in the past decades because you know people really need to iterate this process every time you know has their build up their own models uh get very nice engineering optimizations on on NPCs deploying the hardwares and if hardware goes off it have broken repair the robot which takes like several months and then redo the iterations. So this kind of iteration is super super super slow. So it's why it's one of the reason why you know in the past decades the entire back bipad commu community is very limited to limited to the very several uh labs in the around the world.
Okay. This is one of I think this is one of the reason. Okay. And then um after this I told Chayen who uh the the student building up building this told him that you know we're actually very our lab is very good at reinforcement learning why why don't we directly try reinforcement learning for this and then he go back and then train the policy train the robot in simulators for two two days and then directly deploy on the hardware perfect working perfectly fine at the first time he has deployed the person on hardware just after two days training. So that that really kind of changed challen's mind about you know the balance between trade-off between the MPC or the model base and the area based controllers and the question is you know what has changed you know to make this happen to how what is changed to make the developing you know a controller a locomotion controller for humanoid that scalable in these days. So if we go back and zoom out in these kind of timelines.
So we have uh you know we have a we have a around you know 2020 and 2023 2021st we actually you know we are actually one of the first who among other colleagues you know there are lots of notable work in the communities but we actually one of the one of them who introduced the reinforcement learning in the lack of commercial controls in the real world and at that time our work so as showing here in as a craft paper we actually show that the rope the area can not only control the bipad robot but also bring a very significant robustness of in the real world. So in this video you can show that the robot almost fall over and the there is a you know leash and get we actually the leash can the the robot get pulled up and the robot can quickly recover to a stable gate. So at that time there that was a huge surprise to the community. So this first for two reasons. First this is definitely beyond the capacity of MPC. So if we do optimal control with any model based controllers we for nonlinear controls we know the region of a definitely that is out of the region of traction of uh of you know of any kind of you know MPC controllers.
Second reason is that uh previously people only believe I can only control the robot humanoid robot in stimulations with a quite funny gate. So this was a surprise to the both communities. So then there you know they actually attract eyes from the you know from everyone in the in in this field and many people come into this field try their own algorithms uh test on their own uh robots and we actually observe a explosive development of based control in the lack robots. So uh in 2024 we actually summarize a paper to you know summarize what our the development of our algorithms the you know the recommendations of how to use IO to train your robots and the lessons we learned and show all kind of different novel capacities of a battery robots uh in the real world and after that people start to move towards more tackle the more challenging problems which is the humanoids.
So but you know in the past five years so I has become actually another paradigms around the model based control in the lack local motions. So uh we can let's go through this journey to see you know what happened and what's some what the details behind uh this shift.
So say that's an overview of my previous work. Um I have to have several parts of my doing my PhD. I actually studied modelbased optic controls for the first two years actually. So we use the we can use the model base the the full order models we just introduced and we can but you know it's too complex to optimize online. So we leverage offline optimizations and then online we have a checking controller to enable the you know the robot to track the optimized trajectories. So in this way the robot can do quite exp uh expressive motions and complex motions and we can also come up with some reduce order model to simplify the robot dynamics and just to capture the robot working dynamics and we can use this simplified model since this model is simplified so it can be embedded into a NPC the optimal controllers in real time. So then the we can enable this robot to uh perform autonomous and navigation in the real world while avoiding you know obstacles nearby and crouching down travel through on travel through you know high constraint environment while avoiding obstacle nearby while I'm maintaining the walking stabilities everything uh happened in real world and in real time okay yeah so and then I switch my focus shift my focus a bit.
I look at another another the end of the another the other end of the spectrum which is to give up the models and the purely enable the robot to learn the locomotion skills from trying the arrows and then uh the now the bottleneck becomes you know we really need to develop to train the robot in simulations and the bottleneck becomes how to transfer the policy trend in simulation to the real world and we have a line research to tackle the this into real problems. And after it's done, you know, the the robot can do quite uh dynamic locomotion skills in the real world.
And once the robot is able to do all kind of, you know, controls, we then think about how can we extend the intelligence of this, how to enable this robot to interact with the objects and also we interact with other agents.
Okay. And finally how to bridge the advantages uh you know how to bring only from both sides from the model base from model free and then we come up some safety um method and how to ensure the safety with a learn based controllers.
So uh to summarize my past research and also current current still current research directions that we have you know structured emotion intelligence then asset intelligence object intuction intelligence agent intention intelligence and safety okay so we don't have time to go through all of these parts so we can actually pick some uh interesting I think there will be the most interesting things to share okay so uh let's start with the you know learned athletic intelligence. So in this work we just want to answer one question. Can we actually obtain a general control solutions or for all different kind of bip locomotion skills? So back to time for is not actually obvious because for different locomotion skills such as walking, jumping and running people need to develop different models to suit this kind of a specific uh locomotion skills and the control framework is actually different. Okay, for example, for walking is a continuous uh motions. If the robot you know not losing balance at the the f one second as long as you can recover the next second it's okay for this kind of uh uh for for for for the for the robot but for jumping for example for this especially for a periodic jumping the jumping time is only 1 second the robot need to do a very quick you know take off and land with a very large impact and quickly recover to stability.
within just 1.6 seconds or so. So if the robot miss any points it will cause uh failures of this kind of of of these controls. Okay. So this is like finite time stability of this system. So based on the you know the difference of the systems the control solutions back to time is actually different for each skills people may develop their own algorithms to only tackle these specific skills. Okay. And so at that time we think about you know can we actually obtain a general solutions we don't really need to change the you know the control solution for from from skill to scale but one solution for all. Okay. So how how can we actually uh obtain such a general solutions? We actually look at uh our humans. We just let the robots just like all humans to learn to walk run jumps uh through China arrows. Okay.
So then we actually introduce you know we brought idea of reinforcement learning. Okay. So now this is the IObased controllers with which is now represented by a deep new network. Okay.
So it directly output uh the desired motor positions for the robot and is passing through a low pass filter and will be utilized in a joint level PD controllers to generate the motor talks on the hardware. Okay. So you have se for this policy for this controller uh which is the IO based uh controllers we we call it policy for this policy we have several inputs so first is you know the commands the where the robot should land let's if you use if you use the jumping as an example is the where the robot should land after a jump for if you we use walking or running as an example the command becomes you know the velocity the robot going to track vx vy turning your rate of this uh these commands. Okay. And we provide a reference motion. Okay. For we only provide a single reference motion for the robot. And uh for jumping in case we used animation. So just a single in place jump animations. So this is not dynamical physible. It's just a kinematic physible. It's just animation.
It's handcrafted by human animators.
Okay. But we just uh utilize this as a reference motion means that the robot take a reference of these motions. They want to reproduce this motions while maintaining dynamical visibility.
Okay. And we provide the robot some feedbacks. Feedbacks includes a short four time step robot input and output histories. So the input of the robots are the policies output are the policies actions. Okay. Is the desired multip positions. Okay, the output of the robot are observed robot states including uh joint positions, joint velocities, IMU readings, etc. Okay, we also keep track of a long sequence of this IO histories that last about 2 seconds. Okay. And this sequence of the uh IO history will first encoded by a 1D CN a temporal CNN sorry 1DN and the latent will be concatenated with other input and the pass through this MLP actor base. Okay, this actor this design of the policy emphasize the importance of both long-term and the shortterm IO histories. Okay, if you look back, so at that time was I think we developed this kind of architecture in 2022 or around 2022. So but nowadays we know transformer. This is actually a very very simple or naive way of uh of a transformer if you if you look at you know if you actually look at you know the attention mechanisms but this is a but this is like very small size of parameters okay which can be you know infer run online because the entire policy need to run about we actually make the the policy need to update at 30 Hz and the low-level control update at 30 at 2k htz okay And what we want is that after this policy is trained, we can directly control the robot in the real world to track the command the given command to land. For example, that given a landing target, the robot can just jump to there. A different landing target robot can just jump to there. Okay. with using the same controllers.
Okay. And you know uh real realizing this is quite challenges and the one of the how to tackle the challenging is that we actually um one of the solution is to you know one of the um one of the solution to make this happen is to leverage the long IO histories. So why is very important? So we and we actually can look at look back at this kind of uh we can actually look back of our dynamics models. We can regroup the dynamics of the models we just introduced and we can actually find some we can find some input outputs. So inputs are the you know the motor talks.
Okay. Uh motor talks is characterized by the desired motor positions which is utilized in the PD controllers. The outputs of the systems of the robot are the robot states, joint positions, joint velocity, joint accelerations. Okay. So if we keep track of a long sequence of this input and output pair, we can actually fit in the parameters of the models in these equations such as the mass matrix, you know, cororics, you know, Jacobian and external force. we can actually you you this this sequence of input output pair implicitly characterized the parameters of the models. So this is the reason that you this information this long sequence of emission implicitly encode the models the parameters of the robots. So then the policy we hope to is is going to adapt to the change of the dynamics in order to accomplish the commands in order to accomplish the control objective which is checking commands. Okay. And you know including the uh landing target as well as the reference motion. Okay. So with this informations we're going to train the policy.
So we first develop develop the robot in simulations. Okay. At the first stage in simulations we just let the robot to check this single jumping in place animations and we leverage the idea of our human to learn the locomotion skills. Okay. We just let the robot to try randomly try the you know uh with the with the information we give the long histories the IO histories the you know command the reference motions with this informations we let the robot to try by itself okay the randomly explore different actions and if these actions bring a stable jump so the robot actually while you know being staying close to the animations to the reference motion we provided the robot receive a very good reward If the robot fall over, the robot will receive nothing even a penalty. Okay? Just like our human kids, we learn to walk. If if the human if the kids can imitate, you know, the adults walking gates, the the the kids may give some get get priced, the kids will have with the rewards. If the kids fall overs, they will get hurt.
So this is some of the penalty. So the so with the iterations goes on human kids learn how to walk with a good reward. Same as the huh robots with the iteration goes on the robots learn gradually learn how to maximize this total the reward it totally gained by maximizing the you know by minimizing the difference between his all motion to the reference motion while maintaining uh stable stability. Okay. So once this single scale is trained, so we actually not want to limit it too much from this single reference motion. But actually the interesting things I want to highlight here is that nowadays if you see all the dancing the humanoid dancing video demos, all of these kind of motion tracking videos, people just stop at this single uh single task stage. Okay, one single reference motion robot check deploy on hardware done. Right. This is this is basically how the the most commonly seen algorithms we in humanoid these days we call human motion trackings. So we at that time we show you know this is the birth of this algorithms but at that time we don't really want to get limited to reference single reference motion. We already want to go beyond simply checking one reference motion. Okay. So now we want to answer another question is can the robot go jump to another places okay not deviating while you know maintaining the stability we long to let robot to deviate from not limited to the reference motions okay so we went to the second stage which we called the task randomization stage okay we start to randomize the task and now the reward get changed so the now Other rewards say that if the robot jump to the target, it will receive a very very good reward.
But if the robot meets the target, it won't receive too much reward. You even get some penalties. And the distance to the reference motion have a less and less weight at this stage. So in this case, we leverage the reinforcement learning to guide the robot to augment more and more motion skills. not only limiting to single reference motion we provided but also to augment more and more skills by trying errors. Okay, this can actually help us to make the reference motion utilize reference more efficient in a more efficient way. Okay, so then uh we start to randomize the dynamics you know all the you know parody dynamic parameters we can we can parameterize we can we can randomize in simulators. This is now quite common uh used technology techniques uh in the IO based locomotion controls for the interiors and then after uh this stage is trained we directly deploy the policy from simulation to the real world without any tunings on the hardware.
Okay. So this pipeline um is um right now I think all of the motion tracking based method actually based on this uh pipeline okay but a little bit different I want to highlight is a task randomization stage okay because not only because now the people actually missing this part but also they actually can further improve the robustness of the policy okay because if we so if we really want to make this pipeline to work. We really want the barb policy controller to be very very strong. Okay, it should be adapt to the dynamics on the hardwares. So as we mentioned, we really need to use the sequence of the IO histories to implicitly encode the dynamics of the hardares. Okay. And if we if we use a long history of it, we inevitably need to use an encoder to encode such a dynamics and the laten may miss very important informations of the real-time feedback. So we really need to compensate the use of long history by directly bypassing the encoders with using the short histories. So the base actors have a direct access to the short histories four times that recent feedback together with the encoded long histories. So the we in the paper we show that this is this combinations with both long-term and short-term histories is very important to have the best performance on the learning perform and the caterial. Okay. So second is that the robot also need to be robust okay to the unexpected errors. So dynamic randomizations definitely one of the ways but we also showed that the task reations task randomizations can further expand the robustness of IO based policies and we actually conducted a very extensive and comprehensive benchmark and ablations in both simation and real world to show that uh how to structure this. Okay. So if you're interested take a look at our papers but the you know the answer is I already uh uh introduced okay and then this is the results. So we use a single solution framework that enable different kind of locomotion skills from running to jumping to walking. So this is the uh demo of you know robot directly trans from standing to a sharp to a fast running then do a very sharp turning.
Okay, the turning is like uh the robot can finish this sharp turn 90 degrees within five wrong steps and this is not included in the reference motion. The robot learned such kind of sharp turns in the task randomization stage. Okay, so this is the we also try 400 meter dash. This is the first time a bipart robot finish one entire lap in the human histories. Okay. And the robot actually finished this in 2 minute and a half.
Okay. So you can see that we're actually controlling the yaw angles of the when the robot is running and the robot can actually run over different terrains and uh you can quickly switch to a you know standard after you finish the line.
Okay, let's finish the run.
We also try show you know the fast running days uh for 100 meter dash. So in this case you can see there's more significant flight face and uh more significant large impact of the robot lands and uh this happened periodically.
So the robot need to maintain balance with this periodic impact while you know maintain stabilities which is you know quite cool and the robot finish this 100 m dash within 27 seconds and the cool things is that I can quickly switch to a you know standing gate uh after after this using same controller same policy okay and the robot can also run over different terrains the robot don't really know the doesn't really know the incoming terrains So it can actually run while adapting to the change of the terrains because we have a histories of the input output. So this the change of this sequence actually encode the change of the terrains.
We also do some quite cool robustness test when the robot is running. we can actually perturb the robot laterally.
And since the robot have learned um not only limited to to the you know the provided the running straight uh reference motion but also running laterally so robot can quickly react to the lateral perturbations. So this is another examples using uh for bipad jumping. So these four videos are actually realized by sand controllers.
We just give the you know the different targets random targets robot can just jump to it. So these four videos actually have two uh I can just uh show it again. So this four video actually have two world record at that time. One is the third long jump standing long jump 1.4 m ahead and one is the highest table jump. You can jump jump to like point.44 44 meter per table and all using the same controllers and in the papers we actually show 19 different jobs using the same controllers. Okay.
So we also uh show some working you know experiments. This experiment is very old back to 2022. Okay. And at that time there are lots of novel you know bipo locom motion capacity such as you know switching automatically switching from a standing and walking. But I think the one of the most important thing I want to highlight is the persistency. So we actually evaluate this working controllers over 490 550 days like around one year and a half. Over this one and a half time span we just periodically test the you know the control performance on the hardware.
So over this time long time span the hardware may change. So we have jumping experiments, running experiments, the motor may get wear and tear, the motor frictions may get changed over times but regarded of this changes of the dynamics parameters the control performance is actually consistent which means that the tracking performance the velocity checking are quite consistent and quite good over this long time span which means that our policy is able to adapt to the change of the dynamics over this long time span which I think is very important to the industry perspective.
before because you want to do a mass productions, you really want to make the policy to be consistent. Okay, you don't really want to, you know, send your engineers to, you know, to to tune the parameters and tune the policies uh robot by robot. You really wanted the robot itself to be able to adapt to the change of the dynamics and the use of the IO his I think will be a very good solution for it. Okay, so some takeaways, we actually have some general locomotion framework uh framework using reinforcement learning. We can do periodic motions, a periodic motions, static motions and transitions or from all kinds motion to uh static motions.
We highlight the proper design choice to bring the adaptivities and we also highlight the importance of the task transitions to uh to further improve the robustness and we show some novel by locomotion capacities in the real world.
So this is on the cover page of AGR and also on the list of the most papers of ag in the past three years. Okay. So once rob have learned you know.
>> Hi hi jung can I have a quick question before we move forward. So this is very impressive even given some recent development of of reinforcement learning right. So my question is let's say if you um you know have some reference with some recent development of reinforcement learning what lesson that you think is still valid until today from the work that you just present. uh I think the pipeline uh we show is quite general and I think all of the recent development actually not does not deviate too much from the the pipeline we developed and in fact actually built on the uh based on our previous research I think there are several takeaways I really want to highlight one is that motion tracking is very general solutions actually >> so we provide reference motions uh we don't really need to train too much of the framework that we can then the roboc catch actually reproduce uh you know different kind of locomotion skills. So this is very nice things. So but there are also some other you know directions because dur that time there was a discussion that whether we want to use reference motion or not. I'm not sure people still remember but during my re age of research there's a debate whether want to use motion tracking or not.
There are some of the researchers say that claim that uh the robot want we want want the robot develop their own gate by whatever by whatever whatever method but turn out to be that reference motion or motion tracking is more scalable. Okay. So second is the use of the IO histories I think is very important. So nowadays people use you know several steps but I think long histories uh is is very important. Okay.
>> Okay. And third is definitely the task transitions. Okay, dynamics transitions is the most commonly used one but is you cannot infinitely enlar enlarge the range of the dynamic randomizations. Right.
>> Okay.
>> Yeah. So tations give another way to further expand the robustness. Yeah. So this is the three takeaways. Yeah.
>> Okay. That's so good. And also I particularly interest in how it can do transition very well between fast running and standing. Right? So as you you know recently people may have different data set to do human line walking running so on so forth but I think it's if you do training relying on motion mimicking right it may overfeit to the motion and it may be harder for the the pipeline to learn the transition right. What is your suggestion on this uh kind of problem?
>> It's very hard uh to be honest IO is very hard at is very challenging to use reinforcement learning to learn multimodels uh distributions. So you can tell dynamic motions is one distributions static steel is uh is another you know uh distributions. It's very hard using one single policy IO policy to learn both. So this actually the re we are the reason we highlight we can do transitions from static motion to dynamic motions and back and forth and at that time we we are very addicted to IO. So we use IO to fetune to we keep use we tune the IO very well. We tune the reward we tune designs and we actually document all of this in our papers. So this is very very detailed how to how to train how to combine different skill into one one policies at that time I but things actually change these days I think two things actually change so which which can actually make it more scalable to more skills uh you know as you can tell we show although we is a general framework but it's still skill uh still skills limited skill specific we can only do running uh and you know transition to standing but we cannot do both running and jumping using same controllers it's still skill specific but nowadays we see the potentials to to achieve that I think there's two reasons first is the capacity of the new network so previously you can tell we use a very we see with MLP is a still a very small newet but now you can use transformer which can be a larger deeper much deeper unit which have a larger capacity test this second is that we can leverage imitation learnings. We can use you know learn from different we can use IO to learn reinforcement to learn expert policies for individual skills and then using this to have generate tons of data based on these expert policies and then use transformer to distill uh or to learn the supervised to use supervised learning to learn all of these skills and into one neuron network. So I think this will be could be more promising.
>> Okay.
>> Yeah.
>> Thank you.
>> Yeah. Okay.
>> Thank you. Just just one uh reminder. I hope to have a bit of more time for for our team to ask your you some question as well. So maybe uh I think we running out of time a bit. So if you can speed up a bit we can have more time for the team. Yeah. Sure.
>> Sure. Let me just quick ref uh object inctions. So just show some very cool demos I think will be cool. So yeah, so not only to the local motions but also object interactions. So we think about you know how to enable robot to to per local motion skills while performing manipulation skills such as soccer ball shootings. So the challenging things here that you know the ball is soft if you do direct syn you you observe a quite large deviation of the ball trajectories because the ball is soft and you cannot simulate the soft ball well in simulations. So now we actually leverage the the real world data. So we let the rope we actually pre we in this work we pre-chain uh the planet process in simulations with a rigid ball and then fine-tuning it in the real in the in the real world when the robot was to shoot the real soft ball and after 100 meter shots as you can tell the accuracy become get boosted. Okay, this is a very old paper but in these days people still use this philosophy pre-tray information and phantom in the real world and we actually provide such a framework uh to realize this on the complex robot. Okay. So this this is the static motions we also combine you know the dynamic motions uh so when the boy is flying to the the goal we do we we studed the goalkeeping uh problems. So the when the boy is flying to the goal the robot need to you know in real time to select what kind of a dynamic skill you need to perform and also the anti fet trajectories uh the the robot should perform in order to intercept the ball. Okay so everything happens very fast within one second the robot can is able to realize this in real time. So this is the half second uh shot. The robot can quickly react to this uh fast combing ball to decide what kind of skills high jump uh side jump or you know side step while the you know antifies in order to uh intercept the board. Okay. We also combined ination learning policy with the IO policies for the whole body manipulation tasks. So at that time we don't really have the humanoid. So only have the coach par robots. Okay. So at that time you know but you know is still the same. Okay. We first train uh whole body controllers general whole body controllers for controls for joint level controls and on top of it we use imitation learning to learn the anti effect trajectories from human demonstration data. Okay. This is now widely used in the humanoid whole body manipulation tasks. Okay. Not only in the single agent but also multi- agent we actually studied how can this robot to you know collaborate with each other. So you can see that one single robot cannot you know lo tower this kind of a load in a you know have his own limit. he cannot power a heavy load. But if we combine like three robot a efforts, he can tow a heavy load while you know transporting the load to a given target while avoiding obstacles nearby. So this is the one of the first work using to study the multi- aent um collaborations on like robots uh these cult robots. So you can see this here we use at that time we use centralized MPC okay to plan for the to to optimize for the the the plan for the for the team okay but you know it's not that scalable because you know although the framework can scale to know 12 robot teams but the solving time explode exponentially so it's a curse of op uh dimensionality problems so uh at that time the algorithm can only solve for the problem uh motion plan like for up to three agent team. So then we started how can we actually use utilize multi- aent reinforcement learning to develop a decentralized uh planner for the team the the the multi- aent teams.
So you can see for this video from one agent team to four agent teams we are using same controllers same policies for all of the robots we see here. So the robot actually decide his own actions based on his local observations while coordinates other agents you know to tower this load. So with heavy load if you can tell you can see it's like you have three agent to do the main contributors and the one agent this force agent to tune the you know the load pose okay while navigating through this narrow space. Okay.
So, but for a lightweighted team for the lightweighted teams uh uh lightweighted load scenarios the team will actually do another formula another form. So there will be two main contributors on these tasks. Okay. And uh and if we actually varying the load the team will do the auto automatic adaptation of the formulations from a heavy load formulation to uh so from a lightweight formulation to a heavy load formulations. So you can see the the agent actually catch up to be the three agent will be the main contributors. So the idea of the advantage of using a decentralized method is that we can actually remove or add a agent in a very easy way. Okay.
So you can see that for A4 agent we remove one agent and the four agent quickly catch up and form the four three agent teams. Okay. And in simulations, we scale up to 12 agent team rope agent teams with a consistent inference times for like 30 Hz for all of these kind of scenarios which is kind of scalable.
And then we further expand to you know cooperation and competitive game. So which is the you know soccer. So now finally we have you know two agents versus one agents while playing soccer.
So robot can actually passing the ball to another agent and the ball the another agent actually completing the the soccer the shoot. Okay. And uh since >> can I have a question?
>> Yeah go ahead.
>> So in the previous slide what's the limitation that keep you to have 12 agents? What happen if you increase it more?
uh if we increase more for this one the now the solution the solving time is consistent consistent but uh I don't think there's a limitation because we actually show in simulations the reason is that we don't really have that many robots we don't have 12 quip robots in in real world yeah but I think the challenge now is because of the shape so you know you know you know there's a very large teams so >> so I think I'm mainly asking about um fundamental principle of it right. So let's say if this is decentralized >> Mhm.
>> um reinforcement learning.
>> Yeah.
>> So does that mean that even if you scale up more let's say >> 20 50 100?
>> Yeah.
>> But um the side of the problem for each agent should stay the same. Right.
>> Yes. Yes.
>> The simulation complexity may increase the training time will be slower. But I think once you be able to train it, the solution or the network or the result for each agent should stay the same, right?
>> Yes.
>> So it mean that in in principle we should be able to deal with many many different agent if we can train it well.
>> Yes. Yes. In principle yes the with increase of the agents the deployment time the deployment performance will be consistent and for each agent we don't really only uh make decision based on his own local observations. So we don't really need to m uh border you know with increased number of agents as long as the batch size the simulator simulation times >> uh you know can catch up. So can you can afford such compute. So I think we can we can we can we can scale you know scale this is scalable this method.
>> Okay. Thank you.
>> Okay. So yeah so yeah so I think uh so this is a competitive game. So yeah it's also decentralized. So we can replace a robot with a human. So yeah we really after you know in recent work we really like the you know decentralized method and sometimes we the robot can also fight back. Okay. Okay, the red one can also intercept the board and fight it back. Okay, so yeah, so my current directions I think is like you know as you can tell is humanoids. So we have you know our SHK in my current group uh you know as we we want to expand these three axis you know agility safety and intelligence and we centered on the problems that how to use data wisely in robotics. Okay we can leverage simulations you know and then we can do more advanced power skills but we can also leverage the real world data. We can record the trajectories in the real world while the robot human is at inter objects and replayed and reconstruct this in simulations and in simulation actually we can actually augment the real world data and in the real world can you can pick up a empty box but in the simulators we can you know augment it with by randomizing the you know the the load. So now the robot can is not only able not limited to pick up an empty box but also you know varying uh varying weights. Okay. So we can do we can learn you know purely from demonstrations as you can see from the you know uh left hand side but we can also further fine-tune or improve the robot uh the accuracy of the manipulation skill the robot skills by learning from the real world uh online data through interactions through reinforcement learning. Okay. So we also studied how to uh learn a large scale uh learn from large scale data with multimodalities. For example, we input the images and we uh with a natural lang natural instruction natural language instructions such as you know condition on these images and the input is go through the open doorway on the right.
The robot predict the actions that it actually will go through the doorway uh go through the doorway on the right.
Okay. And u another example is that given this image and uh using the language as an instruction you tell a robot to stop at the bed. Okay, the robot will just stop at the bed. So this is actually V models. It's actually the future of the humanoid robots robotics right human can directly give the robots a natural language as instructions and the robot can actually understand the languages and do the based on the visions the his own states and do the uh reasonable actions.
We can also fine-tune this uh with the human feedback. Say we have a uh you know language as a in uh we condition on the task where a person is moving and making gestures as doing a boxing ring.
It's a natural languages it will generate the motion the robbo motions okay the boxing motions and probably is not as good as we expect. So human can label it can give a preference like or dislike and this small kind of a data can actually fine-tune the you know the generative qualities. Okay. So yeah so in conclusions in currently we are working on you know how to leverage simulation data real world data online data and or offline data using large or scale or small scale data with multimodalities to center on the humanoid whole body local motion manipulation navigation problems and we're actually collaborating with emotions very my it's my great honor to collaborate with emotions and we actually tackled a very interesting task how to use these two humanoid robot to collaboratively tell a irregular shape in a in a contested environment unstructed environment. We can starting from the centralized motion narrators to the decentralized planners while each each agent each robot is actually do the plan his own body motions based on its own local uh local observations while interacting with the load and other agents and then followed by the base you know whole body motion controllers to uh control to comp accomplish the tasks which could be quite exciting uh reset directions right and very using very last minute they want share one robotics challenges uh the competitions we actually h we are actually uh hosting at CHK. So in this year we're going to have so called a tech 2026 which is the is the robot competitions as a challenges.
Uh we call the team from from the globe from the worldwide to tackle the one problems that is whole body manipulations using leg robots in the wild. Three key words whole body manipulations. Okay sorry three key words. One is the leg robots, quip robots with arms, humanoid with arms, whatever. Um, do the whole body manipulations which is second keywords.
Uh, walking over Chinese terrains while you know do a pick and place while you know do the trash disposal, treasure hunting and fire distinctions. All the kind of all of these kind of whole body manipulation tasks tasks in the wild which is the third key words that is the everything is going to deploy on the in the real world outdoor natural environments. Okay. Uh to this like uh you know it's actually to def we're actually going to show that what's the limits of the current whole body manipulations uh algorithms and what we can do in the future. So it's actually welcome the teams from industry as well from academia from all uh whole worldwide especially from Vietnam ven Vietnam and you know and you know anyone is interested universities all the companies uh you are welcome to join and you know the price is very crazy it's like you know the champion have uh $150 USD dollar US dollars for the for the for the for the champions and we also provide uh you know different global hardwares and as a as a award but uh yeah so if I interested also welcome to join and if you know the universities in Vietn are interested also welcome to join it's very good education process for them okay and thanks I'm happy to take questions >> okay thank you Jungu thank you so much um I will let uh the team to have chance to ask question with you but uh before doing that I would say that uh thank you so much for having the slide on the potential collaboration with Twin Motion. I like it a lot. But uh just keep in mind that we are not limited to buy 12 robots. Okay? If you can do more, let me know.
>> 100 robots will be crazy.
>> Okay. Thank you. Okay. Twins. So now if you have any question from the team, just uh go ahead and ask a question.
>> We have only 15 minutes remain. So >> uh we already have question in the chat.
Oh >> yeah, that's why we can have the the team in at at um the conference room to ask question first. Okay.
>> Okay. Any anyone? Okay. Uh you can come here and uh get the microphone.
>> So good evening. Uh good afternoon professor and uh thank you for a very informative uh sharing. And I am an and I'm a uh AI engineer at Vin Motion. And I have a question for you.
>> So can you share some experience or idea on how to make one reinforcement learning policy to be more robust or more adaptive for multiple task objective? uh for example for traversing uh different terrains at different uh veloc velocity different payload or different task performance optimizations uh without stumbling mode collapse and yeah thank you >> so thank you >> it's very hard it's very hard if you want to so it's very good questions so if you want to do robust so if we randomize so all all such a terra you know load We can actually randomize in simulations, right? But the IO policy can be easily learned a very simple solutions such as just standing in place, standing still there is the most conservative strategies for the robot.
This is a very extreme examples but yeah actually robot may adopt such a strategies because you know it can as long it stays you know it have some rewards right but it's not definitely not ideals. So still uh at that time uh we do very hard work on reward tunings and episode designs. This is very this is actually the first things we need to do. But the second things is that in beyond just simply do the reward engineerings we want to highlight the use of the IO histories because this is the key to make the policy to adapt.
Okay, that means that with regardless of the change of the dynamics, the control performance should be consistent. This is the idea. This is the most ideal one, right? So the robust usually means that with the with all of these kind of a change of dynamics, the robot can just stay stable. Okay, the stable have lots of way of interpretations. Standing still is another way of stable, being stable, right? But it failed at checking a command. Okay, so we really want the policy to be adapt to the change of dynamics. So how to do it is by using the long IO histories. So we can give a robot with the informations of the the changing of the dynamics. So the terrain the change of the terrain actually implicitly encode in the external force.
Actually the people actually show that it's actually true. We have the learn dynamic the the latent space the learn the latent space actually have some dimension that highly correlated to the change of the external force the you know the impact the the ground reaction force. Yeah. So based on this long IO histories the and with the proper structures the robot can learn the can leverage this informations and with the consistent reward which is the a cost for example and then you know with a tracking tracking tracking reward. So now the robot know you know we need to track the given target. So this this is the way the robot can learn adaptive uh uh policy. Yeah, >> actually let me give you a little bit more context on this. So basically we are working on this like a perceptive you know locom motion stuff right so what he was talking about is mostly like uh when we after we encode the depths so we we see the mode collapsing problem.
So what's your what's your take on let's say because uh besides let's say for blind policy yeah actually things work quite well right with any like basically any any framework framework you find like blind policy always works fine but right now we are adding the modality of the deps and basically you know basically after we encode the deps and uh we we do have some issues with you know from sim to sim to sim to real >> so Yeah, I think avoiding this I think definitely adding proper information is very important. Yeah, if you have more informations the robot can learn from this information. No, perception is definitely one another source of the informations of the incoming things. So once you have such informations so the robot and then the what what is left is to train the robot to utilize this informations and how to train it becomes more like a reward engineerings. But without this information you can never learn such a you know even with the best rewinder you cannot learn enable robot to learn adaptive or like avoid mode classing things. Yeah but this is for you know adapting to the change of the you know whatever dynamics but for adapting to different skills. I think actually mentioned very um we had discussed this before. So probably I is not probably not the best solutions. We want to actually leverage intential learning to increase the capacity of the neuronet network and then utilize more and more data multiple direct data and then learn learn from it. Yeah. So probably can shift the paradigm to intention learning or even fetune the in policy with the policy is also possible.
Yeah.
Okay. That's good. So next question.
>> Hello professor.
>> Um today I have a question for you. Uh could you tell me if humano or robot can currently truly enter the real world with ability to adapt to a various environmental condition and uh what technologies would be used to achieve that from your perspective could Vietnam become the future city of robotic?
>> It's very interesting topic uh questions. Um to be honest uh I think in the future so although we showed that you know we can do zero shot uh to the hardware and you know for walking policy or even for jumping policies we show the adapt for the walking jumping and running process we show the sort of adaptivities to to the you know the change of the dynamics uh to the wear and tear hardware but I think the true you challenging is how to enable the robot to keep learning after deployment. So it's learning during the test time.
Okay. So once the robot deploy on the hardware, it actually generate more data, right? And the this data is very valuable. It's the most precious data because it's real world data. But actually we didn't we don't really utilize this these days, right? We deploy it and give a demo. That's it.
But all of the data are actually are wasted. But I think this most of this data the deployment data is the most valuable since so the question is how can the policy keep learning it after deployment using the real world data keep keep improving it. So this is another research line we're actually interested in that is the continue learning continue IO or we call it lifelong IO lifelong learning. So once it never ends at the at the you know uh the learning never ends keep using the generated data to deployment data to improve the policy. I think this is the key things to solve actually if you want to make the robot the humanoid robots more reliable and working like in the the domain but you know continu is very challenging still very challenge first is you know um you know the real world data is very limited we really need to use a simple efficient algorithms the second is that it may quickly forget the previous you know learned skills the third thing I think for is most critical for humanoid or for lack robots is that for the exploration natures of the reinforcement learning the randomized or you know the explored action may cause instability of the of the robot robot may just fall over because of the explorations. So we want to make the exploration to be more conservative and you know and respect you know the stability constraints. So this is three I think these three uh open questions still open questions that will be very interesting to explore in the future.
Okay.
What next?
Okay. Okay. Love.
Um thank you professor Lee for your presentation. Uh and thank you for uh professor for organizing the the seminar. Uh and um I have two question for you professorly. uh you know now nowaday we have Sonic framework >> and they say we have the very good uh motion tracker >> and you think what do you think about this framework and can we have the very good uh motion checker in the future where we can use it for uh old task with a high uh accuracy and uh yeah I mean the high uh accuracy and stable Yeah. Uh what you >> can I can I ask question >> another one? Yes please.
>> And uh yeah and another one for the domain randomization. Uh you say we can uh increase the d r d r d r d r d r d r d r d r d r d r dubess with the randomization but uh maybe when we increase the domain randomization the accuracy of the task going to be decreased. So can we how can we ch between them to balance between them?
Okay, these are two very interesting question.
>> Yeah, the first one the sonic uh sonic is very good baseline for whole body tracking general universal whole body tracking but the accuracy is still not that ideals. So one of the reason I think is because it cannot it is a robust policy it cannot actually adapt to the change of the dynamics on hardware. This is the actually go back to the previous questions. So can the policy to adapt to the you know the the change of dynamics right because you know it's still like it's if you learn from all of these different motions the robot may just learn average skills the skill that can just you know be resemble motion one resemble motion two motion three I can can really learn uh like average motions or or normal motions we we call it we cannot but you know can not that that's definitely cannot beat the skill specific policy.
Okay, this is the reason we actually work on the skill specific because it can actually make the this skill to be very powerful. Okay. So how to further improve the accuracy of the motion tracking using the re the recent you know the universal whole body motion controls controllers I think first is to again further in improve the adaptivities utilize the long IO histories while keep learning after deployment or finetune it I think is very very very very uh important.
for this line of research. And the second one is that instead of learn uh a universal controllers, we can actually just follow the our previous line learn skills specific policy and only and learn the transition skill among these skill specific policies. So in this way you can actually make the use of the best you know part of the IO right you can actually the for the specific skills the IO can learn very very good job and uh you know and then after you have a new skills you just transit to another skills so I think this could be also possible so for example let's just using human as example like for human athlete whatever we are doing we are doing wrong if we are doing running we don't really actually need to consider that we the skills we use for jumping right and we just need to focus on running and after this running we want to use jumping we pick up our jumping skills our jumping brain I think this probably could also be another solutions to learn how to transit from different skills yeah I think this could be uh interesting so this the first question the second is that how to you know for sure if you further increase the dimensionalizations is still the same. The you know robot can learn a very average uh performance.
You can just stay uh learn a very conservative behaviors. You can just you know walking standing in place for example. Right? If when you command a robot to jump forward and laterally uh you you command robot to jump forward backwards laterally and the average of the command is zero. So the robot can just standing in place or jump in place. It can still have sort of rewards, right? So [clears throat] in our case we solve this by two method.
One is using you know uh actually three.
One is you definitely leveraging the IO histories long histories. So the robot have the information of the change of the dynamics. So when the dimension when the dynamics parameters are randomized the this sequence of IO pair will implicitly encode the change of the dynamics parameters. This is the first thing and very important thing. The second is the use of the task randomizations because we don't really we cannot infinitely increase the range of the dynamic randomizations the the more we increase the the more likely the robot will adopt a very conservative behaviors right but we but we can further increase the robustness by randomizing task commanding the robot to jump to different locations. So by learning this no robot can still learn like recover behaviors. So in the jumping paper we actually show a very cool demo when the robot jump in place and we pull the robot at this apex of the jump pull it back and the robot can learn to recover can actually quickly recover to this backwards pulling force because it learns a backwards jump. So the robot actually quickly adopt a backwards walking gate in order to be response to in respond to the backwards pulling force. Same as the running the ping laterally right. So this is the you know uh so tation can further can expand the robustness in the you know in another directions. Okay. So another this is the things we actually discuss a lot in the paper. If you're interested uh definitely check it out. I think the third things is the you know the reward tuning for sure. Uh reward is very important but at some point you also really need a episode a very good episode design. So if the robot deviate from the command you don't really want to keep keep let the episode you know to continue you really want to end episode early. So the episode design actually complementing with each other. So at some point you can tell episode design is like it's another way of reward tuning. So this is also very important and we also uh discuss this in the paper as well. Yeah. So I think this is the three three solutions.
>> Um thank you professor. Uh sorry can I have a last question?
>> Yeah yeah go ahead. Uh >> yeah because uh now we use a lot of um I mean uh the motion tracker we use only motion >> uh to train the the humanoid >> and uh uh can we have another data set or information to training like contact or something else because uh I think the the motion is not enough for I mean for for everything right now.
>> Yeah. So what do you think about can we have another information for training maybe in the future near future?
>> Yeah, I think it's a very very good questions. I probably want to explore this direction with the collaborations with votion on our our projects. So I think the another idea good way is to definitely what you mentioned is the contact plan. So if you know if we can plan the contact plan and then b condition on the contact plan we generate motion that suit the the contact design contact plan I think this will be a better way to do so because you know for all the human motion data you know we can it's still a limited data set right you know it's it's finite set uh but you know we can use algorithms to generate more and more data uh more and more motions and how to generate more motions. I think condition on the contact definitely is one of the directions because you know we need to interact with the environment. We need to open the door. We need to you know lift the sofa and interact with other agents. The contact point is the most important. We don't really care you know how we reach the contact point, right?
We we we hold we hold this bottle. We don't really care this way or that way, right? We don't really care about the joint motions, right? So I think this could be very interesting to explore. I think want to we find is the invariance of the motions for the you know for different tasks. I think contact point is definitely one of the one of the solutions.
>> So >> so we should uh we should uh collect a new data set or can we have another way to take the contact from the the video or images? Yeah, we can either collect more data but you know it's very hard to collect contact information when we human doing tally operation because now we really need the contact sensors the four sensors which which is either very expensive or not reliable. Another way is that but but you know right now there's a uh improvement of the tactile sensors which I think would be also interesting could be useful. Another way is to you know using the human third person view videos right if you see the human how human interv contact point uh this could be also interesting the third one I think could be still leverage trajectoations which could be you know quite uh need some efforts but I think it's doable you know if you remember if you recall at the past decades there are lots of very fancy work using opa trajectoization to generate different kind of motions various kind of motions in simulations but offline but we don't really need to you know uh worry about that now because we can use this the optimized trajectories offline as a reference motion and learn using I plus to learn from it so I think this will be also exciting way to get more motions thank you.
>> No, thank you uh Dr. Lee. Um uh maybe we don't have time more for the seminar but uh maybe I received around 300 question for you from the registration link but I think we don't have enough time to answer all >> but one of question uh not related to technical things but I think I should share to you is >> um currently we have a lots of the student watching uh you your talk today and uh >> they come from the computer science >> from meatronic >> from the uh electric and engineering.
But uh what is the best measure to focus on if they want to become a professional in the if they want to success in the humanoid robotic or if they even become to the net raising star like you. So can you recommend which major is the best for them?
>> Well, it's a very nice qu it's a very interesting questions. Yeah. Uh you know robotics is very you know I have uh so many things in robotics. You have hardwares, you have you know controls, we have electron electrical engineerings you know control is actually interception between mechanical engineering and electrical engineering and even computer science and you know beyond this we have you know algorithms you know planning algorithms optimization algorithms some even from mathematics you know impro uh uh operation research you know IUR they also do lots of optimizations and now reinfor learning learning in more many from computer science domain you know AI you know reinforcement learning teaching so I don't really have uh you know a recommendation for the best majors you want to um that recommend um anything I think related to the robotics could be a very interesting major mechanical engineering electrical engineering computer science they all you know closely related to robots but from different aspects so for if you do mechanicals mechanical engineerings which you actually will know more on the hardware size, how the actuator works, how you know the control works, how the dynamics works, right? Electrical engineering is more like the you know middle level size you know all same you know have to controls and how you know how the motor driver works how the you know all the you know low intermediate uh control algorithm works. Computer science is more like high level how to learn from visions uh from you know uh languages and to become uh more like learning oriented AI oriented algorithms but all of these actually come from actually bring different backgrounds and different focus to you. So the most I you know ideal is that you starting from one aspects you know from CS from C from uh MEI and then learn all of this uh during your with your with your with times goes on from undergrad to master to PhD. So this will be uh and after this after this training so you may know have a better or big pictures of the robotics and then you know you know which which one is are really excited about and which one is real real challenges. So for example uh nowadays you know AI become very powerful you know LM the coding so coding can do coding very well. So now as the SD you know become quite you know uh you know could be could be easily replaced you know people say could be easily by AIS but you know AI does not understand physics yet don't know the order the math is going you know hardware is how the hardware is working so now the mechan engineering knowledge become more and more important in this field because you really need to go go down to the earth on you know fixing a robot do the actuators do the controls, right? But you know AI is also very powerful because you know we can actually enable the robot have a sort of reasonings and the brains how to how to perceive the environment how to reason you know uh uh for long horizon tasks just like humans.
So I know so yeah so my suggestion is that pick the one uh major the as aspects of the robot that you are really interested about but not limited to it to further learn all of this uh with the time scores on >> okay yeah I think that's one not the question for me but I hope I have a chance to somehow contribute to this question right I I think in in my opinion uh it's very hard to have the best answer for anything right but I think in my opinion is maybe true for many different cases is that if you want to be a rising star in any field in the future try to pick some problem that is very challenging in general right so I think in in the time that I and Chunju start our field there's very few people working on humanoid and electricity because we like difficult problem Right?
And now many people want to work on humanoid maybe they think that it's feasible. Right? But even if you want to work on humanoid robot robotics in general, try to pick something challenging enough for you to to be able to tackle it in long term. I think that's maybe benefit for you in the future, right? I don't think that people will care too much about you know major, right? Mechanical engineering, computer science, electrical engineering. Most sub will have like joy appointment in all three different departments but I think in in general let's I think it's depend on the problem that you will be tackling right and and maybe just a joke here if you want to be a rising star don't be human [laughter] okay okay thank you guys I think this is were a very very good time and then thank you so much and um Jungu hope to have you to to have a chance to welcome you in in our Vin Motion office in Vietnam at some very soon. Okay, >> that would be great. Yeah, looking forward to it.
>> Okay, and looking forward for our collaboration as well.
>> Yes.
>> Okay, thank you. Thank you all. Bye.
Bye.
Bye. Bye. See you next time.
Related Videos
Beyond Robotics | European Rover Challenge 2026
beyondrobotics
189 views•2026-06-01
Beatbot Sora70: JetPulse Technology and AI obstacle avoidance and navigation!
DroidModderX
26K views•2026-06-02
Tesla FSD 14.3.3 Hits Phoenix Streets - FIRST LOOK
anthonystesla
114 views•2026-05-29
Elon Musk Just Revealed Fremont Line for Optimus Gen 3 Mass Production
TheAINexusOfficial
180 views•2026-05-30
人機一体「零式人機 ver.2」 子ども企画【おもしろ発見!モビリティー】 #乗り物 #automobile #robot #shorts
KyodoNews
1K views•2026-05-28
China’s New Luna AI Robot Looks Shockingly Human...
NextGenHumanoids
850 views•2026-05-28
Reachy Mini: the $300 open source robot you can actually hack — Andres Marafioti, Hugging Face
aiDotEngineer
662 views•2026-05-29
柔軟指×AI画像処理食品の仕分け作業システム!#柔軟指 #ロボット #自動化 #製造業をもっと盛り上げたい
KiQ_Robotics_Corp.
113 views•2026-05-28











