Install our extension to search inside any video instantly.

VINMOTION INTELLIGENCE PIONEERS SEMINAR #2 | Synthetic Data for Robot Learning - Dr. Fan Shi
Added: 2026-05-26

158 views141:26:13VinMotionOfficialOriginal Release: 2026-05-25

Dr. Fan Shi masterfully demonstrates how GPU-accelerated simulation is evolving from a mere testing tool into the primary engine for robotic intelligence. This talk provides a rigorous technical roadmap for bridging the gap between synthetic training and real-world physical reliability.

[00:00:06]Okay.

[00:00:08]Enjoy.

[00:00:09]Hello everybody.

[00:00:11]Hi Professor Quan.

[00:00:12]Nice to see you.

[00:00:14]Thank you for your time.

[00:00:16]Okay. Thank you.

[00:00:18]Um I think we come on time now. So maybe um we can get started. Is everything okay there, Jin?

[00:00:27]Yeah, everything's okay now.

[00:00:29]Okay. Maybe we can can get started.

[00:00:31]Okay?

[00:00:33]Good. Fan, thank you again. Thank you so much for your time. So um maybe just before we um start um presentation, I just want to go over very briefly the format that we hope to have. All right, so I think um for this seminar, I want to basically have have it a bit more interactive, right? So in the first part, I may want to jump in ask you a couple questions during the first part of the seminar. But in the first part, I hope to keep it, you know, within maybe 40 45 minutes max uh in hope that, you know, at the end our team can have a chance to ask you a lot of questions.

[00:01:12]So I think uh our team are very excited to um to ask you a lot of questions, so I hope to reserve at least, you know, a very significant part of the end for the team to ask you questions. Okay.

[00:01:24]Hopefully uh that's okay for you. Sure.

[00:01:27]Yeah.

[00:01:27]>> Okay. Okay. Thank you. So let me uh go ahead and have a quick introduction now for you. So uh again, thank you all for for joining us today. It's my pleasure for me to introduce to you Professor Fan Chi uh to our seminar. Professor Chi is an assistant professor at NUS uh where he's hold the NUS uh presidential young professorship. Um And before joining NUS, he got his PhD from University of Tokyo. He's one of the most famous lab in humanoid robotics.

[00:02:01]And I think it's very historic lab that if you you all know it. Then he moved to ETH AI Center as a postdoc.

[00:02:09]Also working with a very famous professor.

[00:02:12]Professor Stelian Coros and Professor Marco Hutter. And as we all know, we are very well known for their work in the field of legged robots.

[00:02:22]And up to now, Professor Fan Shi and his lab have been working on different projects to ensure intelligent and safe robotics and AI system.

[00:02:32]And today is our pleasure to have him to join our seminar and we have a chance to hear from him about synthetic data for robot learning.

[00:02:41]And which is to me one of the most important topics in robotics nowadays.

[00:02:47]Right? So thank you again Fan and thank you for joining us. Please take it away and we're looking forward to hear more from you about this exciting topic.

[00:02:56]Okay, thank you.

[00:02:59]Thank you very much Professor Quan for very very nice introduction. It's my really my great honor and pleasure to be here and also a lot of your work really inspire me during my PhD and postdoc.

[00:03:10]And also I want to really thank Rachel and the Dr. Chen for organizing this seminar and also for sure for all the promotion colleagues who joined this seminar during Saturday.

[00:03:21]Okay, so as Professor Quan said, we are very happy to be more interactive. You can even feel free to [snorts] cut in if you have some questions. I mean, we we can be quite interactive.

[00:03:32]So today I want to share some of our our recent progress on the synthetic data for robot learning. Especially I'm here happy to discuss what, you know, people can already do, what people still have the challenge, difficulties and our I want to share some of our thoughts and the practice towards what's So, for today I will mainly share about the three different topics and actually deeply coupled together. So, first is about the dexterity, especially for manipulations. So, what's if we have some very advanced synthetic data engine, what can we benefit from it? For example, now we can even train an end-to-end policy for for for cloth deformable object manipulation from a simulation data.

[00:04:24]A second, I know we Motion AI focused a lot on locomotion humanoid and so on. And I want to show is like some some some other part about the reliable deployment because I believe it's also something we Motion really care about. So, for this part I want to introduce how we can leverage simulation to enhance the safety and especially validate weakness and to improve the AI algorithm, improve the learning-based controller, and also improve our hardware.

[00:04:56]And last part I want to share some our recent work on how to leverage, for example, the very efficient simulation in order for the co-design and also for the policy optimization.

[00:05:08]Okay.

[00:05:09]So, yeah, let's start. So, so usually I want to say is like for people here we we all care about the robot learning, right? So, robot learning actually I feel is very similar to human learning and the motion learning problem. We always say call you are what you read for people.

[00:05:27]And like for neural network, you always say quality in, quality out. And another way is like garbage in, garbage out. So, all is actually trying to say for robot learning, similar problem is like we really need a high-quality data.

[00:05:42]But high-quality data itself is not that enough. We want a high-quality data, but also we wanted data to be as cheap as scalable as possible. Because being cheap is the only way to make the data being scalable.

[00:05:57]So, for us I think we really envy the people who do self-driving because looks like they have a lot of free right data, right? But because of similar embodiment between the human driver and the robot taxi.

[00:06:12]But meanwhile, for remote motion and also for us in in US, we care about more general-purpose robot.

[00:06:19]So, unfortunately, the population for this type of robot is still very small.

[00:06:26]And based on some IFRs, some calculation, they have 4 million robot in the world. But actually, if we talk about human not if we're talking about, you know, the dexterous hand, I believe it may be hundreds and less than 1,000.

[00:06:42]And meanwhile, this task actually is much more difficult compared to the self-driving.

[00:06:48]And there are also no universal robot.

[00:06:51]Just to say it's like it's become more and more challenging for the data problem for this general-purpose robot.

[00:07:00]So, recently maybe we saw some big company maybe in in US, in China, they have maybe more than hundreds of people to do the data collections.

[00:07:10]And for me, I think it's a very smart good strategies. But the question is like the data is too expensive and it cannot be scalable.

[00:07:20]And meanwhile, if we know more about how they do it, you know, they will have different company to support them, which means it's very decentralized way. And then, you know, it's it's very hard to make sure to, you know, collect lot of enough failure recovery data.

[00:07:39]So, just in summary, we feel like the quality okay, but just the data is too expensive to be scalable, especially to cover all the failure cases and make the controller not so robust.

[00:07:53]But meanwhile, another story happened in in robotics. It's like especially from 2021, 2022, people in ETH and NVIDIA they work together to build this very nice Isaac Gym and later Isaac Lab. I think until today many people around the world like this like even in my lab there are still some people are using this simulation. So, just for people in case your background is not in locomotion, I can briefly introduce to explain what happening here is like it's like with a single normal GPU runtime is 3090, you can paralyze more than 4,000 robot to you know, get a lot of high value data. Meanwhile, this data is very cheap and scalable.

[00:08:39]So, what what I benefit is like we can see from the drawing scene to the leg locomotion, even to some dexterous hand policy. And then later on even for today, you see the Unity is still we use a lot for this simulation for this reinforcement learning based in this simulation to generate a lot of very fantastic behavior.

[00:09:04]And for my story, actually I start to work with Marco Hutter from 2020 where I exchanged in Marco's lab at that time.

[00:09:13]So, to be super honest, on 2020 people do did not believe reinforcement learning for most people. They did not believe it. The The reason is like, you know, because you already is very hard to benchmark the the different controller, right? So, the So, so when I was there, it's my first time to decide to you know, to convert from MPC to to reinforcement learning.

[00:09:38]And just after 1 or 2 weeks, I I already have some very nice progress. So, for this task, for people who are not familiar with manipulation, this is a very challenging task. The reason is like you cannot decide the contact in advance.

[00:09:54]You don't know how to schedule a contact.

[00:09:58]So, if MPC we need we call some contact implicit method, which I mean in 2020 is very big very computational hungry because you need to optimize a contact and meanwhile optimize the the the kind of joints commands together.

[00:10:15]So, later on we we we we also transfer to the real humanoid robot. So, this is because my PhD done during COVID, so we we we cannot go to the lab, so we we have a small humanoid robot. We can see we can I mean it's also some of the early work we we show we can transfer to the humanoid at that time.

[00:10:37]Nice thanks. Can I have a quick question?

[00:10:38]>> And during my post doc we further push the skill to, you know, to to to be more kind of functional in world. For example, this robot is like we can walk and during working we can, you know, to do some manipulation.

[00:10:52]And this one is like we even that time there's a engineer for Sony he visited us, we collaborated, and we can deploy reinforcement learning controller in Sony's product as well.

[00:11:04]And then for our lab we are continuing for this tasks.

[00:11:09]But meanwhile for locomotion we do see great success. But for manipulation somehow the the the story like still people struggles. So, we we do see some very nice example in the simulation.

[00:11:23]But actually for real hardware we feel like still, you know, we we don't see this kind of very nice behavior in the real hardware. So, what's the reason?

[00:11:33]Then it's going to come to my main research topic today. So, the first point is like Wait. Wait. Wait. If we investigate very deeply for this problem, for leg robot for locomotion, you really contact relatively simple.

[00:11:48]For example, for like this animal case, it's it's very simple, very small problem for contacts. And meanwhile, for locomotion, you know, we can tolerate some physics, you know, inaccuracy. So, we have much more uh space to tolerate.

[00:12:05]But for the manipulation, we are very sensitive to the small interactions, small contacts. I just have much higher requirement.

[00:12:14]And meanwhile, if we want to simulate this scenario, it's very very difficult because it usually has more than 10,000 of these kind of constraints when you try to solve together.

[00:12:25]So, I would say until today, the contact rich tasks are far from being solved.

[00:12:31]So, therefore, in our lab, we we we have very strong motivation to if we can solve this data problem, um if we can move one step forward for this data problem, we can bring lot of new possibilities for manipulation, right?

[00:12:48]So, here we call it a fast but accurate physics simulation. So, usually the fast and accurate are controversial because accurate means you need more computation time to converge, right? Then it's very slow. It's not fast.

[00:13:04]The fast means we only give you very limited interaction steps, and then you really maybe it's not converge very well, so it's not accurate. So, we did lot of student did lot of low-level in mathematics breakthrough and meanwhile, the GPU engineering to make sure the whole method is fast but accurate.

[00:13:27]So, to be more specific was a challenge for this type of simulation. It's like first for the formal system, it's it's very I mean it's always to be very large scale. Such as such a close an example is like it's very related to more than kind of 10,000 edges and so on inside.

[00:13:49]A second for the deformable objects usually have the non-linear natures for the hyperelasticity.

[00:13:59]And last but not least is for contact, you know, it's always kind of non-smooth. Always there's a step function for contact thread. This means for optimization it will be will be a non-linear problem bring more troubles.

[00:14:14]So, but for we we need to solve a very large scale problem like this, but meanwhile we have very very limited real-time budget. For example, if we want the simulation to run 60 FPS, that means like you we need to solve all of this computation very nicely within 16 milliseconds. So, all of this just bring more difficulties for the simulation development. [snorts] So, in our method in short is like we build we will leverage a local global with a non-smooth Newton method together.

[00:14:51]And the local global will handle the deformation part and non-smooth Newton will help to make sure the contacts to be to keep its accuracy.

[00:15:01]And then our features is like we we based on some mathematics improvement, we can convert this problem to be fully GPU parallelizable.

[00:15:13]And meanwhile the this local global method can make us to have some nice convergencies.

[00:15:19]And meanwhile the non-smooth Newton because it try to keep the non-linear feature of the contact problem. It can be very accurate contact response.

[00:15:31]And we did a lot of efforts in the GPU pipeline in order to leverage the CUDA solvers to further increase enhance the speed for the whole computation.

[00:15:45]So, then how's the performance?

[00:15:47]So, compared to some state-of-the-art result and our our error can be very very small compared to the ground truth. Okay. So, state-of-the-art solver usually have more than 1% relative error to the ground truth. We can reduce to almost 0.1 giving the very limited time budget.

[00:16:11]And here, by the way, ground truth means like for example, if we run the basic Newton method for very very long time until it converge, it's a usually we regard it as a ground truth.

[00:16:23]Also, that is, you know, just some computer graphics result. But how how do we leverage to the robots, right? So, here this is example like because we are simulation itself is very GPU native, we can parallelize like more than 300 environments in one single 1590 GPU.

[00:16:45]It's just like another asset we can collect lot of data very efficiently.

[00:16:50]But still it's simulation, right? How about the real performance?

[00:16:54]So, this is a very important moment for us. It's like in around the February this year, Trimble AI happened to be in Singapore.

[00:17:03]And we are we are very honored to be invited to show a live demo in a NUS booth.

[00:17:11]And the idea is like we want to show the learned master can be very robust. We continue to deploy this test for more than 1 hours. By the way, this task is to fold the tower from one corner to another corner.

[00:17:24]Okay, so this is very simple for human for sure, but still for robot is very difficult. The reason is like imagine for this tower just small wrinkle is actually a new state for the robot. So, for deformable objects it's just naturally have almost infinite states.

[00:17:41]So, that's why it's very hard.

[00:17:44]So, this is how does how it looks like in a simulation. But by the way, here we use a depth image. There is a depth looking down RGBD camera here to to as a percep- perception input.

[00:17:58]So, the whole training time we are taking is around the 1 hour. So, so just use 5090 GPU 1 hour you can train a very nice policy. No kind of human collaboration.

[00:18:10]>> [snorts] >> And also we can, you know, for for the t-shirts and also for the pants as well we can also leverage a similar method.

[00:18:20]Okay. Uh sorry to interrupt. Here I want to just skip to show you some some other result I think maybe you care more about which is like recovery case.

[00:18:30]So, the learning simulation as in one very nice feature in simulation is like we can try to cover as much as failure case as possible. And then to, you know, to make sure robot can learn the recovery by itself just like you did in locomotion.

[00:18:47]So, it's a beautiful using simulation data because for real world data collection is very decentralized where you can very hard to, you know, to to make sure all the failure case are properly recorded.

[00:19:00]So, here to show some cases where we have perturbation and so on robot can very continue to finish its task very successfully.

[00:19:11]So, here we also compare with other simulations like the Genesis, like Isaac Sim and Newton. And here we try our best to tune the parameters for each simulator to make sure some fair comparison. And here here we can see many existing controllers still somehow struggled in some some even for this is a maybe very basic manipulation cases.

[00:19:42]For example, here we the friction have some problem for Genesis and also here it's like a multi close they have overlapping cases very hard for some existing simulation. And Newton is one of the best but still we can see have some space to improve.

[00:20:00]And here some additional results just to show the simulation can you know the accuracies and so on.

[00:20:10]So, here it's just another comparison between the different simulation and the we can see still we for for the existing simulation for different method we still have some space to improve in order for the very successful simulator.

[00:20:26]The reason is like if we use some simulation it's not realistic to a real world then the data will be not so very likely to be not so useful for you because it's too the mismatch between similar and real is too huge.

[00:20:42]So, this is one way kind of collectively to to compare to benchmark simulation to understand why we need to care about simulation why we need to improve the simulation.

[00:20:53]Another way to benchmark simulation is like for example for for to to build some very fair benchmark tasks.

[00:21:01]Like for example, this task is like we we can put a rigid cube on the slope which is 10 degree.

[00:21:09]And then we can tune the the friction coefficient to figure out, you know, which case the cube will start to slip and which case the cube will get stuck. So, based on our, you know, physics common sense, we know the threshold is tangent 10 degree here, right? If the friction coefficient is larger than tangent 10 degree, then the cube will just get stuck because the friction is too much.

[00:21:39]And if the friction coefficient is smaller than tangent 10 degree, then it will start to slip.

[00:21:46]So, then the actually the control threshold is around this value.

[00:21:51]So, if we compare the different method, we can see our method can be as sensitive as the slip or no slip.

[00:22:02]So, it's very high resolution, very very very accurate.

[00:22:07]But for some other very common method, some of them may need to change larger than 0.1 to, you know, to to to distinguish between slip or no slip.

[00:22:18]Okay?

[00:22:19]So, with this value, we try to build a fair comparison, we can see it's still for many existing simulation have too many approximation to sacrifice a lot of accuracies.

[00:22:32]So, in Singapore, another application is like healthcare because in Singapore we care about healthcare a lot. So, the human body is a very nice combination of deformable and the rigid. Our skin is deformable and meanwhile our bones is a rigid.

[00:22:51]So, here we show is like we can very nicely to simulate to build a hepatic frontal for for for for the deformable objects.

[00:23:01]And this can be used for doctor training for the doctor do not need to use some real animal or even human tissue to do some experiment.

[00:23:12]And meanwhile to do some training and meanwhile for the surgical robot we can also leverage simulation to you know to to to train the robot policy.

[00:23:21]And this work is like we build based on the sofa. Sofa is a very famous physical engine by some first team from CNRS. We replace their engine with our own physical engine. We can improve the accuracy for more than 100 times and speed up last year 30 times this year is also more than 100 times to make sure make sure you know to especially for haptic we can make it very efficient for data collection.

[00:23:52]And another case that is like here is like so called Newton method you know we regard it as a ground truth because we don't limit their computation time. We can assume we have infinite computational time and see here we can see our method can be very close to the ground truth data.

[00:24:09]And meanwhile we only give a very limited iterations. We give a very short time time budget.

[00:24:17]Okay.

[00:24:18]So this is our first part I want to share is like for for for since the data I think we I think the major bottleneck first is like still seem to have a mismatch especially for highly deformable objects or we call contact rich scenarios. The sim-to-real mismatch really bothers us. Really prevent us from training you know some policy for for for deformable manipulation task.

[00:24:45]And second later soon I want to share is like for robot learning especially for reinforcement learning another bottleneck is like still we need you know tons of sample you know we see millions or even more than millions given, you know, current frameworks just become more and more complex.

[00:25:04]So, especially for perceptive end-to-end policy, currently, if we want to directly learn end-to-end, it's still a very big challenge. So, here I want to share how to leverage a different boss simulation for two for this end-to-end learning problem.

[00:25:19]And last I want to share is like, I mean, currently, robot learning is still a black box system. There's no guarantee in safety. How to improve the safety for the robot learning side and also for the hardware side.

[00:25:34]Okay. So, just after the simulation part, I'm going to share some our recent thoughts on how to leverage a first-order gradient of different boss simulations specifically.

[00:25:46]So, just now we showed you know, the drone racing task. Looks like maybe for people not familiar with it, you will feel, "Oh, it looked like pretty good, right? Already solved our problem."

[00:25:58]So, do we still need to improve? Well, do do we still need improve it? What's our kind of bottleneck? So, if we look very closely to this problem, actually, they do a lot of simplification. For example, well, their paper is like they based on reinforcement learning, but meanwhile, for their observation, they only have 31 dimensions as input and four dimensions as output. So, very small problem. The reason is like they will assume like the gate and so on, you already know it.

[00:26:29]But, in real world, actually, when you know, many of the tasks actually require perceptive end-to-end learning. Then, it's very little more than thousands or hundreds of dimension because the pixel is very high dimension.

[00:26:44]And then, why the previous reinforcement learning controller will have some problem. The previous reinforcement controller is like you really to be using a normal forward simulator.

[00:26:54]And for reinforcement learning process it's like we do a lot of rollouts, right? In order to estimate your gradient. To leverage a gradient to to to update your maybe your value function and the policy network. That's how PPO usually do. We do a lot of rollouts.

[00:27:10]That's why we keep your parallel boss simulator.

[00:27:14]But for different boss simulation is another story. It's It's actually some very nice features like it do not need to sample too much. But for each sample you can directly get your gradient from the simulation. Okay.

[00:27:30]So to be more specific is like for example your simulation only from current state operation we leverage a control policy pi to generate the you know action you know the state and action together to generate next state, right? That's the usual process.

[00:27:47]So different boss simulation is like okay based on the future state we can directly get your gradient to go back to to improve or to update your policy pi.

[00:27:57]Okay. So this just provide the new possibility to efficiently leverage a simulation data.

[00:28:05]Well for our first experiment is like based on this uh leg locomotion case and the input is the depth image. It's kind of we we already done sample it but still it's more than 200 dimensions.

[00:28:21]Kind of. So we leverage this one as an input.

[00:28:25]And what we want to learn is like we want to learn a locomotion policy. We want to make a short make a robot to work and meanwhile to learn to avoid obstacles. Okay, [snorts] like here.

[00:28:36]So it's a still very challenging problem if we want to directly throw all this to to PPO to normal to vanilla reinforcement learning problem.

[00:28:48]So, instead what we did is like we train different ball simulation. And here different ball simulation also have one challenge is like if we we need to somehow to you know the ones to provide some gradient for for this task. It's like otherwise the robot will hit it and then the gradient is too late.

[00:29:06]So, we use some trick called Pinocchio nose. It's like before collision it's already have some nice gradient to to guide it to avoid these obstacles.

[00:29:17]And here we can have some nice result for the 2D navigation.

[00:29:22]And recently our student also push forward move push forward for the for methodology we can also you know push to the 3D case to to make robot to navigate based on this end-to-end sorry based on this end-to-end policy.

[00:29:43]So, for example efficiency we can improve three to five times compared to the picture when you learn method.

[00:29:53]And some another direction is like for previously is a rigid body's contact we can make it differentiable. So, in our research lab for the just similar to previous deformable objects we can also make a deformable object to be differentiable. So, this is a very new work we just recently submitted actually last week.

[00:30:18]And here is like we make our previous fast but accurate deformable simulation to be differentiable and leverage different ball simulation we can we can trigger control policy for robot to control this very deformable highly deformable objects.

[00:30:35]And another case is like for example for the elephant's nose which is you know again very high dimensional soft bodies.

[00:30:45]We can leverage gradient from simulation to to update your policy to make it to, you know, to to do some nice drawing to the kind of to control as a manipulation task.

[00:31:00]And another application is like we can also very efficiently to do real to sim.

[00:31:06]It's like we can you know, to collect your data from the real world to get a point cloud and then leverage a video as input to to to and also leverage a gradient from the simulation in order to to you know, to to calibrate your physical parameters and then use our very nice simulation to do the forward simulating to show the sim to real mismatch is very small.

[00:31:36]Okay.

[00:31:38]So, just now we introduce is more like analytic differentiable simulation.

[00:31:43]There is another type of differentiable simulation we call the neural network-based differentiable simulation.

[00:31:49]And for people maybe some people feel like, "Oh, analytic differentiable simulation is nice, but maybe too much infra efforts, right?" So, another way if you is like what are your advice?

[00:31:59]Like maybe we can use a neural network as a simulator. The neural network, you know, is naturally to be differentiable.

[00:32:07]And we can leverage feature as differentiable simulation.

[00:32:11]So, here are some previous work, but what are some what's their limitation?

[00:32:15]Why we, you know, why we need to improve it? The reason is like the most of this uh neural simulation end to end is like they are very data hungry. They try to give a lot of data in order to fit uh neural network to predict your next state.

[00:32:33]So, it's a very data hungry for especially for real world data collection.

[00:32:39]So, our question is like could we build a neural differentiable simulator but only use very few shot real-world data?

[00:32:46]Okay, use very small number of real-world data but then we can get into, you know, to calibrate simulation.

[00:32:54]So, what's the challenge here is like what's our goal is like we want to learn from data but with very few shot we want to make end-to-end differentiable.

[00:33:04]So, our contribution it will appear next month in a equal is like we train a graph neural network based differentiable simulator to leverage some geometry features of the objects.

[00:33:18]And we also use few shot real-world to sim data scaling. It's like we we we we try to calibrate the contact parameters and then use simulation to scaling.

[00:33:31]And we show show some very nice result.

[00:33:34]To be more specific, the method is like this. So, first we still need some real-world trajectory. Here we put QR codes in order to, you know, properly record the trajectory of the objects.

[00:33:46]And then we kind of leverage a simulation to cali- calibrate the physical parameters like the stiffness, like the friction coefficient. We leverage a CMA-ES to properly calibrate the sim- the simulation's parameters.

[00:34:05]And then we can generate more interaction in the simulators.

[00:34:09]And then use the data from the simulator to train the GNN network.

[00:34:15]Okay.

[00:34:16]So, here is more details is like we take an input of the three only three real-world trajectories.

[00:34:24]And then we we use CMA-ES to calibrate the friction coefficient and just stiffness, damping and we want to minimize their loss based on the operation and angular trajectory.

[00:34:41]And leverage of calibrated simulations data, we can we can scale into 3,000 contact rich trajectory in our simulations.

[00:34:54]And what this is like we can see here some results.

[00:35:01]It's it's like compared to some other different bar simulation, we can achieve relatively very low error bar compared to you know, our module which is well of our ground truth data.

[00:35:17]And here the video also to show from the real world and the simulation neural simulator we can see it's very close.

[00:35:26]But that's not the most exciting part here. So this is where something really our motivation is for neural simulator.

[00:35:34]For very contact rich task, you know, for analytic simulator is like if we have more objects in your contacts, the computation will be even intractable because your computation will raise when they have more contacts in your loop. But for neural neural simulator have one nice feature is like okay, we can paralyze more GPUs as we need. So that's why people have strong motivation for neural simulator.

[00:36:02]And here is like we show the learned GN simulator can very generalize to some novel scenarios. So during training we only have two cube case. We never have this kind of multiple cubes case.

[00:36:15]But then our learned GN master can be can can the result can be very close to you see to to the ground truth data. So it's not just to show the power of the neural simulator can generate quite well against some you know very complex case.

[00:36:34]And then another way we can leverage the neural simulator is like spill for the gradient we can leverage for the learning task.

[00:36:42]So in this task the goal is to crush use a blue cube to crush a green cube to reach to this red dot.

[00:36:50]So we can leverage the gradient of the neural network to to do the optimization and just after 10 iterations we can already solve this task quite well. And then you can see very clearly the loss will decrease during the trip during the learning process.

[00:37:11]Okay, so this is the second part. I want to show what I share is like for more complex case it's very has very high potential is like we can leverage a differentiable simulator and differentiable simulator can be maybe from the analytic differentiable simulation or even from neural differentiable simulation.

[00:37:31]And this simulation can help us especially for some high dimensional task. This can be very useful for policy learning, for system identification and so on.

[00:37:42]Okay.

[00:37:43]And the last part I want to share is for the safety of robot box neural network controller.

[00:37:50]So the motivation is like you know for me I started to learn this robot learning especially reinforcement learning from 2020. I really see the reinforcement learning controller is quite robust against you know a lot of scenario.

[00:38:05]But here comes actually one problem is like every time when there are some industrial visitor when it came to our lab you know I say okay the you know your neural network controller is great but I need some guarantee. Can you give me some guarantee? Otherwise maybe I'm very have a strong consent to deploy to real real products.

[00:38:29]So, in RSS 2024, we try to answer this problem. I think it's one of the very early work to like robot we can do this kind of analysis.

[00:38:40]So, the motivation is like for reinforcement learning in like locomotion from 2019 to to from now from from runtime is like we have a lot of improvement from MLPs and more controller to memory enhanced even perception enhanced, right?

[00:38:59]But meanwhile here, we found something very intriguing. It's like even for people who develop controller, they're not 100% sure about what's their failure case, what's the worst case.

[00:39:13]And second, you know, for training neural network, you know, if we change a little bit for the loss function or reward function, if we change a little bit on the architecture, right? We can get multiple different policies.

[00:39:27]But which is more robust against the worst case?

[00:39:31]Actually, if you ask yourself, we we also not 100% sure, right? We we get some policy looks all good, but what's their worst case? We we don't know.

[00:39:41]And meanwhile, when the controller become better and better, actually finding weakness is also become more and more challenging.

[00:39:50]The reason is like first, the failure become very long tail case if you have very good controller. And second, more important is like, you know, it's still a control problem. So, the failure is also kind of time sequential. It's not a single frame you feel fall down your robot. You need some very small sequential signal to fall down robot.

[00:40:10]Then just means the dimension will be very high because time T is inside.

[00:40:15]So, just to show how challenging problem is. So, first we start from a very simple open source like gym controller, a simple MLP controller.

[00:40:26]So, here we show the standard testing is like some random perturbation, constant perturbation, and so on. It's it's very hard to fold down the very open source controller, very surprisingly.

[00:40:39]The second most surprising to us is like we invite 100 experts in ETH to try to to say who can feel this robot.

[00:40:47]And only three people can feel this very simple MLP controller. And by the way, here is a computational our computational method. So, for computational method compared to the human winner, computational method only take 1.4 seconds to fold down robot. But meanwhile, human winner this take 4 seconds.

[00:41:10]And the second is like computational method will cause very larger damages compared to the human winners.

[00:41:19]And meanwhile, one more thing is very interesting is like all of these three winners are not doing locomotion.

[00:41:26]In other words, it's like all locomotion experts in fields to identify the weakness of the controllers. This is very surprising to us because you know like for for robot learning, we feel like dummy expertise is very important, right?

[00:41:43]But for failure for for understand how to evaluate your system, the dummy expert still knowledge still very important, but maybe not that important. So, not enough. We really need the computational method.

[00:42:00]And then the question comes to like, how about the state-of-the-art method? It's just now is a simple MLP controller, but what is what how about the very robust controller? So, then we try to attack one of the most robust controller in the world, which we know that part of the robotic challenge. But still we found it's very risky for this robot to have very easy to feel for down this robot.

[00:42:23]So here, what is our, you know, attack signal? Here are the two plots to show how attack works. So first, we assume the low pitch yaw angle from IMU have no more than 3° errors. So which is very small, right? For IMU low pitch yaw angle, it's very easy to have noise. It's like more than 3°.

[00:42:46]And second, for this robot, operation range is 2 m/s.

[00:42:51]And we assume the adversary agent can kind of send joint command to the robot within 0.5 m/s.

[00:42:59]So here, all we want to see is like we can use very small error signal, which is very likely to be a hidden risk in your controller. We can make the robot to fall down to break your controller.

[00:43:14]And then people may ask, "Oh, did you just, you know, randomly find some bugs or did you really find a critical failure case?" So here, this experiment just also surprised to us, but we are happy to see it's really fun identifying the critical failure case.

[00:43:31]For example, for when the robot is on a stair, the learned adversary policy will always first rotate the robot to be paralyzed to the stair and then to do the very strong attack. The reason is like when the robot dog is paralyzed to the stairs, very easy to cross your legs, right? It's very easy to cross legs to fall down.

[00:43:52]But the same policy for a flat terrain, you can see it's have totally different way. It's like a shake shake shake to to fall down robot.

[00:44:02]So as I said, it's a same policy, but different behavior on different terrain.

[00:44:06]And then we are now we are more confident to try to claim it's like the the the the novel agent is very smart.

[00:44:14]It's really learn what is you know major weakness.

[00:44:20]So what we got for the benefit is like once we have the you know we know where is the weakness. First it's like we can avoid for this you know weak scenarios.

[00:44:32]Another thing is like we can follow a function to improve the controller together with this failure scenarios.

[00:44:40]So here is to show some very harsh experiment.

[00:44:44]From for the left side is original controller. Although it's already quite robust but in some very harsh environment still it's need to survive.

[00:44:53]And for right one it's like our new controller to fine tune with this adversary signal scenarios. So here in simulation we also simulate the foot force foot perturbation force to fully enhance it.

[00:45:09]And then we invited the original controller's developer to help us to do stress test.

[00:45:15]And this is just to show some very extreme cases for for the locomotion.

[00:45:23]And based on this stress test the developer is more confident in controller you feel like okay it's really improve the controllers.

[00:45:31]The remaining part is just some outdoor experiment I will skip for your time.

[00:45:37]And the proposed method not only just you know learn failure case for reinforcement controller but also we can identify MPC controller or any kind of controller. And this one is a mini cheetah MPC controller actually previously developed by Dr. Chenyi and the Professor Quan's team. So we leverage it to do our MPC work we leverage to do love to see attack cycle line for their open source here very nice code.

[00:46:05]And here we can see we also have some technique to try to investigate the diverse failure scenarios. For example, we can identify some cases of torque over limits. You know, you know, the robot will terminate falling down as I mentioned and also for self-collision cases by leveraging in single network and some other reward recommendation technique.

[00:46:30]So here the takeaway is like the first attack Oh, sorry. Further attack can be some nice robustness in indicators to, you know, to compare different controller very efficiently.

[00:46:45]The second something very interesting here we also found like a multi-modality attacks are stronger.

[00:46:52]So here for our attack channel if only in command space for play joystick, send send joystick command or operation space means joint angle or perturbation space is cannot make a robot fall down real.

[00:47:07]But however, if we combine two space or three space together, we can generate stronger attack signal. So it's also to show many times the failure come from the multiple factors together. It's not a single factor.

[00:47:23]And third third thing I think also very important for everyone who do reinforcement learning like we found termination not sufficient.

[00:47:31]So you only when we do the reinforcement learning, you know, it's like for example, we will randomly apply 100 Newton, right? As an example to to and we hope like after termination, the controller should have survived against 100 Newton.

[00:47:46]But based on our experiment, we just showed termination cannot guarantee you are very you are safe under 100 Newton.

[00:47:54]Okay, so that's here also we need to be very careful when training a robot for safety control.

[00:47:59]And in your proposal, also we can apply to other platforms as well for example, you know, for enhanced manipulation, navigation, and so on.

[00:48:10]Okay, so that part is like how to identify how to evaluate your weakness.

[00:48:16]And some other part way for the safety part is like we also think about how to make the complex controller more efficient for the robot.

[00:48:24]For example, you know, they use no virtual vanilla MPC or RL controller is not very compliant, right? So So first we train a controller to be very compliant, especially against the external perturbation or payload, which is very likely happen for the like locomotion.

[00:48:46]And meanwhile, our controller can adjust its compliance gain to show different compliant behaviors.

[00:48:56]But actually here is not Yeah, to show like different behaviors for the compliance for the different safety requirement.

[00:49:06]But also meanwhile, we found this only compliance is not enough. It's like because it's not safe enough. For some case, if a robot just very stupidly follow the compliant com- command, actually it's very still falling down.

[00:49:21]So our idea is like we want to train safe policy to make sure the robot will smartly to transform between the compliant state and the safe state to further enhance the safety during the compliant mode. Okay. So currently existing work is like the robot only be compliant, but then we realize many cases it will fail.

[00:49:44]So the contribution is like we want to make a robot even smarter how to, you know, by itself to identify when should I be compliant, when should be I be something for safe state.

[00:50:01]And so that is the software part. Last work I want to share is like for a hardware part, we also have some thoughts how to make robot more safe.

[00:50:11]For human robot robot especially, we are very we care really about their safety or not because the human robot is supposed to work closely with a human in a human-centered environment.

[00:50:24]So if a robot falling down, it's very it's very very likely to break the its own hardware but also the people nearby.

[00:50:33]So in this work we collaborate with Professor Chichilian in Soft Robotics Lab in NUS and we develop a new very nice material for the for the robot and then we do for more coding and to deploy on a human robot and we can make robot very safe under [snorts] very high impact. So this is maybe to throw from the to to to fall down from the second floor.

[00:51:00]As a matter of case, for example, you just throw it from the second floor. So for very large impact, the robot is still very safe.

[00:51:10]So here the key idea is like we can leverage simulation to to very efficiently to do the co-design.

[00:51:18]And here is a co-design result and it's included a shape and also the thickness of the protector. We can directly put the design into the laser cut and laser cutter can automatically to, you know, to reconstruct the the protector.

[00:51:37]And here again we did a lot of experiments in in in different scenario.

[00:51:42]For example, when robot dancing, we just, you know, to to to give some random push to make a robot falling down.

[00:51:51]And just, you know, repeat repeat many kind of figure cases.

[00:51:55]And here I want to show another one.

[00:51:58]It's like So, for very dynamic one for for the front flip, the robot falling down, we just we just, you know, we just shut down the power during it. So, that's why it's falling down.

[00:52:11]And then the robot can be still very safe against it.

[00:52:15]And just continue to work.

[00:52:18]And here I just some other more result.

[00:52:20]It's like, you know, under the 10 outside environment and so on.

[00:52:26]And recently we are also adding an active policy to make a robot to do some active motion to make a robot more safe.

[00:52:35]Okay.

[00:52:36]So, with this I want to conclude my today's sharing. So, first takeaways like with with really look at the robot learning is very critical for current age to, you know, to solve a lot of very fundamental problem. But here one major challenge is like how to make get the high quality but meanwhile scalable data.

[00:53:00]So, for this we still we are strong believer for simulation data. And we believe simulation data used to lead a lot of real world success and for future it will continue to lead a lot of real world success.

[00:53:14]And for our lab, as Professor Kormann introduced, it's like we really focus on how to make sure the robot to be more smart but meanwhile more reliable, more safe.

[00:53:25]So, here our major data flavor is advanced simulation data to make to, you know, to make a robot to, you know, even work for more contact rich cases.

[00:53:37]And with this I want to really appreciate First I appreciate my sponsors to my PhD post doc. And meanwhile also really appreciate all my PhD postdoc in the our lab. And so they did all this great work. I'm just here to speak on behalf of them.

[00:53:55]And you know, and also I want to really appreciate the support from the funding agencies and also for sure support from the promotions.

[00:54:05]So yeah, so so with this I'm happy to take any questions and also I want to say thanks for the all the audience who take their time to join this seminar.

[00:54:16]Okay. Thank you so much. Um The first thing I want to check if you can hear me okay.

[00:54:25]Can you hear me well?

[00:54:27]>> Oh yeah, sorry. Just just now I can hear you. Yeah. Okay. That's fine. I was trying to ask a couple of questions during the presentation, but I think you did not hear me.

[00:54:36]I really sorry for that. The reason is that just some video they have some voice. So I just yeah, muted my Yeah. Okay. Yeah, that's fine.

[00:54:48]I I think that's a good policy but because I have lots of questions here.

[00:54:52]>> [laughter] >> Okay. Very good. But very exciting talk.

[00:54:55]Thank you so much. I think initially I joined this talk with an expectation to learn more about our simulation work, but I think eventually I find like the last two work is also very exciting. Particularly the one with AI agent attack, I think it will be very helpful for some of the work that we are trying to do. Right? Because I think as you know, for deployment we care a lot about reliability. Right? So how to make sure that we can have a policy that's very reliable, right? And identify all the potential failure mode so that we can prepare for the worst-case scenario. I think it's very exciting. And the last one when you showed them you know, the soft material, right?

[00:55:40]Which [clears throat] last human order about I think it's also very nice.

[00:55:43]Okay. By the way, again, I have many, many questions I have in my notebook.

[00:55:49]But I would select maybe a couple of them so that I have related team to have an opportunity to ask this as well, right? Sure. Um Yeah, so maybe one question to bow differentiable simulation, right? So I think we we agree that there's some um advantage for differentiable simulation is may have us to do policy learning faster.

[00:56:15]But I I think you also mentioned it in the presentation that this may have some problem with um impact modeling, right? So because if the system is hybrid, uh differentiable now become an assumption, right? So um can you have some thought about this and how can we address this problem? Yeah.

[00:56:40]That's a great question and I also run some most important question I think in this field is like how to make a impact from where you want we want to make an impact to make very accurate but meanwhile make it also differentiable.

[00:56:56]So so for this I think so so we have maybe two different way trying to solve it. So one way is like we we we yeah our one for very new work like actually we just submit last week is like they they they found like they are still some tricks how to handle the impact impact. So Dr. Professor Quince expert for this just for a general audience who are not familiar with this problem. So Professor Quince question is like it's like usually you know the contact is like a step function, right?

[00:57:30]And for differentiable simulation what we did we like we need some you know approximations for a step function in order to make it continuous. Yeah. And then you really well you know bring some trouble for the gradients. Okay, so this is for the general audience. So so so so you know last week our new paper is like we can build some you know new function in order to very nicely to bridge between the two manifold the you know the the kind of static part and also the some dynamic part. So this is one way we can trying to solve it.

[00:58:07]And another way is like I also do see some recent potential is like some some hybrid training like for example maybe not purely in differentiable simulation also maybe not purely in you know PPO or something. For example maybe some like some normal idea some some idea plan idea is like maybe we can first training differentiable simulation and then we can put it in another high fidelity simulation but just for few shot fine tune. This is also some practical way.

[00:58:40]Yeah but that's just some of my thought but still it's a very important problem in this field. [snorts] I do agree people need more time to investigate.

[00:58:50]Okay yeah so good. And maybe one more kind of general question about simulation right? So I think in learning we're trying to have an accurate simulation as accurate as as much as possible. Yes. And also fast right? But the question is in in reinforcement learning for example normally we also need to do dynamic randomization. Yes.

[00:59:11]With in another mean is try to make the simulation less accurate. Right. Mhm. So it's more like a paradox in this is feel. So what your thought about building and let's say an optimal simulation right? Mhm. Let's say the best simulation that can have to train the policy fast uh and have very good sim-to-real.

[00:59:33]Yeah. Yeah, that's also great question.

[00:59:36]It's like a trade-off between the diversity and the accuracy. Yeah, exactly.

[00:59:43]Yeah, so yeah so so so I do see people have different debate. So some people say, "Okay, I don't need to be so accurate, you know, as long as your diverse enough. And And maybe with this diversity I can even be more robust because, you know, I can tolerate this kind of big errors."

[01:00:03]And some other people just claim, "Oh, we need to be accurate enough to, you know, make our training faster."

[01:00:10]So my my own personal experience is like I So first I do think still we need relatively accurate. The reason is like for if if the domain randomization is too heavy, then the policy can be very conservative if we did not handle it very well. Then very conservative means, you know, maybe the energy consumption will be very high or even, you know, for the worst case, maybe the robot just do not move for locomotion. Then, you know, it's just always get the sub-optimal reward. But in short it's like domain randomization I think is important but maybe should not be too heavy but because it will usually influence your training to to get some very conservative or local minimal policy.

[01:00:58]And And meanwhile I I do see because we we we had some ongoing work on system identification.

[01:01:07]And I do see it's like with some nice system identification or we call real-to-sim, we can make your training more less painful because, you know, make training more efficient and with less domain randomization.

[01:01:22]And last part I want to share is like also it will influence your task you can do. For example, for some task maybe we can tolerate very huge errors and then you know we can have a dominant nation is also okay. But for some very fine grained maybe manipulation task I can imagine if we do very greedy dominant nation maybe the task is very hard to learn.

[01:01:46]So in summary my thought is like if for the task you can do very natural to see then maybe just try try try our best to do the natural thing to you know to reduce us simple real mismatch. And then to just to reduce the efforts on the learning part.

[01:02:06]Otherwise just maybe more challenge for the learning part.

[01:02:10]Okay. That's so good. Maybe one last question for me.

[01:02:14]Yeah, always my pleasure to discuss.

[01:02:16]>> Okay.

[01:02:17]So So I I really interested in work of having an attack agent as I mentioned earlier to have a to train a better policy. But I think one of the common problem in RL that now if you fine tune the policy, right? That's you may and focus on the S case scenario or the failure case that you are trying to solve, right? It may help you to handle that.

[01:02:43]But maybe you will fail in another scenario that you did not fail earlier, right? So it's ballet.

[01:02:51]Or maybe some other open problems. Yeah.

[01:02:54]Because you try to fix this but you fails in another case, right? Something like Okay. Yeah, some some nominal working policy can work quite well but then you try to do portrait recovery a bit more aggressively but then the nominal working part is no longer working as well, right? So I think it's it's a bit common. So do you have any suggestion idea on how to avoid this or at least how to improve it.

[01:03:19]Yeah, that's a great question. So, today I did not have time to show some the statistic we did on this part, but that's very important problem for this part this this type of research.

[01:03:31]So, so so so in short is like my philosophy is like maybe first you have a relatively good controller.

[01:03:41]And a good controller with as as less holes as possible. And then you use you know like a third way to identify where the hole is, where the weakness is, and try to fill these holes. So, that's my philosophy. Because another way is like it's because we did some experiment in our paper. We really see if we do not properly design the system just like Professor Quan mentioned, "Okay, I found some hole in the left. I feel it." And then right home you know home will be more holes and then you will always you know back and forth back and forth. So, so the the theoretical reason is like usually adversary attack we have different reward function compared to the you know locomotion or the task reward. So, it's just just just like a GAN. So, GAN you know people know the GAN network is like you try to optimize min-max the different two objective together. And also I already people heard about GAN to be very hard to converge, very hard to to train, right?

[01:04:40]So, for the for this type of research will be similar because also you min-max the different reward. Even if you min-max the same reward is also very hard to converge to a nice result.

[01:04:53]So, in practical way my suggestion is like okay, just first have a relatively good policy with as you know as small hole as possible and then just use this method to understand where your hole is to fill in the holes.

[01:05:09]Yeah. And there's some trade-off is like still it's will make the robot control policy a little bit conservative, a little bit more energy consumption, but it's acceptable.

[01:05:22]Okay. Thank you. Yeah, I think I we have one question from the chat.

[01:05:28]Let me see the chat and then after that we can have we can take a question from seminar. [snorts] Sure. So Boris said, "Thank you for sharing. Can you share any bottleneck from [snorts] hardware side when you do robot testings for humanoid and robot dog?"

[01:05:50]Especially on powering system.

[01:05:55]Okay. Yeah, this is also a great question. So to be honest, I'm not expert in hardware, but I think you also ask some very important problem. So So So So So I have to say so especially for some small humanoid robot like the achievement test, right? So people I think they do have some problem is like how to make robot more powerful.

[01:06:20]Like the achievement recently, so especially for its upper body, we all know it's still very weak. I think G1's payload I forgot the detail number maybe 4 kg or I mean no more than 10 kg, right? So So for this very small payload, it's very it's really very limited the performance or the task it can do. So So I do see still there are some huge potentials how to make a robot more powerful, more lightweight for both lower body and upper body.

[01:06:56]Especially I think next stage will be local manipulation.

[01:07:00]The humanoid robot will do more functional task. The payload will be a major issues.

[01:07:09]Or maybe maybe two way, one way like you have better motor, another way maybe you have some specific mechanical design to make you know more energy efficient.

[01:07:22]Yes, thank you Boris.

[01:07:23]Okay, thank you. From the seminar room, if you have any question, feel free to unmute and ask us some.

[01:07:31]Oh.

[01:07:32]Okay, uh you do you have any question? Turn to Dr. Fancy. I see your interrupt uh Dr. Fancy. Oh, no, no, we were checking if we can hear us.

[01:07:47]Yeah, we can hear you.

[01:07:48]Yeah.

[01:07:51]Okay, so uh uh I expect that the question is related to manipulation, but actually many question related to look locomotion.

[01:08:03]So now any question related to maybe synthetic data or manipulation contact bridge uh from audience here?

[01:08:17]You have 10 seconds because I have lots of question here.

[01:08:20]Yeah, otherwise I will ask some irrelevant questions.

[01:08:27]Um thank you.

[01:08:28]Uh can you hear me? Yeah, very well.

[01:08:31]>> Yeah, we can hear you.

[01:08:32]>> Yeah, um thank you for your present session.

[01:08:36]Um I have a question not related not really related to the current presentation.

[01:08:43]It is about the local manipulation because I'm from the locomotion team.

[01:08:49]Um so um what is the best observation presentation or what is the best simulation for that task to include the interaction between human and object with different kind of object like a box, a sofa, or some agile.

[01:09:14]Um how can we more model the uh interaction carefully?

[01:09:21]Um so that's my question.

[01:09:25]Yeah, it's a very good question. So I I will answer the question maybe in two ways.

[01:09:31]So the question is like for local manipulation, you want to have more diverse objects, right?

[01:09:38]Or maybe what is the best simulation or how to work properly in simulation.

[01:09:42]So I would say first thing is like you need to have some very nice uh digital asset.

[01:09:49]If your digital asset maybe the mesh or some maybe color that is not so good, I believe for sure it will influence your training result. So first make sure uh yeah, any simulator you're using for first make sure you use some learning properly some nice digital asset.

[01:10:07]And it will influence a lot, especially for collision and so on.

[01:10:12]And second, for this task, I think currently I think people especially if if they don't care about the deformation of the objects, I think most people maybe MuJoCo and Isaac Lab are very popular choice.

[01:10:28]So if I were you, I will also first try these simulators and especially they already have maybe some nice open source um info materials you can rebuild based on this. Yeah. But before and first as I mentioned, you need to make sure your digital asset is good enough.

[01:10:49]It's very hopefully maybe very nice resolution, especially for the contact Uh thank you for your question. Um Do you guys have any other question?

[01:11:11]>> Uh actually we have another question in the chat. So yeah, you guys can prepare your next questions while while we can work on the chat question. Okay.

[01:11:21]Can you share your personal experience about deformable restitution for deformable objects, which material parameters Yeah, the parameter you this year are very important. Tend to have the greatest impact on sim to real transfer.

[01:11:41]Conversely, which parameter can you really be fixed as constants without significant Uh-huh. Yeah, this is a also very very nice questions. So I want to share you another different perspective. So So actually when we do the benchmarking for the different simulators, be- because you know, different simulator you also Mhm. Yeah, also writing in in in with a different algorithms. So so so So even for the same parameters actually can have very different behaviors on the on the different simulator, different measures.

[01:12:20]So my suggestion is like first what you are listing is is most important ones. I would say you really need to just tune this this ones.

[01:12:29]And meanwhile, you can start to try to develop some automatic real to sim pipeline.

[01:12:36]So start to try maybe not to, you know, to calibrate by yourself because, you know, for one or two parameters is easy, but then for four or three, four or five or six parameters is very hard for manually to tune it.

[01:12:50]But instead, try to use some auto tuning way. Like let's say MAA ES is one of the very powerful way for for for for the that are other methods. If your simulator is differentiable, you can also use that directly leverage a gradient. But in short, it's like just because the different simulator have different method and trying to not to especially if you tune multiple parameters, maybe it's better to develop an automatic pipeline algorithm to just algorithm to tune it.

[01:13:22]It will be more efficient.

[01:13:26]Yeah.

[01:13:27]I guess we have another question from the room.

[01:13:30]Yeah.

[01:13:31]I have been working on your drone swarms and I wonder if there's special adjustment that need to be made to reinforce learning algorithm to swarm.

[01:13:43]Oh, okay. It's a multi-agent problem.

[01:13:46]And would the policy train for 10 drone drones? Yeah. Okay, I tried because I think also many audience I for locomotion background, right? So I try to make the question more generalizable.

[01:14:00]So the question is like, okay, the proposed method are mainly for single agent. What if I have 10 robots? For multi-agent, how to solve it?

[01:14:11]So So I think usually for if you have less agent, you can still do centralized reinforcement learning algorithm, which means you assume you have a central computer central PC and to learn to control all the agent together.

[01:14:29]But as you mentioned, you if you have more than one more than like 10 agents or more, like sometimes the centralized way is already hard to solve it. Then maybe the decentralized is also a very good way for example, you assume you have communication with your maybe nearest one to or even sometimes you don't have communication with any of them. So, each agent just make a decision by itself.

[01:14:57]So, for sometimes it's also doable. For example, I remember maybe Professor lab maybe two two or three years ago they show some nice result how to train a robot to for example multiple robot dog together to to carry some payload larger payload or larger objects.

[01:15:15]I think kind of work can be very maybe inspiring. You can check about the detail of this method. Just to be short, it's like you have few number of agent you can still do centralized way. But if you want to make it scalable, then your decentralized way is your only choice.

[01:15:36]Okay. So, the next question is like if we use neural network to directly learn system dynamics such as a neural operator of PRN can completely replace the traditional simulator. Okay, so this is also very interesting question the the the the audience ask, could we just only use a neural controller? We don't care no more about analytic. So, my idea is maybe not not that much.

[01:16:13]Maybe hybrid is a future. So, the reason is like first you still need data, right? To train a neural controller, a neural simulator. So, where do the data come from? So, as we mentioned, if only from real world data is very very expensive, very very hard. You need a lot of data. So, from analytic simulator is a very good way to train the neural simulators.

[01:16:37]And and and and people usually do that just because if for for the analytic simulator maybe too slow or maybe hard to scalable for you know more contact rich cases. And but for similar for neural network is easier just due to the the computer architectures advantages easier to to to be more scalable.

[01:17:02]So my my short answer is like I believe the two two simulator will will coexist and benefit with each other. We can not simply say once neural simulator will replace all of the analytic simulator.

[01:17:19]And additionally it's like analytic simulator can always give you constraints guarantee because it's use optimization based method. But neural controller until today I think it's hard to see it's kind of hard constraints some you know to guarantee the hard constraints. So this is also some trade-offs.

[01:17:42]Okay.

[01:17:43]So next question is like applicability of a simulation technique and simulator.

[01:17:53]We are not yet at this stage with work Sorry. Where is it?

[01:17:59]Oh, here.

[01:18:01]We are not yet uh working with a highly contact density objects.

[01:18:10]But we are always interested in higher or faster quality simulation to build data set to train.

[01:18:17]Do you have some additional resources we can look how to we can employ in our headset lab or do we need to apply into current Okay. Uh the question is a bit long but I guess the question is like how could we leverage your simulator, right? If I understand correctly. So, in short is like so so first I think our simulator will open source soon. Maybe around this is May. I think maybe before April we we we we are open source it. And then everyone can use it.

[01:18:55]And also we are very likely also to open source one which integrated with Newton.

[01:19:02]Because we already have the this results some initial results on Newton.

[01:19:09]And and then because for us we also hope everyone can benefit from it. And meanwhile we don't want to change too much for your own coding popular, right?

[01:19:18]So, if we can integrate with Newton or some other popular simulator I think it will be very nice for everyone who want to use it.

[01:19:26]It's not to say okay if you want to use our simulator you have to adapt your training program to us. I think it's too much for other people.

[01:19:34]Okay.

[01:19:36]So, yeah I hope I answer your question.

[01:19:38]But if if I understand if you want to have additional question please let me know.

[01:19:44]So. I I think I think we may have a lot of uh attendance from um online. So, I don't think that we will be able to address all of them. But I think uh just to help you to summarize a bit I think that's also one uh interesting question that maybe you can have to address is that I think we currently um we have a lot of data from the internet, right?

[01:20:09]So, uh is it better if we can reuse them or somehow combine them with the synthetic data from simulation for the Yeah. Yeah.

[01:20:19]So, this is also I think the very important question. Reason is like how to leverage internet scale data to human-like robot or to human-like robot.

[01:20:31]So, so for this I think maybe some people will say egocentric, some people, you know, say some third third person feel, like so on.

[01:20:41]And for for for us, it's like we really care about, for example, for for for some real to sim problem. So, how could we, for example, giving a maybe a a a mom is kind of maybe preparing your food at the kitchen? Could we maybe to leverage a simulation, leverage a foundation model to maybe to convert some some some objects in your video to be some digital asset in your, you know, simulation environment? And also how to recalibrate some kinematic motions and so on. So, I I do see people are maybe some people are they start trying to start to do it. For us, it's like our efforts they want to push to more contact-rich scenarios, because maybe a if you're if your video just one person to pick up for example my cup and so on. I think this one people maybe now they already have some existing technique and so it's quite well.

[01:21:41]But if I want to push it forward, for example, you know, maximum some clothes or it's for example some this kind of tool or some like this kind of things.

[01:21:53]I I I do see maybe have more potentials we can, you know, and meanwhile this kind of objects, especially articulate objects, are more more commonly seen in a real world.

[01:22:07]So, in short, it's like we in real world there are more contact much more contact-rich tasks. And uh yeah, yeah, if we can advance the simulation, we can see more possibility with real to sim and then combine with the existing sim to real pipeline.

[01:22:24]Okay. Thank Thank so much. So, I think for the interest of time, I will let the room ask maybe one more question before we end the seminar. Yeah.

[01:22:34]Okay, um possible if uh the question from me uh So, um >> So, currently I work directly with a research team from the mani- manipulation team, and they try to make the synthetic data for manipulation.

[01:22:55]And the big question is uh the data generate from their proposal, uh how how can we um kind of evaluate the quality of the the data they uh try to generate.

[01:23:14]So, do you have any um tool or any suggest to kind of evaluate the the goodness of the synthetic data?

[01:23:23]Because uh when you uh synthesize the data, it may it might contain some uh invalid about the physical about the some uh uh abnormal behavior inside.

[01:23:39]So, how do you locate some problem?

[01:23:44]Mhm. That this is a also very good question. I think to be honest in in the research community, I think people still struggle to how to to to find some quantitatively. I think people want a quantitative result, right? So, now I think one way is like you can get some qualitatively result. For example, one example is just as we we showed is like you can do some very basic, intuitive interaction with with your objects. And let's see whether your objects have some desired performance or maybe a performance look too strange.

[01:24:22]So, I think this is this is maybe well the most straightforward way.

[01:24:28]And uh Yeah, and for some some other ways is like maybe you can do you can train some simple policy maybe maybe from point Let's say still as a couple example.

[01:24:43]Maybe you can train a simple push policy, right? You can first maybe to see whether the policy is simple or transformable before you do some more fancy complex task.

[01:24:56]Yeah, but yeah, still is a unsolved problem. I think people in this field we don't have too much nice ground truths to to to benefit from.

[01:25:08]Okay. Thank you so much. Thank you for your time and thank you all the attendant for all the exciting question today. I still have a long list of question myself, but hopefully to have another chance to meet with you later in the future. To to chat now. Yeah, it's always my great pleasure to discuss with you. Are you going to ICLR this year? Um may not be. Okay. Okay, maybe have another time. Okay, but hopefully to see you again another time and who knows maybe you would drop by in our motion offices sometime as well.

[01:25:40]Yeah, just three hour flights, you know, nothing, right?

[01:25:44]Yeah. And also my very good friend Dr. Chenye is also there. Yeah, yeah. I I will be back.

[01:25:51]Yeah. Okay.

[01:25:53]Thank you all. Thank you so much. Thank you. Bye. Okay, thank you. Thank you so much Professor Quan. Okay, thank you.

[01:25:58]Thank you. Bye.

[01:26:00]Uh thank you.

[01:26:02]See you in the next seminar. Bye.

[01:26:04]Okay, thank you. Bye. Bye-bye.

[01:26:07]Bye.

Related Videos

Beyond Robotics | European Rover Challenge 2026

beyondrobotics

189 views•2026-06-01

Beatbot Sora70: JetPulse Technology and AI obstacle avoidance and navigation!

DroidModderX

26K views•2026-06-02

Tesla FSD 14.3.3 Hits Phoenix Streets - FIRST LOOK

anthonystesla

114 views•2026-05-29

Elon Musk Just Revealed Fremont Line for Optimus Gen 3 Mass Production

TheAINexusOfficial

180 views•2026-05-30

人機一体「零式人機 ver.2」子ども企画【おもしろ発見！モビリティー】 #乗り物 #automobile #robot #shorts

KyodoNews

1K views•2026-05-28

China’s New Luna AI Robot Looks Shockingly Human...

NextGenHumanoids

850 views•2026-05-28

Reachy Mini: the $300 open source robot you can actually hack — Andres Marafioti, Hugging Face

aiDotEngineer

662 views•2026-05-29

柔軟指×AI画像処理食品の仕分け作業システム！#柔軟指 #ロボット #自動化 #製造業をもっと盛り上げたい

KiQ_Robotics_Corp.

113 views•2026-05-28

Trending

Revisiting The Cat Cafe For The Final Time

BenGtalks

3195K views•2026-05-29

Lil bro is a menace 🤣

NotAirJordan

2037K views•2026-05-31

Political Science

My response to the Police

RecklessBen

1496K views•2026-06-01

The Dancing Plague...

HoodieGuyStories

1730K views•2026-05-30