This tutorial provides a rare, granular look at the sim-to-real pipeline, masterfully demystifying the complex reward shaping and hardware modeling required for functional bipedal locomotion. It is an essential roadmap for researchers looking to bridge the gap between theoretical reinforcement learning and practical robotic deployment.
Deep Dive
Prerequisite Knowledge
- No data available.
Where to go next
- No data available.
Deep Dive
Secrets of Walking Robot (NVIDIA Isaac Lab and Isaac Lab Arena)Added:
Hello and welcome to my channel. In this video, we are going to look at some kind of secrets how to make the bipedal robot. And I hope this video is going to be super helpful for the people who would like to build their own bipedal robots. In the previous videos, we did a lot of things like we learned how to use the Isaac SIM. We built the robot arm which was controlled by the Isaac SIM afterwards. So, we learned how to use the Isaac Lab. We built the bipedal robot. We tested the bipedal robot. We made the simulation of the bipedal robot inside the ISA lab in order to make the neural net in order to control this bipedal robot. And finally, we deployed this neural net model on the real robot.
And it's working. Now let's look into details how this is simulation was done and how we transferred this simulated neural net to the real robot. This video is going to be a little bit technical but I hope it's going to be useful for many of my viewers. So let's get started. I would like to start by explaining how to make an URDF file.
This is a file which contains the information about robot and we need this file in order to put the robot inside the simulation inside the Isaac lab simulation. And I would like to include this part in this video because quite often I need to make an UDF file and I would like to have the guide which I can follow next time when I need to make the UDF file. So I'm making this part of the video mainly for myself but I'm sure it's going to be helpful for other people too. I already explained a little bit in the previous video how I get the precise center of masses and inertia matrices for the URDF file. And here we're going to look how to make this URDF file. And for the URF file, I made two CAD models. The first model is a simplified robot. This is for the measures for the URDF file. And the second cut model is a precise cut model with the precise masses of all parts. I already talked about this model in my previous video. So what I did, I took my robot and I put all the parts of my robot in the separate component which I called robot and afterwards I aligned I rotate this component in order to have the robot aligned with the Z-axis. In this case I have the front of the robot aligned with the Y-axis but I think it would be better to align the front of the robot with the Xaxis because this is what usually used in the ISA club. But this orientation is also okay and I worked with this orientation with a yaxis which is pointed forward. After this I grounded this component in order to keep this orientation. After this I exported each component as a separate STL file but I did it in the special way. For example, if I need to export this part, I hide it all other parts except this one. And afterward, I exported not this link separately, but I exported the robot like this. I keep this orientation in the exported file.
This is really important. So over here, it's STL binary. Unit type should be meter. You can put the refinement at low in order to make this file small. like this is the entire URDF model is going to be small. It's not going to take too much memory out of the hard drive and also I think with the smaller URDF files the simulation is going to be a little bit uh faster but I'm not sure about this and after when saving this file uh I put here the proper name for the link.
So instead of robot I put the name like link 4. So like this I exported all the components of this robot. And thanks to this way of exporting all my components has the same origin which is where is the origin of uh the initial cut file.
And also thanks to the fact that we put the robot in the correct orientation from the beginning in our URDF file which is over here all the rotations are at zero everywhere. And this is simplifies a lot because like this you don't need to think how to rotate each component in order to put it on your robot. And also thanks to this all moment of inertia like over here are the same as in our CAD model. The only thing that we need to take care of the units because the Fusion 360 gives the moment of inertia in g mm squared and for the UDF file we need kilogram m squared. So basically we need to take these values and multiply them at 10 in the power minus 9. So it's quite simple to get the moment of inertia for your RDF file like this. But what about other parameters and other parameters are the origin of each link like over here and also the center of mass of each link and also for the joints. Let's look first at the origin of the links. For the first link it's easy. the origin is at zero. So basically it means that the origin of the URDF file is going to be the same as origin of the cut file. The center of mass of the base is also the same between the URDF and cut file. But let's look at some other random links like for example this link. This is a leg right foot over here the origin of the part.
We take the coordinates of the joint of this foot and these coordinates we take them with a minus sign and we put these coordinates in our URF file origin. And for the center of mass it's also quite simple. We need to take the center of mass from the cut file and subtract the position of the joint of this part. So basically for example this one we take this center of mass and from this we subtract the position of this joint these values and this would give us the the position of the center of mass with respect to this origin. So this is also quite simple and for the joint we also need to specify the origin. So basically we need to specify the relative distance of one joint with respect to another joint. And for example in this case it's this joint with respect to this joint.
So it's basically these values minus these values and this would give us the position of one joint with respect to another joint. And also we should not forget the joint to put the axis of rotation of our joint. For example, here we say that axis of rotation is in the opposite direction than the axis of x.
So x-axis goes over here and axis of rotation goes in the opposite direction. Easy. After when we finish our urdf, we can use this website in order to check that our URF looks like our robot. And when our URF is ready, we need to put our URDF inside the Isac Sim/ISAC lab. Now let me show you how to create the USD file from the URDF file.
This one. So first we need to launch the Isaac sim. In the Isaac sim we need to create the new stage empty.
In this stage I will delete this one. So I need like completely empty stage. Now I go to the file import and I need to choose our URDF file. I already in the right folder and over here I need to specify that I have the movable base because we have the mobile robot. It's not a robot arm. It's not a fixed robot arm. It's a robot with the movable base.
The base which can move the rest I will uh keep the default values and we click import. The files will be created at the same folder. This is okay with me. So over here we have the robot. It's not visible here because uh there is no light at this point. The file is already created. But we need to modify this file. And I figured out that I cannot do this from here right now because uh it have not worked for me. So what I would do instead I would close this file by opening another empty file and afterwards I will open the recently created USD file which we need to modify. we need to make it instancable.
So this is a file which was created. So when we open this file, we see all the components of our robot and we need to select these components and we need to make them instancable over here. Now when it's instancable, we need to save the file by pressing Ctrl S. The file is saved. We don't need this XM anymore. We can close it. This is a folder which was created.
We need to take the USD file and also this folder both of them copy and we need to put it in the Isaac sim in the asset uh folder. There is a folder robots and over here we need to put these two files. So this is a way to create the USD file from URDF file.
Isaac is running on my Linux computer but I would like to show you the file structure on my laptop because it's easier for me. So basically these are the files which I used for my robot inside the ISA club and basically here there are three main folders actuators assets and tasks. So in the actuators there is actuator model assets contains everything which are related to the robot and task contains everything which is related to the reinforcement learning model. So inside the assets there is a folder robot which contains the USD file which we built before and also it contains this file which kind of explains the robot itself. So this file is over here. Basically this file states which joints there is inside the robot the initial position and afterwards for each joint it explains what kind of actuator model with which kind of parameters should be used. Now the actuator model itself is stored over here and this file we can see here.
Basically I took the standard file for the actuator for the ideal PD actuator and I made some modifications. So basically there are two main things in this actuator model. There is acceleration and there is delay.
Actuator does not reacts immediately with as fast acceleration as possible.
it has some kind of acceleration curve and also the signal which actuators receives has some delay. So basically actuator does not react to the new comments immediately it reacts to the new comments with some delay and this delay is basically stored in the delay buffer. So like this when we compute the actuator the torque of the actuator we basically take the position velocity and uh effort effort mean torqus from this delay buffer and afterwards over here I do some uh acceleration calculations and at the end we calculate the control action which is basically over here as you can see it's effort and this effort is coming from the stiffness ness and dumping stiffness time error position, dumping time error velocity. And so this actuator model is used in the file which describes our robot. By the way, over here there is a uh pass to the USD file.
So like this you are sure that this is a file which explains our robot. And over here for each actuator uh there are a lot of different parameters. And basically what I did I put the effort limit the theoretical limit velocity limit also theoretical limit stiffness dumping the same. And what I changed I actually tuned the armature friction and delay in order to get the simulated actuator as close to the real actuator as possible. Here the friction is a little bit higher. Delay is everywhere six six time steps. Over here the friction is even higher. I'm not really sure why the friction is different between different actuators and why the armature is different between them. But this combination makes the simulated actuator quite close to the real one.
That's why I use it. And let me show you how good the simulated actuator to the real one. This is a file with a comparison of the simulated actuator with the real actuator. And the real actuator is in red and simulated actuator is in green. So over here it's the position of actuator as a function of time. And what I ask actuator I ask actuator to move abruptly at time zero to the certain position. Of course it cannot do this perfectly. So what actuator do it's overshoots a little bit and afterwards it corrects. So this is for the positive comment. This is for the negative comment and this is uh for the sign command when I change the target position uh with a sine wave. And I do this for each actuator. Like for example, these three graphs are for the hip rotation right leg actuator. These are for the hip rotation left leg actuator etc. And as you can see the green line the simulated actuator is quite close to the red one but not perfect for the hip rotation. I think they are quite uh good for the next pair of actuators. So the right leg, left leg. This is for the hip adduction abduction. It's uh not perfect but it's okayish. And there is a there is a little bit um strange since that uh over here there is a two ones but uh there is no such graph on this on the left leg.
So it seems like maybe the right leg actuator just a little bit different than the left leg actuator.
Here the hip flexion extension it's already a little bit better. Also you see some discrepancy in the sign motion.
This is a knee flexion extension. This one is quite good. But the real actuator has kind of interesting pattern. And these are the fit flexion extension.
So as you can see it's not always perfect this comparison. But this is the best what uh I was able to achieve with this actuator model. And this probably one of the possible improvements for the future is to make better actuator model which would fit better the real data. So we talked about robot, we talked about actuators and now the time to talk about the reinforcement learning model. As I told you it's uh mainly in the folder task and what we have here we have first of all agent and here is the file which describes the neural net policy which is going to be trained and afterwards deployed on your robot. So if you want to change the number of layers, the number of nodes or the architecture of this neural net, you should change this file. And the second file which is important here but mainly the most important file here is this one. And the additional file to these files are over here. So uh here there are some additional observations, additional area words or additional curriculums which can be used inside this file. So let's look at this file and this is a file with which you going to work a lot like really a lot. This was the main file which you need to change between different trainings. I already explained a little bit this file in the video about the ISA club. But let's look roughly here. So there is a definition of the terrain. There is over here the actions from the policy. So this is the output of the policy. There is here observations. This is the input of the policy. Event event it means basically all the randomization for our environment and robot like for example for the material you can uh make the randomization for the friction or you can change the mass of the base of your robot. Afterwards there is probably the most important part here is the rewards.
Rewards for reinforcement learning model. After there is a curriculum.
curriculum. It means some parts which are going to evolve during the simulation. For example, the push force.
So the robot is going to be pushed randomly at the beginning of the simulation is going to be pushed just a little bit. And when the robot is going to start to work more or less properly, it's going to be pushed even more. So the push force would increase in order to progress with the learning. Here's a comment. So basically this is a speed with which uh we would like our robot to walk in X or Y or angular Z direction and terminations. So what I did in order to improve the as the club model is basically two things not only these two things but main things are events. So I increased the randomization for different parts. For example I implemented the randomization on the center of mass. So instead of having the center of mass fixed at the certain point for the base link, the center of mass is randomized with this uncertainty on the x direction plus - 5 cm on the y direction plus -2 cm and on the z direction it's also plus - 2 cm or for example over here there is a randomization on the actuator model like this. If the actuator model on the real robot is not exactly the same as in simulation hopefully this would be captured by this uh randomization and the second thing which I corrected really really a lot is rewards. I think in general it's a good idea to run this simulation for many times and trying to change each rewards in different directions and try to feel what the effect of each rewards is having on the robot. For example, let's look at this reward action rate L2 which for me was like super super important. This is actually penalty meaning that uh if the robot change its action very fast, we give the penalty to the um neural net.
We're saying that this is not good. You should not do this. You should not change the action very fast. So if we put this uh penalty very low in this case we allow for the model to fast change uh changing actions and from one point of view this is a good idea but from another point of view it means that all the motions are going to be fast and because we don't have the perfect simulated actuator model these fast motions would not be transferred really well to the real robot and the slower the motion the better it's going to be transferred to the real robot because the slower the motion the easier for the real actuator to follow this motion. But at the same time, if we put this uh penalty too high, in this case the robot will be penalized for changing the basically for moving the actuators, for changing the motion of the actuators.
And this means that we kind of need to find a a great balance, the best balance in order from one uh hand to have some kind of slow motion. But at the same time, the motion should be fast enough in order to balance in order to walk.
And I think this is the main complication of the reinforcement learning. You need to balance all the rewards in the way in order to make the robot behave like you want. And uh quite often if you increase one reward or penalty too high or too low you will see different uh different problems. So you need like really kind of balance all the rewards and penalties between them. It's quite tedious task which takes uh weeks in the best case scenario but for me it was months actually. Let me show you the difference between the model which was trained with a low action rate and with the high action rate. So for example here the motion are a little bit faster and this was trained with low action rate and over here I put the clip from the training with a higher action rate.
So the motion are slower and I saw on the physical robot that this slower motion are looks way better than these fast ones.
As you can see with a higher action rate it works way better. Let me show you other things which I did in the club in order to make the working a little bit better and a little bit more reliable.
As you can see in this video, the step height of the robot is really really small. And uh I decided that it looks like it's uh a bad thing to do. So it looks like it's better to force the robot to lift the leg higher than it's actually doing. And of course robot learned to lift the foot very low because this fits better the actual rewards. So I have introduced additional reward which penalize the robot if it lifts the leg less than 5 cm from the ground. And let's look at the results. I will show you two small clips where I just started to implementing this policy on the robot and I immediately saw the problem.
From these clips we see that robot is trying to lift the foot. But because it's lifting this foot, it's trying to put the foot back uh quite quickly and so basically it smashes the foot on the ground and it's uh quite violent.
Actually, in reality, it was way more violent that on these clips. So I decided that I need to make another penalty for the impact for the impact of the foot on the ground. And basically what I'm doing I'm limiting the velocity with which foot goes uh to the ground.
So with which foot is reaching the ground. And after I implemented this new penalty, it was way better. The robot was not really hitting hard on the ground.
But I saw another problem. The feet of the legs were too close one to another.
And from time to time they were even hitting one with another. And I found a quite a simple way how to keep two feet separate one from another. And for this I used the randomization of the center of mass of the base. It's over here. And as you can see I made the randomization along the xaxis a little bit higher. And the x-axis is the axis which is kind of perpendicular to the feet. So like this the center of mass can be at any position plus - 5 cm from the actual center of mass. So the robot kind of forced to keep two feet apart in order to be able to balance. So I implemented this. Here's a video of this robot.
As you can see, it keeps the foot a little bit more apart than before. But also, it started to tilt the base like this, left, right, left, right. So, I decided that randomization with the center of mass is not enough. So, I added another penalty in the rewards.
And this penalty was basically to keep the foot at certain distance from the center of the robot. And also what I saw that sometimes policy are not really symmetric. the robot is moving its uh right leg not at the same way as it moving its left leg. So I decided to implement the symmetry and the good thing is that is a club allows to do this uh quite easily and with this symmetry basically what we do we say our policy that the left leg should be equivalent to the right leg. I'm not sure but I think it's also increase the speed of training and I also want to say that it's not always evident uh to see the problem.
It's not always evident what is the problem currently in the model and how we can improve it. I tried to be here coherent with my explanation with my story but in reality it was not as easy as you can uh imagine from this video.
So if you are doing your own bipedal robot, be brave and don't give up. It's going to work, but it can take some time.
After many months of training different working policies, I realized that choosing the best one is not always simple. That is why I decided to look into Isaac Club Arena. It's not really a working training tool yet, but it looks like the right kind of framework for future large scale comparison of working policies. I went through several tutorials and here what I found. First, Arena is still under active development, so it's changing quickly. Second, its main focus is policy evaluation and benchmarking, but it's also makes it possible to set up new environments very quickly, which could be very useful for fast policy testing. You can easily switch between different objects, environments, and even robots. Right now, it mainly focus on manipulation and local manipulation, but it seems to be moving toward mobility as well. It is also open source, so the community can help shape how the tool develops.
Overall, once the setup is working, it's fairly easy to use. I will definitely keep following the development of Isaac Club Arena. Thank you for watching this video till the end. I know this was not easy because this video is quite long and it's a little bit technical and that's why it's not super exciting but I'm pretty sure it would be useful for many of my viewers and also it was super interesting for me to look into the ISA club arena. I did this briefly but still it's a new tool for me so it's interesting and exciting. I will continue to work on this robot. I need to do something with the state estimator. And also what I want to do, I want to make the robot stand when it's not uh going forward, backward or any direction. So it should stand with both feet on the ground and not walking on the spot. Also, I would like to make the policy a little bit more reliable. Right now, from time to time, maybe once per five minutes or once per 10 minutes, it can fall. But at the same time, I know that if I don't walk uh very often to the sides, it's kind of stable. Anyway, thank you for watching this video till the end. Huge thank you to people who support me via Patreon and via YouTube channel membership. Here's their name.
Thank you. You are the best. As usual, stay safe. Good luck with your projects and see you next time.
Stop.
Related Videos
Beyond Robotics | European Rover Challenge 2026
beyondrobotics
189 views•2026-06-01
Beatbot Sora70: JetPulse Technology and AI obstacle avoidance and navigation!
DroidModderX
26K views•2026-06-02
Tesla FSD 14.3.3 Hits Phoenix Streets - FIRST LOOK
anthonystesla
114 views•2026-05-29
NVIDIA Just Unveiled a 75 DOF Humanoid Robot
DPCcars
74K views•2026-06-01
Elon Musk Just Revealed Fremont Line for Optimus Gen 3 Mass Production
TheAINexusOfficial
180 views•2026-05-30
人機一体「零式人機 ver.2」 子ども企画【おもしろ発見!モビリティー】 #乗り物 #automobile #robot #shorts
KyodoNews
1K views•2026-05-28
China’s New Luna AI Robot Looks Shockingly Human...
NextGenHumanoids
850 views•2026-05-28
Reachy Mini: the $300 open source robot you can actually hack — Andres Marafioti, Hugging Face
aiDotEngineer
662 views•2026-05-29











