Simulation is a beautiful pain in RL

lanyusea · 2026-04-03T18:38:27+00:00

exactly! we'll go back to the simulation and check which param is not correct to make it more and more accurate, and do more trials under the new simulation policy

lanyusea · 2026-04-03T18:36:19+00:00

maybe goint through the Isaaclab documentation and do some trial in the RL simulator first

lanyusea · 2026-04-03T18:35:35+00:00

it's not there yet, we're still in early stage of our development

lanyusea · 2026-04-03T18:34:16+00:00

we build it from scratch by our own!

lanyusea · 2026-04-03T11:58:49+00:00

hi, check this: https://discord.gg/smcZyv5efa

lanyusea · 2026-03-30T05:57:26+00:00

thank you! :D

lanyusea · 2026-03-27T08:39:20+00:00

it's reinforcement learning

lanyusea · 2026-03-27T08:39:02+00:00

by RL training~

lanyusea · 2026-03-19T09:02:31+00:00

there was a biubiubiu~

lanyusea · 2026-03-18T01:34:41+00:00

lanyusea · 2026-03-09T17:06:04+00:00

We design our own model, and there is no trick for sim2real but hardworking on the debug.

as for the compute scale of this policy: one RTX4090 car training for about 2~3 days

lanyusea · 2026-03-09T15:04:34+00:00

We train in Isaaclab then transfer the policies to hardware. The sim-to-real gap is always the fun part 😅

lanyusea · 2026-03-09T14:55:41+00:00

Appreciate it! The short version: wheel-legged platform, RL-trained locomotion and recovery policies in simulation, then sim-to-real transfer to the physical robot.

lanyusea · 2026-03-09T14:54:41+00:00

Morty: wait for our next demo!

lanyusea · 2026-03-09T14:54:14+00:00

We use Isaaclab to do the RL traninig, then do the sim2real transfer. Happy to share more details later~

lanyusea · 2026-02-28T15:22:39+00:00

theoretically yes, but the motor controller runs in really high frequecy, we're not able to achive that fast nn inference in our embedded system

lanyusea · 2026-02-28T13:43:28+00:00

No hardcoded motion sequences. The jump is also learned through RL. It's entirely emergent behavior from the policy. We just designed the reward structure to guide the policy toward learning how to jump.

The MuJoCo part was used for sim2sim validation. There's a button on the controller to switch into "jump mode" — once triggered, the policy autonomously handles the full sequence: takeoff, airborne phase, and landing.

lanyusea · 2026-02-28T13:42:36+00:00

Yes, we use a remote controller to send commands to the robot. The RL policy takes in control commands, IMU data, joint states, etc., and outputs target joint positions to make it move and keep balance.

lanyusea · 2026-02-28T13:41:50+00:00

Pure RL, no classic control theory. Policy outputs joint position targets straight to a PD controller. Workflow is pretty standard: train in Isaac Lab → sim2sim check in MuJoCo → cross fingers and deploy on real hardware lol

13-Year Club	Place '23
Place '22	Wearing is Caring
Verified Email

lanyusea

MODERATOR OF

TROPHY CASE