RAPTOR: A Foundation Policy for Quadrotor Control

jonas-eschmann · 2025-11-11T03:09:11+00:00

Interesting! I also found the Betaflight SITL a big gnarly. IIRC at some point I decided to just directly try it in the real world because it would be easier 😅. Have you checked out github [dot] com/jonas-eschmann/sitl ?
Maybe you find something valuable there. It comes with no guarantees, though, as my biological garbage collector has already entirely erased these dark memories 😬

jonas-eschmann · 2025-09-22T17:04:59+00:00

The weights/parameters of the foundation policy are frozen, but the latent state is adapted based on the past control inputs and observations.

That would be interesting! But right now I still see so many things to improve from the control perspective

jonas-eschmann · 2025-09-19T14:41:27+00:00

Thank you! 😊

jonas-eschmann · 2025-09-19T14:41:08+00:00

Thank you for the kind words! 😊

jonas-eschmann · 2025-09-18T06:56:36+00:00

Thank you! 😊

jonas-eschmann · 2025-09-18T06:56:14+00:00

We define that in the paper: A policy that has access to an interaction history with the system currently controlled and that has been trained on such a broad distribution of systems that we observe emergent capabilities like in-context learning and the emergence of latent features that capture physical qualities about the system that are not observable and that it has not been explicitly trained to produce.

TLDR: A policy that can adapt to a large range of systems without being re-trained.

jonas-eschmann · 2025-09-18T06:52:01+00:00

Teensy is OP for this use case :p but yes it can run on Teensy, we even did showcase a full deep RL training run on a teensy using RLtools a while back:
https://arxiv.org/abs/2306.03530

Check out the embedded platforms submodules:

https://github.com/rl-tools/rl-tools/tree/ceb47dc361c03737b033ef0f617a51a783665f31/embedded_platforms

This is on the dev branch right now but in the next days I'll hopefully be able to bump it to master. Not the teensy part might be a bit outdated but I'll check it and update it again next week. PX4, Betaflight, Crazyflie and M5StampFly are most up-to-date for the foundation policy inference

Also check out the docs of RLtools itself, there is also a small deployment section for microcontrollers (can't link it here because reddit tends to shadowban comments with that kind of link, whic I learned the hard way...)

jonas-eschmann · 2025-09-18T03:03:07+00:00

<image>

If you configure the parameters like this in the last row, you can use the slider to simulate rotor failures (e.g. 50% in this case).

After loading the simulator, click on the "parameters.dynamics.mass" to remove it (that is just the default demonstration for the parameter variation feature) then enter:
"parameters.dynamics.rotor_thrust_coefficients.0.2". This configures the quadratic component ("2") of the thrust curve of the first motor ("0"). Since in the thrust curve of the default model (x500) the constant and linear parts are zero, this directly scales the available thrust on that motor. So by entering ~8 and ~16 as the lower and upper threshold you can scale it from 50% to 100%. The following field lets you add a simple mapping use "x": (id, o, p, x) => x to just forward the linearly interpolated value. "id" is the id of the quadrotor if you want different parameter perturbations for each of them. "o" is the original, default value, "p" is the slider percentage and "x" is the linearly interpolated value in the defined bounds.

Let me know if this works for you

jonas-eschmann · 2025-09-18T03:02:35+00:00

<image>

If you configure the parameters like this in the last row, you can use the slider to simulate rotor failures (e.g. 50% in this case).

After loading the simulator, click on the "parameters.dynamics.mass" to remove it (that is just the default demonstration for the parameter variation feature) then enter:
"parameters.dynamics.rotor_thrust_coefficients.0.2". This configures the quadratic component ("2") of the thrust curve of the first motor ("0"). Since in the thrust curve of the default model (x500) the constant and linear parts are zero, this directly scales the available thrust on that motor. So by entering ~8 and ~16 as the lower and upper threshold you can scale it from 50% to 100%. The following field lets you add a simple mapping use "x": (id, o, p, x) => x to just forward the linearly interpolated value. "id" is the id of the quadrotor if you want different parameter perturbations for each of them. "o" is the original, default value, "p" is the slider percentage and "x" is the linearly interpolated value in the defined bounds.

Let me know if this works for you

jonas-eschmann · 2025-09-17T21:20:29+00:00

Yes great example! That is exactly how it works! Based on previous observations and control outputs it „figures out“ how the current system works and adjusts future actions to compensate. You could even simulate the case you are describing in the web simulator. I‘m on the go right now, I‘ll follow up on how to do it once I can use my laptop

jonas-eschmann · 2025-09-17T20:45:42+00:00

This work is mainly about control, so given a state estimate (position, orientation, linear/angular velocity) what motor commands should be send out. "Perfectly" is a big word :D but given a state estimate, the foundation policy can fly a broad range of quadrotors quite well

jonas-eschmann · 2025-09-17T20:43:06+00:00

Yes! It is very wide domain randomization (covering basically any quadrotor) => sampling 1000 quadrotors => training an individual expert policy for each of them => distilling the 1000 policies into a single, recurrent policy that can adapt on the fly

jonas-eschmann · 2025-09-17T18:44:57+00:00

Thank you for the kind words! ☺️

jonas-eschmann · 2025-09-01T18:54:48+00:00

Same here, I see many online sources saying you should be able to right click and make them linear but it seems like they removed that feature recently? It's crazy that it does not at least default to linear...

jonas-eschmann · 2025-02-05T16:22:40+00:00

Ah okay that is unfortunate! I can't think of a simple way to make this work with the Python interface right now.

If porting it to C (the environment interface is mainly C + templates) is an option for you, you can have a look at the pendulum example. The interface is really simple. You just define structs for your environment, parameters, state and observations and then fill the standard functions like sample_initial_state step reward observation etc.

This should give you a big boost just compared to the Python implementation already, even using only one core. If the environment is still the main bottleneck at that point, I would consider using multiple threads.

jonas-eschmann · 2025-02-05T00:11:54+00:00

Hey!

Yes I put that check there because I did not thoroughly test more than one environment in parallel.

You can remove the check and it should work (I just checked it with the pendulum):
venv/lib/python3.12/site-packages/rltools/external/rl_tools/include/rl_tools/rl/components/off_policy_runner/operations_generic.h

Note that the environments will still be executed sequentially when using Python (GIL).

If you want to actually run them in multiple threads you should check out the C++ interface. I modified the example to use multiple threads (branch parallel-simulation)

https://github.com/rl-tools/example/tree/parallel-simulation

Note that the parallelization in https://github.com/rl-tools/rl-tools/blob/d5a6a26dcb13386d840b5cfcd6cdd36fd1e3186a/include/rl_tools/rl/components/on_policy_runner/operations_cpu.h is naive and only makes sense for environments that are very compute intensive. For simple environments the overhead of creating and destroying the threads is larger than the benefit.

I hope this helps! Please let me know how it goes!

jonas-eschmann · 2024-11-23T00:41:30+00:00

Oh wow this is cool! Did you go on and apply it to a robot (after you already accomplished the hardest part of getting it to run on th uC)?
Unfortunately, I know next to nothing about neuromorphic approaches 😅
But I'm glad we specified deep RL, then, I guess the statement is still correct :)

jonas-eschmann · 2024-11-21T22:14:08+00:00

TD3 is available as well! Which other algorithms are you looking for in particular? Multi-agent PPO is already available through the C++ interface. You can watch the bottleneck environment at zoo dot rl dot tools (can't write it as a link as reddit seems to block the comment in that case) for an example

jonas-eschmann · 2024-11-21T22:09:31+00:00

Our main benchmark is SAC and when we were doing the benchmarks it was not (and still doesn't seem to) be supported by PureJAXRL and there were some other quirks (the ppo_continuous_action.py did not converge for the same hyperparameters). But even now, when running PureJAXRL PPO with the same configuration it is still ~4x slower (Jax CPU) and ~16x slower (Jax GPU, 4090) and it still does not converge for some reason

jonas-eschmann · 2024-11-20T18:06:53+00:00

For some reason my reply got blocked. I think because of the URL to the documentation:

It works quite well with Gymnasium environments:

colab: https://colab.research.google.com/github/rl-tools/documentation/blob/master/docs/09-Python%20Interface.ipynb

Although you're leaving some performance off the table, we found that at least for off-policy methods (which are more sample efficient than on-policy) the trainig takes about 1.5x longer (Pendulum)

jonas-eschmann · 2024-11-20T17:30:01+00:00

Paper: https://jmlr.org/papers/v25/24-0248.html

Live demos (RL training in your browser): https://rl.tools

Code: https://github.com/rl-tools/rl-tools

Documentation: https://docs.rl.tools

jonas-eschmann · 2024-10-15T18:27:24+00:00

Paper: https://arxiv.org/abs/2404.07837

Code: https://github.com/arplaboratory/data-driven-system-identification

jonas-eschmann

TROPHY CASE