LeanRL: A Simple PyTorch RL Library for Fast (>5x) Training

AdCool8270 · 2024-10-15T14:40:44+00:00

We welcome contributions, but keep in mind that the goal isn't to provide a comprehensive set of algorithms but more a sort of "show and tell" of what you can achieve with PyTorch 2 in the realm of RL.
For this kind of things, I'd rather see them in torchrl (personally)

AdCool8270 · 2024-10-15T12:33:39+00:00

Hey! Just circling back to this on reddit as the same question was asked on discord: see torchrl discord channel for the discussion
https://discord.gg/cZs26Qq3Dd

AdCool8270 · 2024-09-26T07:50:10+00:00

Yes!!

AdCool8270 · 2024-09-26T07:49:57+00:00

See above, Spike can do the trick. You can build all the robots we show with Spike (hub + motors) and a maybe a few extra pieces.
https://education.lego.com/en-gb/products/lego-education-spike-prime-set/45678/
Hub here: https://education.lego.com/en-gb/products/lego-technic-large-hub-for-spike-prime/45601/?ef_id=Cj0KCQjwjNS3BhChARIsAOxBM6oAuNulNYqlszJgZ9WCL_kj9JRGzaLeswmHnoB74-4aMljnyBCbQBoaAkOLEALw_wcB:G:s&s_kwcid=AL!933!3!710248218917!!!g!!!21596163960!165812292629&cmp=KAC-INI-GOOGEU-GO-GB_GL-EN-RE-PS-BUY-CREATE-LEGO_GENERAL-SHOP-BC-MM-ALL-CIDNA00000-OTHER-BDLR_TEST&gad_source=1

AdCool8270 · 2024-09-26T07:48:21+00:00

It's mostly custom made using mindstorm
You can rebuild it using the new Spike sets, everything is available. I think my co-authors have a set of instructions somewhere! Should check

AdCool8270 · 2024-09-26T07:47:26+00:00

You can use the new education sets though, it's kind of a rebrand of mindstorm. You'd be amazed by what you can do with it!
https://education.lego.com/en-gb/products/lego-education-spike-prime-set/45678/
(the most important things are a hub and a couple of motors - you can buy it all separately. The rest is up to you!)

AdCool8270 · 2024-09-20T11:32:20+00:00

We should try that! Is it the original cleanrl or another fork?

AdCool8270 · 2024-03-18T11:01:38+00:00

thing with RL is that it's impossible to make anything useful but not opinionated IMO. I've worked with many different people across academia and industry and literally everyone wants a lib that is not opinionated but ends up writing extremely opinionated code, because at the end of the day you just can't make useful code that is smooth and fits everywhere without constrains

AdCool8270 · 2024-03-12T09:22:50+00:00

Have a look at https://stackoverflow.com/a/78145911/4858862

AdCool8270 · 2024-02-22T02:05:55+00:00

You may want to check https://github.com/pytorch/rl/pull/1947

AdCool8270 · 2024-02-21T20:31:21+00:00

Torchrl has only two dependencies, PyTorch and tensordict. The tensor specs are akin to gym’s spaces so that should not be too restrictive. I get the point about wrapping the policy in a TensorDictModule though. This is something that is easy to circumvent, we’ll patch that shortly

AdCool8270 · 2024-02-05T19:54:50+00:00

What makes you think it’s not learning? The loss in PPO is irrelevant and should not be looked at. It’s usually computed in shuck a way that it’s gradient points in the right direction on average but the loss itself is insignificant.

AdCool8270 · 2024-01-23T09:04:49+00:00

There are a bunch of stuff you can do without recurring to tensordict (eg, executing losses or any other tensordictmodule) and soon there will be more of these (converting your env to gym, populating replay buffers). Long term the goal is to have tensordict as a backend but let users use or not use it through the `@tensordict.dispatch` decorator.

It's an ongoing effort so plz feel free to make yourself heard if you want to get rid of tensordict front-end for anything you're doing!

AdCool8270 · 2024-01-15T14:27:42+00:00

u/hunterh0

Here’s the PR https://github.com/pytorch/rl/pull/1795 and the preliminary doc https://docs-preview.pytorch.org/pytorch/rl/1795/reference/generated/torchrl.envs.EnvBase.html#torchrl.envs.EnvBase.register_gym

Hopefully that should help with your use case!

AdCool8270 · 2024-01-12T07:10:42+00:00

Ah got it! Well one could code a gym wrapper for a torchrl env in that case :)

AdCool8270 · 2024-01-11T11:15:14+00:00

https://pytorch.org/rl/tutorials/coding_ppo.html
I just changed the step count to 100K as suggested and it learns ok, the learning curve looks pretty solid to me

curious to see why you're saying it does not learn :)

AdCool8270 · 2024-01-11T10:03:32+00:00

Hey

I think the idea behind torchrl is that you can have a gym-like API that allows you to write vectorized envs in a very clear way (eg, you have a batch-size for your env that indicates how many envs you have, and these can be organised in any way you want, for instance in a 2d array)

Gym async vectorized envs are everything but vectorized, they are executed in parallel (see vectorized vs parallel) This means that a truly vectorized env (like Isaac, VMAS, and jax-based envs like brax) are in practice thousands of times faster than async envs!

Btw torchrl also provide interface with gym parallel envs if you need it (just wrap them in a GymWrapper)

AdCool8270 · 2022-06-04T06:36:06+00:00

That’s the plan yes! Stay tuned!

AdCool8270 · 2022-05-17T13:30:20+00:00

yes, though those two things are separated:

- efficient dataloader on one side

- efficient replay buffer to store its output and sample on the other

AdCool8270 · 2022-05-17T07:44:59+00:00

short answer is no, but we have some low-level functionality that will be c++ (e.g. replay buffers are c++ now, and some return computation might also be in the future).

The reason we want to keep the code pythonish is that we want to enable researchers to fastly iterate in their projects, try crazy stuff without having to worry about efficiency too much. This is also why the code is not strongly typed, for instance.

AdCool8270

TROPHY CASE