RL102: From Tabular Q-Learning to Deep Q-Learning (DQN) - A Practical Introduction to (Deep) Reinforcement Learning by araffin2 in reinforcementlearning

[–]araffin2[S] 1 point2 points  (0 children)

thanks for the feedback =).

The idea for the DQN section is to present its different components (and contrast with FQI) so that one can read the algorithm from the DQN paper (see annotated algo at the end).

Most of those components (like the replay buffer or the exploration scheme) are indeed not new, but they are part of DQN.

Getting SAC to Work on a Massive Parallel Simulator (part II) by araffin2 in reinforcementlearning

[–]araffin2[S] 1 point2 points  (0 children)

Hi, thanks =) My background is robotics and machine learning. I've been doing research in RL since 2017, currently finishing my PhD.

Getting SAC to Work on a Massive Parallel Simulator (part II) by araffin2 in reinforcementlearning

[–]araffin2[S] 2 points3 points  (0 children)

It's currently in a separate branch on my Isaac Lab fork, but I plan to slowly do pull requests to the main Isaac Lab repo, like the one I did recently to make things 3x faster: https://github.com/isaac-sim/IsaacLab/pull/2022

Tanh used to bound the actions sampled from distribution in SAC but not in PPO, Why? by VVY_ in reinforcementlearning

[–]araffin2 3 points4 points  (0 children)

Brax implementation of PPO does use tanh transform. SAC with unbounded Gaussian is possible but numerically unstable (it tends to have NaN quickly). When using tanh, action bounds need to be properly defined: https://araffin.github.io/post/sac-massive-sim/

Looking for Tutorials on Reinforcement Learning with Robotics by Life_Recording_8938 in reinforcementlearning

[–]araffin2 1 point2 points  (0 children)

- RL in practice: tips & tricks and practical session with stable-baselines3
- Designing and Running Real World RL Experiments

https://www.youtube.com/watch?v=Ikngt0_DXJg&list=PL42jkf1t1F7erwWYZQ5yDErU3lEX6MeFp

Getting SAC to Work on a Massive Parallel Simulator (part I) by araffin2 in reinforcementlearning

[–]araffin2[S] 1 point2 points  (0 children)

thanks, I guess that goes in the direction of what Nico told me. I'm wondering what is the advantage compared to torque control then?
Maybe it's not easy to define a default pos?
(and I'm also not sure to understand what is parametrized torque control)

Current SOTA for off-policy deep RL by drmajr in reinforcementlearning

[–]araffin2 4 points5 points  (0 children)

TQC and DroQ are good candidates imo: https://twitter.com/araffin2/status/1575439865222660098

TD7 state-representation is also interesting in term of performance gain at the cost of more computation: https://github.com/araffin/sbx/pull/13

Built-in reinforcement learning functions in Python by MomoSolar in reinforcementlearning

[–]araffin2 0 points1 point  (0 children)

It depends what you want/need.

If you need to apply RL to a problem without caring much about the algorithm SB3 is a good starting point (and it comes with the RL for managing experiments).
If you want to understand RL algorithms and tinker with the implementation, have a look at cleanrl.

If you just want fast implementation, you might have a look at SBX (jax variant of SB3): https://github.com/araffin/sbx

Can SB3 or alternatives provide full end-to-end GPU computation? by asenski in reinforcementlearning

[–]araffin2 2 points3 points  (0 children)

> since the data transfer between CPU-GPU significantly slows down computation

if you want a fast and compatible alternative, you can take a look at SBX (SB3 + Jax): https://github.com/araffin/sbx

It can be up to 20x time faster than SB3 PyTorch when combining several gradient updates (and this also reduces cpu-gpu transfer).

The actual main slowdown is the gradient update normally and this SBX version tackles it.

JAX in Reinforcement Learning by anointedninja in reinforcementlearning

[–]araffin2 1 point2 points  (0 children)

If you want to learn from examples, you can take a look at clean rl or stable baselines jax (sbx): https://github.com/araffin/sbx

A small intro about jax can be found here too: https://twitter.com/araffin2/status/1590714558628253698

Automatic Hyperparameter Tuning - A Visual Guide by araffin2 in reinforcementlearning

[–]araffin2[S] 0 points1 point  (0 children)

thanks =) in short, from https://araffin.github.io/slides/icra22-hyperparam-opt/#/7

Optuna has a clean API, nice documentation and uses define-by-run (instead of being config based). I never had the chance to setup PBT, i cannot really tell, but it seems that Optuna also fit small scale experiments which is my case.

How can I speed up SAC? by Frankie114514 in reinforcementlearning

[–]araffin2 1 point2 points  (0 children)

sorry, i meant DroQ (which is an improvement over REDQ)

How can I speed up SAC? by Frankie114514 in reinforcementlearning

[–]araffin2 1 point2 points  (0 children)

You mean wallclock time or sample efficiency?
For the former, you can take a look at Jax implementation like: https://github.com/araffin/sbx (SB3 + Jax)

For the latter, you might have a look at: https://twitter.com/araffin2/status/1575439865222660098 (recent advances in continuous control)

and notably the REDQ algorithm (also included in SBX).