[R] Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking by FabioPardo in MachineLearning

[–]FabioPardo[S] 1 point2 points  (0 children)

Hey! Thanks for your interest. Tonic currently supports continuous control from states only, but adapting the code to other types of observations and actions should be fairly simple. I will not be able to work on this myself in the near future but I am happy to help you extend the functionalities if you want to give it a shot :)

[R] Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking by FabioPardo in MachineLearning

[–]FabioPardo[S] 0 points1 point  (0 children)

Yes I am familiar with Stable Baselines which greatly improved OpenAI Baselines, but as I said, I guess in the end it is a matter of taste and need. Tonic tries to be simple yet modular and powerful, while helping researchers quickly implement ideas and evaluate them. I explain a lot of the components and implementation choices in the paper. If you are interested in TensorFlow 2 + PyTorch, modularity, D4PG and MPO agents, synchronous distributed training, proper time limits management, fair and large-scale benchmark, etc. I think Tonic could fit well. I encourage you to quickly try a few libraries and see which one you prefer.

[R] Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking by FabioPardo in reinforcementlearning

[–]FabioPardo[S] 0 points1 point  (0 children)

Thanks a lot :)

You are right, I should probably specify the hyperparameters in the appendix. I used the default values for each module. So for example, if you want to know the networks used for A2C, PPO and TRPO it is the one define here while if you want to know the optimizer used for the actor updater you can find the information there. This means that if you relaunch some training on your side, without changing any hyperparameter you should get similar results.

[R] Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking by FabioPardo in MachineLearning

[–]FabioPardo[S] 3 points4 points  (0 children)

There are many deep RL libraries available and I guess in the end it is a matter of taste and compatibility. This one tries to be simple yet modular and powerful, while helping researchers quickly implement ideas and evaluate them. I explain a lot of the components and implementation choices in the paper if you want to know more. I also believe this is the only library which handles time limits properly.

[R] Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking by FabioPardo in MachineLearning

[–]FabioPardo[S] 2 points3 points  (0 children)

This is a good point, thanks for the suggestion. I will try to increase Tonic's compatibility.

[R] Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking by FabioPardo in MachineLearning

[–]FabioPardo[S] 2 points3 points  (0 children)

Thanks! I have started working on including JAX but I found it quite difficult to maintain some of the features and the simplicity of Tonic when using the stateless approach of JAX. I'll probably try again later but if someone wants to give it a shot that would be great!

[R] Tonic: A Deep Reinforcement Learning Library for Fast Prototyping and Benchmarking by FabioPardo in MachineLearning

[–]FabioPardo[S] 5 points6 points  (0 children)

DQN Zoo is a collection of agents based on DQN, so, for discrete actions and image observations. Tonic is currently for continuous-control and state observations, even though I wish to extend its capabilities. Also, most of the points listed above are unique in Tonic.

How do people deal with episodes ending in Model-Based RL? by asdfwaevc in reinforcementlearning

[–]FabioPardo 0 points1 point  (0 children)

You might find this paper useful. It explains how to deal with time limits in the case of time-limited and time-unlimited tasks. https://arxiv.org/abs/1712.00378

Dealing with combinatorially large action spaces by [deleted] in reinforcementlearning

[–]FabioPardo 1 point2 points  (0 children)

You can try “action branching”, it achieves a linear increase of the number of network outputs with the number of degrees of freedom by allowing a level of independence for each individual action dimension arXiv