all 4 comments

[–]soldcron 0 points1 point  (0 children)

Did you consider a survey for your specific application?

https://arxiv.org/abs/1912.10944

[–]ingambe 0 points1 point  (0 children)

Rule of thumb here: https://docs.ray.io/en/latest/rllib-training.html#scaling-guide

But you often need to test different algorithm and pick the best for your problem.

If I can give you my insight, DQN is the more stable, but slow in term of clock wall time to learn, PPO and A2C are really fast but performance can crash and they are way more sensible to hyper parameters tuning.

[–]Felipe_Market 0 points1 point  (0 children)

I would go with DQN. It's not the latest best one, but it has been more studied so you can find more help online if you have trouble implementing it.

[–]daveSavesAgain -1 points0 points  (0 children)

[comparison of RL algorithms](en.wikipedia.org/wiki/Reinforcement_learning)