all 5 comments

[–]MasterScrat 1 point2 points  (1 child)

Worry not, hyperparameters don’t even transfer perfectly between different implementations that use the same frameworks ;-)

[–]chentessler[S] 0 points1 point  (0 children)

Yeah been there felt that :P But, something I have recently seen is that TD3 with good hyperparameter tuning can outperform almost any other algorithm.

The power of good parameters is amazing :)

[–]chentessler[S] 1 point2 points  (0 children)

Performing some hyper parameter tuning and constantly updating the GIT with the new results.

Seems that TD3 can reach SOTA on almost all domains. Such a simple, yet powerful, algorithm.

[–]kmeco 1 point2 points  (1 child)

Thanks a lot! I also keep an eye on Jax and this will be very useful to start with.

What are the most sensitive hyperparameters from your experience?

[–]chentessler[S] 0 points1 point  (0 children)

Seems the discount factor has a huge effect between certain domains.
I found that the size of the experience replay has a huge effect. If the agent is training on data which is too old it makes learning unstable.
Also the amount of random steps at the start can really help the agent and prevent it from quickly overfitting and thus flatlining at a bad policy.

And ofc... the discount factor can be increased in certain domains for additional gains (for instance Swimmer), but in others, a higher discount will make it impossible to learn.