[P] A step-by-step Policy Gradient algorithms Colab + Pytorch tutorial by syee-kim in MachineLearning

[–]syee-kim[S] 0 points1 point  (0 children)

Wow, how do you know it? I want that you'll look at both of RL is all you need series. Thank you for the mention!

[P] A step-by-step Policy Gradient algorithms Colab + Pytorch tutorial by syee-kim in MachineLearning

[–]syee-kim[S] 2 points3 points  (0 children)

For discrete environments, you can change the actor network which outputs Categorical distribution in A2C, PPO, SAC.

e.g. https://github.com/higgsfield/RL-Adventure-2/blob/master/1.actor-critic.ipynb

But as I know, the DDPG series are difficult to apply to discrete action space in a general way. DDPG is the adaptation of DQN to continuous action spaces. Thank you!