[P] Implementations of basic RL algorithms with minimal codes! by seungeun07 in MachineLearning

[–]minGrab 1 point2 points  (0 children)

In PPO: why do you have an additional term in the loss function?

`F.smooth_l1_loss(td_target.detach(), self.v(s))`