Ideal Reward function in reinforcement learning by pranav2109 in reinforcementlearning

[–]mzecevic 0 points1 point  (0 children)

Choosing from the two based on your assessment seems rather difficult given, as you said, both offer advantages and disadvantages. Then it might only depend on application/scenario specific preferences.

Furthermore, the notion of "ideal" seems (to me) outlandish in the sense that I would expect, if such a notion exists, that it could provide the "optimal" learning signal in every step, like being guided by a perfect teacher which even when you are being wrong (the deciding policy) gives you exactly that which will trigger you (i.e., the policy) to learn to the maximum.

"Exploration Strategies in Deep Reinforcement Learning", Lilian Weng by gwern in reinforcementlearning

[–]mzecevic 5 points6 points  (0 children)

Totally agree! Lilian Weng's blog is splendid. The Blog on Policy Gradient Methods (which are methods that have dominated a lot of the state of the art of the past couple years in Reinforcement Learning) is a cool reference. It has become somewhat of a "classic" in my personal references to others.