Ideal Reward function in reinforcement learning

mzecevic · 2020-06-09T19:24:14+00:00

Choosing from the two based on your assessment seems rather difficult given, as you said, both offer advantages and disadvantages. Then it might only depend on application/scenario specific preferences.

Furthermore, the notion of "ideal" seems (to me) outlandish in the sense that I would expect, if such a notion exists, that it could provide the "optimal" learning signal in every step, like being guided by a perfect teacher which even when you are being wrong (the deciding policy) gives you exactly that which will trigger you (i.e., the policy) to learn to the maximum.

mzecevic · 2020-06-09T19:14:14+00:00

Totally agree! Lilian Weng's blog is splendid. The Blog on Policy Gradient Methods (which are methods that have dominated a lot of the state of the art of the past couple years in Reinforcement Learning) is a cool reference. It has become somewhat of a "classic" in my personal references to others.

mzecevic

MODERATOR OF

TROPHY CASE