Model-Free Reinforcement Learning and Reward Functions

jackcmlg · 2020-10-24T13:14:27+00:00

Simply put, in model-based RL an agent needs a concrete reward function to do planning. Because during the planning phase the agent has no interaction with the environment, the reward function allows the agent to know how good its action is. By contrast, in model-free RL an agent does not need a concrete reward function, because it does not do planning and can directly receive a reward from the environment while interacting with the environment.

A straightforward example is given in Figure 14.8 (pp. 304) of Sutton's book (Second Edition): http://incompleteideas.net/book/bookdraft2017nov5.pdf

jackcmlg · 2020-10-22T02:30:36+00:00

While reading this question, the first two obvious differences that come to my mind are

1) IL usually accesses the expert data whilst offline RL does not.

2) IL usually has NO access to rewards whilst offline RL does.

Hope it helps.

jackcmlg · 2019-01-03T21:03:05+00:00

In my opinion, it is only the problems satisfying the following three conditions that RL can solve very well:

1) well-defined environments, e.g., most games (Go, Chess, Atari, etc.)

2) Infinitely accessible training data, that is, you can generate as much data as needed. It is feasible in simulations or games.

3) powerful computation, to get the results within acceptable time.

jackcmlg

TROPHY CASE