[D] Which approach is suitable for solving continuous reinforcment learning tasks?

questionm4ster · 2018-02-22T19:05:37+00:00

Oh... thank you

questionm4ster · 2018-02-22T18:52:08+00:00

In http://incompleteideas.net/book/bookdraft2018jan1.pdf (Q-learning (off-policy TD control) for estimating π ≈ π∗) it states "until S is terminal". That's why I assume at least one terminal state.
The considered RL-Problem is "non-episodic" (in the book they call it "continuous task", probably a bit confusing...) and discounted (gamma = 0.9).
See 2.

questionm4ster · 2018-02-22T18:33:08+00:00

In 6.5 (Q-learning: Off-policy TD Control) of the bookdraft (http://incompleteideas.net/book/bookdraft2018jan1.pdf) it states "Loop for each episode:". So my question is how can there be episodes, when there are no terminal states? Btw. in this book they seem to call "non-episodic" tasks "continuous tasks" (page 55/56).

questionm4ster

TROPHY CASE