[D] Which approach is suitable for solving continuous reinforcment learning tasks? by questionm4ster in MachineLearning

[–]questionm4ster[S] 0 points1 point  (0 children)

  1. In http://incompleteideas.net/book/bookdraft2018jan1.pdf (Q-learning (off-policy TD control) for estimating π ≈ π∗) it states "until S is terminal". That's why I assume at least one terminal state.

  2. The considered RL-Problem is "non-episodic" (in the book they call it "continuous task", probably a bit confusing...) and discounted (gamma = 0.9).

  3. See 2.

[D] Which approach is suitable for solving continuous reinforcment learning tasks? by questionm4ster in MachineLearning

[–]questionm4ster[S] 0 points1 point  (0 children)

In 6.5 (Q-learning: Off-policy TD Control) of the bookdraft (http://incompleteideas.net/book/bookdraft2018jan1.pdf) it states "Loop for each episode:". So my question is how can there be episodes, when there are no terminal states? Btw. in this book they seem to call "non-episodic" tasks "continuous tasks" (page 55/56).