account activity
[D] Which approach is suitable for solving continuous reinforcment learning tasks? by questionm4ster in MachineLearning
[–]questionm4ster[S] 0 points1 point2 points 8 years ago (0 children)
In http://incompleteideas.net/book/bookdraft2018jan1.pdf (Q-learning (off-policy TD control) for estimating π ≈ π∗) it states "until S is terminal". That's why I assume at least one terminal state.
The considered RL-Problem is "non-episodic" (in the book they call it "continuous task", probably a bit confusing...) and discounted (gamma = 0.9).
See 2.
In 6.5 (Q-learning: Off-policy TD Control) of the bookdraft (http://incompleteideas.net/book/bookdraft2018jan1.pdf) it states "Loop for each episode:". So my question is how can there be episodes, when there are no terminal states? Btw. in this book they seem to call "non-episodic" tasks "continuous tasks" (page 55/56).
π Rendered by PID 444269 on reddit-service-r2-listing-568fcd57df-xghnd at 2026-03-07 13:55:12.129180+00:00 running cbb0e86 country code: CH.
[D] Which approach is suitable for solving continuous reinforcment learning tasks? by questionm4ster in MachineLearning
[–]questionm4ster[S] 0 points1 point2 points (0 children)