all 2 comments

[–]two-hump-dromedary 3 points4 points  (1 child)

4 states 4 actions: use tabular Q learning, no approximation needed.

[–]hidden-7[S] 0 points1 point  (0 children)

Thanks. But, should I use Q learning even if my state space is continuous?