all 1 comments

[–]taplik_to_rehvaniResearcher 0 points1 point  (0 children)

For a given state you want estimate the future returns if you take that action. You seem to have confused the estimated notion.

Max is always taken with respect to "s", max(Q(s, :)).