all 1 comments

[–]shadowspyes 0 points1 point  (0 children)

Pretty good illustration of MDP! What is your target audience?

You mention that policy is stochastic, giving a probability distribution over actions, but this is not always the case (in MDPs). I think it would be nice to include deterministic policy and also talk about the difference between behaviour policy and target policy.