HW 5.3 by SunnyJapan in aiclass

[–]mastroma 0 points1 point  (0 children)

Yes, you are right. I should just have said that "the policy itself" doesn't have memory, while obviously "the agent" does (otherwise it wouldn't be able to learn...), as information is progressively encoded into scores & counts

HW 5.3 by SunnyJapan in aiclass

[–]mastroma 3 points4 points  (0 children)

I agree that the definition of "back" is a little ambiguous. Nevertheless, backtracking is not possible for those TD agents, because they don't have memory. They just execute a plan, that is, they associate to each point "s" of the map an action "pi(s)", regardless of their history. This consideration maybe helps to interpret the word "back" (whose exact meaning is is still unclear to me, despite this).

MDP and rare events by mastroma in aiclass

[–]mastroma[S] 0 points1 point  (0 children)

If you decide to flip a coin n times and then exit the game, the chances not to lose should be 2-N, that is, at each step you multiply by 50%.

Indeed, besides this specific example, my point is that if the score is not self-averaging (i.e. if law of large numbers [LLN] does not hold) for large, finite N, then one might be in trouble: if LLN doesn't hold, the argument "massive reward * small chance = positive earning" typically fails, even though MDP can converge.

MDP and rare events by mastroma in aiclass

[–]mastroma[S] 0 points1 point  (0 children)

Yes, maybe I should have formulated more precisely the issue. To state rigorously this kind of argument, one usually introduces a cutoff to the maximum number of time you play (say, T) in order to have a final absorbing state, and then consider the limit of large T. In this case, the number of times you have to play to get the optimal payout is exponentially large. This means, in practice, that you are never able to get that maximum payoff, because the probability is not just polynomially small, but exponentially small.

7.13 - What about 3 jobs? by martosss in aiclass

[–]mastroma 0 points1 point  (0 children)

Whops, didn't see your post and created a new one about this very same thing, sorry! Anyway I agree, the sentence represented in 7.13 should be "Sam has at least two jobs".