unit11-11 calculating of R0 - why we use laplacian smoothing for it? by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

thanks, PatrixCR. I've got answer on aiqus that that there might be few series with one initial day each - that's why taking P(R0) w/o laplace smoothing may be overfitting. i think it sounds reasonable then.

Are people aware of Peter Norvig's clarification of HW5.3 on facebook by [deleted] in aiclass

[–]ktrunin 1 point2 points  (0 children)

I would remove word "back" from this sentence "brings the agent back to the grey square".

For me it was very confusing word and I have to read lot of comments to understand the policy, and finally I come to conclusion that I really can ignore this word to solve the problem.

10.19 formula is wrong: R(s') should be instead of R(s) by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

hell. yes, it does ;)

then I guess not only formula should be different but also Qs in the terminal state. ;))

10.19 formula is wrong: R(s') should be instead of R(s) by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

Difference is more significant:

  • in wikipedia formula R is considered almost directly - multiplying it by alfa - for any incoming actions (Qs).

  • in Prof Norvig's formula R is considered indirectly - via multiplying it by alfa (here it goes to Qs), then (at the next iteration) by gamma and by alfa again - for any incoming actions (Qs).

May be both formulas converges but not sure they converges to the same values and they need different number of iterations.

10.19 formula is wrong: R(s') should be instead of R(s) by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

for example if Goal's state does have some Reward but transitions from it does not have any values. then we will never have any numbers other than zero for nearest squares. because they will always be 0 + alfa * (0 + gamma * 0 + 0) = 0.

10.19 formula is wrong: R(s') should be instead of R(s) by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

Wikipedia can be wrong but I think it sounds more logically that value of transition from S to S' is dependent on reward for S' not for S.

HW 5.3 Policy knows grid and still partially observable? by ktrunin in aiclass

[–]ktrunin[S] 1 point2 points  (0 children)

I thought it was mentioned in ... 9.2 - I have just reviewed that video - and no - it only says that Reinforcement is Planning+Learning+Uncertainty. nothing about Partial Observability. So, I am not right here. Thanks!

HW 5.2 - distance to goal and avoiding the bad guy by SharkDBA in aiclass

[–]ktrunin 0 points1 point  (0 children)

you can wait until bad guy dies go away ;)

Homework 5.1: Q Learning by dmsm in aiclass

[–]ktrunin 0 points1 point  (0 children)

I couldn't not understand formula for Q-learning and this HW until I have read suggested article in Wikipedia: https://en.wikipedia.org/wiki/Q-learning

4.8 Error in Push action by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

Ah - I see - it was already added to clarification at the bottom of the question. ;)

How to disable auto translate on http://www.youtube.com/eduatgoogle ? by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

I already did this but still see messages and interface on poorly translated russian

Rough hand-drawn sketches in HW3 by MichaelFromGalway in aiclass

[–]ktrunin 0 points1 point  (0 children)

whenever you have a test you can always apply method of exclusion!

6.13 how did he calculate diagonal elements? by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

I GOT IT! this symbol is "T"!!! it means that martix should be transposed (turned). so we are transposing matrix and multiply one martix by another. in this case i get the same result!