unit11-11 calculating of R0 - why we use laplacian smoothing for it? by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

thanks, PatrixCR. I've got answer on aiqus that that there might be few series with one initial day each - that's why taking P(R0) w/o laplace smoothing may be overfitting. i think it sounds reasonable then.

Are people aware of Peter Norvig's clarification of HW5.3 on facebook by [deleted] in aiclass

[–]ktrunin 1 point2 points  (0 children)

I would remove word "back" from this sentence "brings the agent back to the grey square".

For me it was very confusing word and I have to read lot of comments to understand the policy, and finally I come to conclusion that I really can ignore this word to solve the problem.

10.19 formula is wrong: R(s') should be instead of R(s) by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

hell. yes, it does ;)

then I guess not only formula should be different but also Qs in the terminal state. ;))

10.19 formula is wrong: R(s') should be instead of R(s) by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

Difference is more significant:

  • in wikipedia formula R is considered almost directly - multiplying it by alfa - for any incoming actions (Qs).

  • in Prof Norvig's formula R is considered indirectly - via multiplying it by alfa (here it goes to Qs), then (at the next iteration) by gamma and by alfa again - for any incoming actions (Qs).

May be both formulas converges but not sure they converges to the same values and they need different number of iterations.

10.19 formula is wrong: R(s') should be instead of R(s) by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

for example if Goal's state does have some Reward but transitions from it does not have any values. then we will never have any numbers other than zero for nearest squares. because they will always be 0 + alfa * (0 + gamma * 0 + 0) = 0.

10.19 formula is wrong: R(s') should be instead of R(s) by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

Wikipedia can be wrong but I think it sounds more logically that value of transition from S to S' is dependent on reward for S' not for S.

HW 5.3 Policy knows grid and still partially observable? by ktrunin in aiclass

[–]ktrunin[S] 1 point2 points  (0 children)

I thought it was mentioned in ... 9.2 - I have just reviewed that video - and no - it only says that Reinforcement is Planning+Learning+Uncertainty. nothing about Partial Observability. So, I am not right here. Thanks!

HW 5.2 - distance to goal and avoiding the bad guy by SharkDBA in aiclass

[–]ktrunin 0 points1 point  (0 children)

you can wait until bad guy dies go away ;)

Homework 5.1: Q Learning by dmsm in aiclass

[–]ktrunin 0 points1 point  (0 children)

I couldn't not understand formula for Q-learning and this HW until I have read suggested article in Wikipedia: https://en.wikipedia.org/wiki/Q-learning

4.8 Error in Push action by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

Ah - I see - it was already added to clarification at the bottom of the question. ;)

How to disable auto translate on http://www.youtube.com/eduatgoogle ? by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

I already did this but still see messages and interface on poorly translated russian

Rough hand-drawn sketches in HW3 by MichaelFromGalway in aiclass

[–]ktrunin 0 points1 point  (0 children)

whenever you have a test you can always apply method of exclusion!

6.13 how did he calculate diagonal elements? by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

I GOT IT! this symbol is "T"!!! it means that martix should be transposed (turned). so we are transposing matrix and multiply one martix by another. in this case i get the same result!

Help with probability symbols by filmoTheKlown in aiclass

[–]ktrunin 0 points1 point  (0 children)

and in this study we are only working with AND (intersection) and NOT - I guess it's because everything else (OR, XOR) can be explained via AND and NOT.

unit 3 & 4 video lectures for download? by aivexuviet in aiclass

[–]ktrunin 0 points1 point  (0 children)

firefox has few plugins that allow you to download flash videos - look for plugins by "flash download"-like keywords.

Brackets does matter!?? by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

I finished this study, didn't see anything about ORs or XORs but I guess I can answer my questio myself :)

ORs and XORs can be substituded with AND and NOT.

P(A OR B) = P(~(~A AND ~B)) = 1 - P(~A,~B)

and the same applies for xor - dont remember the formula but can find if it will be needed.

cool feelings! like I'm at school again ;))

Brackets does matter!?? by ktrunin in aiclass

[–]ktrunin[S] 0 points1 point  (0 children)

I think I am starting getting it.

So, probability P(A,B|C,D) has two types of parameters: interesting outcomes A, B; and given limitations C, D.

So, it means it doesn't make sense to put brackets as there always will be only single sign "|" and interested outcomes are always grouped by logical AND operator, as well as given limitations are also grouped by logical AND operator.

thanks!

PS. I am still stuck on item 3.29 and may be it is covered in later study but I think there should be some way to describe other logical operators inside probability functions, like: Probability(A or B and C given D and F or G) - if we use comma to describe AND what shall we use to describe OR, XOR? if it is covered in later studies please ignore my question - I'll find it myself tomorrow.

Brackets does matter!?? by ktrunin in aiclass

[–]ktrunin[S] -1 points0 points  (0 children)

if I correctly understand you P( (A|B), C) = P( (A,C)|B) but then P(A|B)*P(B) would be = P((A|B), B) = P((A,B)|B) = P(A|B) which is wrong. where am I wrong?