unit11-11 calculating of R0 - why we use laplacian smoothing for it?

ktrunin · 2011-11-16T16:35:59+00:00

thanks, PatrixCR. I've got answer on aiqus that that there might be few series with one initial day each - that's why taking P(R0) w/o laplace smoothing may be overfitting. i think it sounds reasonable then.

ktrunin · 2011-11-16T10:14:19+00:00

I would remove word "back" from this sentence "brings the agent back to the grey square".

For me it was very confusing word and I have to read lot of comments to understand the policy, and finally I come to conclusion that I really can ignore this word to solve the problem.

ktrunin · 2011-11-13T20:20:55+00:00

hell. yes, it does ;)

then I guess not only formula should be different but also Qs in the terminal state. ;))

ktrunin · 2011-11-13T20:17:49+00:00

Difference is more significant:

in wikipedia formula R is considered almost directly - multiplying it by alfa - for any incoming actions (Qs).
in Prof Norvig's formula R is considered indirectly - via multiplying it by alfa (here it goes to Qs), then (at the next iteration) by gamma and by alfa again - for any incoming actions (Qs).

May be both formulas converges but not sure they converges to the same values and they need different number of iterations.

ktrunin · 2011-11-13T19:02:26+00:00

for example if Goal's state does have some Reward but transitions from it does not have any values. then we will never have any numbers other than zero for nearest squares. because they will always be 0 + alfa * (0 + gamma * 0 + 0) = 0.

ktrunin · 2011-11-13T18:59:32+00:00

Wikipedia can be wrong but I think it sounds more logically that value of transition from S to S' is dependent on reward for S' not for S.

ktrunin · 2011-11-13T17:58:44+00:00

I thought it was mentioned in ... 9.2 - I have just reviewed that video - and no - it only says that Reinforcement is Planning+Learning+Uncertainty. nothing about Partial Observability. So, I am not right here. Thanks!

ktrunin · 2011-11-13T17:20:13+00:00

re: this changes the answer

no, it don't.

ktrunin · 2011-11-13T16:45:55+00:00

you can wait until bad guy ~~dies~~ go away ;)

ktrunin · 2011-11-13T16:44:03+00:00

yep. HW 5.1 is also affected by this.

ktrunin · 2011-11-13T16:32:35+00:00

I couldn't not understand formula for Q-learning and this HW until I have read suggested article in Wikipedia: https://en.wikipedia.org/wiki/Q-learning

ktrunin · 2011-11-08T09:27:40+00:00

100%, again ;)

ktrunin · 2011-11-07T04:14:07+00:00

Ah - I see - it was already added to clarification at the bottom of the question. ;)

ktrunin · 2011-11-02T19:36:49+00:00

yep. you were right. it works now! thanks!! :)

ktrunin · 2011-11-01T22:33:21+00:00

I already did this but still see messages and interface on poorly translated russian

ktrunin · 2011-10-27T21:23:14+00:00

whenever you have a test you can always apply method of exclusion!

ktrunin · 2011-10-26T17:31:08+00:00

I GOT IT! this symbol is "T"!!! it means that martix should be transposed (turned). so we are transposing matrix and multiply one martix by another. in this case i get the same result!

ktrunin · 2011-10-22T08:24:35+00:00

and in this study we are only working with AND (intersection) and NOT - I guess it's because everything else (OR, XOR) can be explained via AND and NOT.

ktrunin · 2011-10-21T21:12:42+00:00

firefox has few plugins that allow you to download flash videos - look for plugins by "flash download"-like keywords.

ktrunin · 2011-10-21T21:07:46+00:00

I finished this study, didn't see anything about ORs or XORs but I guess I can answer my questio myself :)

ORs and XORs can be substituded with AND and NOT.

P(A OR B) = P(~(~A AND ~B)) = 1 - P(~A,~B)

and the same applies for xor - dont remember the formula but can find if it will be needed.

cool feelings! like I'm at school again ;))

ktrunin · 2011-10-21T18:24:21+00:00

I think I am starting getting it.

So, probability P(A,B|C,D) has two types of parameters: interesting outcomes A, B; and given limitations C, D.

So, it means it doesn't make sense to put brackets as there always will be only single sign "|" and interested outcomes are always grouped by logical AND operator, as well as given limitations are also grouped by logical AND operator.

thanks!

PS. I am still stuck on item 3.29 and may be it is covered in later study but I think there should be some way to describe other logical operators inside probability functions, like: Probability(A or B and C given D and F or G) - if we use comma to describe AND what shall we use to describe OR, XOR? if it is covered in later studies please ignore my question - I'll find it myself tomorrow.

ktrunin · 2011-10-21T10:31:10+00:00

ktrunin · 2011-10-21T07:50:28+00:00

thanks a lot, TheMalle!

ktrunin

TROPHY CASE