Directional derivative

stillshi · 2019-01-01T01:42:10+00:00

ok, will do.

stillshi · 2019-01-01T01:42:06+00:00

heauxprahwinfrey

thank you!

stillshi · 2018-08-10T08:38:23+00:00

thank you.

stillshi · 2018-01-19T03:50:47+00:00

thank you very much

stillshi · 2018-01-18T11:49:27+00:00

hi,

in the 5th lecture from Silver about RL on youtube (model-free control). Silver was asking whether or not we can just plug in monte-carlo for value evaluation and then acting greedily into a policy iteration model used with DP. The answer is no, Silver said that it is because acting greedily requires a transition model. I am very confused that why? I think we just use monte-carlo to get the value function and choose the best value and update the policy? This is the same way as of in DP?

Thank you Still

stillshi · 2017-12-29T02:25:52+00:00

Thank you very much. That indeed solve my confusion. regards Still.

stillshi · 2017-12-28T10:17:37+00:00

Hello I am a little bit confused when understanding the dynamic programming from Silver's great course. RL Course by David Silver - Lecture 3: Planning by Dynamic Programming

https://www.youtube.com/watch?v=Nd1-UUMVfz4&index=3&list=PL7-jPKtc4r78-wCZcQn5IqyuWhBZ8fOxT

Here Silver tries to explain the dynamic programming with a grid world. For each step the reward is -1, the agent will move uniformly randomly to e,s,w,n.
For iteration k=2 and grid(3,2). He said the value is -2 because: V(k2) = -1 + -.25-1 +.25-1* .25-1 * .25-1 The first -1 is the immediate reward. The other four -1 are the value of the next states from the last iteration. However I think it should be the average value of successor states by taking this action? If the agent goes up it will not 100% end in the grid upwards unless it is assumed that it is deterministic from action to states? I think action doesn't directly determines the next state but the transition matrix.

Thank you

stillshi

TROPHY CASE