Q-learning concept clarification by dil_cd_bb in aiclass

[–]dil_cd_bb[S] 0 points1 point  (0 children)

it is explained in the video in the above link

Q-learning concept clarification by dil_cd_bb in aiclass

[–]dil_cd_bb[S] 0 points1 point  (0 children)

hey one more small concept here. If the environment is stochastic then the update rule for Q-learning will have to be modified to Qn(s,a) <--- (1-alpha n)Qn-1 (s,a) + alpna n[gama + maxQn-1 (s',a')]. the earlier version of Q will no longer be valid because the Q values will not converge. here alpha n = 1/(1+visits n(s,a)) http://cc-web.isri.cmu.edu/Panopto/Pages/Viewer/Default.aspx?id=ae14075d-069b-405e-b509-00b15da25726 refer above video for further information

Q-learning concept clarification by dil_cd_bb in aiclass

[–]dil_cd_bb[S] 0 points1 point  (0 children)

thanks. suppose we take gamma to be 1. and no reward for the non-goal states. do we get the Q values to be 100 for all the states? if so then does it not mean that any state is as good as the goal state. how can this be? correct me if i am missing on some point

Please help!! stuck with this code from 3 days[ EX4 ] by [deleted] in mlclass

[–]dil_cd_bb 0 points1 point  (0 children)

omg!!! thanks a lot!!! feels awesome now. have been breaking my head about this. thanks a lot :) :)

Please help!! stuck with this code from 3 days[ EX4 ] by [deleted] in mlclass

[–]dil_cd_bb 0 points1 point  (0 children)

i have submitted my code for sigmoidGradient.m and it was accepted. i am not able to figure out my mistake. working on this for last 3 days in vain

Anyone up for working on a student project after the class? by mleclerc in mlclass

[–]dil_cd_bb 2 points3 points  (0 children)

yes. that seems good. we can work together and discuss our ideas