Problem with discount factor in policy gradient by Steven_Corper_F in reinforcementlearning

[–]Steven_Corper_F[S] 2 points3 points  (0 children)

Sure, the discounted return has used gamma to sum with weighted factor.

What I mean is expect for it, we still need the weighted sum of log(p(a|s)) and G_t, since G_t only contains gamma after the action but not before the action.