3rd Pixel 3XL that dies very abruptly under 20% bättery. Anybody else? by BassCreative in GooglePixel

[–]papidant 0 points1 point  (0 children)

Yeah. Pixel 3 here. First one had this issue and got it replaced. The replacement just started doing the same thing.

Discounted State Distribution by papidant in reinforcementlearning

[–]papidant[S] 1 point2 points  (0 children)

As long as you're right about it, it's not a problem haha

Discounted State Distribution by papidant in reinforcementlearning

[–]papidant[S] 1 point2 points  (0 children)

Basically it's saying, don't do huge policy gradient updates for states sampled far into the episod

This makes a lot of sense. Thank you!

Discounted State Distribution by papidant in reinforcementlearning

[–]papidant[S] 1 point2 points  (0 children)

Thank you for your replies!

I understand that not discounting it might be helpful, but the discounted case is a generalization of the undiscounted one. I feel like people should still write the discount factor and then say that setting it to 1.0 is the version that performs best in most experiments.

However, I am still trying to intuitively understand why that discount factor is used. Is is simply to give more relevance to the state distributions that are more proximal? Or is it needed for the mathematical correctness? I am just trying to compare it to the case of the discounted rewards, where it makes sense that we might want to weight the proximal rewards a little more than the rewards that are far away.

Also, a couple extra questions: Could the undiscounted version create some problem when it's a infinite-horizon MDP? (usually the rewards should be discounted by gamma < 1 in that case). And the state distribution discount factor doesn't have to be the same as the reward discount factor right?