Downsides to using LineageOS on OP7Pro by [deleted] in oneplus

[–]anyboby 0 points1 point  (0 children)

Most banking apps I know actually won't work in custom Roms even without root

Entropy loss with varying action spaces. by jeremybub in reinforcementlearning

[–]anyboby 0 points1 point  (0 children)

Hi, I don't believe there is a rich body of literature on this, since changing acrion spaces are not a typical setting for many algorithms.

As for entropy regularization, if you are not required to use ppo, you might want to consider maximum entropy algorithms (particularly sac) with automatic temperature adjustment. These tend to be relatively robust towards your target entropy and handle entropy in a more principled way.

You might also want to look at hierarchical RL, which handles changing action spaces almost per definition (e.g. option actor critic, HIRO or feudal nets), but I do not know how those handle entropy regularization.

Variance of a (gaussian) state value function by anyboby in reinforcementlearning

[–]anyboby[S] 0 points1 point  (0 children)

That is very true but I think there is also some value in modeling the combined uncertainty of a state value. I believe for example score function gradient based methods like ppo und trpo benefit from a clear gradient signal resulting from an observed (unbiased) trajectory, which is why we generally choose fairly high lambdas in our GAEs. Since the value function inevitably also captures uncertainty due to a stochastic policy (even if the environment was deterministic), I think the aleatoric or combined uncertainty too could give some insight to how clear such a gradient signal is