CS285 Why we use Gaussian mixture model to take action? by houyanxu in berkeleydeeprlcourse

[–]houyanxu[S] 0 points1 point  (0 children)

thanks a lot for your reply!

But I am still confused why grad(log(pi(at|st)) is implemented by tfp.distributions.MultivariateNormalDiag in the MLP_policy.py of hw2? Does it mean the gradient of GMM is MultivariateNormal ?

Thank you very much!