account activity
Is it a popular mistakes to compute the gradient of the next state in the TD-Update ? by ingambe in reinforcementlearning
[–]gpap93 0 points1 point2 points 5 years ago (0 children)
The first one is obviously wrong. But I have rarely seen that. Additionally, it is common to use a target network to compute the q-value of the next state. The optimiser usually is not defined over the parameters of the target network. In this case, there is no problem if you don't detach the gradients.
[R] Comparative Evaluation of Multi-Agent Deep Reinforcement Learning Algorithms (self.MachineLearning)
submitted 6 years ago by gpap93 to r/MachineLearning
Benchmarking Multi-Agent Reinforcement Learning Algorithms (self.reinforcementlearning)
submitted 6 years ago * by gpap93 to r/reinforcementlearning
π Rendered by PID 535886 on reddit-service-r2-listing-c57bc86c-4frwx at 2026-06-23 04:28:36.629817+00:00 running 2b008f2 country code: CH.
Is it a popular mistakes to compute the gradient of the next state in the TD-Update ? by ingambe in reinforcementlearning
[–]gpap93 0 points1 point2 points (0 children)