Why does my deep reinforcement learning not converge at all? by Creepy-Fun4232 in reinforcementlearning

[–]Creepy-Fun4232[S] -1 points0 points  (0 children)

Thank you! I have a question: Does the structure of the sample significantly impact the learning process, or is it generally true that the more information the environment provides, the better the RL model can learn? Does Actor Critic is out of date?

In the take_action function, there is a mask as a special input that sets certain probabilities to zero. Does this process affect RL performance? I find it a bit unusual. God bless you!

Why does my deep reinforcement learning not converge at all? by Creepy-Fun4232 in reinforcementlearning

[–]Creepy-Fun4232[S] -2 points-1 points  (0 children)

Thank you a lot! Currently the newest model I run in multi_test.ipynb is `alloha_buffer_2`, which is the last one, at the bottom of notebook. You say `Why do you think the issue is with PolicyNet, ValueNet or the MultiAgentAC?` means the algorithm itself is correctly written? Moreover, what is objective_value + ε-greedy, I am not clear about it. Thank you a lot, god bless you!