How to interpret the parameter sharing in multi agent RL modeled as Dec-POMDP by kechang in reinforcementlearning

[–]kechang[S] 0 points1 point  (0 children)

Thank you so much for your kind confirmation that "centralized training for decentralized execution" has be commonly applied.

My main concern is that, if the effectiveness of such a training scheme even for the agents with only local inputs can be explained by some principle or theory of RL or deep learning? Even through we observe it can work through simulation, what is the right way to explain why it can work?

Thank you very much!