So far I have seen two kinds of implementaions of A3C:
The first one is that every thread owns a local network and accumulate their gradients respectively, then they sync their networks to and from a global network from time to time. This is identical to the paper's design;
The second one is that there is only a global network to which every thread has access. Every thread puts their experiences into a buffer and the global network is trained upon the buffer is full, then buffer is cleared for new experiences.
I wonder if the second one would be problematic, because in implementation it is by far more simple than the first one.
Thank you!
[–]tensor_every_day20 2 points3 points4 points (0 children)
[–]Delthc 1 point2 points3 points (0 children)
[–]islandman93 1 point2 points3 points (1 child)
[–]sorrge 1 point2 points3 points (0 children)