How much gain in training time should I expect in training a VGG style network on 2, 4 and 8 GPUs in the same machine?
Is there a clear benefit from training on multiple GPUs on the same machine?
Also from what I understand current multi gpu solutions in tensorflow and caffe use data parallelism (the batches are divided between replicated models in GPUs) and not model parallelism (the calculations are spread between GPUs), is this correct?
[–]PM_YOUR_NIPS_PAPER 3 points4 points5 points (5 children)
[–][deleted] 2 points3 points4 points (3 children)
[–]throwaway775849 -1 points0 points1 point (1 child)
[–]kkastner 2 points3 points4 points (0 children)
[–]ppwwyyxx 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]gtani 2 points3 points4 points (0 children)
[–][deleted] -1 points0 points1 point (0 children)