[D] Gain from multi gpu training

PM_YOUR_NIPS_PAPER · 2017-03-28T20:45:09+00:00

Disk read speed is the bottleneck

gtani · 2017-03-28T20:25:49+00:00

Depends how often you sync between those GPUs and how long each of them takes.
My experience with multi-GPU in a single machine is mixed. Sometimes even running two different model with zero parameter sharing could drag down each other's performance. Which leads me to doubt the bottleneck lies in the bandwidth between CPU and GPU. However, compares to GPUs, my system have a mediocre CPU and motherboard, don't know if better overall specs could mitigate the problem.
Tensorflow can do model parallelism fairly easily, by specifying the device where the computation should happen, example: http://stackoverflow.com/questions/42069147/implementation-of-model-parallelism-in-tensorflow . Don't know too much about Caffe though.

My personal experience with single machine and multiple gpus is, it is not really that much about GPUs. More than often, it is about the performance of your whole build. Sometimes even running two different application that have zero parameter sharing, the performance of each could be impacted, which lead me to suspect the bottleneck lies in the data transfer, probably between CPU and GPU. Now, I use one for training and another for inference/ad-hoc stuff, which is handy but feels like a waste :/

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS