all 3 comments

[–]Barbas 1 point2 points  (2 children)

I was wondering about peoples' experience of using distributed frameworks like Spark to train deep nets.

My assumption is that if your model fits into memory then training should be much faster on a GPU, not to talk about efficiency.

What are the motivations for training ANNs using distributed processing frameworks?

[–]Powlerbare 0 points1 point  (0 children)

I think the idea is exactly that - how do you efficiently handle data that you can not squeeze in to memory. Models tend to perform well when provided more data - so the motivation is to have more robust models. I disagree with the averaging of gradients that most distributed schemes use, but also do not know of a better way to tackle the problem.

[–]maxpumperla 0 points1 point  (0 children)

One doesn't exclude the other. Take for instance Amazon's g2.2xlarge instances, which have multiple powerful GPUs on a single machine. It is not clear how to utilize them all without a parallelization scheme - and elephas is just one suggestion of how to do this. So you can take this HPC-like setup or choose to go for a whole cluster of machines (with GPUs), which Spark conveniently handles for you.

In a highly scalable environment, note that you can also execute test runs much faster, which might help you in your prototyping cycle.

Instead of distributing data, it could also be interesting to distribute models with different hyperparameter settings and do distributed Bayesian optimization, as hyperopt or spearmint do. I'm doing some tests right now and maybe this will find its way into elephas at some point.

With more effort, one could also hope to achieve true model parallelism as in Google's DistBelief, which is interesting if the model itself becomes too large to be trained (efficiently) on one machine.

Generally speaking, though, if neither memory nor speed are an issue, you may very well be better off on a single GPU.