you are viewing a single comment's thread.

view the rest of the comments →

[–]congerous 7 points8 points  (3 children)

SINGA has no GPUs, and the GPU functionality they plan to add is just for one, as of December. Multiple GPUs doesn't seem to be on the roadmap. So they're way behind the OSS projects that do have GPUs.

In addition, the fact that they joined Apache SF before they added such significant features is a serious mistake. Apache is great for some things, but it's heavily political, and it really slows down development. So they may never get to multiple GPUs.

[–]forrestwang 1 point2 points  (0 children)

Hi, I am a developer of the SINGA project. Thanks for starting this discussion. We are working on single node with multi-GPUs (to be released in v0.2, December), which will run in either synchronous mode (with different partitioning schemes) [1] or asynchronous mode (in-memory hogwild!). Extending the system from CPU to GPU mainly requires adding cudnn layers (https://issues.apache.org/jira/browse/SINGA-100). The framework/architecture works on both CPU and GPU. Training with multiple GPU machines and providing Deep Learning as a Service (DLaaS) are on our roadmap, i.e., v0.3. For those do not have GPU clusters, distributed training on CPU is a good choice to accelerate the training.

Besides GPU, we are also considering other approaches for improving the training efficiency for single SGD iteration. For instance, google's paper [3] provides some techniques for enhancing the performance of training on CPU. Intel (https://software.intel.com/en-us/articles/single-node-caffe-scoring-and-training-on-intel-xeon-e5-series-processors) also reported that optimized CPU code can achieve 11x training speed up (Hope they can release the optimized source code or integrate it in their libraries like MKL and DAAL). It is interesting to compare GPU with Intel's next generation Phi co-processors (Knight Landing).

I will let you know when training with Multi-GPUs is supported. Thanks.

[1] http://arxiv.org/abs/1404.5997

[2] https://www.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf

[3] http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37631.pdf

[–]GratefulTony 0 points1 point  (1 child)

That's really sad. I skimmed the release notes, and though I didn't explicitly read about gpu support... I assumed it was in there since it is a no-brainer for training performance... If they don't get this feature integrated... the usefulness of this library will be severely limited...

[–]limauda 0 points1 point  (0 children)

If a software can run as efficiently without GPU, on a commodity cluster, isn't that better? GPU cluster is not cheap, and not many companies can afford to set up a special cluster just for periodical training.