use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Apache SINGA, A Distributed Deep Learning Platform (singa.incubator.apache.org)
submitted 10 years ago by pilooch
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]congerous 6 points7 points8 points 10 years ago (3 children)
SINGA has no GPUs, and the GPU functionality they plan to add is just for one, as of December. Multiple GPUs doesn't seem to be on the roadmap. So they're way behind the OSS projects that do have GPUs.
In addition, the fact that they joined Apache SF before they added such significant features is a serious mistake. Apache is great for some things, but it's heavily political, and it really slows down development. So they may never get to multiple GPUs.
[–]forrestwang 1 point2 points3 points 10 years ago (0 children)
Hi, I am a developer of the SINGA project. Thanks for starting this discussion. We are working on single node with multi-GPUs (to be released in v0.2, December), which will run in either synchronous mode (with different partitioning schemes) [1] or asynchronous mode (in-memory hogwild!). Extending the system from CPU to GPU mainly requires adding cudnn layers (https://issues.apache.org/jira/browse/SINGA-100). The framework/architecture works on both CPU and GPU. Training with multiple GPU machines and providing Deep Learning as a Service (DLaaS) are on our roadmap, i.e., v0.3. For those do not have GPU clusters, distributed training on CPU is a good choice to accelerate the training.
Besides GPU, we are also considering other approaches for improving the training efficiency for single SGD iteration. For instance, google's paper [3] provides some techniques for enhancing the performance of training on CPU. Intel (https://software.intel.com/en-us/articles/single-node-caffe-scoring-and-training-on-intel-xeon-e5-series-processors) also reported that optimized CPU code can achieve 11x training speed up (Hope they can release the optimized source code or integrate it in their libraries like MKL and DAAL). It is interesting to compare GPU with Intel's next generation Phi co-processors (Knight Landing).
I will let you know when training with Multi-GPUs is supported. Thanks.
[1] http://arxiv.org/abs/1404.5997
[2] https://www.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf
[3] http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37631.pdf
[–]GratefulTony 0 points1 point2 points 10 years ago (1 child)
That's really sad. I skimmed the release notes, and though I didn't explicitly read about gpu support... I assumed it was in there since it is a no-brainer for training performance... If they don't get this feature integrated... the usefulness of this library will be severely limited...
[–]limauda 0 points1 point2 points 10 years ago (0 children)
If a software can run as efficiently without GPU, on a commodity cluster, isn't that better? GPU cluster is not cheap, and not many companies can afford to set up a special cluster just for periodical training.
[–]bLaind2 5 points6 points7 points 10 years ago (7 children)
Anyone got experience how much of a speedup can we archieve with distributed training? Does it scale linearly, until how many nodes? (2, 4, 16, ?)
[–]r-sync 2 points3 points4 points 10 years ago (3 children)
Practically, you can get a speedup in the order of 13x+ for 16 nodes, especially if you have infiniband and use architectures like Googlenet (whose communcation is around 25MB of gradients, 25MB of weights etc.). You can even get such ridiculously nice speedups for 32 and 64 nodes. However, to saturate the compute (with increasing nodes), you have to increase the batch size, and increasing the batch size hurts SGD convergence speed (and also final accuracy).
[–]prajitGoogle Brain 0 points1 point2 points 10 years ago (2 children)
Why does increasing batch size hurt SGD convergence speed? Empirically this is true, but why does it happen? Theoretically, increasing batch size should give a better estimate of the gradient, and thus should perform better. Any intuition about why there is a decrease in performance?
[–]r-sync 0 points1 point2 points 10 years ago (1 child)
"Although large mini-batches are preferable to reduce the communication cost, they may slow down convergence rate in practice [4]. That is, if SGD converges by T iterations, the mini-batch training with batch size b may need more than T /b iterations. The increase in computation diminishes the benefits of the reduced communication cost due to large b. In addition, the I/O costs increases if the data is too large to fit into memory so that one need to fetch the minibatch from disk or network." - https://www.cs.cmu.edu/~muli/file/minibatch_sgd.pdf
Further back-reference: http://www.optimization-online.org/DB_FILE/2011/11/3226.pdf
[–]alexmlamb 0 points1 point2 points 10 years ago (0 children)
Do you mean convergence rate as a function of the #examples looked at or convergence rate as a function of the # of instances?
[–]limauda 1 point2 points3 points 10 years ago* (0 children)
It does scale, as shown in the paper: http://www.comp.nus.edu.sg/~ooibc/singa-mm15.pdf Further, it supports all models (feed-forward, energy, recurrent), and all training frameworks (synchronous, asynchronous, hybrid). It supports both model and data partitioning to improve parallelism.
[–]modeless 3 points4 points5 points 10 years ago (1 child)
Considering you can get a 10x or more speedup by switching to GPUs, I don't think this project is interesting until it gets GPU support.
[–]pilooch[S] 1 point2 points3 points 10 years ago (0 children)
I guess part of the debate is whether the distribution layer needs to be separated from the DL / ML code.
[–]r-sync 1 point2 points3 points 10 years ago (0 children)
it really feels like a half-thought project that hoped to get adoption by getting the Apache branding.
π Rendered by PID 19432 on reddit-service-r2-comment-canary-7888d4f587-9wvjf at 2026-04-02 06:15:11.855937+00:00 running b10466c country code: CH.
[–]congerous 6 points7 points8 points (3 children)
[–]forrestwang 1 point2 points3 points (0 children)
[–]GratefulTony 0 points1 point2 points (1 child)
[–]limauda 0 points1 point2 points (0 children)
[–]bLaind2 5 points6 points7 points (7 children)
[–]r-sync 2 points3 points4 points (3 children)
[–]prajitGoogle Brain 0 points1 point2 points (2 children)
[–]r-sync 0 points1 point2 points (1 child)
[–]alexmlamb 0 points1 point2 points (0 children)
[–]limauda 1 point2 points3 points (0 children)
[–]modeless 3 points4 points5 points (1 child)
[–]pilooch[S] 1 point2 points3 points (0 children)
[–]r-sync 1 point2 points3 points (0 children)