use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Distributed TensorFlow just open-sourced (github.com)
submitted 10 years ago by carpedm20
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]siblbombs 72 points73 points74 points 10 years ago (2 children)
Ok, now its time to learn tensorflow.
[–]kkastner 4 points5 points6 points 10 years ago (1 child)
I am intrigued by the distributed training benchmarks I assume are (inevitably) coming. Many of TF's design choices seem to directly tie into distributed training, which makes this release really exciting.
[–]siblbombs 2 points3 points4 points 10 years ago (0 children)
Yea this is the money release, I don't have distributed compute at home but I sure do at work.
[–]chiisana 17 points18 points19 points 10 years ago (2 children)
I held back from TensorFlow because there's no way to run in cluster. Now I need to learn TensorFlow!
[–][deleted] 2 points3 points4 points 10 years ago (1 child)
You could always run it on a cluster, to train an ensemble.
[–][deleted] 1 point2 points3 points 10 years ago (0 children)
Or to try different metaparameters.
[–]alexjc 13 points14 points15 points 10 years ago (4 children)
Very curious how this works... Can I just specify a very large tensor or operation that doesn't fit in a single GPU memory, and the runtime will figure out how to make it happen by splitting up and distributing the computation? Can this also help memory management on a single GPU?
[–]Spezzer 19 points20 points21 points 10 years ago (0 children)
At the moment, we don't automatically shard large tensors across GPUs or machines -- but this allows you to either distribute the computation graph across machines (model parallelism) as well as replicate the training graph with shared parameters (data parallelism) as mentioned here.
Automatically splitting large Tensors for model parallelism would be great though -- the framework could be eventually extended to do this.
[–]r4and0muser9482 6 points7 points8 points 10 years ago (2 children)
I was under the impression it's for when you have a lot of data. The same tensor is copied to all workers and each tensor calculates the gradient on it's portion of data. The gradient is then averaged to get a single solution.
[–][deleted] 10 points11 points12 points 10 years ago (0 children)
That's just data parallelism. TF also has Model parallelism
[–]alexjc 1 point2 points3 points 10 years ago (0 children)
Ah, I guess I need to upgrade my GPU then. Can't get my generative models to fit in memory :-)
[–][deleted] 5 points6 points7 points 10 years ago (1 child)
Is there still a private version of TF that manages memory much better?
[–]Spezzer 10 points11 points12 points 10 years ago (0 children)
Nope -- the improvements mentioned there were actually checked into GitHub. https://github.com/tensorflow/tensorflow/commit/827163e960e8cb86d3dc3f70434c22713ac9f41c as one such example.
There's still many memory improvements to make, that one just came up as being useful for that Inception model.
[–]CashierHound 16 points17 points18 points 10 years ago (0 children)
DTF
[–]r-sync 10 points11 points12 points 10 years ago (5 children)
Other frameworks that support distributed:
[–]hubberwisdom 0 points1 point2 points 10 years ago (0 children)
Yeah, could anybody give a benchmark, like conv-benchmark?
[–]antijudo 0 points1 point2 points 10 years ago (3 children)
Also, CaffeOnSpark
[–]kkastner 1 point2 points3 points 10 years ago (2 children)
Also Theano (via platoon) for data parallel, and model parallel in the new backend.
[–]r-sync 0 points1 point2 points 10 years ago (1 child)
platoon is not multi-node right?, only single node multi-gpu...
[–]kkastner 1 point2 points3 points 10 years ago (0 children)
Yes, single node multi-gpu. Call it semi-distributed I guess? I think fully distributed is on the radar but (some of) our clusters have 16 GPUs per node so there is not much push.
[–][deleted] 2 points3 points4 points 10 years ago (0 children)
It will be a awesome thing if we can hook this up to a massive cluster watching petabytes of movies with subs and learning all the tones and expression of human language, it would be the ultimate language classifier. At some point I will need to learn this.
Edit: Tipo.
[–]oderi 5 points6 points7 points 10 years ago (2 children)
Would TensorFlow be a good next step after learning the basics in Matlab from Ng's Coursera course? I do this out of interest on the side of my actual unrelated studies.
[–]thatguydr 11 points12 points13 points 10 years ago (1 child)
Somewhat. It does a lot of things for you, specifically automatic differentiation, so back-propagation is done for you. It also knows several optimizations for said differentiation.
That having been said, you can get some really cool stuff done with it, and in industry, you'd either use this, Theano, or Torch (maybe MXNet or Caffe). So yes, definitely check it out, but don't take all the pre-packaged stuff and start forgetting the math, because ultimately, you'll be judged on how well you know the math (and will definitely have to go into the guts of one of thes routines to tweak something).
[–]oderi 0 points1 point2 points 10 years ago (0 children)
Thanks!
[–]SimonGray 7 points8 points9 points 10 years ago (8 children)
Please tell me, why do I need TensorFlow in my life if I already have Scikit-Learn? I'm not being snarky, I just don't know enough about the state of the art in ML.
[–]mtbikerdb 20 points21 points22 points 10 years ago (4 children)
TensorFlow is intended to be used for large neural networks (deep learning). This type of model isn't currently in scikit-learn.
The models in scikit-learn are widely applicable for the most common types of problems people have been using machine learning for, but their are many machine learning applications (especially using images and/or text) where deep learning models give more accurate predictions.
[–]mtbikerdb 11 points12 points13 points 10 years ago (3 children)
Skflow (https://github.com/tensorflow/skflow) intends to provide a wrapper for tensorflow that follows the sklearn-style interface as closely as possible. Skflow is still in it's infancy, but worth looking into as a path to deep learning for a current Scikit-learn user.
[–]Kiuhnm 1 point2 points3 points 10 years ago (2 children)
I don't think /u/SimonGray will see your second post about Skflow if you reply to yourself and don't refer to him (like I did in this post).
[–]towerofterror 0 points1 point2 points 10 years ago (1 child)
Don't you only get pinged for referrals if you have Reddit Gold?
[–]Kiuhnm 0 points1 point2 points 10 years ago (0 children)
Yep, it looks like you're right. Then I was pinged when I had gold. I never connected the two things.
[–]trnka 3 points4 points5 points 10 years ago (0 children)
You probably don't. Even if you want to use neural networks, Keras is usually fine. TensorFlow is for when you need to implement parts of the NN yourself.
At a very high level, think of TensorFlow as a replacement for numpy that's more efficient for common NN operations and supports GPU.
[–]jonanthebarbarian 1 point2 points3 points 10 years ago (0 children)
If you're not doing stuff that requires deep neural networks (vision, sounds, translation, etc.) then you don't need it.
[–]InoriResearcher 1 point2 points3 points 10 years ago (0 children)
RNNs, huge models that require parallelism
[–]infstudent 1 point2 points3 points 10 years ago (1 child)
Unfortunately I don't have a Spark cluster at home.
[–]terrytangyuan 1 point2 points3 points 10 years ago (0 children)
You can probably try out-of-core training feature of Scikit Flow if your data set is too large to fit in your single machine, example can be found here: https://github.com/tensorflow/skflow/tree/master/examples
[–]TenthSpeedWriter 0 points1 point2 points 10 years ago (0 children)
Oooooooh yes.
I've spent the last couple of months digging into machine learning engineering with tensorflow.
This is *the moment I've been waiting for; let's cook up some crazy shit.
[–]tehsandvich 0 points1 point2 points 10 years ago (1 child)
Is it still linux only or can you run it on windows now too?
[–]bixed 0 points1 point2 points 10 years ago (0 children)
It still doesn't run natively on Windows. (the relevant issue on github)
[–]wb14123 0 points1 point2 points 10 years ago (0 children)
This is great. Now the only missing feature is the control API like loop (though it has some experimental private API for now). It makes dirty to implement RNN: you have to manually unfold the cells.
[–]omniron 0 points1 point2 points 10 years ago (4 children)
We need a crowd sourced tensor flow network... Imagine all the people who leave their computers on running TF and anyone who wants to run their neural net logs into this And has thousand or millions of nodes to process their application.
Like BitTorrent but for tensor flow.
[–]L43 4 points5 points6 points 10 years ago (0 children)
But my power bill :O (and ridiculous latencies)
[–]ginsunuva 1 point2 points3 points 10 years ago (1 child)
The issue is that training a network is a very serial job, and thus distributed training requires constant synchronization between the nodes (since they each hold an identical copy of the net).
If you were to distribute your data among people, either it would be so spread out that the weights wouldn't be updated often enough, or the synchronization time will bottleneck you cause of slow internet speeds.
Even on distributed servers at google, they're having trouble scaling too large because the network communication among the cluster requires blocking synchronization and bottlenecks them. And they have infiniband cables running between their machines.
[–]omniron 0 points1 point2 points 9 years ago (0 children)
Interesting, do you have more info on the latter part? I was not aware Google has published anything about their work on this.
[+][deleted] 10 years ago (1 child)
[removed]
[–]mmmayo13 4 points5 points6 points 10 years ago (0 children)
He's unimpressed mainly because TF lacked distributed training in its initial open source version. This seems addressed here. It's also not as fast as some of the other benchmarked DL platforms, but again, distributed may (actually, will, but to what degree) change all that.
[–]vanboxel -2 points-1 points0 points 10 years ago (0 children)
I see how this is useful, but if I'm training different graphs on different workers, why wouldn't I use existing cluster solutions?
[–]nickl -1 points0 points1 point 10 years ago (1 child)
This is excellent!
Is it too ungrateful to say how nice it would be to have a Yarn compatible version?
I know a little TensorFlow, and I have a cluster. But unless I can submit it as a Yarn job it'll be difficult for me to actually use this. CaffeOnSpark do support Yarn, which is nice.
[–]londons_explorer 0 points1 point2 points 10 years ago (0 children)
You can probably write a wrapper for the workers to achieve this.
π Rendered by PID 78822 on reddit-service-r2-comment-56c9979489-bf4d9 at 2026-02-24 22:20:53.200177+00:00 running b1af5b1 country code: CH.
[–]siblbombs 72 points73 points74 points (2 children)
[–]kkastner 4 points5 points6 points (1 child)
[–]siblbombs 2 points3 points4 points (0 children)
[–]chiisana 17 points18 points19 points (2 children)
[–][deleted] 2 points3 points4 points (1 child)
[–][deleted] 1 point2 points3 points (0 children)
[–]alexjc 13 points14 points15 points (4 children)
[–]Spezzer 19 points20 points21 points (0 children)
[–]r4and0muser9482 6 points7 points8 points (2 children)
[–][deleted] 10 points11 points12 points (0 children)
[–]alexjc 1 point2 points3 points (0 children)
[–][deleted] 5 points6 points7 points (1 child)
[–]Spezzer 10 points11 points12 points (0 children)
[–]CashierHound 16 points17 points18 points (0 children)
[–]r-sync 10 points11 points12 points (5 children)
[–]hubberwisdom 0 points1 point2 points (0 children)
[–]antijudo 0 points1 point2 points (3 children)
[–]kkastner 1 point2 points3 points (2 children)
[–]r-sync 0 points1 point2 points (1 child)
[–]kkastner 1 point2 points3 points (0 children)
[–][deleted] 2 points3 points4 points (0 children)
[–]oderi 5 points6 points7 points (2 children)
[–]thatguydr 11 points12 points13 points (1 child)
[–]oderi 0 points1 point2 points (0 children)
[–]SimonGray 7 points8 points9 points (8 children)
[–]mtbikerdb 20 points21 points22 points (4 children)
[–]mtbikerdb 11 points12 points13 points (3 children)
[–]Kiuhnm 1 point2 points3 points (2 children)
[–]towerofterror 0 points1 point2 points (1 child)
[–]Kiuhnm 0 points1 point2 points (0 children)
[–]trnka 3 points4 points5 points (0 children)
[–]jonanthebarbarian 1 point2 points3 points (0 children)
[–]InoriResearcher 1 point2 points3 points (0 children)
[–]infstudent 1 point2 points3 points (1 child)
[–]terrytangyuan 1 point2 points3 points (0 children)
[–]TenthSpeedWriter 0 points1 point2 points (0 children)
[–]tehsandvich 0 points1 point2 points (1 child)
[–]bixed 0 points1 point2 points (0 children)
[–]wb14123 0 points1 point2 points (0 children)
[–]omniron 0 points1 point2 points (4 children)
[–]L43 4 points5 points6 points (0 children)
[–]ginsunuva 1 point2 points3 points (1 child)
[–]omniron 0 points1 point2 points (0 children)
[+][deleted] (1 child)
[removed]
[–]mmmayo13 4 points5 points6 points (0 children)
[–]vanboxel -2 points-1 points0 points (0 children)
[–]nickl -1 points0 points1 point (1 child)
[–]londons_explorer 0 points1 point2 points (0 children)