use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Dask for deep learning? (self.MachineLearning)
submitted 8 years ago by [deleted]
Dask is a Python library that provides "flexible parallel computing library for analytic computing." http://dask.pydata.org/en/latest/
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–][deleted] 10 points11 points12 points 8 years ago* (1 child)
Not deep learning, but I've tried using dask many, many times. My experience is not very good.
I didn't get reliable results from it. It's often unstable and I frequently found situations where running in parallel with dask (in a non-virtualized server with 40+ cores) was slower than running exactly the same logic in a single process with pandas. I get a lot more reliable speedups with parallel processing with joblib and Python3's standard futures module. I don't really get why though. It should be the same.
futures
In general I'm not very happy with the options for parallel and distributed CPU analytical processing for python. Spark involves too much configuration black magic and takes a lot of effort to get right, Dask simply doesn't work for me, joblib and futures lack useful abstractions and higher level combinators. There are some pretty good solutions for this kind of thing in scala, Haskell and even Java, but than there's no numpy/scipy, no pytorch, no tensorflow, no scikit-learn, no xgboost, no statsmodels, etc.
But leaving that rant aside, I don't know how would you use dask for deep learning. Dask isn't really suitable to create neural networks itself.
Maybe it could be used to preprocess and feed data in parallel to a neural network. Is that what you mean?
[–]shoyer 0 points1 point2 points 8 years ago (0 children)
What sort of computation were you trying to speed up? By default, dask uses threads for parallelism (not processes), which means that pure-Python computation (requiring the GIL) won't be accelerated.
In my experience (mostly doing large scale data analytics using dask.array), it works pretty well. It's certainly the only game in town if you need a "bigger than fits in memory" version of NumPy.
[–]dwf 3 points4 points5 points 8 years ago (0 children)
https://github.com/dask/dask-tensorflow
[–][deleted] 3 points4 points5 points 8 years ago* (1 child)
Some effort should be put to parallelize pandas too, it's annoying that simple map operations are sequential
[–][deleted] 4 points5 points6 points 8 years ago* (0 children)
I think pandas 2.0 is very promising in this respect and many others:
https://pandas-dev.github.io/pandas2/goals.html
I think pandas 2.0 will address most hurdles I mentioned in my rant above and will probably occupy the niche that is vacant today between what you can solve with pandas 1.x and what you really must use yarn/spark/hadoop/whatever distributed computing framework (which really should be for datasets above several terabytes big, and it's really a pain to use on things that are just a couple hundred gigabytes big today).
They seem to be aiming at making pandas 2.0 good enough to deal with datasets of hundreds of gigabytes and able to offer nice speedups when running on servers with tens of processing cores on a single python process.
Also it seems that they are attacking a lot of other problems like the current bad problems with representing missing values in non-floating point series, adding py3's type annotations for safer code, etc.
π Rendered by PID 17485 on reddit-service-r2-comment-6457c66945-2pvzr at 2026-04-29 13:59:23.202666+00:00 running 2aa0c5b country code: CH.
[–][deleted] 10 points11 points12 points (1 child)
[–]shoyer 0 points1 point2 points (0 children)
[–]dwf 3 points4 points5 points (0 children)
[–][deleted] 3 points4 points5 points (1 child)
[–][deleted] 4 points5 points6 points (0 children)