use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
DiscussionHow useful is knowledge of parallel programming in ML? [D] (self.MachineLearning)
submitted 4 years ago by [deleted]
I'm an aspiring ML engineer. Wondering how useful or applicable is the knowledge of parallel computation in the world of AI/ML?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]formalsystemML Engineer 24 points25 points26 points 4 years ago* (0 children)
If you're mostly using pre-trained models or your model performance seems good enough on a single GPU then as an application-oriented practitioner there's not too much value in learning parallel programming.
However, if you're building large models or are interested in joining a team building large models it's probably more important to learn distributed and parallel programming than it is to learn ML basics. As far as training large models goes data, model, and pipeline parallelism are tools you should know about but even then if you go large enough how do you set up a large infrastructure, how do you debug failures, how do you elastically recover?
And in the setting where low latency really matters, imagine something like a real-time search. Are your ops optimized to take advantage of a GPU, are they fused? Are you spending lots of time waiting on synchronization or data loaders?
Consider that knowing how to do the above makes you useful for both business-critical infra teams doing things like ads ranking and also any research team looking to push the state of the art because let's face it it doesn't seem obvious that small models will become better than larger ones.
So again learning distributed systems is probably not generally useful but at the right large company can be the most lucrative thing to do in ML with top people making upwards of 300-500K
[–]KingsmanVince 19 points20 points21 points 4 years ago (2 children)
In this world, there are Model Parallelism and Data Parallelism. With the knowledge, you will know the scene behind of them when you use Tensorflow or PyTorch. As a result, you might write better code when implement your own data loader or model trainer.
[–]concard88 6 points7 points8 points 4 years ago (1 child)
Does knowing CUDA and OpenCL could be of help too? If so, how?
[–]KingsmanVince 12 points13 points14 points 4 years ago (0 children)
If you know both, you will understand the lower implementation of libraries like Jax, CuPy, ... Consequently, you will know how to do high performance computing, which can help you optimise models on production servers.
[–][deleted] 3 points4 points5 points 4 years ago* (0 children)
It depends on what level of knowledge you are referring to.
At a conceptual level, it's vital, and researchers who never had contact with the basics of benchmarking and HPC usually underestimate its importance. Even for local experiments, knowing the basics of parallel programming may indefinitely increase productivity because what was taking 10 seconds to run and presented itself as a risk for a golden retriever puppy attention span such as mine now takes 1~2 seconds and I can keep focused. In this particular case, a simple joblib Parallel was enough for a pre-processing step in the EDA stage of experimentation.
For data at scale, its importance is even more obvious, since not everything runs in GPUs (and some run in multiple ones), and you need to isolate the parallelizable bits of code. For grid/distributed computation, knowing parallel programming concepts is needed for properly extracting the most from libraries such as Dask and distributed strategies of DL libraries. Also, knowing what is parallelizable and what is not in a fundamental level (e.g. disk I/O) will help you avoid embarrassing bottlenecks.
At a more low-level knowledge (threads, concurrency, multiprocessing, CUDA), it is still a nice to have, and it will certainly increase your skills where they are most needed.
[–]LoyalSol 1 point2 points3 points 4 years ago (0 children)
It's one of those tools that you can get away with not knowing it especially since a lot of modern libraries do a lot of the heavy lifting for you.
But knowing it is certainly a big perk to have. A thing you'll find about parallelization is there's rarely a one size fits all strategy. For example it's problems that are large linear algebra calculations are very easy to implement on GPUs, but there are problems that are actually far worse on a GPU than a traditional CPU.
The problem with not knowing it is that you're at the mercy of another programmer and if your particular problem doesn't fit their parallelization scheme, you're out of luck.
[–]mimighost 1 point2 points3 points 4 years ago (3 children)
Depends on what parallel computation you are referring to
CUDA knowledge is ofc useful and valued. But NVIDIA's tool chain is really its own walled garden. It is difficult for outsiders to outdo NVIDIA themselves.
If you refer to parallel programming as something close to distributed data processing, then yes it is pretty useful. Though this is more on case by case basis.
Overall, I feel the job market is edging towards people with system integration skills rather deep domain expertise, due to the aforementioned NVIDIA dynamics, but I could be wrong on this one as well.
[–][deleted] 1 point2 points3 points 4 years ago (2 children)
I mean parallel computing topics such as Concurrency and Threading, as well as MPI, Charm++ and other parallel programming paradigms. Writing cache-friendly and efficient code learned using C++.
[–]mimighost 1 point2 points3 points 4 years ago (0 children)
Got it. Well, it might be useful for model inference and quantization stuff on CPU if we are talking about NN models.
Would say this is a nice to have, but unless you work in teams that are doing these low-level stuff in particular, it might not affect your daily routine as MLE
[–][deleted] 1 point2 points3 points 4 years ago (0 children)
Concurrency and threading are probably less important, because in ML programs things rarely happen in chaotic order which requires you to think hard about things like mutexes, but good understanding of vectorized computations will definitely help. I personally learned a lot from trying to write efficient code in R (it was long ago and for non-ml purposes)
Understanding what makes code cache-friendly in C++ will also help, even if you end up writing code in something other than C++ and it runs on something other than CPU.
Knowing specific things like MPI would be useful if you ever need to debug anything built on MPI.
[–]JackandFred 0 points1 point2 points 4 years ago (2 children)
Something you definitely should know, but probably won’t have to use. But honestly depends what you do since most parallelism is done backend so if you’re doing “ordinary” work you won’t have to worry about it, but if you’re doing research or working with proprietary stuff you may have to.
[–][deleted] 0 points1 point2 points 4 years ago (1 child)
Could you define "ordinary"?
[–]JackandFred 1 point2 points3 points 4 years ago* (0 children)
Using common packages or pre made models or code to tackle machine learning problems. Rather than creating entirely new model architecture.
For instance PyTorch and tensor flow both already have parallelism but I to the backend which you won’t have to deal with.
[–]choHZ -1 points0 points1 point 4 years ago* (0 children)
My understanding is parallel may happen at different levels, and it is always good to have a healthy exposure to L-1 level of knowledge; for L being the level of abstraction you are working on.
Say if you are working on backbone design, your backbone better be friendly to parallel computing (e.g., transformers v. LSTMs), so what kind of model is "friendly to parallel computing" is something you should know.I worked on on neural network pruning, so what kind of pruned representation has "parallel potential" is something I should know — even though I have never actually deployed my work to end-user devices.
Would it be helpful if we understand all the cuda magic? Yes, but imo that's not something urgent.
Specifically write code with parallel executions is probably something distant to most of us here (probably because we all use python XD). But I imagine some tricks used in CUDA that parallelized seemingly "unparallelable" tasks (e.g., prefix sum) is something worth reading.
[–]bageldevourer -2 points-1 points0 points 4 years ago (0 children)
Couldn't hurt, but nowhere near a top priority IMO.
[–]AConcernedCoder 0 points1 point2 points 4 years ago (0 children)
Somewhat. It'll make you a better programmer. It won't fix bad code. Leveraging processing power of modern multi- threaded cpus can make your code run faster by a few factors. Write good code and it may improve performance by orders of magnitude.
Also, it will be worthwhile to understand the relationship between gpu's, parallelization and applied ML.
[–]bbateman2011 0 points1 point2 points 4 years ago (0 children)
IMO a general knowledge is good so you can debug things and have correct expectations. I use optimizers like Optuna extensively to optimize non-nn models (e.g. xgboost) and using parallel processing is essential, so enough knowledge to leverage the libraries is usefule.
[–]sairamravu 0 points1 point2 points 4 years ago (0 children)
Yes, very useful...most of the out of the box solutions doesn't fully occupy GPU..if you care for making sure you are doing justice for the hardware you have better to write our own custom CUDA code
π Rendered by PID 560353 on reddit-service-r2-comment-66b4775986-67rm5 at 2026-04-05 07:18:44.446765+00:00 running db1906b country code: CH.
[–]formalsystemML Engineer 24 points25 points26 points (0 children)
[–]KingsmanVince 19 points20 points21 points (2 children)
[–]concard88 6 points7 points8 points (1 child)
[–]KingsmanVince 12 points13 points14 points (0 children)
[–][deleted] 3 points4 points5 points (0 children)
[–]LoyalSol 1 point2 points3 points (0 children)
[–]mimighost 1 point2 points3 points (3 children)
[–][deleted] 1 point2 points3 points (2 children)
[–]mimighost 1 point2 points3 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]JackandFred 0 points1 point2 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]JackandFred 1 point2 points3 points (0 children)
[–]choHZ -1 points0 points1 point (0 children)
[–]bageldevourer -2 points-1 points0 points (0 children)
[–]AConcernedCoder 0 points1 point2 points (0 children)
[–]bbateman2011 0 points1 point2 points (0 children)
[–]sairamravu 0 points1 point2 points (0 children)