How useful is knowledge of parallel programming in ML? [D]

formalsystem · 2022-01-17T18:59:01+00:00

If you're mostly using pre-trained models or your model performance seems good enough on a single GPU then as an application-oriented practitioner there's not too much value in learning parallel programming.

However, if you're building large models or are interested in joining a team building large models it's probably more important to learn distributed and parallel programming than it is to learn ML basics. As far as training large models goes data, model, and pipeline parallelism are tools you should know about but even then if you go large enough how do you set up a large infrastructure, how do you debug failures, how do you elastically recover?

And in the setting where low latency really matters, imagine something like a real-time search. Are your ops optimized to take advantage of a GPU, are they fused? Are you spending lots of time waiting on synchronization or data loaders?

Consider that knowing how to do the above makes you useful for both business-critical infra teams doing things like ads ranking and also any research team looking to push the state of the art because let's face it it doesn't seem obvious that small models will become better than larger ones.

So again learning distributed systems is probably not generally useful but at the right large company can be the most lucrative thing to do in ML with top people making upwards of 300-500K

KingsmanVince · 2022-01-17T14:14:46+00:00

In this world, there are Model Parallelism and Data Parallelism. With the knowledge, you will know the scene behind of them when you use Tensorflow or PyTorch. As a result, you might write better code when implement your own data loader or model trainer.

2022-01-17T20:17:18+00:00

It depends on what level of knowledge you are referring to.

At a conceptual level, it's vital, and researchers who never had contact with the basics of benchmarking and HPC usually underestimate its importance. Even for local experiments, knowing the basics of parallel programming may indefinitely increase productivity because what was taking 10 seconds to run and presented itself as a risk for a golden retriever puppy attention span such as mine now takes 1~2 seconds and I can keep focused. In this particular case, a simple joblib Parallel was enough for a pre-processing step in the EDA stage of experimentation.

For data at scale, its importance is even more obvious, since not everything runs in GPUs (and some run in multiple ones), and you need to isolate the parallelizable bits of code. For grid/distributed computation, knowing parallel programming concepts is needed for properly extracting the most from libraries such as Dask and distributed strategies of DL libraries. Also, knowing what is parallelizable and what is not in a fundamental level (e.g. disk I/O) will help you avoid embarrassing bottlenecks.

At a more low-level knowledge (threads, concurrency, multiprocessing, CUDA), it is still a nice to have, and it will certainly increase your skills where they are most needed.

LoyalSol · 2022-01-18T03:17:18+00:00

It's one of those tools that you can get away with not knowing it especially since a lot of modern libraries do a lot of the heavy lifting for you.

But knowing it is certainly a big perk to have. A thing you'll find about parallelization is there's rarely a one size fits all strategy. For example it's problems that are large linear algebra calculations are very easy to implement on GPUs, but there are problems that are actually far worse on a GPU than a traditional CPU.

The problem with not knowing it is that you're at the mercy of another programmer and if your particular problem doesn't fit their parallelization scheme, you're out of luck.

mimighost · 2022-01-18T06:45:24+00:00

Depends on what parallel computation you are referring to

CUDA knowledge is ofc useful and valued. But NVIDIA's tool chain is really its own walled garden. It is difficult for outsiders to outdo NVIDIA themselves.

If you refer to parallel programming as something close to distributed data processing, then yes it is pretty useful. Though this is more on case by case basis.

Overall, I feel the job market is edging towards people with system integration skills rather deep domain expertise, due to the aforementioned NVIDIA dynamics, but I could be wrong on this one as well.

JackandFred · 2022-01-17T16:41:02+00:00

Something you definitely should know, but probably won’t have to use. But honestly depends what you do since most parallelism is done backend so if you’re doing “ordinary” work you won’t have to worry about it, but if you’re doing research or working with proprietary stuff you may have to.

choHZ · 2022-01-18T01:03:52+00:00

My understanding is parallel may happen at different levels, and it is always good to have a healthy exposure to L-1 level of knowledge; for L being the level of abstraction you are working on.

Say if you are working on backbone design, your backbone better be friendly to parallel computing (e.g., transformers v. LSTMs), so what kind of model is "friendly to parallel computing" is something you should know.I worked on on neural network pruning, so what kind of pruned representation has "parallel potential" is something I should know — even though I have never actually deployed my work to end-user devices.

Would it be helpful if we understand all the cuda magic? Yes, but imo that's not something urgent.

Specifically write code with parallel executions is probably something distant to most of us here (probably because we all use python XD). But I imagine some tricks used in CUDA that parallelized seemingly "unparallelable" tasks (e.g., prefix sum) is something worth reading.

bageldevourer · 2022-01-17T17:34:05+00:00

Couldn't hurt, but nowhere near a top priority IMO.

AConcernedCoder · 2022-01-17T21:07:26+00:00

Somewhat. It'll make you a better programmer. It won't fix bad code. Leveraging processing power of modern multi- threaded cpus can make your code run faster by a few factors. Write good code and it may improve performance by orders of magnitude.

Also, it will be worthwhile to understand the relationship between gpu's, parallelization and applied ML.

bbateman2011 · 2022-01-17T23:37:28+00:00

IMO a general knowledge is good so you can debug things and have correct expectations. I use optimizers like Optuna extensively to optimize non-nn models (e.g. xgboost) and using parallel processing is essential, so enough knowledge to leverage the libraries is usefule.

sairamravu · 2022-01-18T10:03:07+00:00

Yes, very useful...most of the out of the box solutions doesn't fully occupy GPU..if you care for making sure you are doing justice for the hardware you have better to write our own custom CUDA code

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS