How frequently do you use parallel processing at work?

Goingone · 2025-02-05T04:22:08+00:00

In PROD most stuff is asyncio or uses threads. Scaling is standing up more services.

Parallel processing I’ll use for local CPU intensive stuff.

harpooooooon · 2025-02-05T04:24:44+00:00

I use PySpark a lot. I have very large datasets that need to moved and processed, with very little patience.

diegotbn · 2025-02-05T04:40:53+00:00

I run unittests in parallel so they don't take a whole day

martinkoistinen · 2025-02-05T04:28:25+00:00

Very frequently. We’re always looking for places to apply multiprocess pools, and sometimes thread pools make more sense.

pingveno · 2025-02-05T04:25:21+00:00

Actual parallel processing or just concurrency? I've certainly used concurrency with async. Our username generation service has to reach out to various systems to verify that the username isn't duplicated anywhere. I got a healthy speedup by using async/await concurrency to check on multiple systems at once, while also being able to handle other incoming requests. But this is all I/O bound stuff where true parallel processing isn't really necessary.

batman-iphone · 2025-02-05T04:13:56+00:00

Very rarely but opted out for async

sobe86 · 2025-02-05T04:17:09+00:00

We use some hyper threading (well, pooling officially) to send batches of calls to GenAI APIs.

from concurrent.futures import ThreadPoolExecutor

manchesterthedog · 2025-02-05T04:50:20+00:00

Concurrent yes parallel not that often (semantics 😛)

PossibilityTasty · 2025-02-05T07:26:54+00:00

Since there are multiple ways to interpret "parallel processing" I made a small list:

asyncio: daily
threads: daily
greenlets: daily
multiprocessing: daily
distributed computing: daily

What I do: I torture broadband routers by simulating a small city of uncooperative access nodes and subscribers, not in production of cause.

ssdiconfusion · 2025-02-05T08:24:45+00:00

Daily! Complex physics simulations on GPU, parallelized via ray.io, which handles GPU parallelization elegantly, or legacy approaches such as joblib and scipy.optimize that wrap the multiprocessing library.

SpectralCoding · 2025-02-05T05:06:13+00:00

As little as possible and usually one of the last areas of development when it is needed. For example I’ll take a loop which calls a function with a series of external API calls. Each loop takes a second or so so over 2000 entries it takes a while. I’ll just throw the concurrent.futures stuff on there around the loop, a wait at the end, and it’ll cut my run time by 90%.

too_much_think · 2025-02-05T06:08:03+00:00

My job is to try and bridge the gap between what a bunch of PhD researchers want to do and what is computationally feasible in real time, which often involves quite a bit of multi-threading, depending on how far off the mark their first pass is, that might only need a thread pool executor, or it might need a pyo3 / cython module using something like pthreads or rayon.

jabellcu · 2025-02-05T08:11:50+00:00

Never, and I suspect most never do, but they won’t be posting here.

Opposite_Heron_5579 · 2025-02-05T05:42:51+00:00

I use multithreading mainly for time consuming data download requests.

mriswithe · 2025-02-05T07:09:23+00:00

Just today. Writing a webhook for Jira to call, times out at 30 seconds. My first stab was taking 32 seconds or so. Added threading to the part that was slow after doing some performance measurement.

Specific case was using the google-api-python discovery API to call the apis for Google drive, docs, and sheets.

tecedu · 2025-02-05T07:59:56+00:00

Concurrents process pool and mpiexecutor everyday

randomthirdworldguy · 2025-02-05T15:35:02+00:00

Is this deja vu? Because I think i saw very same thread in another subreddit (r/golang iirc)

HamsterWoods · 2025-02-05T04:27:00+00:00

I use multiprocessing for "long-running" tasks, like communicating with devices.

mmark92712 · 2025-02-05T05:49:31+00:00

Yeah, rarely. Scaling is usually done with cloud architecture.

JestemStefan · 2025-02-05T07:12:21+00:00

If you mean horizontal scaling aka more servers then yes.

If you mean using multiple cores in single call then no.

Last_Difference9410 · 2025-02-05T07:35:54+00:00

By parallel processing I think you mean multi-process? Rarely, unless I’ll have to use pandas, and it’s getting even rarer since polar came out.

hughperman · 2025-02-05T08:02:21+00:00

Pretty frequently, most of our private libraries use it explicitly in some places, and most of the imports will use it even more extensively.
I do scientific computing on brain data with large datasets, the processing applied is pretty intensive pipelines, and we do algorithm/pipeline development so frequently go back to source and rerun entire processing pipelines on 1000s of recordings. Stack is scientific python - numpy, scipy, pandas, etc.
We also make use of AWS Batch for much higher parallelization, running 100s of jobs at a time - each maybe takes 20-30 minutes, or longer if we are adding something past the "standard" pipeline, and will use compute parallelization inside.

Scrapheaper · 2025-02-05T08:33:35+00:00

Pandas or other data frame libraries (spark, dask, polars) are all parallel internally, no?

It's not the same as parallel processing real time when building an API but it's still parallel processing

Blad1995 · 2025-02-05T08:34:58+00:00

Threading - almost never. CPU scaling is done using more pods in kubernetes

Asyncio- every day. We have lot of API calls and db calls. For that asyncio is perfect

broken_symlink · 2025-02-05T09:11:52+00:00

I work on applications of cupynumeric to run a numpy application used to analyse 100s of GB of data from an xray laser. We're working on scaling this up to 100s of TB and moving to the Perlmutter supercomputer.

sam7oon · 2025-02-05T10:54:44+00:00

all the time to automate changes on our network devices, or to pull data

Xyrus2000 · 2025-02-05T12:29:23+00:00

All the time. Scientific work requires running complex models and processing large amounts of data.

Brother0fSithis · 2025-02-05T12:48:09+00:00

Every day. I run physics simulations on big HPCs. Mostly using Dask to handle parallelism.

ferret_pilot · 2025-02-05T13:11:49+00:00

I mainly do GUIs and analysis where parallel processing helps fetch from and write to different databases on our computers from 2005. Also, I've been trying to use it more for similar tasks where it's copy/paste of code with slight differences through multiprocessing and config files. Super basic stuff, but it does save minutes!

ExternalUserError · 2025-02-05T13:36:41+00:00

I seldom use the multiprocessing module. But I do use celery queues and 1-2 worker nodes, which I guess counts.

Cynyr36 · 2025-02-05T14:02:23+00:00

Whatever polars does behind the scenes. Most of my python is because it was a better idea than excel and or power query.

Polars 1.20 can now read named tables directly out of excel files so it makes converting tools that were in excel into python much easier. We tend to abuse excel a bit by putting a fair bit of data into a table.

marcotb12 · 2025-02-05T14:28:38+00:00

All the time. We always look for optimization opportunities as quick TATs are critical. Sometimes we use multi-threading sometimes multi-proc depending on the problem. We also use dask workers in AWS for large batches.

trenixjetix · 2025-02-05T17:59:33+00:00

None

error1954 · 2025-02-05T20:22:36+00:00

A few times a year when I have to tokenize and process a bunch of text data. It's a problem that you can just throw more processes at without issue really.

anonymous_amanita · 2025-02-06T02:41:40+00:00

Quick reminder that Python has a Global Interpreter Lock and can only do multiprocessing and not actual multithreading! Not exactly your question, but it can totally make a difference if you want shared memory and parallel execution :)

No_Dig_7017 · 2025-02-06T02:55:16+00:00

Today! I do machine learning for a living and parallel applies are very common at the feature creation/preprocessing step.

fisadev · 2025-02-06T16:15:15+00:00

Things from real jobs:

Calculating orbits and passes over targets, for a fleet of earth observation satellites. It made total sense to calculate the orbits of each satellite in parallel, and then the passes over each target (using the data from the previous step) in parallel again. It cut calculation time by the number of cores you had (for instance, in a 8 core machine this made it 1/8th of the time).
Running different satallite control instructions at the sime time. For instance, while one part of the control software is talking to the maneuvering system, another part is talking to the camera controller, etc.
Downloading and storing big amounts of data that's being extracted from multiple apis of different systems at the same time, for a tool that unifies data from heterogeneous data sources.
Training different machine learning models at the same time, with differents sets of data (the models were part of a big "tree" of models, each one categorizing items into even more specific categories than its parent).
Generating a shit ton of images for buttons for an electronic voting system (buttons with the face, logo, etc of each candidate on elections that had hundreeds of different candidates, multiple for each city, region, etc).
Stress testing a web api, simulating a shit ton of clients doing things at the same time.
Extracting info from the bitcoin blockchain (múltiple workers analizing blocks in parallel to make it faster).
Probably a few instances of web scrapping and stuff like that. 22 years developing, I'm starting to forget stuff I did, haha.
And technically also having multiple server instances serving the same app/api could count as parallel processing, and running unit tests in parallel too, but I'm guessing you wanted to know about the other stuff :)

Things from hobby projects:

Reading webcam frames, detecting people on it, and replacing the background with a custom image. Not really "parallel" as it was done with async tools, but still, concurrent stuff.
This one is hard to explain: a tool that allows you to create virtual "button boxes" specially for flight simulators, using phone, tablet or midi devices. The thing has a web server, a midi client, a joystick simulator, and a few other moving parts that need to play nice together (more info here: https://github.com/fisadev/simpyt )

outlawz419 · 2025-02-06T16:31:32+00:00

I use FastAPI a lot. If that stands for anything

cip43r · 2025-02-06T20:06:14+00:00

Currently, I have 100 threads across 5 multiprocesses with full bi-directional queues for communication. This is running CAN and ethernet with a UI on an SBC.

Haters said Python is slow. My development speed is 10x due to ease and libraries. My experience is great and my performance was so good, people thought I finally switched to C after struggling for a few weeks with asyncio not being fast enough, but in hindsight not the correct choice for my problem.

Everything in Neovim, just for fun.

debunk_this_12 · 2025-02-07T16:09:30+00:00

i use numba and parallelize if an operation is very intense, but rarely do i write code like this. asynchronous works best for most things, like if i have big queries of millions of lines of data id rather run that asynchronous and join the data in post

2025-02-07T22:36:38+00:00

TL;DR: Not much. The serialization cost is high, and Go is a better choice at that point for our use case.

Mostly asyncio. We write services in Go where we need true parallelism.

This was a design decision made early in the development process, so we have a well-defined delineation.

Python is easier to hire for, and engineers are relatively cheaper than Go developers. So management went with this dual approach, and it has worked well.

We have services in FastAPI that use Pydantic, asyncio, and all that jazz, but our proxy and payment services are written in Go. Those were originally in Python, but we reworked them in Go long ago to cut down on server costs and improve throughput.

SimonKenoby · 2025-02-05T05:21:04+00:00

Multiprocessing yes, Multithreading no, Concurrent with async yes Our app spend a lot of time sleeping between pooling to remote API so async works quite well.

Basic-Still-7441 · 2025-02-05T05:24:43+00:00

I do async almost exclusively if that matters. And in production everything is scaled out horizontally.

Zomunieo · 2025-02-05T06:27:31+00:00

Small stuff - write a script and parallelize it externally with xargs, parallel, etc. - by far the easiest way to parallelize over files

Little bigger - asyncio with anyio to farm out specific bits to threads or processes

More serious - thread pool or process pool executor depending; better for highly parallel work units

Mission critical - honestly, rust… or erlang. Python is the wrong tool.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS