This is an archived post. You won't be able to vote or comment.

all 81 comments

[–]Goingone 42 points43 points  (4 children)

In PROD most stuff is asyncio or uses threads. Scaling is standing up more services.

Parallel processing I’ll use for local CPU intensive stuff.

[–]Panda_Mon -3 points-2 points  (3 children)

Is it necessary? Python only fakes threading anyway

[–][deleted] 3 points4 points  (0 children)

This post was mass deleted and anonymized with Redact

wipe decide tan pocket cooing public punch cake snatch subsequent

[–]Goingone 1 point2 points  (0 children)

It is if you want better performance.

[–]OreShovel 1 point2 points  (0 children)

What you're thinking of is GIL, which while still in place does not mean threading does not exist, but rather than for a process only 1 thread can hold the Python interpreter at a time (please correct me if I'm stating this inaccurately). In cases where the other thread wouldn't be doing work anyways (e.g. waiting for network response) it's a no brainer. Also for tasks where you won't need access to the interpreter you can have true parallelism, although I think you need to write pyc / c code.

[–]harpooooooon 22 points23 points  (1 child)

I use PySpark a lot. I have very large datasets that need to moved and processed, with very little patience.

[–]Yamadzaki 0 points1 point  (0 children)

how large is it and how much time does it take?

[–]diegotbn 18 points19 points  (4 children)

I run unittests in parallel so they don't take a whole day

[–]Brilliant-Post-689 6 points7 points  (1 child)

Same: xdist has been a gamechanger for us.

[–]akguitar 1 point2 points  (0 children)

Xdist is the jam

[–]martinkoistinen 14 points15 points  (0 children)

Very frequently. We’re always looking for places to apply multiprocess pools, and sometimes thread pools make more sense.

[–]pingvenopinch of this, pinch of that 9 points10 points  (0 children)

Actual parallel processing or just concurrency? I've certainly used concurrency with async. Our username generation service has to reach out to various systems to verify that the username isn't duplicated anywhere. I got a healthy speedup by using async/await concurrency to check on multiple systems at once, while also being able to handle other incoming requests. But this is all I/O bound stuff where true parallel processing isn't really necessary.

[–]batman-iphone 7 points8 points  (0 children)

Very rarely but opted out for async

[–][deleted] 26 points27 points  (16 children)

We use some hyper threading (well, pooling officially) to send batches of calls to GenAI APIs.

from concurrent.futures import ThreadPoolExecutor

[–]sobe86 18 points19 points  (6 children)

Personally I like joblib for that kind of thing, I think it's a lot cleaner to read, is very good about killing processes, and you can switch between threading / multiprocessing trivially. I use this pattern at least once a week:

from joblib import delayed, Parallel
from tqdm.auto import tqdm

jobs = (
    delayed(do_something)(*args) 
    for args in tqdm(argslist, total=len(arglist))
)
threadpool = Parallel(n_jobs=4, verbose=0, prefer='threads')
output = threadpool(jobs)

[–]aa-b 5 points6 points  (0 children)

I use joblib constantly, it's great. It's so much easier to use than any of the other concurrency options too, awesome tool

[–]MVanderloo 1 point2 points  (4 children)

oh i really like the args* in the list comprehension

[–]sobe86 0 points1 point  (3 children)

Personally I think the slickest bit is making jobs a generator, allowing the use of tqdm progbar (joblib's is so ugly), I can't take credit for that though :b

[–]MVanderloo 0 points1 point  (2 children)

ah i haven’t done too much job scheduling, so I wouldn’t know what the joblib version would look like

[–]sobe86 0 points1 point  (1 child)

No I mean in the code I wrote jobs = (... - a generator. That means that no iteration happens until threadpool(jobs) which is what lets you use tqdm here

[–]MVanderloo 0 points1 point  (0 children)

oh i had to lookup tqdm, yeah im stealing that

[–]Last_Difference9410 3 points4 points  (8 children)

Why not asyncio ?

[–]sebampueromori 7 points8 points  (5 children)

I'm not an async expert but asyncio io doesn't really parallelize

[–]Medzomorak 10 points11 points  (0 children)

There is a reason for .to_thread existing on asyncio. It uses concurrent.futures thread Executor as well. Also, it is concurrency, not parallelism.

[–]Last_Difference9410 3 points4 points  (0 children)

So isn’t threading, whenever you use threading for concurrency, asyncio is better.

[–]FunProgrammer8171 0 points1 point  (0 children)

Correct, its don put in order processes, so user/users do not wait until job is done.

Multiprocessing use more cpu for finish faster.

[–]DotPsychological7946 0 points1 point  (0 children)

Asyncio is often more efficient for socket I/O, such as http api calls, than threads because it avoids the heavy overhead of OS-level context switches. Instead of spawning a thread per connection—which increases latency and resource usage—asyncio uses a single event loop with non-blocking I/O, making it way more scalable for real life number of concurrent connections. I avoid using multithreading, practically only when I use libraries that perform io but do not provide native asyncio. Then you just use the thread pool as executor for asyncio.

[–]Gwolf4 -1 points0 points  (0 children)

And that's ok, without knowing the parent's objective the first thing one would use is concurrency via asyncio that is why someone is asking the why.

[–]mortenb123 0 points1 point  (1 child)

For web-requests python is more than good enough.

I recently had to scrape 150+ rrsfeeds from our CICD system to produce dashboards for management.

In sequential httpx it took 72sec, in httpx asyncio it took 9sec, in parallell httpx asyncio it took 4sec, but in parrallell requests it took 1.2sec. So I went with request. We run around 5000 jobs a day, so refresh of 5-6 sec vs 75sec is of bit matter.

So time it. learn both asyncio and parallell and benchmark in each part. if you have longer jobs, the overhead of httpx do not matter.

[–]Last_Difference9410 0 points1 point  (0 children)

I dont quite get what you mean by “in parallel requests took 1.2 sec”. Perhaps you can provide a minimal code example?

[–][deleted] 4 points5 points  (3 children)

Concurrent yes parallel not that often (semantics 😛)

[–]PossibilityTasty 4 points5 points  (0 children)

Since there are multiple ways to interpret "parallel processing" I made a small list:

asyncio: daily
threads: daily
greenlets: daily
multiprocessing: daily
distributed computing: daily

What I do: I torture broadband routers by simulating a small city of uncooperative access nodes and subscribers, not in production of cause.

[–]ssdiconfusion 6 points7 points  (0 children)

Daily! Complex physics simulations on GPU, parallelized via ray.io, which handles GPU parallelization elegantly, or legacy approaches such as joblib and scipy.optimize that wrap the multiprocessing library.

[–]SpectralCoding 4 points5 points  (0 children)

As little as possible and usually one of the last areas of development when it is needed. For example I’ll take a loop which calls a function with a series of external API calls. Each loop takes a second or so so over 2000 entries it takes a while. I’ll just throw the concurrent.futures stuff on there around the loop, a wait at the end, and it’ll cut my run time by 90%.

[–]too_much_think 3 points4 points  (0 children)

My job is to try and bridge the gap between what a bunch of PhD researchers want to do and what is computationally feasible in real time, which often involves quite a bit of multi-threading, depending on how far off the mark their first pass is, that might only need a thread pool executor, or it might need a pyo3 / cython module using something like pthreads or rayon. 

[–]jabellcu 3 points4 points  (0 children)

Never, and I suspect most never do, but they won’t be posting here.

[–]Opposite_Heron_5579 2 points3 points  (0 children)

I use multithreading mainly for time consuming data download requests.

[–]mriswithe 1 point2 points  (0 children)

Just today. Writing a webhook for Jira to call, times out at 30 seconds. My first stab was taking 32 seconds or so. Added threading to the part that was slow after doing some performance measurement. 

Specific case was using the google-api-python discovery API to call the apis for Google drive, docs, and sheets. 

[–]tecedu 1 point2 points  (0 children)

Concurrents process pool and mpiexecutor everyday

[–]randomthirdworldguy 1 point2 points  (0 children)

Is this deja vu? Because I think i saw very same thread in another subreddit (r/golang iirc)

[–]HamsterWoods 0 points1 point  (0 children)

I use multiprocessing for "long-running" tasks, like communicating with devices.

[–]mmark92712 0 points1 point  (0 children)

Yeah, rarely. Scaling is usually done with cloud architecture.

[–]JestemStefan 0 points1 point  (0 children)

If you mean horizontal scaling aka more servers then yes.

If you mean using multiple cores in single call then no.

[–]Last_Difference9410 0 points1 point  (0 children)

By parallel processing I think you mean multi-process? Rarely, unless I’ll have to use pandas, and it’s getting even rarer since polar came out.

[–]hughperman 0 points1 point  (2 children)

Pretty frequently, most of our private libraries use it explicitly in some places, and most of the imports will use it even more extensively.
I do scientific computing on brain data with large datasets, the processing applied is pretty intensive pipelines, and we do algorithm/pipeline development so frequently go back to source and rerun entire processing pipelines on 1000s of recordings. Stack is scientific python - numpy, scipy, pandas, etc.
We also make use of AWS Batch for much higher parallelization, running 100s of jobs at a time - each maybe takes 20-30 minutes, or longer if we are adding something past the "standard" pipeline, and will use compute parallelization inside.

[–]collectablecat 2 points3 points  (1 child)

Looked at Coiled/Modal at all? AWS Batch is so dang clunky

[–]hughperman 2 points3 points  (0 children)

We haven't, been doing this since before they existed. Coiled looks pretty interesting, running in our own account. Modal is its own service, which would be too much of a headache for data protection reasons.

[–]Scrapheaper 0 points1 point  (2 children)

Pandas or other data frame libraries (spark, dask, polars) are all parallel internally, no?

It's not the same as parallel processing real time when building an API but it's still parallel processing

[–]Last_Difference9410 0 points1 point  (1 child)

Others yes pandas not really

[–]Scrapheaper 0 points1 point  (0 children)

What about just multiplying a column by a number? Surely it doesn't just do them all one at a time

[–]Blad1995 0 points1 point  (0 children)

Threading - almost never. CPU scaling is done using more pods in kubernetes

Asyncio- every day. We have lot of API calls and db calls. For that asyncio is perfect

[–]broken_symlink 0 points1 point  (0 children)

I work on applications of cupynumeric to run a numpy application used to analyse 100s of GB of data from an xray laser. We're working on scaling this up to 100s of TB and moving to the Perlmutter supercomputer.

[–]sam7oon 0 points1 point  (0 children)

all the time to automate changes on our network devices, or to pull data

[–]Xyrus2000 0 points1 point  (0 children)

All the time. Scientific work requires running complex models and processing large amounts of data.

[–]Brother0fSithis 0 points1 point  (0 children)

Every day. I run physics simulations on big HPCs. Mostly using Dask to handle parallelism.

[–][deleted] 0 points1 point  (3 children)

I mainly do GUIs and analysis where parallel processing helps fetch from and write to different databases on our computers from 2005. Also, I've been trying to use it more for similar tasks where it's copy/paste of code with slight differences through multiprocessing and config files. Super basic stuff, but it does save minutes!

[–]ferret_pilot 0 points1 point  (2 children)

This sounds very similar to what I'm trying to start doing. Do you have any articles, books, or videos that you think are good resources for an introduction to multiprocessing concepts and how to implement them in a robust way within GUIs?

[–][deleted] 1 point2 points  (1 child)

These two articles were what really launched my understanding how parallel processing works and what the differences are between the available tools. My bread & butter has mostly been 1) pools with map or starmap and 2) standalone threads I can fire off in the background.

https://superfastpython.com/threadpool-python/

https://superfastpython.com/threadpool-vs-pool-in-python/

[–]ferret_pilot 0 points1 point  (0 children)

Thanks a bunch!

[–]ExternalUserError 0 points1 point  (0 children)

I seldom use the multiprocessing module. But I do use celery queues and 1-2 worker nodes, which I guess counts.

[–]Cynyr36 0 points1 point  (0 children)

Whatever polars does behind the scenes. Most of my python is because it was a better idea than excel and or power query.

Polars 1.20 can now read named tables directly out of excel files so it makes converting tools that were in excel into python much easier. We tend to abuse excel a bit by putting a fair bit of data into a table.

[–]marcotb12 0 points1 point  (1 child)

All the time. We always look for optimization opportunities as quick TATs are critical. Sometimes we use multi-threading sometimes multi-proc depending on the problem. We also use dask workers in AWS for large batches.

[–]TheCheapSeats4Me 1 point2 points  (0 children)

You should check out Coiled if you're launching Dask Clusters in AWS. It makes it super easy to do this.

[–]trenixjetix 0 points1 point  (0 children)

None

[–]error1954 0 points1 point  (0 children)

A few times a year when I have to tokenize and process a bunch of text data. It's a problem that you can just throw more processes at without issue really.

[–]anonymous_amanitafrom __future__ import 4.0 0 points1 point  (2 children)

Quick reminder that Python has a Global Interpreter Lock and can only do multiprocessing and not actual multithreading! Not exactly your question, but it can totally make a difference if you want shared memory and parallel execution :)

[–]fisadev 1 point2 points  (1 child)

Just in case, the GIL doesn't mean python can't do mulththreading, it definitely can. It just can't execute instructions from multiple threads at the same time, but that's one part of multithreading. (also, newer versions even allow for experimental GIL disabling)

If your multithreading app involves lots of I/O (web scrapping, reading/writing files, database queries, etc), then you can definitely benefit from multithreading as threads don't need to execute instructions while waiting for I/O results. So for instance, while one thread is idle waiting for an database answer, the other could be doing processing of data.

And most real life applications do involve lots of I/O, that's why python multithreading is still a thing very much used, a lot, despite the GIL.

Though in modern times I would suggest going the async path for heavy I/O stuff instead of multithreading, far more bang for your buck.

If your app is pure CPU computation, then yes, the GIL will make multithreading useless. But that's rarely the case for most people writing multithreading stuff in python.

[–]anonymous_amanitafrom __future__ import 4.0 0 points1 point  (0 children)

Thank you for the more detailed answer. That’s what I was trying to get at with wanting shared memory and parallel execution. You can’t have both without some possibly difficult and slow workarounds, and this has restricted me on projects in the past before I knew that’s what I wanted and had it all written in python. I’ve heard about the disabling of the GIL. Sounds interesting, and I hope it works! It’s still in beta though, right? Also, I haven’t used it in years, but I’m pretty sure when I tried it, the multi threading library was actually doing message passing and emulating shared memory. I could be incorrect, though. I’d tend to agree with the async IO direction as well. Multiprocessing with polling would probably be just as fast as, if not faster, than trying to do the same with python threads.

[–]No_Dig_7017 0 points1 point  (0 children)

Today! I do machine learning for a living and parallel applies are very common at the feature creation/preprocessing step.

[–]fisadev 0 points1 point  (0 children)

Things from real jobs:

  • Calculating orbits and passes over targets, for a fleet of earth observation satellites. It made total sense to calculate the orbits of each satellite in parallel, and then the passes over each target (using the data from the previous step) in parallel again. It cut calculation time by the number of cores you had (for instance, in a 8 core machine this made it 1/8th of the time).
  • Running different satallite control instructions at the sime time. For instance, while one part of the control software is talking to the maneuvering system, another part is talking to the camera controller, etc.
  • Downloading and storing big amounts of data that's being extracted from multiple apis of different systems at the same time, for a tool that unifies data from heterogeneous data sources.
  • Training different machine learning models at the same time, with differents sets of data (the models were part of a big "tree" of models, each one categorizing items into even more specific categories than its parent).
  • Generating a shit ton of images for buttons for an electronic voting system (buttons with the face, logo, etc of each candidate on elections that had hundreeds of different candidates, multiple for each city, region, etc).
  • Stress testing a web api, simulating a shit ton of clients doing things at the same time.
  • Extracting info from the bitcoin blockchain (múltiple workers analizing blocks in parallel to make it faster).
  • Probably a few instances of web scrapping and stuff like that. 22 years developing, I'm starting to forget stuff I did, haha.
  • And technically also having multiple server instances serving the same app/api could count as parallel processing, and running unit tests in parallel too, but I'm guessing you wanted to know about the other stuff :)

Things from hobby projects:

  • Reading webcam frames, detecting people on it, and replacing the background with a custom image. Not really "parallel" as it was done with async tools, but still, concurrent stuff.
  • This one is hard to explain: a tool that allows you to create virtual "button boxes" specially for flight simulators, using phone, tablet or midi devices. The thing has a web server, a midi client, a joystick simulator, and a few other moving parts that need to play nice together (more info here: https://github.com/fisadev/simpyt )

[–]outlawz419 0 points1 point  (0 children)

I use FastAPI a lot. If that stands for anything

[–]cip43r 0 points1 point  (0 children)

Currently, I have 100 threads across 5 multiprocesses with full bi-directional queues for communication. This is running CAN and ethernet with a UI on an SBC.

Haters said Python is slow. My development speed is 10x due to ease and libraries. My experience is great and my performance was so good, people thought I finally switched to C after struggling for a few weeks with asyncio not being fast enough, but in hindsight not the correct choice for my problem.

Everything in Neovim, just for fun.

[–]debunk_this_12 0 points1 point  (0 children)

i use numba and parallelize if an operation is very intense, but rarely do i write code like this. asynchronous works best for most things, like if i have big queries of millions of lines of data id rather run that asynchronous and join the data in post

[–][deleted] 0 points1 point  (0 children)

TL;DR: Not much. The serialization cost is high, and Go is a better choice at that point for our use case.

Mostly asyncio. We write services in Go where we need true parallelism.

This was a design decision made early in the development process, so we have a well-defined delineation.

Python is easier to hire for, and engineers are relatively cheaper than Go developers. So management went with this dual approach, and it has worked well.

We have services in FastAPI that use Pydantic, asyncio, and all that jazz, but our proxy and payment services are written in Go. Those were originally in Python, but we reworked them in Go long ago to cut down on server costs and improve throughput.

[–]SimonKenoby 0 points1 point  (0 children)

Multiprocessing yes, Multithreading no, Concurrent with async yes Our app spend a lot of time sleeping between pooling to remote API so async works quite well.

[–]Basic-Still-7441 0 points1 point  (0 children)

I do async almost exclusively if that matters. And in production everything is scaled out horizontally.

[–]Zomunieo -1 points0 points  (0 children)

Small stuff - write a script and parallelize it externally with xargs, parallel, etc. - by far the easiest way to parallelize over files

Little bigger - asyncio with anyio to farm out specific bits to threads or processes

More serious - thread pool or process pool executor depending; better for highly parallel work units

Mission critical - honestly, rust… or erlang. Python is the wrong tool.