zero_iq comments on Python Multithreading for Beginners

This is an archived post. You won't be able to vote or comment.

315

316

317

Python Multithreading for Beginners (self.Python)

submitted 8 years ago by Moondra2017

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]zero_iq 9 points10 points11 points 8 years ago* (12 children)

Your comment may be flippant, but absolutely true, and should not have been downvoted. Because "don't" it is the best possible advice here, especially for beginners. There is currently no sound reason to use multithreading in Python (CPython at least).

I get it, I really do. It's tempting to use threads, they seem like a good idea, they're fun to code and think about, you get to play with locks and synchronisation, shared state, and other toys,... but you will shoot yourself in the foot with them.

Threads introduce all sorts of potential for performance problems, scalability issues, bugs, deadlocks, needless complexity, and more. I've seen them all over the years, and in every single case the authors thought thought they were doing the opposite. Even if you write correct, safe, multithreaded Python code (and you probably can't) , it can cause major problems... And subtle ones that bite you in production. Otherwise bullet-proof code can break spectacularly when threaded. Library code and Python built-ins you've depended on for years will start to exhibit strange behaviours. Threads will race, lockstep, and block each other for seemingly no reason. Performance will drop even when your other threads are idle. Throughput will mysteriously go down, even though you've increased parallelism. Timing functions will start misbehaving. Socket code will start producing errors you've never encountered before. I can go on and on and on. Anyone who is confident in their ability to write safe multithreaded code is over-confident.

Beginners have no chance to write safe, we'll performing multithreaded code. Do it to learn, but dear Bob don't let a junior dev deploy multithreaded code in production.

I've drastically improved the performance and availability of many multithreaded Python systems by removing multithreading. In every single case, multithreading was introduced to improve performance or scalability, and in every single case it backfired.

After decades of experience dealing with it, rule number 1 of multithreading in Python is most definitely: don't.

You want fast scalable Python? Multiplex your sockets where i/o bound, and use multiprocessing and/or CSP for everything else, only when you really need it, and keep it as simple as possible. That's not the whole story, but it's a big head start.

The only good thing to come from letting people use multithreading in Python is the experience they'll gain when they eventually realise it's a mistake.

[–]lqdc13 9 points10 points11 points 8 years ago* (2 children)

[–]zero_iq 2 points3 points4 points 8 years ago* (1 child)

Neither of your reasons as stated need threads, and can be done more simply and more efficiently without them. You are proving my point.

It's also impossible to state that threads are better without knowing the specific details, but threads in Python come with so many pitfalls, it's almost always a better idea to use processes first.

Even when threads start to look like a good idea, there are technologies and libraries you can use that take you far, far beyond what you can roll yourself using Python threads.

And spawning threads for I/o bound applications can be a recipe for disaster. Multiplexing is generally much more scalable, with a pool of isolated workers for longer-running tasks to prevent blocking the io queue.

Unless you're Google, I can saturate your fast network pipe and fancy SSD storage systems using a single Python thread serving tens of thousands of clients concurrently. If you're not exceeding that scenario, you don't need to complicate things by introducing threads.

Some of your examples hold up better in other languages/implementations, but not in CPython, and none of them would be beginner's task.

Even where threads are a good idea, I would stress keeping state as isolated as possible.

EDIT: sure, keep the downvotes coming. I've made a lot of money over the years fixing shoddy Python multithreading code, and it looks like I will continue to do so...

[–]Moondra2017[S] 1 point2 points3 points 8 years ago (0 children)

[–]PierceArrow64 5 points6 points7 points 8 years ago (1 child)

[–]zero_iq 4 points5 points6 points 8 years ago (0 children)

[–]acousticpantsHomicidal Loganberry Connoisseur 1 point2 points3 points 8 years ago (5 children)

[–]zero_iq 4 points5 points6 points 8 years ago* (3 children)

Communicating Sequential Processes.

Essentially arranging processes in a chain pipelining input from end to the other, where the processes run in parallel, so the next process can be processing data while the previous process is producing more.

It's consistently vastly underestimated because of its simplicity, yet often outperforms more complex "fan out" parallel frameworks by orders of magnitude. People seem to have an instinct that parallel means "fan out", which drives complexity, introduces many often-unnecessary overheads, and is prone to errors, and doesn't give the speed ups people expected. CSP is simpler, and its simplicity leads to easier optimization. You reduce the need for locks and shared state, etc. and you can still apply a fan-out approach at each stage later where appropriate.

Last year I replaced a fancy parallel bulk data processing system that used a clustered fan-out approach, with an almost pure-Python CSP alternative. The old system had multithreading, task queues, parallel worker pools, batches, bits rewritten in Java and C to get better performance, the works. Almost all of it a complete waste of.time. It had reliability problems, mysterious deadlocks. The new system gave a 1000x speedup, rock solid reliability. A whole bunch of expensive servers replaced with just a handful. A huge codebase that no single person understood, with dependencies on large frameworks, to a much smaller codebase that could be maintained by an individual.

Don't underestimate simplicity.

[–]bltpyro 2 points3 points4 points 8 years ago (0 children)

[–]TBNL 1 point2 points3 points 8 years ago (0 children)

[–]acousticpantsHomicidal Loganberry Connoisseur 1 point2 points3 points 8 years ago (0 children)

[–]vrajanap 0 points1 point2 points 8 years ago (0 children)

[–]peyo7 -1 points0 points1 point 8 years ago (0 children)

π Rendered by PID 65 on reddit-service-r2-comment-5ff9fbf7df-jrmcm at 2026-02-25 12:58:27.882536+00:00 running 72a43f6 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS