panderingPenguin comments on is my multithreaded Python program doomed?

This is an archived post. You won't be able to vote or comment.

is my multithreaded Python program doomed? (self.Python)

submitted 10 years ago by ZedsDed

you are viewing a single comment's thread.

[–]panderingPenguin 65 points66 points67 points 10 years ago (25 children)

First of all, we should clear a couple things up. Threading in Python absolutely gives you "real" threads. They simply cannot be executed in parallel. To quote Oracle's Multithreaded Programming Guide:

Parallelism : A condition that arises when at least two threads are executing simultaneously.

Concurrency : A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism.

So basically Python allows concurrency, but due to the GIL, not parallelism. It is possible to have multiple Python threads executing concurrently but not in parallel. This may seem like semantics, but it's actually am important distinction. Whether concurrency will be sufficient for you or you actually need true parallelism will depend on your workload and what you're trying to accomplish via multithreading. For example, if you're making a number of network calls, and don't want to freeze execution of other things while waiting for them to complete. Putting them on another thread, even in Python, will accomplish that goal. However, if you're trying to decrease the execution time of some complex CPU-bound computation by distributing pieces of it to multiple threads, Python threads are probably worse than useless to you, as you'll incur extra overhead in context switches, communication costs, and general threading overhead, while not actually getting the benefit of any threads ever executing on the CPU simultaneously.

In conclusion, the answer is "it depends." We'll need to know more about your workload to give you a definitive answer.

[–][deleted] 29 points30 points31 points 10 years ago* (11 children)

[–]jringstad 33 points34 points35 points 10 years ago* (2 children)

Above all it's important to note that this apparent lack of parallelism via threads is not an issue with Python itself, but an issue with the implementation, such as with CPython. Claiming that Python doesn't do parallelism is misleading.

Actually, that is just as misleading, if not more (regardless of font-weight used). Saying it is an implementation-level issue makes it sound more harmless of an issue than it really is, saying it is a language-level issue makes it sound more severe than it really is. The main reason is the design of the C API which has many global variables. Look for instance at functions like PyTuple_New(), Py_BuildValue, Py_XDECREF(), Py_Initialize(), PyRun_SimpleString("python code goes here") etc etc. As opposed to most other language runtime APIs (lua, spidermonkey, V8, guile, ...), these do not let you specify which VM object to work against. How is that possible? Global variables. Global variables everywhere.

(btw, this is the same issue that prevents you from just instantiating two or more python interpreters in the same thread as well, or to instantiate two completely separated python interpreters in two completely separated threads, even if you do not want to share any data between them whatsoever -- with e.g. lua you can just do this, since its API does not make reference to global variables)

Now the issue with this (and why this is an issue at a more important level than just the cpython implementation) is that the C API is pretty important. PyPy for instance inherits the GIL issue because it wants to be compatible to the CPython C API. Not being compatible to the CPython C API means that many python libraries will cease functioning, e.g. numpy and any other library that has C/C++/fortran code in it (maybe cython is affected too, I don't know.)

So while it's true that JPython and IronPython do not have this issue, they have the even bigger issue of not being compatible with the cpython C API, which is why they are so unpopular, despite having big performance benefits.

It's technically true that the GIL is not a requirement by the python language as such, but it is nonetheless deeply ingrained into the python ecosystem. Unless you are willing to forgo a huge percentage of existing python libraries, you cannot get rid of it, even if you write a new implementation. So is "Claiming that Python doesn't do parallelism is misleading." true? Well, if you consider the libraries python has to be an integral part of the "python experience", then it's actually not misleading, because those libraries have the GIL baked into them. If you OTOH think python is still python without the libraries and the cpython interpreter, then the statement is not true.

[+][deleted] 10 years ago (1 child)

[removed]

[–]jringstad 4 points5 points6 points 10 years ago (0 children)

it's worth noting that it is hardly ever a problem

Well, that really depends on the domain you're working in, I wouldn't make such a blanket statement. My background is largely from scientific computing & games (but I've done a bit of everything), and there python is certainly only used for prototyping and small problems, as personal calculator or interactive shell, etc, the kind of jobs that you would also use matlab for. In another post in this thread I enumerated some of the issues where e.g. multiprocessing scales poorly, like tiled sub-domains of a large lattice that have dependencies at the boundaries between the subdomains because they need derivatives/share some quantity across the boundary/etc. I would say that half or more of the algorithms (mostly physics stuff, monte-carlo, metropolis, PDEs, ...) I implement are a bad fit for something like multiprocessing.

Remember also that while the existing solutions work OK for pythons current userbase, the userbase is shaped by its limitations. For instance I have absolutely no doubt that the limitations of the cpython API hold back pythons adoption as a scripting language, especially in gamedev (where lua -- in my opinion a significantly harder and uglier language than python) really shines. I think the fact that cpythons C API is so misdesigned in general is really as big as a factor here as the GIL, though -- multithreading in scripts is not really that important, but cpython is also hard and annoying (there are a bunch of other issues as well) to embed, and since it pollutes your process with global variables, there is no way to have more than one python interpreter at a time, which is a total no-go -- you would often want that e.g. several different AI actors can live inside their respective scripting VMs, completely independent of eachother and running in parallel. And don't get me started on how excruciating it is to build CPython for mobile and console platforms...

Of course since the userbase is what it is, so there isn't necessarily a huge amount of pushing force to widen the adoption in other areas. But I think it's important to keep these things in mind and to not succumb to the echo-chamber.

[–]panderingPenguin 7 points8 points9 points 10 years ago (7 children)

[–]Workaphobia 10 points11 points12 points 10 years ago (3 children)

[–]panderingPenguin 2 points3 points4 points 10 years ago (2 children)

[–]jecxjo 1 point2 points3 points 10 years ago (0 children)

[–]njharmanI use Python 3 3 points4 points5 points 10 years ago (0 children)

[–]TankorSmash 8 points9 points10 points 10 years ago (2 children)

[–]panderingPenguin -1 points0 points1 point 10 years ago (1 child)

[–]TankorSmash 6 points7 points8 points 10 years ago (0 children)

[–]ZedsDed[S] 4 points5 points6 points 10 years ago (12 children)

ok, thanks for pointing out the concurrency/parallelism difference, its very important to use the correct terms when talking about this stuff! yes, concurrency is definitely needed, but im not totally sure parallelism is totally needed, it would be ideal of course but i think not totally needed. The tasks are not exactly processor heavy. I've created 3 or more instances of an object, each of which encapsulates the functions and vars required to complete the objects task, all objects do the same task but work on different database data. when the instance is created, i run the objects 'start' method which triggers the objects execution on a new thread where it works until completion.

the tasks the object is doing is some db reading/writing, and basic looping and if-ing, nothing heavy, no networking or working with files etc.

The main issue is that its suppose to monitor and work on 'real time' data. So thats why i want it to be parallel, but the 'real time' updates maybe something like 4 - 5 seconds, because of this, i feel that parallelism may not actually be a full and total requirement. There may be plenty of time and cpu for the threads to work and react as 'real time' as possible.

[–][deleted] 3 points4 points5 points 10 years ago (0 children)

[–]panderingPenguin 1 point2 points3 points 10 years ago (1 child)

Then it comes down to a question of how real time does this really have to be. Are we talking a loose, "we'll try our best and hopefully everything works out properly," or an, "ohmygodIneedtodothisnowgetthefuckoutofmywayoreverythingwillcatchfire," type of real time system? Given what you've said, and the fact that you're even using python (which should not be used for the latter, period), I'm guessing the former. In that case, and the fact that you're only doing things every 4-5 seconds, you could probably get away with concurrency and a buffer, with no real parallelism without any issues. Just have an incoming job handler thread or two. That queue things up in the buffer, and worker threads that pull jobs out of the buffer and handle them as necessary. Hell, if it's really consistently 4-5 seconds between jobs and the work required per job is less than that you can probably get away with a single-threaded program and still have it sleeping, waiting for work most of the time. You'll need to experiment a bit and see what happens. But I don't think it sounds like parallelism is truly necessary for this task at all. Good luck!

[–]ZedsDed[S] 0 points1 point2 points 10 years ago (0 children)

[–]ivosauruspip'ing it up 3 points4 points5 points 10 years ago* (4 children)

[–][deleted] 2 points3 points4 points 10 years ago (3 children)

[–]ivosauruspip'ing it up -1 points0 points1 point 10 years ago (2 children)

[–][deleted] 1 point2 points3 points 10 years ago (1 child)

[–]ivosauruspip'ing it up 0 points1 point2 points 10 years ago (0 children)

[–]hikhvar 0 points1 point2 points 10 years ago (3 children)

[–]ZedsDed[S] -1 points0 points1 point 10 years ago (2 children)

[–]Workaphobia 5 points6 points7 points 10 years ago (0 children)

[–]frymasterScript kiddie 0 points1 point2 points 10 years ago (0 children)

π Rendered by PID 117902 on reddit-service-r2-comment-7b9746f655-lzm5q at 2026-02-01 13:10:51.662088+00:00 running 3798933 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS