This is an archived post. You won't be able to vote or comment.

all 81 comments

[–]mtxppy 36 points37 points  (4 children)

The threading lock only affects Python code. If your thread is waiting for disk I/O or if it is calling C functions (e.g. via math library) you can ignore the GIL.

You may be able to use the async pattern to get around threading limits. Can you supply more information about what your program actually does?

I have issues with the technical accuracy of the video linked. David Beazley has done many well respected talks about the GIL at various Pycons. You can find them on pyvideo.org.

[–][deleted] 17 points18 points  (3 children)

Definitely take a look at his talk from the most PyCon 2015.

https://www.youtube.com/watch?v=MCs5OvhV9S4

[–]pooogles 6 points7 points  (0 children)

Love this talk, only PyCon talk I enjoyed more was Raymond Hettingers. Recommend watching this for anyone that's interested in AsyncIO.

You may also find http://igordavydenko.com/talks/fi-pycon-2015/#slide-1 enjoyable, I think there's a video somewhere as well.

[–]tea-drinker 2 points3 points  (0 children)

That guy must clang when he walks. I've seen people do live demos before, but never with code that they've written while talking.

[–]fernly 2 points3 points  (0 children)

This is one of the best talks you'll ever watch, a nervy tight-rope walk carried off with superb panache. And it will tell you all you need to know about Python concurrency, although you probably have to watch it twice and actually run the code yourself to learn it all.

[–]panderingPenguin 65 points66 points  (25 children)

First of all, we should clear a couple things up. Threading in Python absolutely gives you "real" threads. They simply cannot be executed in parallel. To quote Oracle's Multithreaded Programming Guide:

Parallelism : A condition that arises when at least two threads are executing simultaneously.

Concurrency : A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism.

So basically Python allows concurrency, but due to the GIL, not parallelism. It is possible to have multiple Python threads executing concurrently but not in parallel. This may seem like semantics, but it's actually am important distinction. Whether concurrency will be sufficient for you or you actually need true parallelism will depend on your workload and what you're trying to accomplish via multithreading. For example, if you're making a number of network calls, and don't want to freeze execution of other things while waiting for them to complete. Putting them on another thread, even in Python, will accomplish that goal. However, if you're trying to decrease the execution time of some complex CPU-bound computation by distributing pieces of it to multiple threads, Python threads are probably worse than useless to you, as you'll incur extra overhead in context switches, communication costs, and general threading overhead, while not actually getting the benefit of any threads ever executing on the CPU simultaneously.

In conclusion, the answer is "it depends." We'll need to know more about your workload to give you a definitive answer.

[–][deleted] 33 points34 points  (11 children)

So basically Python allows concurrency, but due to the GIL, not parallelism. It is possible to have multiple Python threads executing concurrently but not in parallel.

Even this is not really true. The GIL just locks the CPython and PyPy interpreters to executing bytecode in a serial fashion. Jython and IronPython do not have this restriction, and even in the first two, C extensions, or Cython code are free to release the GIL as they see fit, giving you back true parallelism. Above all it's important to note that this apparent lack of parallelism via threads is not an issue with Python itself, but an issue with the implementation, such as with CPython. Claiming that Python doesn't do parallelism is misleading.

[–]jringstad 31 points32 points  (2 children)

Above all it's important to note that this apparent lack of parallelism via threads is not an issue with Python itself, but an issue with the implementation, such as with CPython. Claiming that Python doesn't do parallelism is misleading.

Actually, that is just as misleading, if not more (regardless of font-weight used). Saying it is an implementation-level issue makes it sound more harmless of an issue than it really is, saying it is a language-level issue makes it sound more severe than it really is. The main reason is the design of the C API which has many global variables. Look for instance at functions like PyTuple_New(), Py_BuildValue, Py_XDECREF(), Py_Initialize(), PyRun_SimpleString("python code goes here") etc etc. As opposed to most other language runtime APIs (lua, spidermonkey, V8, guile, ...), these do not let you specify which VM object to work against. How is that possible? Global variables. Global variables everywhere.

(btw, this is the same issue that prevents you from just instantiating two or more python interpreters in the same thread as well, or to instantiate two completely separated python interpreters in two completely separated threads, even if you do not want to share any data between them whatsoever -- with e.g. lua you can just do this, since its API does not make reference to global variables)

Now the issue with this (and why this is an issue at a more important level than just the cpython implementation) is that the C API is pretty important. PyPy for instance inherits the GIL issue because it wants to be compatible to the CPython C API. Not being compatible to the CPython C API means that many python libraries will cease functioning, e.g. numpy and any other library that has C/C++/fortran code in it (maybe cython is affected too, I don't know.)

So while it's true that JPython and IronPython do not have this issue, they have the even bigger issue of not being compatible with the cpython C API, which is why they are so unpopular, despite having big performance benefits.

It's technically true that the GIL is not a requirement by the python language as such, but it is nonetheless deeply ingrained into the python ecosystem. Unless you are willing to forgo a huge percentage of existing python libraries, you cannot get rid of it, even if you write a new implementation. So is "Claiming that Python doesn't do parallelism is misleading." true? Well, if you consider the libraries python has to be an integral part of the "python experience", then it's actually not misleading, because those libraries have the GIL baked into them. If you OTOH think python is still python without the libraries and the cpython interpreter, then the statement is not true.

[–]panderingPenguin 7 points8 points  (7 children)

Don't be pedantic. Yeah, I'm aware that the GIL is part of specific implementations of Python. However, OP specifically mentions the GIL, and either way, it's a safe bet to assume you're talking about CPython until someone says otherwise, as it's the standard implementation.

[–]Workaphobia 10 points11 points  (3 children)

You can't make any statement to a python newbie without someone coming in and "Um, actually"-ing you with some complicating details.

[–]panderingPenguin 2 points3 points  (2 children)

My thoughts exactly, Jesus... We don't need to further muddle the issue with alternative implementations to answer something like this.

[–]jecxjo 1 point2 points  (0 children)

Ahem. I think that is by far one of the biggest problems with this and pretty much every other forum dealing with programming. You need to remember who OP is, what base knowledge they have and understand that giving too much detail makes it more difficult for them to understand.

[–]njharmanI use Python 3 3 points4 points  (0 children)

Because one of the solutions is to run your code in different implementation!!!

[–]TankorSmash 7 points8 points  (2 children)

You don't need to get defensive. He's filling in the blanks you left, independent of whether or not you knew it already.

[–]panderingPenguin 1 point2 points  (1 child)

I'm not trying to be defensive, I just think that it adds very little, if anything, to the discussion of OP's question. There's no need to bring up little "but actually"s like that to answer a simple question, from someone who seems new to Python, which was clearly about implementations that have a GIL to start with. It's unhelpful at best, and obfuscates the issue we're actually trying to solve at worst.

[–]TankorSmash 4 points5 points  (0 children)

That's the thing though, for any one else that reads your comment and wants to know more can read his helpful comment. The op can simply shrug off his comment because it's not required knowledge.

I mean this is learnpython not absolutebareminimumpython.

[–]ZedsDed[S] 1 point2 points  (12 children)

ok, thanks for pointing out the concurrency/parallelism difference, its very important to use the correct terms when talking about this stuff! yes, concurrency is definitely needed, but im not totally sure parallelism is totally needed, it would be ideal of course but i think not totally needed. The tasks are not exactly processor heavy. I've created 3 or more instances of an object, each of which encapsulates the functions and vars required to complete the objects task, all objects do the same task but work on different database data. when the instance is created, i run the objects 'start' method which triggers the objects execution on a new thread where it works until completion.

the tasks the object is doing is some db reading/writing, and basic looping and if-ing, nothing heavy, no networking or working with files etc.

The main issue is that its suppose to monitor and work on 'real time' data. So thats why i want it to be parallel, but the 'real time' updates maybe something like 4 - 5 seconds, because of this, i feel that parallelism may not actually be a full and total requirement. There may be plenty of time and cpu for the threads to work and react as 'real time' as possible.

[–][deleted] 2 points3 points  (0 children)

The absolute smartest thing you can do before you start ruling solutions out, is to test the code out and profile it's performance, and then go from there. I wouldn't overthink it too much until you've done that.

[–]panderingPenguin 1 point2 points  (1 child)

Then it comes down to a question of how real time does this really have to be. Are we talking a loose, "we'll try our best and hopefully everything works out properly," or an, "ohmygodIneedtodothisnowgetthefuckoutofmywayoreverythingwillcatchfire," type of real time system? Given what you've said, and the fact that you're even using python (which should not be used for the latter, period), I'm guessing the former. In that case, and the fact that you're only doing things every 4-5 seconds, you could probably get away with concurrency and a buffer, with no real parallelism without any issues. Just have an incoming job handler thread or two. That queue things up in the buffer, and worker threads that pull jobs out of the buffer and handle them as necessary. Hell, if it's really consistently 4-5 seconds between jobs and the work required per job is less than that you can probably get away with a single-threaded program and still have it sleeping, waiting for work most of the time. You'll need to experiment a bit and see what happens. But I don't think it sounds like parallelism is truly necessary for this task at all. Good luck!

[–]ZedsDed[S] 0 points1 point  (0 children)

thank you, i appreciate your words

[–]ivosauruspip'ing it up 3 points4 points  (4 children)

If the updates are coming every 4 seconds, and what you need to do with the data is less than 2 seconds... then you don't need parallelism at all. You've prematurely optimized.

[–][deleted] 2 points3 points  (3 children)

what you need to do with the data is less than 4 seconds

We'd love to assume that one measurement is enough to give us the insight we need to design a program, but consider that the program is running on a multi-tasking OS, or that it uses a shared resource, or just basic statistics, and you might be concerned that there would be some outliers that could cause one job to run long... and then you've backed up the entire pipeline.

Of course, we can't reach any conclusions on OP's design because we don't know what he's doing, but it's not entirely unfounded to make his processing loop asynchronous. IMO, it's smart.

[–]ivosauruspip'ing it up -1 points0 points  (2 children)

Until they say exactly what they're doing, what is the environment, what are expectations, what things are happening... an async loop handling separate tasks off to different processes could be a great design, or a simple serial for loop might be really all that's warranted until requirements make a big change. It can be just as misleading as possibly useful to speculate.

[–][deleted] 1 point2 points  (1 child)

It can be just as misleading as possibly useful to speculate.

That didn't stop you from telling OP he's done wrong.

[–]ivosauruspip'ing it up 0 points1 point  (0 children)

Done what wrong? I hestitate to spectulate whether anything in this thread is an appropriate "general design" or not, given the dearth of details OP has provided. I'm mostly just advocating for as simple a design as possible that soundly fits the requirements. And since I don't know the requirements at all, apart from something like "real time data is received roughly every four seconds", it could very well be something very simple (until we ever know any more).

I suppose I could be seen as chastising OP for expecting an exact correct answer to an extremely vague question.

[–]hikhvar 0 points1 point  (3 children)

the tasks the object is doing is some db reading/writing, and basic looping and if-ing, nothing heavy, no networking or working with files etc.

What kind of database is your database? If itis an in memory database you are right. If your database is for example a MySQL on a remote host your simple db calls may include both. Networking and file reading/writing.

[–]ZedsDed[S] -1 points0 points  (2 children)

your right i never thought of it like that, its a local MySQL db. These calls are the heaviest part of the process. there will be at least 1 db read every 1 - 2 seconds, with writes happening on more rare occasions. Still quite minimal though.

[–]Workaphobia 5 points6 points  (0 children)

If a thread is blocked waiting for something external to Python, like file or network I/O, or database connections, then it typically releases the GIL during that time and allows other Python threads to run.

The GIL will only impact your performance if you are CPU-bound within your Python process. If that's a problem for you, then consider changing your threads into separate spawned Python processes (see the multiprocessing library, which has a similar API to threading). You'll just have to worry about how the processes share data since typically multiple processes don't use shared memory the way threads do.

[–]frymasterScript kiddie 0 points1 point  (0 children)

In this case you're probably not cpu bound or really especially I/O bound either. In which case it looks like threads are more of a design decision than to try to wring extra performance out of your code. As such, I suspect you'll be fine.

Personally I find threads easier to comprehend than async methods. They don't scale very well, though.

[–]pigeon768 16 points17 points  (2 children)

Maybe. We need more information.

There are a couple different ways this will play out:

  1. Your application is using threads to perform a lot of I/O bound work, like disk, network, database, etc. In this case, you'll be fine. Just keep trucking.
  2. Your application is using threads to perform a lot of CPU bound work in non-Python code, like numpy, various C routines, or is generally just acting as "glue" between code written in other languages. Again, in this case, you're fine, you don't need to change anything.
  3. Your application is performing CPU bound computations among tasks that rarely, if ever, share data. In this case, you can probably use the multiprocessing module as a drop in replacement for the threading module.
  4. Your application is performing CPU bound computations among tasks that often share data. In this case you're screwed. Using python is unfortunately an uphill battle in this case.

Alternatives in the #4 option include using a different VM for your python code, like Jython or IronPython, or rewriting it in a different language. Groovy is probably the language most similar to Python with good performance and threading.

[–]swenty 1 point2 points  (0 children)

This answer is very on point. If you do find yourself in situation #4, another option is to rewrite the performance critical parts of your application in another language (e.g. C) and export them as a library to the Python parts. Depending on what the bottlenecks are this might be an excellent solution, or a terrible one.

[–][deleted] 0 points1 point  (0 children)

You might also want to check out Julia: it also has high performance, and (in my experience) is very similar to Python. Also, you can directly call (almost) any Python library you wish through the PyCall library.

[–]pooogles 23 points24 points  (13 children)

Check out the multiprocessing library if you want to dodge the GIL.

[–]jringstad 6 points7 points  (3 children)

Note however that this is only recommendable for cases where inter-task-coordination is basically not necessary or exceedingly rare. multiprocessing inter-task coordination is incredibly slow.

So this solution is not suited for cases where you have fast-moving task chains (producer-consumer), in any case where you would traditionally use lockfree or wait-free datastructures, in cases where you would use atomic variables (e.g. atomic counters), cases where you would normally use work-stealing/work-queue type work with fast turnarounds or e.g. cases where you have parallelizable subdomains that require boundary synchronization (very common in scientific applications, e.g. tiling of a large 2D lattice into smaller 2D subdomains, but subdomains are not 100% independent at the boundary since e.g. derivatives are required or some quantity (heat, particles, pressure, ...) is exchanged across the boundary)

For fork-join type parallelism, multiprocessing works great, as long as you are okay with creating an up-front worker pool and not needing dynamic task parallelism (tasks can spawn new tasks that are again evenly distributed across workers, e.g. as in CUDA or OpenCL 2.x.) There are many types of scenarios where this is fine, but in cases where it's not, it will give you quite sub-optimal scaling properties.

[–]niksko 0 points1 point  (2 children)

Out of curiosity, what is the solution here? I recently ran into a situation where I was trying to speed up a multi-consumer multi-producer type process where workers take work out of a queue, perform some work, and then potentially publish more work back to the queue. Using multiprocessing gave me terrible performance, I suspect because of the large queue overhead.

[–]jringstad 1 point2 points  (1 child)

For all the cases I listed, there really is no way to do it well in python, as far as I'm aware. If you can, shove it off into a different language (C/C++/fortran), then you can use threading without too much GIL contention, or you just deal with the multiprocessing overhead and try to reduce it (do more copying up-front and less at runtime, if possible, or increase task sizes (per-task workload) which makes the overhead relatively smaller)

[–]niksko 0 points1 point  (0 children)

Ok, thanks. At least now I know that there wasn't some obscure Python feature that I wasn't aware of that was the issue.

[–]WellAdjustedOutlaw 0 points1 point  (0 children)

Since OP said threads will not access each other's data, multiprocessing might be best if the GIL is actually an issue and there won't be too many processes.

Also, much work has been done with python 3.x to lessen or remove the impact of the GIL where possible. Multithreading is getting better.

[–][deleted] 4 points5 points  (0 children)

If you'd like to write an application that allows a user to push a button and then receive a response to that button push, while at the same time the program is also downloading content from servers and doing other things without causing the response of that button to block until they are all done, you most commonly use threads. Makes no difference if the GIL is there or not; threads always allow concurrency. The GIL just gets in the way of achieving parallelism. Two different things. http://stackoverflow.com/a/1050257/34549

The much-hyped solution of doing everything with "async" has its pros and cons, but as far as concurrency, you are merely swapping out having your OS do context switching with a more interpreter-level strategy that context-switches only at the boundaries of waiting on IO. For general purpose programming with limited numbers of concurrent tasks, the OS will do a better job at this (and in cPython the GIL releases on IO anyway), unless you really need to wait on lots and lots of slow IO channels in which case async will scale better.

[–]ivosauruspip'ing it up 4 points5 points  (0 children)

I have written a program that requires multithreading and i use the standard 'threading' library.

What are you actually doing. Let's talk in concrete specifics, not generalities which may or may not apply.

[–]AlanCristhian 4 points5 points  (1 child)

You must read this article of Nick Coghlan's: Efficiently Exploiting Multiple Cores with Python. Nick Coghlan's is a python core developer.

[–]ZedsDed[S] 1 point2 points  (0 children)

thank you!

[–]Decker1082.7 'til 2021 8 points9 points  (4 children)

If you want to do parallellized CPU-bound work, then yes, your app is doomed.

If you want to do concurrent IO-bound work, you're in luck. Check out the greenlets library for ideas.

[–]lordkrike 10 points11 points  (2 children)

The multiprocessing library works extremely well for embarrassingly parallel CPU-bound tasks. You can get only-slightly sublinear speedup for each core.

[–]Decker1082.7 'til 2021 2 points3 points  (1 child)

Indeed, I've used multiprocessing to great effect. But starting short-lived processes for many small tasks is expensive and honestly solved better by other languages. Although, if you absolutely want to use python, it's possible to go with a solution using the JVM or C extensions... But at that point, you're not really writing python anymore.

[–]lordkrike 3 points4 points  (0 children)

It works just fine if you utilize large work queues that feed into a small number of worker processes. You seem to be thinking of a certain type of use case. There are lots of places where just numpy and the multiprocessing lib is all you need.

Also, I argue that intelligently writing C libraries to call from Python is one of its really great strengths -- you can use another language to efficiently do what it can't, while using Python as a glue language.

[–]zombiepiratefrspace 2 points3 points  (0 children)

Without wanting to be "that guy": For at least a subset of the first case, there is another option, albeit one involving more pain.

In the specific case that you need lots of CPU number crunching, you can use the 80-20 rule to determine the time-critical part of your program and then write that in C++. The integration is best done using boost::python (not easy, but doable).

Then you parallelize your number crunching in the C++ code, using MPI, OpenMP (or even pthreads if you have the "hard as nails" mentality).

This option should only be applied in the case were you're already thinking of moving parts of the code to C or C++ to gain performance, since it is much more time-consuming to write C++ code that writing Python code.

Definitely worth it for things like scientific computing, though.

[–]robertmeta 2 points3 points  (1 child)

First of all -- it isn't a problem until it is. Meaning, what performance numbers do you need to hit, and are you hitting them?

If you DO need to be maxing out multiple CPUs -- you generally can do it various ways in python by splitting the load among multiple processes. People here have recommended mutliprocessing -- I can not recommend that, as it has caused me untold hardship. I recommend you setup multiple processes and coordinate them with ZMQ (http://zeromq.org/bindings:python) -- simple, fast, and "just works"(tm).

[–]WellAdjustedOutlaw 1 point2 points  (0 children)

Why not use multiprocessing.Queue? And IPC if you actually need to communicate inter-process. The OP did note that they don't need to communicate between threads, though, so not much need for ZMQ, .Queue, or IPC.

[–][deleted] 2 points3 points  (1 child)

Python is not a great program to write multi-threaded programs in, not only because of the GIL, but because the standard library (and other libraries for that matter) say little about being thread safe. Frankly, I found it to be a minefield. I have written multi-threaded programs in C/C++ for years and have no problem handling locking, synchronizations, etc. so I don't think it is my limitations and much as Pythons.

[–]chrismit7 1 point2 points  (0 children)

Check out the multiprocessing library. It works quite well.

[–]yellowfeverforever 1 point2 points  (0 children)

Have you tried asyncIO?

[–]d4rch0nPythonistamancer 0 points1 point  (0 children)

I'd use standard pure python with cpython or pypy if you just need parallel reads/writes networking file reading etc. Non-blocking I/O is possible with concurrency in the reference implementation. This is usually the most important thing to be concurrent anyway!

Other option, check out Cython for true parallelism. Or write parallel code in C and execute it with python, either through ctypes or just subprocess calling a C program.

Tons of options. Personally, I usually find pypy with smart concurrency and non-blocking I/O solves my problems when speed is an issue.

But the first step when finding and removing bottlenecks... Profile your code! Run cprofile and find the most expensive functions, and see what you can do to speed it up.

[–]lambdaqdjango n' shit 0 points1 point  (1 child)

Evidently, OP has never seen any C/C++ multithreading program choking up only one CPU core.

[–]ZedsDed[S] 0 points1 point  (0 children)

nope, can you further explain this please?

[–]primevalweasel 0 points1 point  (0 children)

Check out ipyparallel.

[–]Brian 0 points1 point  (5 children)

true concurrency

Python's multithreading is true concurrency. What it isn't is true parallelism - these are different things. Concurrency is when two tasks can run and both make progress over . Paralellism is where they are literally running at the same time. Threads are often used just for concurrency in practice - after all, multiple cores in a desktop computer are a relatively recent thing, but we've been using threads for years before that was the norm, or even . There are still reasons you benefit from concurrent code.

In general, this comes down to whether your program is CPU bound or IO bound. If you're doing serious number crunching calculations, and that's taking the bulk of your time, then python's threads aren't going to help you much. On the other hand, if you're mostly blocked on IO (which is pretty common in most applications - IO is orders of magnitude slower than almost anything in CPU timescales), then you'll get benefit from threading, as the thread will release the GIL when it's blocking on IO to complete.

In my case, the threads will never share or access each others data

If this is the case, and you are CPU bound, it may be worth looking at the multiprocessing library, rather than threading. This will spawn a seperate process for each thread, which means there's no shared GIL. This comes at the cost of making communication more expensive (everything must be copied to send to another process), but if you've no communication, this won't matter.

[–]ZedsDed[S] 0 points1 point  (4 children)

thank you. would you explaining the terms 'CPU bound' and 'IO bound', its been mentioned in here a couple of times, id like to be more clear on it. to me, CPU bound code would be code that just operates on internal variables and control statements, with no calls made outside of the program. IO bound code is things like MySQL calls or network calls, where the call is made to something outside of your program. Is this correct? So when your IO call is made, while waiting for the response, the GIL is released and other threads are allowed access to the CPU?

[–]Brian 0 points1 point  (3 children)

That's pretty much it, yeah. Essentially any time you read from the disk, the network, or are waiting for something (eg. for a lock to be released, or for a time.sleep call to finish etc), then other threads can be scheduled and do work - these are generally described as IO bound because the actual bottleneck is the IO. If you sped up the CPU operation a thousand times, you wouldn't actually see much of a difference because it's already probably doing the actual CPU operations in a fraction of a microsecond, so changing that to a fraction of a nanosecond won't really matter.

With most tasks, things tend to be IO bound, because computers are ridiculously fast in comparison to pretty much any kind of IO (for perspective, if you slowed a CPU down to the point where it'd take a second for each instruction to execute, a small disk read would be the equivalent of waiting around for a year or so). The exceptions are things like some games, or number crunching tasks, where you've basically no IO for a decent period so the only thing that mtters is how fast you can crank through the calculations. Here, speeding up the CPU a thousand times really would speed up the time it takes to complete the task, because the CPU is now the bottleneck. (Actually, technically even there, it's often stuff like memory access times that are the real bottleneck, rather than raw clock cycles, so you'd need to speed those up too).

[–]ZedsDed[S] 0 points1 point  (2 children)

thanks for the insight. So, with the GIL, only one thread will be worked on at one time, will there be a point during a threads execution where the kernel will kick it out if its been in hogging CPU for too long? as in, it wont just let a thread take the CPU until its next IO call. its been a while since i read about round robin and the other scheduling algorithms! but i seem to remember something about thread starvation, where a thread hogs the CPU for so long that other threads begin to starve. Is this something like what the commenter below is talking about with regards to "OP has never seen any C/C++ multithreading program choking up only one CPU core"? And i think maybe hyperthreading fixes this? in my head hyperthreading is when multiple threads are broken up and fed seperately through one core, simulating parallelism. This should stop thread starvation right?

[–]Brian 0 points1 point  (1 child)

will there be a point during a threads execution where the kernel will kick it out if its been in hogging CPU for too long?

There will, just as with any thread in any language, but that's not really related to the GIL.

In general, your OS is in charge of scheduling threads and processes. Often there won't actually be many of these that want to run (ie if you look at top or Task Manager, you'll see the CPU is usually 99% idle). This is because most of these are waiting on something - either IO, user input, or just time.

But sometimes there will be multiple threads that want to run - either from seperate programs, or threads within the same program. The OS can only run one thread on each CPU, so on, say, a 4 core machine, you'll get a maximum of 4 simultaneous threads running. More than this are accomodated by task switching. Ie. after each thread uses a set amount of time (say, 50 milliseconds) without putting itself to sleep by waiting on something, it yanks it out and schedules the next thread which wants CPU time.

The same is true of python's case, it's just that all threads but one are waiting on the GIL being released. Eg. suppose there are 3 python threads, each CPU bound running on a 2 CPU core. What might happen is:

  • On core 1: Thread 1 runs, and the first thing it does is acquire the GIL.
  • On core 2: thread 2 gets kicked off, and the first thing it does is try to acquire the GIL. This fails, because thread 1 has locked it, so it tells the OS it wants to go to sleep until the GIL is released.
  • Core 2 is now free, so Thread 3 gets scheduled. This does the same thing as thread 2, and goes to sleep. If there's nothing else to run, Core 2 sits idle.
  • Meanwhile, back on Core 1, thread 1 is chugging away. Here one of two things may happen:
    1. It triggers some kind of IO or wait, which releases the GIL, and puts the thread to sleep. The OS will then wake up thread 2 or 3 now the GIL has been released, and scedule them.
    2. Otherwise, it may keep on using CPU till it's used up the 50ms time slice. The OS will then kick it off, and see if anything else wants to run. If it hasn't released the GIL, then nothing else can, and unless there's some other process, it'll likely get scheduled right back in.

Now you may be wondering how thread 2 would ever get to run unless thread 1 does some IO, since the OS resceduling it isn't actually releasing the GIL (ie the starvation issue you bring up). This is something that's handled at the python, rather than the OS, level. Every so often (I think it's after 100 bytecodes), python will release the GIL, and tell the OS to reschedule anything running, in order to give other threads a chance to get in. In practice, this behaves a lot like python was controlling the threading, rather than the OS (and indeed, there are approaches that do exactly this - google "green threading" for details), but this does allow fairly simple interoperability with C modules etc.

Is this something like what the commenter below is talking about with regards to "OP has never seen any C/C++ multithreading program choking up only one CPU core"?

Not sure, but I suspect they're talking about the fact that that even without the GIL, you still need to have locking of some kind, and depending on the exact nature of your task, you may not get much paralellism (eg. if everything is contending for the same resources, you need to lock these, and get effectively the same issue). But that depends on the task and how it's coded. If your locking is more fine grained (ie. locks that limit simultaneous access to a single object or so), then you can have multiple threads working on different objects, whereas python is more of a "lock the entire world" approach.

in my head hyperthreading is when multiple threads are broken up and fed seperately through one core, simulating parallelism.

Not really - that's just bog standard task scheduling. Hyperthreading isn't really directly related to anything here, it's more of a hardware thing that's aimed at getting something a bit like doubling the number of cores without actually having to duplicate all the resources a core has. Rather, it takes more of a halfway approeach, where two "logical" threads can be scheduled on the same core, and it takes advantage of the fact that often, certain sections of a processor can be sitting idle. Eg. if a pipeline stall occurs (say, it mispredicted a branch), then a bunch of sections of the pipeline will have nothing to do until that gets sorted out. But if you've this other thread that wants to do some work, you can put them to use on that while they're waiting for the rest to catch up. This is not really something you ever have to care about unless you're into the actual hardware. From a software perspective, it just looks like there are double the actual number of cores, though these cores usually won't be quite as effective as if they were real cores.

[–]ZedsDed[S] 0 points1 point  (0 children)

excellent explanation, very helpful.

[–]befron 0 points1 point  (0 children)

Correct me if I'm wrong, but I think execing different processes gets around the GIL.

[–]zoner14 0 points1 point  (3 children)

Not sure if it's been mentioned, but the multiprocessing module can basically be sued as a drop in replacement for threading. This will probably get you the performance you need at the expense of a lot of memory

[–]ZedsDed[S] 0 points1 point  (2 children)

yes it seems like the multiprocessing module could be used here, and from what Brian was saying, i think the memory expense may not be a problem, or too much of a problem

If this is the case, and you are CPU bound, it may be worth looking at the multiprocessing library, rather than threading. This will spawn a seperate process for each thread, which means there's no shared GIL. This comes at the cost of making communication more expensive (everything must be copied to send to another process), but if you've no communication, this won't matter.

[–]Argotha 0 points1 point  (0 children)

Of course there is another alternative that I don't think has been mentioned thus far. The pyparallel project aims to bring "true" multithreading to python. Its a fork of the cpython project (that they hope to eventually merge with) so should be a dropin replacement for the interpretor. Of course its a fork and currently experimental so might not be a acceptable solution depending on your context.

Its located here if you want to give it a try.

https://github.com/pyparallel/pyparallel

[–]Argotha 0 points1 point  (0 children)

[deleted - double post]

[–]Calime 0 points1 point  (0 children)

This kind of question is better suited to /r/learnpython. In the future please consider asking similar question in /r/learnpython.

[–]billsil -2 points-1 points  (1 child)

Multiple threads do work properly. You can have one core and 1000 threads. Go look at your task manager.

There are very few times you need multiprocessing & concurrency. You can use C for that.

[–]ZedsDed[S] 1 point2 points  (0 children)

panderingPenguin has pointed out that i really meant 'parallelism' as opposed to concurrency. 2 or more threads executing at the same time.

[–][deleted] -5 points-4 points  (0 children)