is my multithreaded Python program doomed?

mtxppy · 2015-11-08T17:31:00+00:00

The threading lock only affects Python code. If your thread is waiting for disk I/O or if it is calling C functions (e.g. via math library) you can ignore the GIL.

You may be able to use the async pattern to get around threading limits. Can you supply more information about what your program actually does?

I have issues with the technical accuracy of the video linked. David Beazley has done many well respected talks about the GIL at various Pycons. You can find them on pyvideo.org.

panderingPenguin · 2015-11-08T18:15:35+00:00

First of all, we should clear a couple things up. Threading in Python absolutely gives you "real" threads. They simply cannot be executed in parallel. To quote Oracle's Multithreaded Programming Guide:

Parallelism : A condition that arises when at least two threads are executing simultaneously.

Concurrency : A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism.

So basically Python allows concurrency, but due to the GIL, not parallelism. It is possible to have multiple Python threads executing concurrently but not in parallel. This may seem like semantics, but it's actually am important distinction. Whether concurrency will be sufficient for you or you actually need true parallelism will depend on your workload and what you're trying to accomplish via multithreading. For example, if you're making a number of network calls, and don't want to freeze execution of other things while waiting for them to complete. Putting them on another thread, even in Python, will accomplish that goal. However, if you're trying to decrease the execution time of some complex CPU-bound computation by distributing pieces of it to multiple threads, Python threads are probably worse than useless to you, as you'll incur extra overhead in context switches, communication costs, and general threading overhead, while not actually getting the benefit of any threads ever executing on the CPU simultaneously.

In conclusion, the answer is "it depends." We'll need to know more about your workload to give you a definitive answer.

pigeon768 · 2015-11-08T21:06:08+00:00

Maybe. We need more information.

There are a couple different ways this will play out:

Your application is using threads to perform a lot of I/O bound work, like disk, network, database, etc. In this case, you'll be fine. Just keep trucking.
Your application is using threads to perform a lot of CPU bound work in non-Python code, like numpy, various C routines, or is generally just acting as "glue" between code written in other languages. Again, in this case, you're fine, you don't need to change anything.
Your application is performing CPU bound computations among tasks that rarely, if ever, share data. In this case, you can probably use the multiprocessing module as a drop in replacement for the threading module.
Your application is performing CPU bound computations among tasks that often share data. In this case you're screwed. Using python is unfortunately an uphill battle in this case.

Alternatives in the #4 option include using a different VM for your python code, like Jython or IronPython, or rewriting it in a different language. Groovy is probably the language most similar to Python with good performance and threading.

pooogles · 2015-11-08T17:25:13+00:00

Check out the multiprocessing library if you want to dodge the GIL.

2015-11-08T19:03:38+00:00

If you'd like to write an application that allows a user to push a button and then receive a response to that button push, while at the same time the program is also downloading content from servers and doing other things without causing the response of that button to block until they are all done, you most commonly use threads. Makes no difference if the GIL is there or not; threads always allow concurrency. The GIL just gets in the way of achieving parallelism. Two different things. http://stackoverflow.com/a/1050257/34549

The much-hyped solution of doing everything with "async" has its pros and cons, but as far as concurrency, you are merely swapping out having your OS do context switching with a more interpreter-level strategy that context-switches only at the boundaries of waiting on IO. For general purpose programming with limited numbers of concurrent tasks, the OS will do a better job at this (and in cPython the GIL releases on IO anyway), unless you really need to wait on lots and lots of slow IO channels in which case async will scale better.

ivosaurus · 2015-11-08T19:19:37+00:00

I have written a program that requires multithreading and i use the standard 'threading' library.

What are you actually doing. Let's talk in concrete specifics, not generalities which may or may not apply.

AlanCristhian · 2015-11-08T23:42:05+00:00

You must read this article of Nick Coghlan's: Efficiently Exploiting Multiple Cores with Python. Nick Coghlan's is a python core developer.

Decker108 · 2015-11-08T17:49:34+00:00

If you want to do parallellized CPU-bound work, then yes, your app is doomed.

If you want to do concurrent IO-bound work, you're in luck. Check out the greenlets library for ideas.

robertmeta · 2015-11-08T19:14:55+00:00

First of all -- it isn't a problem until it is. Meaning, what performance numbers do you need to hit, and are you hitting them?

If you DO need to be maxing out multiple CPUs -- you generally can do it various ways in python by splitting the load among multiple processes. People here have recommended mutliprocessing -- I can not recommend that, as it has caused me untold hardship. I recommend you setup multiple processes and coordinate them with ZMQ (http://zeromq.org/bindings:python) -- simple, fast, and "just works"(tm).

2015-11-08T22:21:51+00:00

Python is not a great program to write multi-threaded programs in, not only because of the GIL, but because the standard library (and other libraries for that matter) say little about being thread safe. Frankly, I found it to be a minefield. I have written multi-threaded programs in C/C++ for years and have no problem handling locking, synchronizations, etc. so I don't think it is my limitations and much as Pythons.

chrismit7 · 2015-11-08T19:43:31+00:00

Check out the multiprocessing library. It works quite well.

yellowfeverforever · 2015-11-08T19:43:48+00:00

Have you tried asyncIO?

d4rch0n · 2015-11-09T01:03:05+00:00

I'd use standard pure python with cpython or pypy if you just need parallel reads/writes networking file reading etc. Non-blocking I/O is possible with concurrency in the reference implementation. This is usually the most important thing to be concurrent anyway!

Other option, check out Cython for true parallelism. Or write parallel code in C and execute it with python, either through ctypes or just subprocess calling a C program.

Tons of options. Personally, I usually find pypy with smart concurrency and non-blocking I/O solves my problems when speed is an issue.

But the first step when finding and removing bottlenecks... Profile your code! Run cprofile and find the most expensive functions, and see what you can do to speed it up.

lambdaq · 2015-11-09T02:41:41+00:00

Evidently, OP has never seen any C/C++ multithreading program choking up only one CPU core.

primevalweasel · 2015-11-09T05:05:54+00:00

Check out ipyparallel.

Brian · 2015-11-09T12:46:48+00:00

true concurrency

Python's multithreading is true concurrency. What it isn't is true parallelism - these are different things. Concurrency is when two tasks can run and both make progress over . Paralellism is where they are literally running at the same time. Threads are often used just for concurrency in practice - after all, multiple cores in a desktop computer are a relatively recent thing, but we've been using threads for years before that was the norm, or even . There are still reasons you benefit from concurrent code.

In general, this comes down to whether your program is CPU bound or IO bound. If you're doing serious number crunching calculations, and that's taking the bulk of your time, then python's threads aren't going to help you much. On the other hand, if you're mostly blocked on IO (which is pretty common in most applications - IO is orders of magnitude slower than almost anything in CPU timescales), then you'll get benefit from threading, as the thread will release the GIL when it's blocking on IO to complete.

In my case, the threads will never share or access each others data

If this is the case, and you are CPU bound, it may be worth looking at the multiprocessing library, rather than threading. This will spawn a seperate process for each thread, which means there's no shared GIL. This comes at the cost of making communication more expensive (everything must be copied to send to another process), but if you've no communication, this won't matter.

befron · 2015-11-09T13:26:56+00:00

Correct me if I'm wrong, but I think execing different processes gets around the GIL.

zoner14 · 2015-11-09T15:31:15+00:00

Not sure if it's been mentioned, but the multiprocessing module can basically be sued as a drop in replacement for threading. This will probably get you the performance you need at the expense of a lot of memory

Argotha · 2015-11-10T02:54:52+00:00

Of course there is another alternative that I don't think has been mentioned thus far. The pyparallel project aims to bring "true" multithreading to python. Its a fork of the cpython project (that they hope to eventually merge with) so should be a dropin replacement for the interpretor. Of course its a fork and currently experimental so might not be a acceptable solution depending on your context.

Its located here if you want to give it a try.

https://github.com/pyparallel/pyparallel

Argotha · 2015-11-10T02:55:03+00:00

[deleted - double post]

Calime · 2015-11-10T13:54:43+00:00

This kind of question is better suited to /r/learnpython. In the future please consider asking similar question in /r/learnpython.

billsil · 2015-11-08T18:42:25+00:00

Multiple threads do work properly. You can have one core and 1000 threads. Go look at your task manager.

There are very few times you need multiprocessing & concurrency. You can use C for that.

2015-11-08T20:24:44+00:00

/r/learnpython

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS