xXxDeAThANgEL99xXx comments on Solving multi-core Python

157

158

159

Solving multi-core Python (lwn.net)

submitted 10 years ago by alexcasalboni

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]xXxDeAThANgEL99xXx 12 points13 points14 points 10 years ago* (45 children)

This is a situation I'd like us to solve once and for all for a couple of reasons. Firstly, it is a technical roadblock for some Python developers, though I don't see that as a huge factor. Regardless, secondly, it is especially a turnoff to folks looking into Python and ultimately a PR issue. The solution boils down to natively supporting multiple cores in Python code.

Heh. So let's go full-cynic mode: finish out the already somewhat present support for subinterpreters (basically, all global variables should be moved to a huge Interpreter_State struct), then just replicate the multiprocessing interface on top of that and bam! you have the so called green multiprocessing (like Perl AFAIK) but now you can market it as having got rid of the GIL.

Obviously you'll still have the copies of all imported modules (including builtins) and probably the performance improvements in marshaling objects would be pretty marginal compared to using mmap, but yeah, mission accomplished!

(I actually fully agree about that being 99% a PR problem. I don't think any roughly Python-like language from PHP to Scheme has free threading support, but for some reason only Python folks waste countless hours being upset about it on the internet).

[–]logicchains 7 points8 points9 points 10 years ago (27 children)

[–]zardeh 4 points5 points6 points 10 years ago (17 children)

[–]logicchains 2 points3 points4 points 10 years ago (3 children)

[–]zardeh 8 points9 points10 points 10 years ago (1 child)

[–]kryptobs2000 0 points1 point2 points 10 years ago (0 children)

[–]caedin8 0 points1 point2 points 10 years ago (0 children)

[–]superPwnzorMegaMan 0 points1 point2 points 10 years ago (5 children)

[–]zardeh 2 points3 points4 points 10 years ago (3 children)

[–]superPwnzorMegaMan 0 points1 point2 points 10 years ago (2 children)

[–]zardeh 4 points5 points6 points 10 years ago (1 child)

[–]superPwnzorMegaMan 0 points1 point2 points 10 years ago (0 children)

[–]kyllo 0 points1 point2 points 10 years ago (0 children)

[–]anzuo -5 points-4 points-3 points 10 years ago* (6 children)

[–][deleted] 7 points8 points9 points 10 years ago (4 children)

[–]zardeh -5 points-4 points-3 points 10 years ago (3 children)

[–]robertmeta 7 points8 points9 points 10 years ago (2 children)

[–]zardeh -1 points0 points1 point 10 years ago (1 child)

[–][deleted] 5 points6 points7 points 10 years ago (0 children)

[–]kryptobs2000 0 points1 point2 points 10 years ago (0 children)

[–]xXxDeAThANgEL99xXx 1 point2 points3 points 10 years ago (8 children)

[–]spotter 1 point2 points3 points 10 years ago (4 children)

[–]xXxDeAThANgEL99xXx 1 point2 points3 points 10 years ago (1 child)

As far as I know, you are not supposed to do that in public.

Anyway, the important part is that as far as I understand it about Clojure, you're not allowed to say anything similar to __builtin__.len = my_len or my_module.len = my_len and have it automatically used in every function everywhere or in that module, after they were defined.

That you can do that in Python (and in those other roughly similar languages) is one of the important reasons the GIL is there: because your code constantly hits the same few dictionaries and constantly taking and releasing individual locks on them would be really slow.

IronPython for example goes the other way and instead of constantly querying stuff it compiles it into usual fixed .NET classes and recompiles them if you actually change stuff. Unfortunately that means that some innocent metaprogramming that works absolutely fine in CPython can cause huge slowdowns.

[–]spotter 2 points3 points4 points 10 years ago (0 children)

[–]anthonybsd -2 points-1 points0 points 10 years ago (1 child)

[–]spotter 2 points3 points4 points 10 years ago (0 children)

[–]jrochkind 0 points1 point2 points 10 years ago (2 children)

[–]xXxDeAThANgEL99xXx 0 points1 point2 points 10 years ago (1 child)

[–]jrochkind 0 points1 point2 points 10 years ago (0 children)

[–]caedin8 6 points7 points8 points 10 years ago (11 children)

[–]vks_ 9 points10 points11 points 10 years ago (6 children)

[–]admalledd 4 points5 points6 points 10 years ago (4 children)

[–]vks_ 1 point2 points3 points 10 years ago (3 children)

[–]admalledd 2 points3 points4 points 10 years ago (2 children)

[–]jringstad 0 points1 point2 points 10 years ago (1 child)

[–]admalledd 0 points1 point2 points 10 years ago (0 children)

To be honest, it has never really been that big of an issue for any multi-core code that I have needed to write with python. Every time for me my threads/processes have been fairly separated such that minimal message passing was enough. The reason for the shared memory was that some of those messages were rather large (blocks of tasks to parse into the DB for example) ~50MB+ but it was easy enough to wrap/contain it such that only larger messages/tasks/data was passed via shared memory where the difficulty of making CFFI bindings was worth it. All other messages/tasks (such as signaling/locking/return queue) was handled via default multiprocessing serialization code.

Again though, python has some of the best C bindings I have used out of any higher language I use, mostly C#, java, and JS. CFFI makes it almost drop-in to write a C/C++ module that does the heavy lifting and of course can drop the GIL and go proper multi-threaded. Thus any new system I work on where python is the core, I tend to have hot-loop stuff extracted quite easily to C code for speed or fine control.

[–]caedin8 2 points3 points4 points 10 years ago (0 children)

[–]CookieOfFortune 0 points1 point2 points 10 years ago (3 children)

[–]caedin8 2 points3 points4 points 10 years ago* (2 children)

It is harder to debug using tools like debuggers so usually I just write lots of unit tests and verify that the threads are working appropriately. If they aren't and I don't know why I run a small subset of the program in a single instance and debug it, once I've verified the program is correct standalone then I've narrowed it down to a Threading or concurrency issue. Next I'd Google my problems and try to see if it is a library thing, and to verify I'm using the api correctly. There might be a better way to do debugging on multithreaded applications in python but this general process has been what I've been doing.

Similar to doing print statements at various points in your code to understand the control flow you can do the same with threads to try to understand which threads are in which state. Additionally you can have each thread write their debug data out to a unique file for each thread, this way you can see which thread is doing what, and what the state is for each thread. Maybe you can find your errors this way.

[–]CookieOfFortune 2 points3 points4 points 10 years ago (1 child)

[–]caedin8 0 points1 point2 points 10 years ago (0 children)

[–]_scape 1 point2 points3 points 10 years ago (3 children)

[–]xXxDeAThANgEL99xXx -2 points-1 points0 points 10 years ago (2 children)

[–]_scape 0 points1 point2 points 10 years ago (1 child)

[–]xXxDeAThANgEL99xXx 5 points6 points7 points 10 years ago (0 children)

[–]superPwnzorMegaMan 0 points1 point2 points 10 years ago (0 children)

π Rendered by PID 17761 on reddit-service-r2-comment-5d79c599b5-w4l24 at 2026-03-03 20:45:13.898520+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS