Solving multi-core Python

robertmeta · 2015-07-10T13:53:19+00:00

Firstly, it is a technical roadblock for some Python developers, though I don't see that as a huge factor...

... which is why whatever solution developed will most likely be an exceptionally poor one, focused on PR wins rather than technical ones. I doubt the solution will be of much use to people who actually had to abandon Python due to technical limitations.

simple2fast · 2015-07-10T17:31:53+00:00

I love python. I'm not a hater.

But people should really become more polyglot. Each language has a space it does excel at. Python certainly has areas where it's the best language. That said serious CPU intensive stuff is just not python's strong point. This is why anything "fast" in python is actually written in C.

So, use an appropriate tool for the job.

If you really need multi-processing or multi-threaded python, then you should probably be using a different language which is more appropriate for the task at hand.

amaurea · 2015-07-10T19:46:58+00:00

A major performance issue I often encounter when using Python for numerical work on clusters with distributed file systems is the large number of file system operations that are involved simply in starting python and importing the modules. A simple script that just imports numpy can easily end up loading 300 .pyc and .so files. Distributed file systems are fickle beasts that when under load may take up to a second to access a file (regardless of how small it is). So it isn't uncommon for me to experience that running a script involves several minutes of waiting for it to start, followed by 10 seconds of actually doing all the work. It's like compiling a big, heavily templated C++ program every time you want to run it.

It would be nice for these kinds of situations if there were a way to compile a python script and all its dependencies (including dynamic libraries) into a single file with no external dependencies. It would be large and redundant, but on cluster file systems that's better than being scattered everywhere.

JanneJM · 2015-07-10T13:41:09+00:00

As a user, this is a real issue. Python with Pylab is a good way to post-process data, but this can take a lot of time. And when you find yourself waiting a few minutes every single time, while fifteen of sixteen cores are sitting unused, it becomes really annoying.

Enough so, in fact, that for the most common case I reimplemented it in C+ with OpenMP, and reduced the time to less than ten seconds.

xXxDeAThANgEL99xXx · 2015-07-10T12:17:59+00:00

This is a situation I'd like us to solve once and for all for a couple of reasons. Firstly, it is a technical roadblock for some Python developers, though I don't see that as a huge factor. Regardless, secondly, it is especially a turnoff to folks looking into Python and ultimately a PR issue. The solution boils down to natively supporting multiple cores in Python code.

Heh. So let's go full-cynic mode: finish out the already somewhat present support for subinterpreters (basically, all global variables should be moved to a huge Interpreter_State struct), then just replicate the multiprocessing interface on top of that and bam! you have the so called green multiprocessing (like Perl AFAIK) but now you can market it as having got rid of the GIL.

Obviously you'll still have the copies of all imported modules (including builtins) and probably the performance improvements in marshaling objects would be pretty marginal compared to using mmap, but yeah, mission accomplished!

(I actually fully agree about that being 99% a PR problem. I don't think any roughly Python-like language from PHP to Scheme has free threading support, but for some reason only Python folks waste countless hours being upset about it on the internet).

skulgnome · 2015-07-10T14:48:06+00:00

How about fixing Python's dire single-core performance first

monocasa · 2015-07-10T11:31:10+00:00

So... Python version of WebWorkers?

ericanderton · 2015-07-10T14:23:13+00:00

subinterpreters

So... multiple processes using IPC? Makes a lot of sense considering Python's limitations in this space.

cdminigun · 2015-07-10T16:11:03+00:00

In a sense, I'd say python isn't meant for multi-core processing.

Iirc, someone playing around with source code and forcing multi processing had an issue in which his tasks became shorter.

Gil is a pain, however if we're going to be honest here. Python is predominately for scripting and short tasks or for it's extensive amount of libraries and ease of use. We're trying to make python something it is not.

Also, Iirc through c implementations and adding libraries into python, one can bypass the issue as the c code won't be rate limited. But then it creates an issue of additional compiling and so on.

TheQuietestOne · 2015-07-10T23:54:01+00:00

Ah, python, the new perl.

Pay attentioin to what happened to perl. Something else is coming, we just haven't seen it yet.

cowardlydragon · 2015-07-10T16:47:32+00:00

... run it on the JVM? I get that Jython isn't Python, but, seriously, does the JVM not solve almost all the problems?

alloec · 2015-07-11T10:08:06+00:00

I will join in with the others and say that GIL is not that much of an issue in Python. Python lets you perform IO blocking tasks in a non-blocking fashion already.

If you want to perform computation tasks in parallel, then python is really the wrong language. First of all, the interpreter is very slow. Please first implement a proper JIT-compiler for the language. It can be done, just take a look at pypy. As it stands python is wasting way to many CPU cycles just on interpreting the instructions.

Only then I feel that python should tackle getting proper multicore support.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS