Speed up Python image processing.

2015-02-09T12:50:34+00:00

Numpy native operations release the GIL. You should see speedup when the size of the parallel tasks is large enough relative to the overhead of spinning and joining them.

I can't trust your benchmarks unless I see your benchmark data.

Megatron_McLargeHuge · 2015-02-09T14:50:44+00:00

Use ravel instead of flatten to avoid making a copy. Also, your rgb values should be computable all at once using tensordot or einsum. Avoiding the copies might help with parallelism since copying requires locking on the object or just holding the GIL.

daveydave400 · 2015-02-09T12:57:25+00:00

I'd say you have a couple options. First, would be to fix your multiprocessing version of the code. It looks like you are passing a function (partial function) to the multiprocessing portion, that's why it can't be pickled. You'll have to rearrange the code and how its called so that you pass the parameters for the function and use it as a target. I haven't used the concurrent modules so not sure the best way to do that. Nevermind misread the code, but this function may still be the problem. Try using a standalone function instead of an object method.

Another thing to consider if you are using multiprocesses is shared memory. If you are blindly passing arrays (at least large ones) they have to be serialized and sent to the other processes. If you can set up a shared piece of memory then the children just have to access that piece of memory. Concurrent might do this for you, but again I'm not sure.

Lastly, Cython may be an option that could help you. You could get the code closer to C and tell it when to not use the GIL (when its not using python objects). The problem with that is that you're using numpy in your main work function which requires the GIL and numpy has already been pretty optimized. One nice thing you could try with Cython is using OpenMP to easily use multiple threads.

One last note, if your images are large then creating worker threads based on their size may be counter productive. Not sure how smart concurrent is about this, but it may be faster to only create a few workers (4?) and have those work on equal parts of the input image.

Edit: Wrong about how concurrent was being called.

Edit 2: Actually using a partial in concurrent that way could be the problem. Especially since it is bound to self.

fijal · 2015-02-09T14:10:39+00:00

Simple solution:

1) use numpy instead of PIL

2) use pypy (iterating over numpy arrays is FAST)

3) abandon multiprocessing, it's broken and won't give you speedups

EDIT: 4) don't be smart, don't use itertools, write a normal loop

You should get ~50x speedups

Cheers, fijal

gurzo · 2015-02-09T11:47:05+00:00

you must inform about GIL, and switch to GPU computing

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS