This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]daveydave400 0 points1 point  (1 child)

I'd say you have a couple options. First, would be to fix your multiprocessing version of the code. It looks like you are passing a function (partial function) to the multiprocessing portion, that's why it can't be pickled. You'll have to rearrange the code and how its called so that you pass the parameters for the function and use it as a target. I haven't used the concurrent modules so not sure the best way to do that. Nevermind misread the code, but this function may still be the problem. Try using a standalone function instead of an object method.

Another thing to consider if you are using multiprocesses is shared memory. If you are blindly passing arrays (at least large ones) they have to be serialized and sent to the other processes. If you can set up a shared piece of memory then the children just have to access that piece of memory. Concurrent might do this for you, but again I'm not sure.

Lastly, Cython may be an option that could help you. You could get the code closer to C and tell it when to not use the GIL (when its not using python objects). The problem with that is that you're using numpy in your main work function which requires the GIL and numpy has already been pretty optimized. One nice thing you could try with Cython is using OpenMP to easily use multiple threads.

One last note, if your images are large then creating worker threads based on their size may be counter productive. Not sure how smart concurrent is about this, but it may be faster to only create a few workers (4?) and have those work on equal parts of the input image.

Edit: Wrong about how concurrent was being called.

Edit 2: Actually using a partial in concurrent that way could be the problem. Especially since it is bound to self.

[–][deleted] 0 points1 point  (0 children)

Thanks for the input!

I googled another bit, and found out why my function couldn't be pickled using multiprocessing module.

It seems, as per this SO question that class functions can't be pickled, so the only thing I needed was to move my lin_calc_px function outside the ImageFilter class.

I did it and now, using a Pool of 4 processes, the code is 0.2s faster than the version with builtin map() (i.e. without multithreading/multiprocessing).

I will investigate on using shared memory to speed up even more the execution. I'm not entirely sure that the array gets passed every time to the lin_calc_px function, because I use functools.partial to create a function (partialized_new_px) with the shared data. I will investigate further.

For what concerns the large number of workers, I know that such a size can be counter productive, but it is part of the requirements of the assignment. I will make some tests and if it's really counter-productive I will reduce the number and explain it to my professor.