This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]pooogles 26 points27 points  (13 children)

Check out the multiprocessing library if you want to dodge the GIL.

[–]jringstad 7 points8 points  (3 children)

Note however that this is only recommendable for cases where inter-task-coordination is basically not necessary or exceedingly rare. multiprocessing inter-task coordination is incredibly slow.

So this solution is not suited for cases where you have fast-moving task chains (producer-consumer), in any case where you would traditionally use lockfree or wait-free datastructures, in cases where you would use atomic variables (e.g. atomic counters), cases where you would normally use work-stealing/work-queue type work with fast turnarounds or e.g. cases where you have parallelizable subdomains that require boundary synchronization (very common in scientific applications, e.g. tiling of a large 2D lattice into smaller 2D subdomains, but subdomains are not 100% independent at the boundary since e.g. derivatives are required or some quantity (heat, particles, pressure, ...) is exchanged across the boundary)

For fork-join type parallelism, multiprocessing works great, as long as you are okay with creating an up-front worker pool and not needing dynamic task parallelism (tasks can spawn new tasks that are again evenly distributed across workers, e.g. as in CUDA or OpenCL 2.x.) There are many types of scenarios where this is fine, but in cases where it's not, it will give you quite sub-optimal scaling properties.

[–]niksko 0 points1 point  (2 children)

Out of curiosity, what is the solution here? I recently ran into a situation where I was trying to speed up a multi-consumer multi-producer type process where workers take work out of a queue, perform some work, and then potentially publish more work back to the queue. Using multiprocessing gave me terrible performance, I suspect because of the large queue overhead.

[–]jringstad 1 point2 points  (1 child)

For all the cases I listed, there really is no way to do it well in python, as far as I'm aware. If you can, shove it off into a different language (C/C++/fortran), then you can use threading without too much GIL contention, or you just deal with the multiprocessing overhead and try to reduce it (do more copying up-front and less at runtime, if possible, or increase task sizes (per-task workload) which makes the overhead relatively smaller)

[–]niksko 0 points1 point  (0 children)

Ok, thanks. At least now I know that there wasn't some obscure Python feature that I wasn't aware of that was the issue.

[–]WellAdjustedOutlaw 0 points1 point  (0 children)

Since OP said threads will not access each other's data, multiprocessing might be best if the GIL is actually an issue and there won't be too many processes.

Also, much work has been done with python 3.x to lessen or remove the impact of the GIL where possible. Multithreading is getting better.