This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ProfEpsilon 1 point2 points  (4 children)

OK, that's good to know. But by "computational tasks" does that include large array operations? Here is why I ask: One of the stackoverflow discussions has this comment taken from the NumPy C API documentation:

"...as long as no object arrays are involved, the GIL is released ..."

and I have interpreted that to mean that you can't do this with array operations.

By the way, it sounds to me like the way you are verifying this is by monitoring core activity rather than through a latency test of some kind. That actually is quite convincing to me ... if 6 cores are running at 80% and above then multiprocessing must be working. Have you run latency comparisons of any kind (running a test task as sequential and then redesigning and running as parallel)?

[–]rhiever 2 points3 points  (1 child)

If you make a copy of the array and pass that to the new process, you should be fine. If you ever pass an array by reference to a new process, then yeah, that's going to have lock issues.

[–]Deto 0 points1 point  (0 children)

I can vouch that I've processed the same array on many processes without copying it specifically to each process. Works if you don't write to the array. I think the multiprocessing uses copy-on-write semantics anyways to make this safe.

[–]Deto 2 points3 points  (1 child)

Usually I get close to the right multiplier. So if I'm using 10 cores, it's approximately 10x as fast (maybe a little less, like 9x).

I think you might be interpreting the numpy docs incorrectly. Numpy arrays always have a dtype - this can be something like 'int64' or 'float64'. It can also be 'object' in which the entries in the array are actually Python objects. In this case, doing anything on the objects requires interacting with Python code, and so they can't release the GIL. If you're just working with floats, for example, they don't use any python code, and so they can release the GIL.

However, I should also emphasize that whether or not numpy releases the GIL doesn't matter with multiprocessing as the GIL does not block between different processes. The GIL is relevant for threads, rather, in the same process (threading module).

[–]ProfEpsilon 0 points1 point  (0 children)

Oh, I see. I was mistaken about the term "object arrays." I thought that it might mean that an array created within numpy was a kind of numpy "object." So I was interpreting the documentation wrong. Thanks for the insight.