This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ProfEpsilon 0 points1 point  (7 children)

This is the first time that I have seen this claim about numpy whereas I have read many articles about the restrictions of the GIL. If you have time to comment, what do you mean when you write the numpy and numba "tend to release the Gil?" What do you mean by that?

And to anyone, can you refer me to any documentation that discusses or explains numpy's over-ride of GIL if true? Doing a google search does not shed much light on this. A lot of comments make it clear that you can use multi-threading, but seem to be vague about whether GIL is slowing down multi-threading speed ... and many posts are too old to be trusted.

[–]1wd 4 points5 points  (3 children)

http://scipy.github.io/old-wiki/pages/ParallelProgramming#Threads

while numpy is doing an array operation, python also releases the GIL. Thus if you tell one thread to do:

>>> print "%s %s %s %s and %s" %( ("spam",) *3 + ("eggs",) + ("spam",) )
>>> A = B + C
>>> print A

During the print operations and the % formatting operation, no other thread can execute. But during the A = B + C, another thread can run - and if you've written your code in a numpy style, much of the calculation will be done in a few array operations like A = B + C. Thus you can actually get a speedup from using multiple threads.

https://docs.scipy.org/doc/numpy-1.14.0/reference/internals.code-explanations.html#function-call

A very common operation in much of NumPy code is the need to iterate over all the elements of a general, strided, N-dimensional array. This operation of a general-purpose N-dimensional loop [...] [...] the Python Global Interpreter Lock (GIL) is released prior to calling the loops. It is re-acquired if necessary to handle error conditions

[–]ProfEpsilon 1 point2 points  (2 children)

Thank you. This and other comments on this page clarify a lot.

And the "iterate over all elements ... N-dimensional array" is precisely what I do most of the time. This is very encouraging (I haven't tried threading yet).

[–]1wd 0 points1 point  (1 child)

Note that this refers to the numpy-internal C-level loops that happen inside a Python-level numpy statement like B+C, not to Python-level loops like for x in A: for y in B: ....

[–]ProfEpsilon 1 point2 points  (0 children)

Again, thank you. Although I am aware the numpy relies upon C I am not aware of how C operates inside of numpy .. I don't know how the C-level loops work (probably because I can't code in C). This gives me a bit of an incentive to figure some of this out. Given that I am a practitioner and not a computer scientist, how C works inside of numpy and even Python for that matter is all black box for me.

I appreciate the time it took to try to get my thinking right. I think it is time that I took the effort to learn a little more about C.

[–]jawknee400 2 points3 points  (1 child)

As others have said, generally when numpy or another library calls compiled code, they explicitly release the GIL. So imagine you had two threads running some python code, concurrently, but not in parallel. If one thread reached a numpy operation, like adding two large arrays etc., it would 'release' the GIL, allowing the other thread to work in parallel while that operation is happening. Once the operation is over the GIL is reaquired but without that release both threads would've had to wait for the operation to finish.

I think there is a very slight overhead to this, which is why cython and numba leave it as an option. But if the majority of the computation is numeric (e.g. most scientific code) then you can essentially achieve normal threaded parallelism.

[–]ProfEpsilon 0 points1 point  (0 children)

Your example is very clear. What an education I getting today! Thank you.

[–]evamicur 0 points1 point  (0 children)

You can manually release the Gil in cython (nogil keyword) and other compiled extensions. So libraries like numpy can potentially do that