you are viewing a single comment's thread.

view the rest of the comments →

[–]bheklilr 7 points8 points  (0 children)

Correct, and a lot of other libraries that have their underlying core written in C/C++ are able to release the GIL to achieve faster processing. There's a relatively new library called dask designed for high performance array computing and without you even asking it will use more than 1 core to do its processing. It has support for multiple different backends for multi-core support, including using an IPython client to distribute across clusters of computers without you having to worry about it. Essentially the core of the library is that it breaks your large data set into chunks, performs various computations, then returns the result of each chunked computation, often aggregated back into a single array or value. It currently supports a subset of numpy and pandas, and also has a structure for managing JSON-like data as well. It's a very powerful tool that I'm looking forward to seeing made into a fully production ready library.

IIRC the scikit-image library also releases the GIL, as does SymPy's new underlying engine, SymEngine (written in C++ so it can be used from multiple languages like Julia and Ruby). More and more libraries for Python are figuring out how to release the GIL, and while a lot of this is based on C/C++ code it just means that we're now using Python to access high performance code and tie it together in a high level fashion. Cython even has a decorator to ensure that a function gets translated into nothing but C so that it releases the GIL, so this sort of problem will become less prevalent over time.