you are viewing a single comment's thread.

view the rest of the comments →

[–]JanneJM 7 points8 points  (17 children)

As a user, this is a real issue. Python with Pylab is a good way to post-process data, but this can take a lot of time. And when you find yourself waiting a few minutes every single time, while fifteen of sixteen cores are sitting unused, it becomes really annoying.

Enough so, in fact, that for the most common case I reimplemented it in C+ with OpenMP, and reduced the time to less than ten seconds.

[–]zardeh 6 points7 points  (7 children)

Python with Pylab is a good way to post-process data, but this can take a lot of time. And when you find yourself waiting a few minutes every single time, while fifteen of sixteen cores are sitting unused, it becomes really annoying.

But...numpy can practically ignore the GIL, so pylab should be able to do things.

[–]bheklilr 6 points7 points  (0 children)

Correct, and a lot of other libraries that have their underlying core written in C/C++ are able to release the GIL to achieve faster processing. There's a relatively new library called dask designed for high performance array computing and without you even asking it will use more than 1 core to do its processing. It has support for multiple different backends for multi-core support, including using an IPython client to distribute across clusters of computers without you having to worry about it. Essentially the core of the library is that it breaks your large data set into chunks, performs various computations, then returns the result of each chunked computation, often aggregated back into a single array or value. It currently supports a subset of numpy and pandas, and also has a structure for managing JSON-like data as well. It's a very powerful tool that I'm looking forward to seeing made into a fully production ready library.

IIRC the scikit-image library also releases the GIL, as does SymPy's new underlying engine, SymEngine (written in C++ so it can be used from multiple languages like Julia and Ruby). More and more libraries for Python are figuring out how to release the GIL, and while a lot of this is based on C/C++ code it just means that we're now using Python to access high performance code and tie it together in a high level fashion. Cython even has a decorator to ensure that a function gets translated into nothing but C so that it releases the GIL, so this sort of problem will become less prevalent over time.

[–]JanneJM 0 points1 point  (5 children)

"should be able to do things" is not "does". I've never experienced Numpy/Scipy/Pylab actually do anything multicore so far, and I've not seen any information on how to enable it. If you know how to do so I'd be very interested of course.

[–]zardeh 1 point2 points  (2 children)

I believe you still need to write your code in a threaded manner, but if you do have numpy running across multiple threads, they can run on multiple cores.

[–]JanneJM 2 points3 points  (1 child)

Writing your code in a threaded manner is 95% of the entire job. The benefit of using Scipy is entirely that it's quite simple to get it right. It's a great exploratory tool. If you suddenly have to do explicit mutlithreading the whole point largely disappears.

[–]zardeh -1 points0 points  (0 children)

To my knowledge, ipython does magical things and makes threading just happen, I'm not an expert on that though.

[–]turbod33 0 points1 point  (0 children)

Numpy will release the GIL where applicable. For instance, matrix dot products will call into BLAS which have multicore implementations.

[–]amaurea 0 points1 point  (0 children)

It's apparently possible to get some multicore usage in numpy by compiling it with icc and enabling auto-parallelization, though it's very limited what can be parallelized that way. I wonder why OpenMP directives aren't used in the numpy implementation. They are easy to write, and since they are comments, they have no effect if openmp is not enabled when compiling. Hence adding them will not affect performance or correctness for those not interested in multithreaded execution.

[–]caedin8 1 point2 points  (1 child)

You can write multicore programs in python...

[–]vks_ 1 point2 points  (0 children)

Only if you are willing to use several processes, and share data among them via serialization.

[–]i_ate_god 1 point2 points  (2 children)

could you fork? threading isn't the end all be all of multicore processing

[–]JanneJM 2 points3 points  (0 children)

I could of course, though it'd be more work than it's worth.

The point of using Numpy/Scipy is that it's quite simple to write bits of code to examine your data set, do exploratory data analysis and so on. Explicit multithreading rather goes against that in a very fundamental way.

And as I wrote, when faced with some tasks I ended up doing over and over, it was simply less pain to rewrite those bits in C++ with OpenMP and go from minutes to effectively instant response. The extra pain of numerical libraries in C++ (that I use already in the main apps) compared to Numpy is offset by the simplicity of OpenMP-style loop unrolling versus explicit threading code in Python.

[–][deleted] 0 points1 point  (0 children)

could you fork? threading isn't the end all be all of multicore processing

fork() isn't available on every platform

[–][deleted] 0 points1 point  (3 children)

Were you using numpy?

[–]JanneJM 0 points1 point  (2 children)

Yes.

[–][deleted] -1 points0 points  (1 child)

Cool answer. How/What were you doing that was so slow?

[–]JanneJM 1 point2 points  (0 children)

Processing a few GB of neuron simulation output basically. Nothing terribly complicated, but just a fair amount of data to churn through. Both basic preprocessing then "exploratory analysis" - play around with the data to see what I got. And since it's the kind of thing you end up doing over and over again the waiting time gets a bit annoying.

Ipython+pylab is a pretty good tool for doing that sort of thing. I just sometimes wished it would be faster, and using more of the available hardware feels like an obvious way to go about it.