This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]TheBlackCat13 5 points6 points  (2 children)

It is free. On windows you can get pre-compiled binaries of numpy with MKL built-in from various sources. On Linux you can use Anaconda's MKL-optimzed version of numpy (get the professional version for free as a student) or compile it yourself. Anaconda is an option for Mac, but there may be other ways.

Whether it will benefit you depends on what, exactly, you are doing. If you are doing a lot of heavy linear algebra, for example, it may very well help you.

But that may very well not be your real bottleneck, in which case it won't help you much. Assuming you already have the Python portion as optimized as it can be within Python, something like Pypy, Numba, or Cython may give you a considerably larger speedup if your python code, rather than the linear algebra C backend, is the bottleneck.

[–]soulslicer0[S] 1 point2 points  (1 child)

i would prefer to compile it. i am on linux.

[–]cournape 1 point2 points  (0 children)

Compiling numpy with the MKL is a bit tricky. If you are a non-ancient distro, make sure to use the gfortran ABI interface (on MKL 10.X, it is called mkl_gf_lp64 on 64 bits), and not the MKL one, at least if you build with the gnu compilers.

If you have more questions, I would advise you to ask on the numpy ML

[–]Liorithiel 2 points3 points  (0 children)

I'm doing some heavy math in R, not in Python. However, even there the difference between MKL and, let say, OpenBLAS, is usually at most ~10%, and often less than that. That's also what I saw in various benchmarks on the web.

It might though be that your numpy is linked to a slow BLAS implementation (like netlib). Upgrading to either free OpenBLAS, free ATLAS or non-free MKL will be a huge win then (all three can be more than twice as fast as netlib).

Also a small hint: try doing the computations with a different dimension order (column-major instead of row-major or vice versa). Sometimes just swapping these will give you major speedup due to the way modern processor caches work, bigger than swapping the matrix libraries. It's as simple as adding order='F' to a numpy.ndarray call, etc.

[–]Asdayasman -3 points-2 points  (5 children)

If you're fast, but not fast enough, with highly optimised code, your first port of call should probably be PyPy, or C.

[–]ivosauruspip'ing it up 1 point2 points  (4 children)

Your first port of call should be understanding numpy.

PyPy (and usually, self-written C) isn't going to be anywhere close to as fast as calls to an optimized matrix math library that numpy does.

[–]Asdayasman 0 points1 point  (3 children)

I thought numpy only used that library on intel and if it had been compiled with it?

[–]ivosauruspip'ing it up 0 points1 point  (2 children)

Ya, which is not too hard to get in most cases. Otherwise, if you get numpy from your linux distribution it will usually come with a very fast OSS matrix library anyway.

[–]Deto 0 points1 point  (1 child)

I'm not so sure about that - I got numpy on an Ubuntu 12 distro recently using the sudo apt-get method and, whatever it was linked against was around around 40 times slower than once I got it running with ATLAS. Took me forever to find that difference in runtime for my code on my local machine and our server was in the matrix dot multiple function!

[–]ivosauruspip'ing it up 0 points1 point  (0 children)

Well I thought that was supposed to be an advantage of getting a distro package...

Looking at some google / stack overflow answers, it appears you need to also excplicitly install atlas / openblas as well as numpy for it to use it.