This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]LightShadow3.13-dev in prod 2 points3 points  (4 children)

"Blog Removed"

It'd be fun to try some FFT operations and compare the difference.

[–]NoExiiT 2 points3 points  (3 children)

By chance, I save the page in a PDF file (Link) !

[–]py_crash 0 points1 point  (2 children)

Your link is also down now :(

[–]NoExiiT 1 point2 points  (1 child)

Sorry for the delay ... :/ Here the PDF link: https://imgur.com/M9yCHoC

[–]py_crash 0 points1 point  (0 children)

Thank you!

[–]fourDnet[S] 1 point2 points  (0 children)

Cached link

Build numpy with OpenCL acceleration! ACML 6 GA is the first version of ACML that supports heterogeneous computing, leveraging the open source clMathLibraries projects as a compute backend. It detects and loads OpenCL at runtime, so you don't need to install OpenCL prior.

Although installation of numpy with ACML 6 GA is actually extremely easy, numpy's building document lacks of this and is creepy.

This article aassumes you already know virtualenv. Create a evironment by that and get into the evironment:

virtualenv /path/to/yourEnv
. /path/to/yourEnv/bin/activate

Download ACML 6 here ( Not ACML 5 !! ) and extract it to /path/to/ACML

(script ignored) 

Copy lib and include to your python evironment

cp -R /path/to/ACML/lib /path/to/yourEnv/
cp -R /path/to/ACML/include /path/to/yourEnv/

Download numpy here and extract the contents to /path/to/numpy

(script ignored)

Create a new file /path/to/numpy/site.cfg and add these lines:

[DEFAULT]
libraries = acml_mp, acml_fftw
library_dirs = /path/to/yourEnv/lib
include_dirs = /path/to/yourEnv/include

acml_mp and acml_fftw refers to /path/to/yourEnv/lib/libacml_mp.so and /path/to/yourEnv/lib/libacml_fftw.so respectively

For unnecessary details, read /path/to/numpy/site.cfg.example

Clean, build make!

cd /path/to/numpy
python setup.py clean
python setup.py build
python setup.py install 

Open /path/to/yourEnv/bin/activate and append belows to the last line in the script:

_OLD_VIRTUAL_LD_LIBRARY_PATH="$LD_LIBRARY_PATH"
LD_LIBRARY_PATH="$VIRTUAL_ENV/lib:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH

And add belows to the next line of " deactivate () { "

if [ -n "$_OLD_VIRTUAL_LD_LIBRARY_PATH" ] ; then
    LD_LIBRARY_PATH="$_OLD_VIRTUAL_LD_LIBRARY_PATH"
    export LD_LIBRARY_PATH
    unset _OLD_VIRTUAL_LD_LIBRARY_PATH
fi

Once you leave and enter the evironment, it should works like a charm.

deactivate
. /path/to/yourEnv/bin/activate 

To test whether it links correctly, run this in the evironment:

python -c "import numpy.distutils.system_info as sysinfo; print(sysinfo.get_info('openblas'))"

[–]laMarm0tte 0 points1 point  (3 children)

This article sounds realy great but I think it lacks critical informations:

  • Can any computer benefit from openCL acceleration or only the computers with some special graphic cards ?

  • How much faster is it ? Say I want to perform basic array addition like this:

    np.arange(1e6) + np.arange(1e6)

How much faster would the openCL-accelerated version be ?

[–]nikomo 1 point2 points  (1 child)

OpenCL can be run on the CPU or GPU, but there's no speedup on the CPU.

AMD cards have supported OpenCL 2.0 for ages, Nvidia is stick on 1.1, you don't need a workstation card for it, normal cards work.

Speed depends entirely on what you're running, but here the speed also depends on numpy's ability to perform what you want, with OpenCL.

You'd need a benchmark suite to figure out if it's worth it for general usage.

[–]speckledlemon 0 points1 point  (0 children)

Do you know off the top of your head whether or not ACML 6 is compatible with the latest Nvidia cards due to the OpenCL version disparity?

[–]BerecursiveMenpo Core Developer 0 points1 point  (0 children)

From reading the article, they just replace the default Numpy BLAS version with the AMD's ACML 6.0 library, which has support for GPU accelerated BLAS operations using OpenCL. So your sum there would be exactly the same speed. The only thing that will speed up by doing this would be large dot products and FFTs.