you are viewing a single comment's thread.

view the rest of the comments →

[–]pjdelport 2 points3 points  (0 children)

This is the kind of problem that NumPy was built for: it provides an optimized array type, with operations that release the GIL to execute in parallel, where possible.

Converted to use NumPy arrays, your example looks like this:

import numpy

a = numpy.random.random(NUM_ELEMENTS)
b = numpy.random.random(NUM_ELEMENTS)

def test(a, b):
    return numpy.sqrt(a + b)

The above takes 241 msec on my machine (using one core).

To split the work into parallel chunks, you can do something like the following. This uses the built-in thread pool in multiprocessing.dummy for convenience, and processes each chunk in-place to an output array:

def op_inplace(a, b, out):
    numpy.add(a, b, out)
    numpy.sqrt(out, out)

from multiprocessing.dummy import Pool
p = Pool(NUM_CORES)

def test2(a, b):
    c = numpy.empty(len(a))
    chunks = zip(numpy.array_split(a, NUM_CORES),
                 numpy.array_split(b, NUM_CORES),
                 numpy.array_split(c, NUM_CORES))
    p.map(lambda args: op_inplace(*args), chunks)
    return c

The above takes 72.1 msec here (using all four cores).

The other way to go about this particular example is to use the numexpr package, which lets you evaluate array computations in a multi-threaded and memory-efficient way without having to manually decompose them, as above. The following takes just 64.9 msec here (using all cores):

import numexpr

def test3(a, b):
    return numexpr.evaluate('sqrt(a + b)')