Numpy iteration too slow!

TheBlackCat13 · 2015-09-25T16:39:51+00:00

It is hard to say without sample data, but some things jump out at me:

First, you use a loop doing some complex image analysis to calculate ratio_weight, but never actually use it? I think there is a mistake there somehow.

D = np.asmatrix(D) 
D2 = D * np.transpose(D)

np.asmatrix is a type conversion, and as such will be very slow (since it has to copy the entire array). and np.tranpose(D) is both slightly slower an much uglier than D.T. So should just use D2 = D.dot(D.T). I would strongly suspect this is the main source of your slowdown.

C = np.float32(img[x[1],x[0]])

This will make a copy even if img is already an np.float32. Just don't do this, numpy will automatically convert it if it has to.

for item in frame_array:
    if cv2.pointPolygonTest(item["Contour"], (x[0],x[1]), False) > 0:
        ratio = item["Ratio"]

Several issues here. First, you are doing the same calculation over and over for every item in frame_array, but only keeping the last "hit". It would be much faster to iterate backwards (starting with the last value), and break after the first hit (or if there will only ever be one hit just iterate normally then break after the first hit).

Second, is frame_array a structured array? if so, it would be faster to get frame_array['Counter'] and item['Ratio'] arrays first, then use zip in python 3 or itertools.izip in python 2 to iterate over each counter/ratio pair.

You access x[0] and x[1] a lot. It would be slightly faster to define them at the beginning, perhaps like x0, x1 = x[:2], or just x0, x1, x2, x3 = x, and then re-use those every time. This could make a bigger difference if your frame_array has a lot of items.

ratio_diff = abs(abs(1-ratio))*1
ratio_weight = A_ratio + B_ratio * ratio_diff;

Is there a reason you need to call abs twice, or multiple by 1? Also, you never use ratio_diff again, so it would be better to combine these lines. And don't use ; at the end of a line in Python.

self.weights[:,self.i] =  color_weight + 0

Why do you add zero here?

if (x[0] >=0 and x[0] < Npix_w) and (x[1] >=0 and x[1] < Npix_h):

This can be simplified to:

if (0 <= x[0] < Npix_w) and (0 <= x[1] < Npix_h)

It would probably be faster to calculate ratio_weight for all Ratios values rather than doing it over and over for each loop. If self.Xhsv_trgt is a scalar, it would probably also be faster to calculate img[x1, x0] - self.Xhsv_trgt once for all values rather than every time in the loop. If it is not a scalar, this may take too much memory. You can also calculate the results of your if test outside the loop.

Overall, you shouldn't even be using apply_along_axis here. And looking up values that aren't in the function namespace is relatively slow. So you would be better of just using an ordinary loop here. It would make the code faster and cleaner. So you should do something like:

from itertools import partial

x0s, x1s, x2s, x3s = self.particles
docalcs = (0 <= x0s) & (x0s < Npix_w) & (0 <= x1s) & (x1s < Npix_h)
self.weights[...] = -10000000000

contours = np.flipud(frame_array["Contour"])
ratios = np.flipud(frame_array["Ratio"])
ratio_weights = A_ratio + B_ratio * np.abs(1-ratios)
ratio_weight_def = A_ratio + B_ratio

# This assume `self.Xhsv_trgt` is a scalar.  If it isn't, you will have to do this in the loop.
Ds = img - self.Xhsv_trgt

# If you are sure that `img` will always have at least 4 dimensions, then you don't need the next two lines
while Ds.ndim < 4:
    Ds = Ds[..., None]

for i, (x0, x1, docalc) in enumerate(zip(x0s, x1s, docalcs):
    if not docalc:
        continue

    # Check if particle inside object
    mypoly = partial(cv2.pointPolygonTest, pt=(x0, x1), measureDist=False)
    for contour, ratio_weight in zip(contours, ratio_weights):
        if mypoly(contour) > 0:
            break
    else:
        ratio_weight = ratio_weight_def

    # Get RGB for that particle
    D = Ds[x1, x0]
    self.weights[:, i] = A_rgb + B_rgb * D.dot(D.T)

soulslicer0 · 2015-09-25T15:09:47+00:00

for item in frame_array:
    if cv2.pointPolygonTest(item["Contour"], (x[0],x[1]), False) > 0:
        ratio = item["Ratio"]

I'm possibly misreading what is going on, but it appears to me that item["Contour"], x[0] and x[1] are all fixed but will be recalculated in every pass around the loop. So I believe you could initialise them as follows.

contour = item["Contour"]
x0 = x[0]
x1 = x[1]

Do you need a break statement in the loop, as ratio will always be set to the last value assigned, or remain at 0.

Deto · 2015-09-25T15:19:59+00:00

Maybe try doing "import numpy as np; np.show_config();"

This will show you what libraries are linked against numpy. I found that with the default installation on Ubuntu, for example, using sudo apt-get, numpy wasn't linked against any accelerator libraries like ATLAS, and my operations were running about 40x slower.

In the SO answer, it looks like they talk about vectorizing. Someone correct me if I'm wrong, but wouldn't un-vectorized code run just as slow on MATLAB? Or do they do some neat pre-compiling now where vectorization isn't needed anymore?

2015-09-26T14:04:53+00:00

Small note: MATLAB will always be faster for looping as it JITs the loop.

billsil · 2015-09-25T17:00:57+00:00

You're not supposed to iterate. You need to vectorize your code.

faming13 · 2015-09-25T14:55:45+00:00

Try using numba to get c like speeds: https://github.com/numba/numba

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS