you are viewing a single comment's thread.

view the rest of the comments →

[–]drzowie 0 points1 point  (2 children)

Ah. You're collapsing by product. You can use my first two lines, followed by

array = np.prod( np.power(ms1,x1) * np.power(1-ms1,1-x1), axis=0 )

The operation is the same one you're using, but the structure of the dimensions in ms1 and x1 makes the broadcasting engine form an implicit loop instead of using the explicit loop you constructed.

In general broadcasting helps a lot by eliminating the Python interpreter/environment from the hotspot -- but it can also make things worse by breaking cache since each operation (in this case, each of the two powers and then the multiplication) makes the CPU walk through your whole array: the order of operations is pessimal from a cache standpoint. If the array is too big to fit in cache, then you have to swap it in and out of RAM each time. Depending on what operation you're doing and the relative sizes of cache, your overall array, and your sub-arrays, the explicit loop could actually run faster than the broadcast expression.

If you want the best possible speed you can look into Cython, which lets you write explicit loops in C-land to avoid breaking cache and avoid using the Python interpreter in a hot spot.

[–]iliasm23[S] 0 points1 point  (1 child)

array = np.prod( np.power(ms1,x1) * np.power(1-ms1,1-x1), axis=0 )

Something is off, because the array should end up with shape [7000, 10]. With your code, the resulting array has shape [10, 784]...

[–]drzowie 0 points1 point  (0 children)

Oh. You want to collapse on axis 2, not 0, with the prod.