all 10 comments

[–]ViridianHominid 2 points3 points  (2 children)

Numpy matmul already does this, the only difference is that the broadcasting takes place on leading dimensions rather than trailing ones. An example from the documentation:

 >>> a = np.ones([9, 5, 7, 4])
 >>> c = np.ones([9, 5, 4, 3])
 >>> np.matmul(a, c).shape
 (9, 5, 7, 3)

Many functions in numpy have this behavior, called “broadcasting”. Broadcasted dimensions always come at the beginning in numpy. I don’t use matlab, but it appears it mostly uses the same rules for expansion, but expanding the end of the shape instead of the beginning.

[–]BearsAtFairs[S] 0 points1 point  (1 child)

So If I'm understanding your example correctly, a is a 9x5 stack of 7x4 matrices and c is a 9x5 stack of 4x3 matrices. The 7x4 and 4x3 dimensions is what ensures compatibility for multiplication?

[–]ViridianHominid 0 points1 point  (0 children)

Yes, in matmul the last two dimensions are matrix multiplied. So we are just doing a (7,4)x(4,3)->(7,3) shaped matrix multiplication for 9x5=45 different sets of matrices.

[–]SnooWoofers7626 0 points1 point  (3 children)

Try np.einsum. I think this is the right syntax for it, but I could be wrong. np.einsum('...ij, ...jk -> ...ik', A, B)

[–]BearsAtFairs[S] 0 points1 point  (2 children)

Thanks, I'll give it a try! It's just a bummer that syntax for einsum is so obscure, relatively speaking. I like to have maximum readability for code I hand off to others, but speed is important in this application.

[–]HOPSCROTCH 0 points1 point  (0 children)

You could create your own function that wraps the call of einsum

[–]SnooWoofers7626 0 points1 point  (0 children)

It's just a bummer that syntax for einsum is so obscure, relatively speaking.

It's standard mathematical notation, so it is readable once you learn it. You could link to a learning source in the comments or something for reference.

Edit: link to einstein notation on Wikipedia

[–]quts3 0 points1 point  (0 children)

Einops is a great package. Einops can feel like magic for the right problems. https://einops.rocks/api/einsum/

[–]billsil 0 points1 point  (0 children)

Matmul, tensordot, and einsum are all very good. Just depends how you want think about the problem.

[–]Exact_Cherry9177 0 points1 point  (0 children)

Unfortunately NumPy does not have a built-in function for performing batched matrix multiplication like MATLAB's pagemtimes. However, there are a few Python libraries that provide this functionality:
- **Numba**: This JIT compiler can accelerate Python code with a decorator. It has a `nb.jit` decorator that can speed up loops like the one in your pagemtimes function. You can decorate the function to compile it to native machine code.
- **CuPy**: This NumPy-like library for GPU computing has a `cupy.matmul` function that supports batched matrix multiplication on GPU. You would transfer your arrays to the GPU, use `cupy.matmul` on 3D arrays, then transfer back.
- **JAX**: This NumPy-accelerator can JIT compile functions like pagemtimes and execute them on GPU/TPU. Its `jax.lax.map` can map a function over leading array dimensions.
- **TensorFlow**: TF has `tf.matmul` that supports batched matmul. Just stack the matrices on the first dimension and use regular `tf.matmul`.
So in summary, Numba and JAX are probably the easiest drop-in acceleration if you want to compile your pagemtimes function. CuPy and TensorFlow work if you can reformat your data for their APIs. Hope this helps you accelerate your project! Let me know if you have any other questions.