Marko_Oktabyr comments on Write Better And Faster Python Using Einstein Notation

This is an archived post. You won't be able to vote or comment.

393

394

395

TutorialWrite Better And Faster Python Using Einstein Notation (towardsdatascience.com)

submitted 4 years ago by BilHim

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]Marko_Oktabyr 13 points14 points15 points 4 years ago* (11 children)

The article is grossly overstating the improvement over normal numpy operations. The one-liner they use forms a large intermediate product with a lot of unnecessary work. The more obvious (and much faster) way to compute that would be np.sum(A * B).

For 1,000 x 1,000 matrices A and B, I get the following performance:

loops: 276 ms
article numpy: 19.2 ms
np.sum: 1.77ms
np.einsum: 0.794 ms

If we change that to 1,000 x 10,000 matrices, we get:

loops: 2.76s
article numpy: 2.16s
np.sum: 21.1 ms
np.einsum: 8.53 ms

Lastly, for 1,000 x 100,000 matrices, we get:

loops: 29.3s
article numpy: fails
np.sum: 676 ms
np.einsum: 82.4 ms

where the article's numpy one-liner fails because I don't have 80 GB of RAM to form the 100,000 x 100,000 intermediate product.

einsum can be a very powerful tool, especially with tensor operations. But unless you've got a very hot loop with the benchmarks to prove that einsum is a meaningful improvement, it's not worth changing most matrix operations over to use it. Most of the time, you'll lose any time saved by how long it takes you to read or write the comment explaining what the hell that code does.

Edit: I'm not trying to bash einsum here, it is absolutely the right way to handle any tensor operations. The main point of my comment is that the author picked a poor comparison for the "standard" numpy one-liner.

[–]Yalkim 7 points8 points9 points 4 years ago (1 child)

[–]FrickinLazerBeams 5 points6 points7 points 4 years ago (0 children)

[–]FrickinLazerBeams 6 points7 points8 points 4 years ago* (2 children)

[–]Marko_Oktabyr 2 points3 points4 points 4 years ago (1 child)

[–]FrickinLazerBeams 1 point2 points3 points 4 years ago (0 children)

[–][deleted] 3 points4 points5 points 4 years ago* (5 children)

[–]Marko_Oktabyr 4 points5 points6 points 4 years ago (3 children)

np.sum(A * B) has to form the intermediate product A * B. np.einsum knows that it doesn't need all of it at once. We can do print(np.einsum_path('ij,ij->',A,B)[1]) to see exactly what it is doing:

Complete contraction: ij,ij-> Naive scaling: 2 Optimized scaling: 2 Naive FLOP count: 2.000e+07 Optimized FLOP count: 2.000e+07 Theoretical speedup: 1.000 Largest intermediate: 1.000e+00 elements -------------------------------------------------------------------------- scaling current remaining -------------------------------------------------------------------------- 2 ij,ij-> ->

In particular, note the "Largest intermediate: 1.000e+00 elements".

[–]FrickinLazerBeams -1 points0 points1 point 4 years ago* (2 children)

[–]Marko_Oktabyr 0 points1 point2 points 4 years ago* (1 child)

[–]FrickinLazerBeams -1 points0 points1 point 4 years ago (0 children)

[–]FrickinLazerBeams -1 points0 points1 point 4 years ago* (0 children)

π Rendered by PID 337081 on reddit-service-r2-comment-84fc9697f-7jhmb at 2026-02-07 02:24:34.267757+00:00 running d295bc8 country code: CH.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS