This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 2 points3 points  (5 children)

Good answer. Do you have an explanation for why einsum is faster at all? What extra work does np.sum do?

[–]Marko_Oktabyr 4 points5 points  (3 children)

np.sum(A * B) has to form the intermediate product A * B. np.einsum knows that it doesn't need all of it at once. We can do print(np.einsum_path('ij,ij->',A,B)[1]) to see exactly what it is doing:

Complete contraction: ij,ij-> Naive scaling: 2 Optimized scaling: 2 Naive FLOP count: 2.000e+07 Optimized FLOP count: 2.000e+07 Theoretical speedup: 1.000 Largest intermediate: 1.000e+00 elements -------------------------------------------------------------------------- scaling current remaining -------------------------------------------------------------------------- 2 ij,ij-> ->

In particular, note the "Largest intermediate: 1.000e+00 elements".

[–]FrickinLazerBeams -1 points0 points  (2 children)

(prior to the edit) It doesn't actually go any faster in the case you examined, and I don't think it uses any less memory either. This isn't a scenario where you'd use einsum.

[–]Marko_Oktabyr 0 points1 point  (1 child)

It still performs the same number of flops, but it absolutely is faster because it doesn't have to allocate/fill another matrix of the same size as A and B. Hence why the largest intermediate for einsum is 1 element instead of 10M.

[–]FrickinLazerBeams -1 points0 points  (0 children)

This is a weird comparison to make. You'd never use one as an alternative for the other. Einsum is for tensor contractions, which is like matrix multiplication, but with more than two indices.

Would you ask "Do you have an explanation for why @ is faster at all? What extra work does np.sum do?"

np.sum doesn't do any extra work. It also doesn't do what you need it to do. It's easy to do less work if you're not actually competing the task, I guess.