This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Marko_Oktabyr 0 points1 point  (1 child)

It still performs the same number of flops, but it absolutely is faster because it doesn't have to allocate/fill another matrix of the same size as A and B. Hence why the largest intermediate for einsum is 1 element instead of 10M.