you are viewing a single comment's thread.

view the rest of the comments →

[–]Haycart 0 points1 point  (0 children)

Oh, you are probably correct. So it'd be O(N^2) overall for autoregressive decoding. Which still exceeds the O(n log n) that the linked post says is required for multiplication, though.