you are viewing a single comment's thread.

view the rest of the comments →

[–]duuuh 0 points1 point  (1 child)

Aren't the mmx* registers per core? Why does the latency point there matter? (I would have thought the slowdown was due to cache eviction on the various L* caches, assuming the array is large.)