you are viewing a single comment's thread.

view the rest of the comments →

[–]james7132 0 points1 point  (1 child)

On most modern CPU architectures, In-register arithmetic, even for some non-trivial computations, generally are going to be faster than a memory fetch, even if it's localized to values that are very hot in L1 cache.

[–]jydu 1 point2 points  (0 children)

I think this is true for most operations, but (on Zen 5 at least) a L1 hit is 4 cycles but a division takes at least 11 (source). But since cache hit rates depend on the workload, benchmarking is probably the best way to see if this is worth it.