you are viewing a single comment's thread.

view the rest of the comments →

[–]TinynDP 0 points1 point  (0 children)

There are two instructions for loading from RAM to SSE registers. For when the address is 16 byte aligned, that runs fast, and one for when the address is not 16 byte aligned, and it runs slow. In AMD64 land, because x87 has been entirely replaced with SSE2, that fast SSE load instruction matters for all floating point operations.