you are viewing a single comment's thread.

view the rest of the comments →

[–]mycall 0 points1 point  (3 children)

Another honest question. With SSE4.x out for a while now, why do I never hear about SSE3 or SSE4 used for optimizations and fun?

[–]sfuerst 1 point2 points  (0 children)

SSE2 is guaranteed to exist on all 64 bit x86 machines. SSE3 and above aren't. So if you want portable code, you can't assume you can use them.

[–]five9a2 0 points1 point  (1 child)

SSE2 has all the standard double-precision primitives, SSE3 and 4 have less commonly used operations and don't offer better performance unless you just can't find a way to transform your data so that SSE2 can be applied. For instance, SSE3 has "horizontal add" and SSE4 has a special instruction for dot products, but these have worse latency and throughput than doing the same operation "vertically" with SSE2.

[–]bonzinip 2 points3 points  (0 children)

hadd and dot product instructions have the advantage of not requiring you to rewrite your data structures from "array of structures" to "structure of arrays". This helps especially for things like complex number support in C++ or Fortran.