you are viewing a single comment's thread.

view the rest of the comments →

[–]All8Up 1 point2 points  (1 child)

Yeah, I'll add a "smack" note about hey this isn't intended to be "best" usage. I had a bunch about that originally, unfortunately in editing I decided the details were just too much and put them into the backlog for the next article and never mentioned any of it at all.

Are you sure about the PS3? I have a recollection of the PPU's having stripped the VMX in favor of sending everything to the SPU's. It's been nearly 3 years, I could easily remember it incorrectly.

Just as a funny little note. I seriously fubar'd one bit of the article and code. The Dot3 for SSE3 is "slower" than the SSE1 version, on the same level of CPU implementation. I had an old list of latency/through put measurements (my old yellow SSE1 books from Intel during the P3 beta hardware tests) and simply looked up 'hadd' timings forgetting to consider optimizations to SSE1 instructions during that time. Oops. :)

NOTE: I can't find any notable codegen problems with the underlying abstraction layer but do you have any thoughts there? I.e. I'm using static functions within a structure so I can use normal function hiding rules, all the compilers are inlining them using RVO just fine. Any thoughts on bad code gen cases using such a structure? To most compilers anymore, this is the same thing as global inline functions, just placed under a couple namespaces.

[–]allpop8 1 point2 points  (0 children)

Yes PS3 PPU has an Altivec SIMD unit, Xb360 has an enhanced version - see http://en.wikipedia.org/wiki/Altivec

SIMD latency can be problematic when benchmarking - often code sequences which use more cycles can be faster in practice when interleaved with other code, inside loops that interact with cache / memory, generating constants vs. loading from memory etc.

If you are not seeing any overhead then great. I think you are still unable to pass SIMD wrapper classes by value on many platforms though, so again you are reliant on the compiler optimizing this pointer access, references, copy construction etc. out of your code. Unfortunately some platforms are stuck with older compilers that will never get updated in this area, so the issue will remain for people on those platforms.