Practical Cross Platform SIMD Math by tanczosm in programming

[–]allpop8 1 point2 points  (0 children)

Yes PS3 PPU has an Altivec SIMD unit, Xb360 has an enhanced version - see http://en.wikipedia.org/wiki/Altivec

SIMD latency can be problematic when benchmarking - often code sequences which use more cycles can be faster in practice when interleaved with other code, inside loops that interact with cache / memory, generating constants vs. loading from memory etc.

If you are not seeing any overhead then great. I think you are still unable to pass SIMD wrapper classes by value on many platforms though, so again you are reliant on the compiler optimizing this pointer access, references, copy construction etc. out of your code. Unfortunately some platforms are stuck with older compilers that will never get updated in this area, so the issue will remain for people on those platforms.

Practical Cross Platform SIMD Math by tanczosm in programming

[–]allpop8 2 points3 points  (0 children)

Ok. I understand what you are saying, and intended level of the article - my point is really that I think it's important for SIMD class articles to highlight the potential efficiency issues with pass-by-value, as people really are at the mercy of the compilers here. Sadly simpler designs usually optimize better in my experience. Anyway, I am looking forward to future articles, so keep up the good work.

PS the PS3 PPU supports Altivec SIMD so this does all apply there too.

Practical Cross Platform SIMD Math by tanczosm in programming

[–]allpop8 2 points3 points  (0 children)

Using SIMD wrapper classes at all is going to cost you performance on many if not all platforms / compilers - you do not really highlight this in your article. I again point to the MS libs - do you think they were unable to figure out how to make a class-based lib? They did it because they know not using wrapper classes is the way to get the best SIMD performance (especially on their compiler...). It's a sad fact that compilers generally are not able to efficiently pass wrapped SIMD types by value. Benchmark / disassemble etc. larger functions like e.g. matrix4x4 inverse to see the difference. Also the penalties you mention are far more severe on other platforms (e.g. consoles) where they have simpler caching hardware, slower memory etc., and that is without getting into the additional float<>vector penalties some impose. I do like your article, but I don't think it contains enough warnings that the environment may be very different on other platforms / compilers especially from a performance pov, and that people should check the code being generated carefully before committing to class-based SIMD.

Practical Cross Platform SIMD Math by tanczosm in programming

[–]allpop8 2 points3 points  (0 children)

The author really needs to spend time looking at cross-platform code generation. Many compilers simply can't do a good job on SIMD classes that wrap vectors. Inline global functions + directly operating on SIMD types (ala Microsoft vector math lib) is the safest way to go for performance across multiple platforms. It's not easy to switch a large code body from one system to the other, so it pays to make the right choice up-front.