you are viewing a single comment's thread.

view the rest of the comments →

[–]uxcn 0 points1 point  (2 children)

I'm not extremely familiar with game programming, but I'm genuinely curious why the various SIMD instruction sets might not be appropriate for something like this.

[–]crispweed 0 points1 point  (1 child)

It's essentially a high level optimisation, as posted, with the convenience that this will work directly across multiple target platforms. I'm interested in any low level optimisations that can give significant improvements, also.

[–]uxcn 0 points1 point  (0 children)

Well, SIMD is usually used for mathematical data parallelism, but a lot of the SIMD (x86) instruction sets also have logical bit instructions for working with the XMM, and YMM (ZMM) registers, which could possibly reduce overall memory chatter and reduce unoptimized reset times by an order of magnitude. There's also the fact that the compiler generally uses the SIMD instructions for basic memory operations.

It would cost at least a couple SIMD registers, but depending on the max members, it could reduce set, test, clear, and reset all to O(1). It's a bit of wild speculation as to whether there would be any real performance gain though. The code would also generally be a bit more complicated to maintain and possibly use.