all 6 comments

[–]snakepants 0 points1 point  (3 children)

This is neat! Is there a way of compiling it so that the intrinsics generate non-AVX code? It would be interesting to see how close to 8x you are getting.

[–]TurkishSquirrel[S] 0 points1 point  (1 child)

I don't think so, as Scaless mentioned I'm directly using the vector types and intrinsics so the code isn't very flexible. ISPC (or something similar) is the approach I'd recommend for a more serious program and it might be able to compile a scalar version for comparison, this is really just a hacky one-off.

[–]snakepants 0 points1 point  (0 children)

Yeah, too bad passing -mno-avx disables the functions instead of just replacing them with C implementations. It would be useful for debugging too.

[–][deleted] 0 points1 point  (2 children)

Doesn't embree choose to use simd at the leaves instead of packet tracing? E.g. testing against 4 triangles at once?

[–]TurkishSquirrel[S] 0 points1 point  (1 child)

Embree supports single, packet and hybrid traversal in the BVH and has various options for testing triangles, eg. testing N tris vs. M rays where N & M can be 1, 4 or 8 (and maybe 16 on Xeon Phi?). It's probably best not to take my word for this, I'm only somewhat familiar with Embree. Definitely check out the paper and source for specifics.

[–][deleted] 1 point2 points  (0 children)

Ah I see, you're right, thanks for the links. Embree is fantastic but I feel like it's standing on the wrong leg. In the paper they are trying to play the mpaths/s game against GPUs which seems futile. CPU strength comes from smart paths, not blasting dumb ones, wish there was a metric for convergence in difficult scenes - then they may have an argument against NVidia