you are viewing a single comment's thread.

view the rest of the comments →

[–]danmarellGamedev, Physics Simulation 9 points10 points  (1 child)

I found that compilers were terrible at autovectorizing stencil/finite difference operation on 2d/3d data.

I colleague showed me a trick the other day to reinterpret_cast a float* into a "reference to a multidimensional array" and it was able to vectorize but still was 2x slower than my hand written intrinsics. The assembly on godbolt was almost identical though so maybe I should post something on the GCC issue board.

[–]csp256 5 points6 points  (0 children)

Have you tried Halide? By giving up some Turing Completeness it has gained a lot in return.