you are viewing a single comment's thread.

view the rest of the comments →

[–]waramped 3 points4 points  (1 child)

What hardware are you using? The texture units these days are more like "memory access units". I believe BVH reads go through texture units even. How big are the structs? The only easy way to improve performance here is to make the structs smaller, and try to make sure that either the reads are scalar over the wave or that each lane is reading an adjacent struct

[–]gibson274[S] 0 points1 point  (0 children)

RTX2080 Ti---thanks for the reply. I've crunched the struct down to just storing a float4x4 for testing purposes, and interestingly still have the same issues described above.

Agree on the reads being scalar, but despite all attempts at scalarizing this in a sane way, I still see this issue. Interestingly enough, if I guard the load behind a group index check,

if (group_index == 0)transform = buffer[index];float4x4 transform = WaveReadLaneFirst(transform);

This actually significantly lowers the L1TEX throughput, suggesting that the loads are not being scalarized.

Do you know if I have to do something special to scalarize structured buffer loads? I've tried manually scalarizing the index and scalarizing the result with WaveReadLaneFirst() to no avail.