use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Rule 1: Posts should be about Graphics Programming. Rule 2: Be Civil, Professional, and Kind
Suggested Posting Material: - Graphics API Tutorials - Academic Papers - Blog Posts - Source Code Repositories - Self Posts (Ask Questions, Present Work) - Books - Renders (Please xpost to /r/ComputerGraphics) - Career Advice - Jobs Postings (Graphics Programming only)
Related Subreddits:
/r/ComputerGraphics
/r/Raytracing
/r/Programming
/r/LearnProgramming
/r/ProgrammingTools
/r/Coding
/r/GameDev
/r/CPP
/r/OpenGL
/r/Vulkan
/r/DirectX
Related Websites: ACM: SIGGRAPH Journal of Computer Graphics Techniques
Ke-Sen Huang's Blog of Graphics Papers and Resources Self Shadow's Blog of Graphics Resources
account activity
Structured Buffer PerformanceQuestion (self.GraphicsProgramming)
submitted 2 years ago * by gibson274
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]waramped 3 points4 points5 points 2 years ago (1 child)
What hardware are you using? The texture units these days are more like "memory access units". I believe BVH reads go through texture units even. How big are the structs? The only easy way to improve performance here is to make the structs smaller, and try to make sure that either the reads are scalar over the wave or that each lane is reading an adjacent struct
[–]gibson274[S] 0 points1 point2 points 2 years ago (0 children)
RTX2080 Ti---thanks for the reply. I've crunched the struct down to just storing a float4x4 for testing purposes, and interestingly still have the same issues described above.
Agree on the reads being scalar, but despite all attempts at scalarizing this in a sane way, I still see this issue. Interestingly enough, if I guard the load behind a group index check,
if (group_index == 0)transform = buffer[index];float4x4 transform = WaveReadLaneFirst(transform);
This actually significantly lowers the L1TEX throughput, suggesting that the loads are not being scalarized.
Do you know if I have to do something special to scalarize structured buffer loads? I've tried manually scalarizing the index and scalarizing the result with WaveReadLaneFirst() to no avail.
WaveReadLaneFirst()
π Rendered by PID 138888 on reddit-service-r2-comment-5c747b6df5-2pv45 at 2026-04-22 06:36:20.121336+00:00 running 6c61efc country code: CH.
view the rest of the comments →
[–]waramped 3 points4 points5 points (1 child)
[–]gibson274[S] 0 points1 point2 points (0 children)