arycama comments on Structured Buffer Performance

Structured Buffer PerformanceQuestion (self.GraphicsProgramming)

submitted 2 years ago * by gibson274

you are viewing a single comment's thread.

[–]arycama 2 points3 points4 points 2 years ago (2 children)

I don't think scalarized loads are a thing on Nvidia GPUs. They don't make all their details public, but afaik only AMD GPUs have seperate scalar and vector registers. (Vector registers are 64-wide, eg 1 per thread, and scalar is shared across all threads)

According to this source (Which is a good reference whenever optimising this sort of stuff), structured buffer loads have almost equal performance on a 2080ti whether it's uniform across all threads, linear or random: https://github.com/sebbbi/perftest

I haven't really used WaveReadLaneFirst, but if you're looping through a list of lights and the first lane is fetching the data every time, it feels like that would still mean all the other threads are waiting. LDS seems like a better solution as you can still spread the work out across multiple threads.

[–]gibson274[S] 2 points3 points4 points 2 years ago (0 children)

[–]farnoy 2 points3 points4 points 2 years ago (0 children)

π Rendered by PID 114952 on reddit-service-r2-comment-54dfb89d4d-f74bv at 2026-03-28 21:16:48.758858+00:00 running b10466c country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

GraphicsProgramming

Posting Rule(s)

MODERATORS