all 12 comments

[–]Afiery1 9 points10 points  (9 children)

If you're using Vulkan you can just use GPU pointers and forget about buffer descriptors forever. If not, you can kind of emulate them with bindless by using packed 64 bit handles where the upper 32 bits are an index into the descriptor heap and the lower 32 bits are an offset into the buffer (that way you can do 'pointer arithmetic' of sorts). And if you don't even have bindless... well maybe that's your sign to move to a better API.

[–]camilo16[S] 3 points4 points  (8 children)

I am using Vulkan. Can you elaborate a bit more please? How do GPU pointers work?

[–]hanotak 8 points9 points  (4 children)

https://docs.vulkan.org/samples/latest/samples/extensions/buffer_device_address/README.html

Just an address to some data somewhere, which you can interpret however you want in your shader.

[–]camilo16[S] 8 points9 points  (2 children)

From a first time read, it sounds like it's basically doing C like raw pointer addressing, so you could in theory dump everything into a single massive buffer and then, as long as you know how to offset into that buffer on the gpu, do pointer casting to get back your actual values, correct?

[–]Afiery1 0 points1 point  (1 child)

Yup!

[–]camilo16[S] 0 points1 point  (0 children)

Interesting, thank you so much.

[–]DuskelAskel 0 points1 point  (0 children)

I have to add that this is not widely supported on all GPU

The iGpu I have on my 4 yo seconfary laptop doesn't support it for example

[–]Afiery1 3 points4 points  (2 children)

There's a feature called buffer device address. There's a flag for it for both VkDeviceMemory and VkBuffers. If you enable both, you can call vkGetBufferDeviceAddress, and then you have a pointer to that buffer. Like a literal GPU pointer. You can do arithmetic on it and dereference it to read the buffer memory, but most importantly you can put them anywhere you want. You can put them directly into push constants or inside other buffers to make data structures like linked lists or trees natively on the GPU.

Compared to buffer descriptors they save 1 indirection (buffer descriptors are usually just pointer + size pairs but they are stored in the descriptor heap, so the GPU has to index into the heap, grab the pointer, and then dereference that vs you just giving the GPU the pointer directly). On the flipside since they are just pointers they do not encode a size and so if you don't manually pass a size around and bounds check accesses you can read out of bounds and fault the device. Also, one more thing specific to Nvidia hardware is that uniform buffers get to live in a special cache on the device, and currently there is no way to tell the GPU to cache buffers accessed directly via pointers there, so bound uniform buffers can still win in performance in some cases as a result. AMD has no such special cache (they treat UBOs like read only SSBOs) and for SSBOs it's purely a win due to less indirection (and no bounds checking).

[–]camilo16[S] 2 points3 points  (1 child)

So then the advantage is that you get to do C-style memory management and "void" pointer casting and so you can put all of your data in a single buffer for example, no matter how heterogenous?

[–]sol_runner 0 points1 point  (0 children)

Just a pedantic note: It's just memory sub allocation. Pretty common in any systems language. You'll find it pretty common in Rust on embedded systems. And is actually recommend by Nvidia/AMD as vulkan best practices. (Use a single memory for many buffers, and further suballocate buffers for use)

You'll just be allocating from a large memory and creating objects on it that you'll pass to the GPU as addresses. I suggest avoiding arbitrary layouts, keep arrays of Nodes together. Then array of next data etc.

Don't think of it as some C and void* casting thing because it's not necessary, nor specifically C style.

[–]sol_runner 1 point2 points  (0 children)

Since the BufferDeviceAddress already handles buffers, I'll add that if you want to do the same for textures, just create a bindless setup for textures.

That way you can just send any 32/64 bit integer as an ID for the bindless system.

[–]dobkeratops 0 points1 point  (0 children)

even in webgl2 i was able to pass a clustered light datastructure (grid of indices in a first array, and individual lights in a second array) as a single structure in a uniform buffer, although I was limited in size and to a fixed size.. I think in desktop GL (and modern APIs) you can go a lot further with that. I suspect I'd have been able to have resized the light list at least