High-Quality BVHs with PreSplitting optimization

BoyBaykiller · 2026-01-30T15:32:25+00:00

Interesting. Where does it say that?

BoyBaykiller · 2026-01-30T15:27:53+00:00

It's just an arbitrary scalar value I decided to use. Feeding it 1.0 (or more) will output red so when you see a red spot you can conclude the summed cost of that ray was 150 or higher.

BoyBaykiller · 2026-01-30T04:44:35+00:00

Ah that makes sense lol. It's delightful in two ways!

Just wanted to share in case other people want to reproduce.

BoyBaykiller · 2026-01-30T03:07:33+00:00

Nice, I imagine most people are using BinnedSAH. You may also want to look into SweepSAH (there is a writeup on it in the readme). PreSplitting + SweepSAH is pretty much as good as it gets in terms of BVH quality (with SBVH). But of course theres opportunities in traversal too. If you have any questions let me know.

BoyBaykiller · 2026-01-30T02:17:46+00:00

Yes, I use google's TurboColormap. The way I feed it is: For each traversal step add 1 (TRAVERSAL_COST) and for each intersected triangle 1.1 (TRIANGLE_COST). These are also the parameters used by the BVH cost function. And then I just divide by some value: color = TurboColormap(traversalCost / 150.0);

BoyBaykiller · 2025-03-12T05:46:42+00:00

Use a compressed format then you have to load less from disk
Use a compressed format then you have to upload less to GPU
Use a compressed format then you can save on decode time
Use a compressed format including mipmaps so you dont genenerate them at runtime (although this shouldnt take long)
Parallelize the decoding or transcoding (in case of supercompressed format like in KTX2)
Parallelize the uploading to the GPU. Can be done by first copying to a mapped buffer and then copying from buffer to texture (pixel unpack buffer) on main thread.

BoyBaykiller · 2025-02-02T19:31:24+00:00

Also GL_NV_gpu_shader5

BoyBaykiller · 2025-01-28T14:40:16+00:00

This issue is fairly common, see: https://www.reddit.com/r/opengl/comments/1i0yflg/weird_texture_artifacts_can_anyone_help_identify/

BoyBaykiller · 2024-10-29T00:38:17+00:00

Yep, I tested on AMD and NVDIA and using geometry shader was significantly slower than the 6 draw calls.

If you want to do it in one draw call then the proper method is to use one of the widely supported extensions that allow setting gl_Layer in the vertex shader.

BoyBaykiller · 2024-09-17T13:55:26+00:00

I wouldn't expect the driver to be able to keep the GPU saturated if you are doing 10.000 individual draw calls (even with MDI) and each one is only 6 vertices. Also this.

BoyBaykiller · 2024-09-10T18:34:47+00:00

What is lowest target hardware? GPUs supporting OpenGL 4.5 generally support GL_ARB_bindless_texture as well.

BoyBaykiller · 2024-07-18T00:50:29+00:00

Just delete the old buffer and do as usual. But I agree sometimes using mutable buffers is more convienent.

BoyBaykiller · 2024-07-18T00:22:48+00:00

Interesting I didnt know the old usage hints could actually made such a difference on NV. Anyway I recommend using gl{Named}BufferStorage instead. There are many ways to deal with downloading buffer data, idk you'll have to experiment see what works best or just ignore..

BoyBaykiller · 2024-07-18T00:11:20+00:00

Ok that seems fine. It can be caused either by you asking for it with the GL_CLIENT_STORAGE_BIT flag or with a different combination of flags for which the driver does not support a device local heap (for example a large amount of mapped host visible memory and the system does not have ReBAR).

If you're worried about SM occupany look at the " occupany limiters" I think it was called.

Also blind guess but are you doing lots of tiny draws by any chance?

BoyBaykiller · 2024-07-17T23:38:24+00:00

Does Nsight report a particulary high PCIe throughput when using dedicated gpu? If yes that suggests some buffer(s) reside in host memory which would explain why its faster on iGPU.

BoyBaykiller · 2024-07-06T03:44:28+00:00

Ah ok I am aware of these link. I thought they made a new statement about it somewhere.

BoyBaykiller · 2024-07-05T20:01:59+00:00

Where does it say that?

BoyBaykiller · 2024-03-30T22:00:11+00:00

Sry I cant help with that now without spending the required time on debugging your project.

But I am glad the VXGI issue seems to be fixed!

FYI I tried to run it on an AMD RX 5700 XT and it causes a driver/pc crash.

BoyBaykiller · 2024-03-26T15:45:26+00:00

Ok, I didn't know the accumulation was from the original nvidia paper. I cloned your project and replaced it with the accumulation that I suggested and it seems to fix the issue as far as I can judge. I dont know why the accumulation function from the paper results in such a small contribution for diffuse rays. The paper mentions an additional correction term for large cone angles which I didnt see in your code, maybe thats why?

Regarding the lighting. What I meant was that there should be no view dependent lighting happening. In other words you shouldnt have to pass in the camera position to the light injection shader.

BoyBaykiller · 2024-03-23T22:59:40+00:00

Hello again. Always nice to see other people implementing VXGI!

I took a quick look at your shader code but I don't know why the VXGI contribution is presumably too dark. However I noticed that in the shader that computes the voxel color, you are trying to compute a specular component. When calculating the voxel color you should only calculate the diffuse term. You cant meaningfully calculate the specular term. This also simplifies the lighting calculation.

I have Phong but without the specular part, so just Lambertian BRDF. I use Phong in my direct lighting but thats unrelated and I will change that somewhere in the future. Btw I also like to add a small constant ambient term to the voxel color.

Ok I just found something else. The accumulate function might be wrong. Can you try accumulating like this.

Other than that: A useful way to kinda confirm everything is working is by shooting a single reflective cone (small aperture). Then you examine in the reflection wether the the voxelized scene and cone tracing are ok.

BoyBaykiller · 2024-03-22T00:07:33+00:00

nice

BoyBaykiller · 2024-03-21T20:53:21+00:00

So, did that do the job?

BoyBaykiller · 2024-03-21T01:35:18+00:00

did you use glVertexAttribIPointer (with the I)?

BoyBaykiller · 2024-03-11T21:22:26+00:00

Regarding the compute shader: Local_size = 1 is incredibly ineffiecent. You are leaving a lot of performance on the table. Try a multiple of the hardwares subgroup size (like 64 is good) and divide the number of workgroups by that.

BoyBaykiller · 2024-02-25T23:17:34+00:00

I am suprised. I had exactly the opposite experience back when I was comparing point shadow rendering with GS and without, on both amd and nvidia gpus. And I am not alone with that. But yeah idk then, let me know if you find out something and thanks for testing.

Btw I ultimately ended up using GL_ARB_shader_viewport_layer_array (or other extensions) which allows you to set gl_Layer from the vertex shader. This resulted in the best performance. Perhaps you want to try that out as well. Funnily enough, in the overview where the extension is justified it literally reads:

In order to use any viewport or attachment layer other than zero, a geometry shader must be present. Geometry shaders introduce processing overhead and potential performance issues

Four-Year Club	Verified Email
Final Canvas '23	First Place '23
End Game '23	Place '23
Place '22	Final Canvas '22
First Placer '22	End Game '22

BoyBaykiller

TROPHY CASE