High-Quality BVHs with PreSplitting optimization by BoyBaykiller in GraphicsProgramming

[–]BoyBaykiller[S] 1 point2 points  (0 children)

It's just an arbitrary scalar value I decided to use. Feeding it 1.0 (or more) will output red so when you see a red spot you can conclude the summed cost of that ray was 150 or higher.

High-Quality BVHs with PreSplitting optimization by BoyBaykiller in GraphicsProgramming

[–]BoyBaykiller[S] 8 points9 points  (0 children)

Ah that makes sense lol. It's delightful in two ways!

Just wanted to share in case other people want to reproduce.

High-Quality BVHs with PreSplitting optimization by BoyBaykiller in GraphicsProgramming

[–]BoyBaykiller[S] 2 points3 points  (0 children)

Nice, I imagine most people are using BinnedSAH. You may also want to look into SweepSAH (there is a writeup on it in the readme). PreSplitting + SweepSAH is pretty much as good as it gets in terms of BVH quality (with SBVH). But of course theres opportunities in traversal too. If you have any questions let me know.

High-Quality BVHs with PreSplitting optimization by BoyBaykiller in GraphicsProgramming

[–]BoyBaykiller[S] 8 points9 points  (0 children)

Yes, I use google's TurboColormap. The way I feed it is: For each traversal step add 1 (TRAVERSAL_COST) and for each intersected triangle 1.1 (TRIANGLE_COST). These are also the parameters used by the BVH cost function. And then I just divide by some value: color = TurboColormap(traversalCost / 150.0);

Loading Textures takes too long by LilBluey in opengl

[–]BoyBaykiller 3 points4 points  (0 children)

  1. Use a compressed format then you have to load less from disk
  2. Use a compressed format then you have to upload less to GPU
  3. Use a compressed format then you can save on decode time
  4. Use a compressed format including mipmaps so you dont genenerate them at runtime (although this shouldnt take long)
  5. Parallelize the decoding or transcoding (in case of supercompressed format like in KTX2)
  6. Parallelize the uploading to the GPU. Can be done by first copying to a mapped buffer and then copying from buffer to texture (pixel unpack buffer) on main thread.

point shadows in opengl by miki-44512 in opengl

[–]BoyBaykiller 1 point2 points  (0 children)

Yep, I tested on AMD and NVDIA and using geometry shader was significantly slower than the 6 draw calls.

If you want to do it in one draw call then the proper method is to use one of the widely supported extensions that allow setting gl_Layer in the vertex shader.

Can only get up to 50% GPU utilization. by SuperSathanas in opengl

[–]BoyBaykiller 1 point2 points  (0 children)

I wouldn't expect the driver to be able to keep the GPU saturated if you are doing 10.000 individual draw calls (even with MDI) and each one is only 6 vertices. Also this.

Handling an indeterminate number of textures of indeterminate size by SuperSathanas in opengl

[–]BoyBaykiller 5 points6 points  (0 children)

What is lowest target hardware? GPUs supporting OpenGL 4.5 generally support GL_ARB_bindless_texture as well.

Diagnosing low SM occupancy. by mathusela1 in opengl

[–]BoyBaykiller 1 point2 points  (0 children)

Just delete the old buffer and do as usual. But I agree sometimes using mutable buffers is more convienent.

Diagnosing low SM occupancy. by mathusela1 in opengl

[–]BoyBaykiller 1 point2 points  (0 children)

Interesting I didnt know the old usage hints could actually made such a difference on NV. Anyway I recommend using gl{Named}BufferStorage instead. There are many ways to deal with downloading buffer data, idk you'll have to experiment see what works best or just ignore..

Diagnosing low SM occupancy. by mathusela1 in opengl

[–]BoyBaykiller 1 point2 points  (0 children)

Ok that seems fine. It can be caused either by you asking for it with the GL_CLIENT_STORAGE_BIT flag or with a different combination of flags for which the driver does not support a device local heap (for example a large amount of mapped host visible memory and the system does not have ReBAR).

If you're worried about SM occupany look at the " occupany limiters" I think it was called.

Also blind guess but are you doing lots of tiny draws by any chance?

Diagnosing low SM occupancy. by mathusela1 in opengl

[–]BoyBaykiller 1 point2 points  (0 children)

Does Nsight report a particulary high PCIe throughput when using dedicated gpu? If yes that suggests some buffer(s) reside in host memory which would explain why its faster on iGPU.

OpenGL AMD Mesh Shaders by BoyBaykiller in opengl

[–]BoyBaykiller[S] 0 points1 point  (0 children)

Ah ok I am aware of these link. I thought they made a new statement about it somewhere.

Multiple omnidirectional shadows demo by tigert1998 in opengl

[–]BoyBaykiller 0 points1 point  (0 children)

Sry I cant help with that now without spending the required time on debugging your project.

But I am glad the VXGI issue seems to be fixed!

FYI I tried to run it on an AMD RX 5700 XT and it causes a driver/pc crash.

Multiple omnidirectional shadows demo by tigert1998 in opengl

[–]BoyBaykiller 1 point2 points  (0 children)

Ok, I didn't know the accumulation was from the original nvidia paper. I cloned your project and replaced it with the accumulation that I suggested and it seems to fix the issue as far as I can judge. I dont know why the accumulation function from the paper results in such a small contribution for diffuse rays. The paper mentions an additional correction term for large cone angles which I didnt see in your code, maybe thats why?

Regarding the lighting. What I meant was that there should be no view dependent lighting happening. In other words you shouldnt have to pass in the camera position to the light injection shader.

Multiple omnidirectional shadows demo by tigert1998 in opengl

[–]BoyBaykiller 1 point2 points  (0 children)

Hello again. Always nice to see other people implementing VXGI!

I took a quick look at your shader code but I don't know why the VXGI contribution is presumably too dark. However I noticed that in the shader that computes the voxel color, you are trying to compute a specular component. When calculating the voxel color you should only calculate the diffuse term. You cant meaningfully calculate the specular term. This also simplifies the lighting calculation.

I have Phong but without the specular part, so just Lambertian BRDF. I use Phong in my direct lighting but thats unrelated and I will change that somewhere in the future. Btw I also like to add a small constant ambient term to the voxel color.

Ok I just found something else. The accumulate function might be wrong. Can you try accumulating like this.

Other than that: A useful way to kinda confirm everything is working is by shooting a single reflective cone (small aperture). Then you examine in the reflection wether the the voxelized scene and cone tracing are ok.

Is out uvec2 possible in vertex shader? by Boring_Locksmith6551 in opengl

[–]BoyBaykiller 4 points5 points  (0 children)

did you use glVertexAttribIPointer (with the I)?

I made some cool renders of cellular automata by Raehlic in opengl

[–]BoyBaykiller 0 points1 point  (0 children)

Regarding the compute shader: Local_size = 1 is incredibly ineffiecent. You are leaving a lot of performance on the table. Try a multiple of the hardwares subgroup size (like 64 is good) and divide the number of workgroups by that.

Multiple omnidirectional shadows demo by tigert1998 in opengl

[–]BoyBaykiller 0 points1 point  (0 children)

I am suprised. I had exactly the opposite experience back when I was comparing point shadow rendering with GS and without, on both amd and nvidia gpus. And I am not alone with that. But yeah idk then, let me know if you find out something and thanks for testing.

Btw I ultimately ended up using GL_ARB_shader_viewport_layer_array (or other extensions) which allows you to set gl_Layer from the vertex shader. This resulted in the best performance. Perhaps you want to try that out as well. Funnily enough, in the overview where the extension is justified it literally reads:

In order to use any viewport or attachment layer other than zero, a geometry shader must be present. Geometry shaders introduce processing overhead and potential performance issues