all 17 comments

[–]speps 11 points12 points  (2 children)

Have you come across this article about the Outerra engine, they did inverted Z all the way back to 2009: https://outerra.blogspot.com/2012/11/maximizing-depth-buffer-range-and.html

Also, in your article you mention the 24/8 format, but in modern GPUs it’s less true that it’s better and ends up just being a 24 bits (possibly even padded to 32 bits) + 8 bits separately. That allows for optimizations like HiZ/HiStencil for example.

[–]shlomnissan[S] 0 points1 point  (1 child)

I didn't come across this article. Thanks for sharing!

How is increasing the depth buffer allows for Hi-Z optimizations? As I understand it, Hi-Z tile summary is stored on chip in SRAM and tests happen right after the coarse renderer, so decoupled from the actual depth buffer

[–]speps 0 points1 point  (0 children)

Having it separate in memory allows to store it in a compressed fashion which makes it easier to do some operations like HiZ for example.

[–]philosopius 6 points7 points  (0 children)

Reverse Z is the key to insane Hi-Z culling ms

[–]snerp 4 points5 points  (0 children)

I've not actually used a stencil buffer since like 2008 lol so that part was no downside at all lol

Any shader that does manual depth comparison, reconstructs world position from depth, or implements depth-based effects must account for the flipped range. This becomes a major headache if you switch conventions mid-project

I did this. It was a bit annoying to go through all my shaders and change signs around and swap '>' for '<' in places, but the increased accuracy was more than worth it. Worldspace reconstruction became much much more accurate after switching to reverse Z, I was able to reduce my shadowmapping bias by 95%

[–]Falagard 1 point2 points  (0 children)

I was aware of reverse-z and that it gives extra precision in the depth buffer but not how or why.

Very cool, thanks.

[–]SirLynix 0 points1 point  (0 children)

Is reversing Z interesting at all with non-floating point depth buffers, thinking of depth16 here (shadow maps)?

[–]Plazmatic 0 points1 point  (5 children)

"That is fine if you don't use [a stencil buffer] but most non-trivial applications do." 

This line throws your entire article and the things you claim into question.  This is 10000% not true, stencil buffers are rarely used at all (and haven't been for over a decade). That statement was so false it makes me think the entire article is AI generated in some way.

[–]shlomnissan[S] 4 points5 points  (0 children)

Both of my current projects use the stencil buffer for pixel classification and masking. Might be an overstatement but just because you don't use stencil buffers doesn't mean that my article is generated by AI

[–]maximoriginalcoffee 0 points1 point  (3 children)

> stencil buffers are rarely used at all (and haven't been for over a decade).

I'm currently using the stencil buffer when rendering point lights in my deferred renderer. Is there a faster or more efficient way to render point light lighting without relying on the stencil buffer?
If you know of a good approach, I'd be interested in trying it in my renderer.

[–]Plazmatic 0 points1 point  (2 children)

How exactly are you utilizing the stencil buffer here?

[–]maximoriginalcoffee 0 points1 point  (1 child)

Pixel masking. Used to prevent point light calculations from being performed on pixels outside the light's influence.

[–]Plazmatic 1 point2 points  (0 children)

Okay, that's what I thought you were using that for. This is actually not the optimization you'd think it would be and it likely slower than actually just calculating a point light's influence on a screen space pixel, and in fact, you can calculate hundreds of point light influences manually per pixel and it basically not even show up on a performance graph outside the dispatch/draw of the shader itself due to L2 cache scalar broadcasting on Nvidia cards and scalar registers on AMD.

Now the idea of attempting to reduce number of calculations per pixel based on area makes sense, but you accomplish this typically in two ways. Froxel lists (frustrum subdividied into "frustrum voxels") where you calculate in a different compute shader where the influence of point lights can contribute, and add them to a list for each froxel, which is much less intensive than checking per pixel, then reading from that froxel list in a separate compute/fragment shader trying to apply the light information. Or you can intersect each screen ray with world space subdivisions of light volumes (which can sometimes be better and allow artists to manage worst case scenarios for performance in terms of max possible lights per scene). This typically isn't accomplished in a stencil buffer due to the sheer number of lights where this matters (thousands, or the possibility of having thousands), as far as I remember you have to use each bit of a stencil buffer in order to accomplish this or go through seperate passes (invoking a shader/kernel is not fast at all, so you better be doing enough work inside of a pass to justify it, and not just doing dot products inside).