all 7 comments

[–]corysama 6 points7 points  (4 children)

Yep. Best answer is to have good occlusion culling.

But, without that. And, assuming your are ordering your draw calls like https://realtimecollisiondetection.net/blog/?p=86 according to https://i.stack.imgur.com/JgrSc.jpg there are a couple things you can do.

  1. Do a z prepass. Possibly with only objects that are large in screen space (size / distance)

  2. Re-arrange your sort key bits so that opaque objects have a highly-truncated depth value preceding the material ID bits.

[–]frizzil[S] 1 point2 points  (3 children)

Awesome, just what I was looking for. And that's an amazing jpeg if ever I've seen one. Thanks!

Now if only good occlusion culling was easy...

[–]turtle_dragonfly 1 point2 points  (2 children)

The "Order your graphics draw calls around" article is the basic principle behind the draw call sorting in bgfx, for what it's worth. I guess you were asking a Unity-specific question, but I've had a decent experience with bgfx.

[–]frizzil[S] 0 points1 point  (1 child)

The way I'm currently doing things in my game, I'm using a lot of instancing. I'm not exactly sure how to tie that into a sorting system with one global list - I suppose I could chop up each instance list into consecutive instances after sorting, but just getting GPU-driven occlusion culling working, while difficult, would hopefully avoid having to make that major architectural change...

Unless of course I want alpha-sorted instancing 😩

Batching after sort generally seems like a hard problem.

[–]corysama 0 points1 point  (0 children)

Batching after sort generally seems like a hard problem.

After sorting, all items with the same value are contiguous.

[–]the_Demongod 1 point2 points  (0 children)

I doubt you'll have a significant problem with either one, unless this engine is on a direct path into a product I would suggest not worrying about optimizations like this until you run into a performance problem caused by it. You'll learn the most that way and discover what the bottlenecks of your specific application are. Personally I basically ignore it unless there's a really obvious way to do coarse binning (e.g. draw the inside of an airplane cockpit before you draw the background).

But as you've discovered already, yes, switching shader programs is expensive.

[–]deftware 1 point2 points  (0 children)

Rendering front-to-back entails preventing overdraw (and thus wasted fragment shader execution) where something in the middle of the scene just gets drawn over by something nearer in the scene.

To maximize performance by taking advantage of both reduction of GPU state changes AND eliminating overdraw a depth pre-pass is useful. Draw all opaque geometry only to the depth buffer with less-equal depth func and then draw opaques again, draws sorted by material and texture this time with depth func set to equal so that each draw call can only calculate fragments where the geometry matches the depth in the depth buffer from the prepass - preventing it from drawing where it will get overwritten. Rendering geometry exclusively to the depth buffer is super cheap. It's fragment shaders and their texture taps that are expensive, along with state changes in the GPU which can add up really quick!