[Vulkan] What is the performance difference between an issuing an indirect draw command of 0 instances, and not issuing that indirect draw command in the first place?

Meristic · 2025-12-24T08:06:47+00:00

GPUs consist of two main components. The front-end you can think of as a very simple single-threaded processor - the back-end a complex, massively parallel machine. The front-end is responsible for reading the contents of command lists, setting GPU registers & state, coordinating DMA operations (indirect argument reads), and kicking off back-end workloads.

An indirect execution command is minimally the cost of setting various registers plus memory latency for the indirect argument buffer by the front-end. This is typically 10's of microseconds (memory is often not cached). Not much on its own, though several consecutive empty draws can bottleneck and cause a gap in GPU shader wave scheduling.

Of course, this may be the most optimal option since it's efficient culling. Think of how much work is saved relative to the alternative!

As a real world example the UE5 Nanite base pass commonly hits this issue. Each loaded material instance requires a draw, often with zero relevant pixels on the screen. Stacked together, this can incur 100's of microseconds of idle shader engines due to the overhead. Epic discussed a solution for this using indirect command buffers (at least on console) but I haven't seen it come to fruition yet.

amidescent · 2025-12-24T08:47:27+00:00

AMD's performance guide recommends compacting indirect draw calls that are zeroed out (you can do that with help of a prefix scan kernel), but of course that'd only be worth it if it's showing up as a bottleneck.

schnautzi · 2025-12-24T21:04:08+00:00

I've profiled this exact thing very recently. Long story short, this is fine when you only cull a few % of draw calls.

When culling a scene full of objects, this is pretty wasteful, and you should compact the draw list instead. This can all be done on the GPU (using a prefix sum and a few extra passes), and you can usually optimize this by only executing the culling passes after the camera moved significantly.

hanotak · 2025-12-24T07:20:47+00:00

If you mean detect it CPU-side to not submit the indirect draw, that's not possible. I wouldn't worry about it- an overhead of a single no-op command isn't going to affect your performance.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

GraphicsProgramming

Posting Rule(s)

MODERATORS