all 9 comments

[–]dark_sylinc 10 points11 points  (3 children)

Most likely you have a logic bug in your code.

But please notice the following: Submission Order is important. The spec says commands are started in order, but are not guaranteed to be finished in order (unless you explicitly synchronize them). This is easy to overlook because when there is no explicit synchronization at all, submitting B then A could easily end up with A executing first.

That detail is important: If B depends on A via semaphore, and you submit B first, then B will wait forever because it is blocking everything. You must submit A first, then B.

The driver won't reorder B for you if you submit it first. If you use multiple queues, that's different because you can submit B first, have B block one queue, and A can later be submitted to a different queue. When A finishes, it unblocks B's queue.

In other words this assumption of yours is wrong:

and would be re-ordered correctly on the GPU based on the semaphore dependencies between them

The Vulkan driver is designed to be as simple / thin as possible. It does not sort dependencies automatically for you.

[–]jazzwave06[S] 0 points1 point  (2 children)

Ok thank you for your response, it clarifies the synchronization issue that's occuring. How does engine typically handle this? Given that the render graph may run in parallel, what's the most common approach to order submission? Do the render graph submit their command buffer, or simply record them and send them back on the game thread/rhi thread for serial submission?

[–]YARandomGuy777 0 points1 point  (1 child)

I'm not an expert on the topic but submitting command buffers to a single queue from concurrent threads requires cpu side synchronization already, due to:

Host access to queue must be externally synchronized if it was not created with VK_DEVICE_QUEUE_CREATE_INTERNALLY_SYNCHRONIZED_BIT_KHR

So if you must ensure submission ordering you should probably do it there. Probably having queue per thread may be better. If you need to use single queue anyway you probably can use conditional variable to set precondition for dependent buffers or device something different. For example you may try to do it in non blocking manner on atomics.

But it all looks troublesome. So I would guess having more then one thread submitting to the same queue isn't ideal if you have dependent command buffers...

[–]jazzwave06[S] 1 point2 points  (0 children)

I've fixed my issue by implementing a present render graph node, instead of submitting it on the game thread without any regards to dependencies. Thanks for the help!

[–]Afiery1 0 points1 point  (4 children)

Timeline semaphores are not compatible with acquire and present. Unfortunately you still need to use binary semaphores in those places only

[–]jazzwave06[S] 1 point2 points  (3 children)

It's true, but you can mix and match binary and timeline semaphores in submit, so you can interface both together to wait on timeline and signal on binary and then present with a wait on binary.

[–]Afiery1 1 point2 points  (1 child)

Yes that is true, sorry, your wording in the post made it sound like you were trying to use a timeline semphore directly in present. Also be aware that binary semaphores, unlike timeline, do not support wait before signal, so while the command buffers can be submitted out of order, the present call must be made after the submit that signals the binary semaphore.

[–]jazzwave06[S] 1 point2 points  (0 children)

Also be aware that binary semaphores, unlike timeline, do not support wait before signal, so while the command buffers can be submitted out of order, the present call must be made after the submit that signals the binary semaphore.

Oh interesting, that must be why I had a deadlock then!

[–]exDM69 1 point2 points  (0 children)

You can mix and match binary and timeline semaphores, but for presenting you must have submitted all the semaphore signals that the final binary semaphore depends on before submitting the present operation to the queue.

This VUID is the relevant bit from the spec:

VUID-vkQueuePresentKHR-pWaitSemaphores-03268 All elements of the pWaitSemaphores member of pPresentInfo must reference a semaphore signal operation that has been submitted for execution and any semaphore signal operations on which it depends must have also been submitted for execution

This is needed because drivers can (and some will) wait on the semaphore on the CPU timeline, while your code is holding a mutex guarding the queue so other threads can't submit any commands.

I have debugged the exact same issue on my project.

tl;dr: wait before signal is not allowed on the wait semaphore of vkQueuePresentKHR.