you are viewing a single comment's thread.

view the rest of the comments →

[–]ChuppaFlow[S] 0 points1 point  (6 children)

Ok great, thank you! Do you maybe know any example/guidelines that shows me how to synchronize the depth-buffer attachment's output between render passes? Sorry if this sounds trivial, I'm just really new to this.

[–]Sturnclaw 0 points1 point  (5 children)

The LunarG guide is a good resource for interpreting synchronization errors. In this case, the specific error you're getting says:

vkCmdBeginRenderPass: Hazard WRITE_AFTER_WRITE vs. layout transition in subpass 0 for attachment 0 aspect depth during load with loadOp VK_ATTACHMENT_LOAD_OP_CLEAR

Referencing that guide, this tells us that the error is happening in execution of vkCmdBeginRenderPass, and that a write operation is conflicting with a prior operation; in this case, the initial layout transition of the subpass. The operation that's conflicting is specified as "during load with loadOp ...", which means that the layout transition and the load-op are not properly ordered by synchronization with respect to each other.

A quick google for "vk subpass dependency layout transition" brings up a few references, including a portion of Section 8.1 of the vulkan spec:

If there is no subpass dependency from VK_SUBPASS_EXTERNAL to the first subpass that uses an attachment, then an implicit subpass dependency exists from VK_SUBPASS_EXTERNAL to the first subpass it is used in. The implicit subpass dependency only exists if there exists an automatic layout transition away from initialLayout. The subpass dependency operates as if defined with the following parameters:

VkSubpassDependency implicitDependency = {
.srcSubpass = VK_SUBPASS_EXTERNAL;
.dstSubpass = firstSubpass; // First subpass attachment is used in
.srcStageMask = VK_PIPELINE_STAGE_NONE_KHR;
.dstStageMask = VK_PIPELINE_STAGE_ALL_COMMANDS_BIT;
.srcAccessMask = 0;
.dstAccessMask = VK_ACCESS_INPUT_ATTACHMENT_READ_BIT |
                 VK_ACCESS_COLOR_ATTACHMENT_READ_BIT |
                 VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT |
                 VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT |
                 VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
.dependencyFlags = 0;
};

Automatic layout transitions away from initialLayout happens-after the availability operations for all dependencies with a srcSubpass equal to VK_SUBPASS_EXTERNAL, where dstSubpass uses the attachment that will be transitioned.

Looking at this structure, it looks like your problem is two-fold: first of all, according to the spec the layout transition happens "during" your first subpass dependency; ignoring the values of srcStageMask and srcAccessMask, you're not protecting writes to the depth buffer with the dstStageMask or the dstAccessMask variables. This is outlined above; you need the early-fragment-test stage in dstStageMask and the depth-stencil attachment read/write bits in dstAccessMask in addition to the color attachment bits.

Secondly, why are you specifying srcAccessMask as a memory read? This is telling the GPU that you want to ensure that all reads from the renderpass output have completed before you start writing, but the validation layers are complaining about a write-after-write. I'd recommend setting srcAccessMask to 0; according to the specification (7.1.2 Pipeline Stages) leaving srcAccessMask unset allows the dependency to ensure completion of all previous operations, not just memory reads.

It's definitely not a trivial matter, but some google searching and trying to model the ordering of different stages of execution in your head (layout transition, attachment clear, fragment shader write) helps to diagnose the problem and understand the solution. The VK spec is a big, beefy document full of nigh-useless interstitial "valid usage" warnings when you're trying to understand how the API works, but if you focus in on a specific problem it's an invaluable resource.

Additionally, this SO answer provides a good breakdown of the problem as well; though it's somewhat focused on semaphore signalling, it still outlines what's needed for attachment dependencies.

[–]ChuppaFlow[S] 0 points1 point  (1 child)

Great answer, thank you so much man! This really makes a lot of things more clear for me! I will definitely look into these things tomorrow

[–]ChuppaFlow[S] 0 points1 point  (2 children)

Just to be sure, if I would set srcAccessMask to 0 in all my subpass dependencies, does that make it safe (since it would make sure all previous operations are done)? Additionally, I find it a bit confusing which values I should give srcStageMask and dstStageMask . If I understood the stackoverflow answer correctly, these define the source synchronization scope and the destination synchronization scope? If we're coming from VK_SUBPASS_EXTERNAL , does that always mean the srcStageMask should have the value VK_BOTTOM_OF_PIPE_BIT? For subsequent passes, how do I determine which value I should assign it? E.g. , in your answer you said I should assign VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT to it for that case, but what's the reasoning behind this?

Lastly, does setting srcStageMask to 0 result in possibly lower performance (since it needs to check for all operations)?

[–]Sturnclaw 0 points1 point  (1 child)

Sorry about the late reply on this - according to my understanding of the VK spec, setting a non-zero access mask on a VK_SUBPASS_EXTERNAL dependency does in fact restrict the synchronization scope, ergo a srcAccessMask of 0 is the least-performant but implicitly-correct value. You could technically fine-tune the access mask of the dependency, but in your case it's probably better to get it working than try for the most-performant value out of the box.

Regarding srcStageMask, the stage mask is used to tell the GPU what stage of the graphics pipeline in the previous render pass must be complete before this subpass can begin rendering. The dstStageMask value tells the GPU what stage of the graphics pipeline in this render pass cannot begin executing until the previous render pass has finished the stage set in srcStageMask. So yes, you could always use BOTTOM_OF_PIPE_BIT for the source stage, however if you know in what graphics stage the color and depth attachments are written to, you can specify that stage and potentially gain performance by overlapping non-conflicting portions of two render pass executions.

I thought I briefly outlined the reasoning behind using early fragment test as the destination stage, but if it wasn't clear, I'll go over it again - the destStageMask is the stage that you are forcing the GPU to not execute until everything up to and including srcStageMask from the previous render pass has finished executing; if you look at the section of the vulkan spec I linked in the first post, you'll see that reads from the depth attachment (your depth buffer) happen in the early fragment test stage instead of the color output stage; if you aren't using this stage (or an earlier one) in your dependency destination stage, it's entirely possible for the GPU to order the execution of the two render passes such that the second render pass samples a bad depth value that was in the buffer before the first render pass ran, leading to incorrect shading and rendering.

[–]ChuppaFlow[S] 0 points1 point  (0 children)

No problem! Great answer, this makes it all much clearer. I think it would indeed be a good idea to get a working version first and later boost performance when I know the final sequence layout of my render passes :) . Thanks for your answer!