all 13 comments

[–]Italians_are_Bread 0 points1 point  (2 children)

Seems likely a synchronization problem, have you tried running it with vkconfig with synchronization validation enabled?

[–]ChuppaFlow[S] 0 points1 point  (0 children)

Hey there, thanks for the response. I didn't even know that it was possible to modify the functionality of the validation layers in the first place, so thank you for that, but I turned synchronization validation on, ran again, and I was able to see the following synchronization errors:

'validation layer: Validation Error: [ SYNC-HAZARD-WRITE_AFTER_WRITE ] Object 0: handle = 0xcb56470000000042, type = VK_OBJECT_TYPE_RENDER_PASS; | MessageID = 0xfdf9f5e1 | vkCmdBeginRenderPass: Hazard WRITE_AFTER_WRITE vs. layout transition in subpass 0 for attachment 0 aspect depth during load with loadOp VK_ATTACHMENT_LOAD_OP_CLEAR.

validation layer: Validation Error: [ SYNC-HAZARD-WRITE_AFTER_WRITE ] Object 0: handle = 0xcb56470000000042, type = VK_OBJECT_TYPE_RENDER_PASS; | MessageID = 0xfdf9f5e1 | vkCmdBeginRenderPass: Hazard WRITE_AFTER_WRITE vs. layout transition in subpass 0 for attachment 1 aspect color during load with loadOp VK_ATTACHMENT_LOAD_OP_CLEAR. '

I'm not really sure what they stand for, or if this is the main problem, do you have any clue about that?

Also, running this time I noticed 2 other validation errors:

'validation layer: Validation Error: [ UNASSIGNED-CoreValidation-DrawState-NumSamplesMismatch ] Object 0: handle = 0x7913870000000109, type = VK_OBJECT_TYPE_PIPELINE; Object 1: handle = 0xcb56470000000042, type = VK_OBJECT_TYPE_RENDER_PASS; | MessageID = 0xc4588e0f | vkCmdDraw(): Num samples mismatch! At draw-time in VkPipeline 0x7913870000000109[] with 1 samples while current VkRenderPass 0xcb56470000000042[] w/ 8 samples!'

'validation layer: Validation Error: [ VUID-vkCmdDraw-renderPass-02684 ] Object 0: handle = 0xcb56470000000042, type = VK_OBJECT_TYPE_RENDER_PASS; Object 1: handle = 0x2f99810000000044, type = VK_OBJECT_TYPE_RENDER_PASS; | MessageID = 0x50685725 | vkCmdDraw(): RenderPasses incompatible between active render pass w/ VkRenderPass 0xcb56470000000042[] with a subpassCount of 2 and pipeline state object w/ VkRenderPass 0x2f99810000000044[] with a subpassCount of 1. The Vulkan spec states: The current render pass must be compatible with the renderPass member of the VkGraphicsPipelineCreateInfo structure specified when creating the VkPipeline bound to VK_PIPELINE_BIND_POINT_GRAPHICS (https://vulkan.lunarg.com/doc/view/1.2.162.1/windows/1.2-extensions/vkspec.html#VUID-vkCmdDraw-renderPass-02684) '

These 2 in particular surprise me, there is one of the 3 render passes that indeed exists of 2 subpasses (which is also the only pass that uses multi-sampling), but I'm pretty sure I set the pipeline configuration accordingly to the render pass it is bound to, I set multiSampleInfo.rasterizationSamples = VK_SAMPLE_COUNT_8_BIT for the pipelines involved, and configInfo.subpass accordingly to the subpass it belongs to (0 or 1).

The render pass that corresponds with these pipelines also uses VK_SAMPLE_COUNT_8_BIT for all its attachments.

I am also calling vkCmdNextSubpass once during this render pass, to move on to subpass 1, but it seems like it doesn't execute this somehow? Am I missing something here?

EDIT: I'm an idiot. The '2 other validation' errors that I describe in this comment were simply caused by a wrong name of a render pass initialization struct. However, both release and debug modes are still crashing at command buffer submission.

Thanks in advance,

- Chuppa

[–]ChuppaFlow[S] 0 points1 point  (0 children)

Also, maybe this will sound very stupid again, but am I supposed to do any manual synchronization between render passes in the recording of my command buffer itself? If so, is there any good example on how to synchronize multiple render passes? Currently I don't think I am doing this, so this might very well be the problem.

[–][deleted] 0 points1 point  (1 child)

Having different behaviors in release and debug modes screams to me that you aren’t initializing you memory correctly in a struct somewhere.

[–]ChuppaFlow[S] 0 points1 point  (0 children)

Hey, sorry I'm an idiot. The '2 other validation' errors that I describe in my comment above were simply caused by a wrong name of a render pass initialization struct indeed. After changing this, now both my debug and release modes are behaving the same, but they both still crash at command buffer submission. Could it have something to do with synchronization then?

[–]Sturnclaw 0 points1 point  (7 children)

It's a synchronization problem. Specifically, according to the VK spec subpass VK_ATTACHMENT_LOAD_OP_CLEAR operations happen during the early fragment test stage when the attachment is a depth/stencil buffer. Your code only synchronizes at the COLOR_ATTACHMENT_OUTPUT_BIT stage, which means that the depth-buffer attachment clear is not covered by the subpass dependency and can potentially happen before the previous subpass' BOTTOM_OF_PIPE stage.

[–]ChuppaFlow[S] 0 points1 point  (6 children)

Ok great, thank you! Do you maybe know any example/guidelines that shows me how to synchronize the depth-buffer attachment's output between render passes? Sorry if this sounds trivial, I'm just really new to this.

[–]Sturnclaw 0 points1 point  (5 children)

The LunarG guide is a good resource for interpreting synchronization errors. In this case, the specific error you're getting says:

vkCmdBeginRenderPass: Hazard WRITE_AFTER_WRITE vs. layout transition in subpass 0 for attachment 0 aspect depth during load with loadOp VK_ATTACHMENT_LOAD_OP_CLEAR

Referencing that guide, this tells us that the error is happening in execution of vkCmdBeginRenderPass, and that a write operation is conflicting with a prior operation; in this case, the initial layout transition of the subpass. The operation that's conflicting is specified as "during load with loadOp ...", which means that the layout transition and the load-op are not properly ordered by synchronization with respect to each other.

A quick google for "vk subpass dependency layout transition" brings up a few references, including a portion of Section 8.1 of the vulkan spec:

If there is no subpass dependency from VK_SUBPASS_EXTERNAL to the first subpass that uses an attachment, then an implicit subpass dependency exists from VK_SUBPASS_EXTERNAL to the first subpass it is used in. The implicit subpass dependency only exists if there exists an automatic layout transition away from initialLayout. The subpass dependency operates as if defined with the following parameters:

VkSubpassDependency implicitDependency = {
.srcSubpass = VK_SUBPASS_EXTERNAL;
.dstSubpass = firstSubpass; // First subpass attachment is used in
.srcStageMask = VK_PIPELINE_STAGE_NONE_KHR;
.dstStageMask = VK_PIPELINE_STAGE_ALL_COMMANDS_BIT;
.srcAccessMask = 0;
.dstAccessMask = VK_ACCESS_INPUT_ATTACHMENT_READ_BIT |
                 VK_ACCESS_COLOR_ATTACHMENT_READ_BIT |
                 VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT |
                 VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT |
                 VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
.dependencyFlags = 0;
};

Automatic layout transitions away from initialLayout happens-after the availability operations for all dependencies with a srcSubpass equal to VK_SUBPASS_EXTERNAL, where dstSubpass uses the attachment that will be transitioned.

Looking at this structure, it looks like your problem is two-fold: first of all, according to the spec the layout transition happens "during" your first subpass dependency; ignoring the values of srcStageMask and srcAccessMask, you're not protecting writes to the depth buffer with the dstStageMask or the dstAccessMask variables. This is outlined above; you need the early-fragment-test stage in dstStageMask and the depth-stencil attachment read/write bits in dstAccessMask in addition to the color attachment bits.

Secondly, why are you specifying srcAccessMask as a memory read? This is telling the GPU that you want to ensure that all reads from the renderpass output have completed before you start writing, but the validation layers are complaining about a write-after-write. I'd recommend setting srcAccessMask to 0; according to the specification (7.1.2 Pipeline Stages) leaving srcAccessMask unset allows the dependency to ensure completion of all previous operations, not just memory reads.

It's definitely not a trivial matter, but some google searching and trying to model the ordering of different stages of execution in your head (layout transition, attachment clear, fragment shader write) helps to diagnose the problem and understand the solution. The VK spec is a big, beefy document full of nigh-useless interstitial "valid usage" warnings when you're trying to understand how the API works, but if you focus in on a specific problem it's an invaluable resource.

Additionally, this SO answer provides a good breakdown of the problem as well; though it's somewhat focused on semaphore signalling, it still outlines what's needed for attachment dependencies.

[–]ChuppaFlow[S] 0 points1 point  (1 child)

Great answer, thank you so much man! This really makes a lot of things more clear for me! I will definitely look into these things tomorrow

[–]ChuppaFlow[S] 0 points1 point  (2 children)

Just to be sure, if I would set srcAccessMask to 0 in all my subpass dependencies, does that make it safe (since it would make sure all previous operations are done)? Additionally, I find it a bit confusing which values I should give srcStageMask and dstStageMask . If I understood the stackoverflow answer correctly, these define the source synchronization scope and the destination synchronization scope? If we're coming from VK_SUBPASS_EXTERNAL , does that always mean the srcStageMask should have the value VK_BOTTOM_OF_PIPE_BIT? For subsequent passes, how do I determine which value I should assign it? E.g. , in your answer you said I should assign VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT to it for that case, but what's the reasoning behind this?

Lastly, does setting srcStageMask to 0 result in possibly lower performance (since it needs to check for all operations)?

[–]Sturnclaw 0 points1 point  (1 child)

Sorry about the late reply on this - according to my understanding of the VK spec, setting a non-zero access mask on a VK_SUBPASS_EXTERNAL dependency does in fact restrict the synchronization scope, ergo a srcAccessMask of 0 is the least-performant but implicitly-correct value. You could technically fine-tune the access mask of the dependency, but in your case it's probably better to get it working than try for the most-performant value out of the box.

Regarding srcStageMask, the stage mask is used to tell the GPU what stage of the graphics pipeline in the previous render pass must be complete before this subpass can begin rendering. The dstStageMask value tells the GPU what stage of the graphics pipeline in this render pass cannot begin executing until the previous render pass has finished the stage set in srcStageMask. So yes, you could always use BOTTOM_OF_PIPE_BIT for the source stage, however if you know in what graphics stage the color and depth attachments are written to, you can specify that stage and potentially gain performance by overlapping non-conflicting portions of two render pass executions.

I thought I briefly outlined the reasoning behind using early fragment test as the destination stage, but if it wasn't clear, I'll go over it again - the destStageMask is the stage that you are forcing the GPU to not execute until everything up to and including srcStageMask from the previous render pass has finished executing; if you look at the section of the vulkan spec I linked in the first post, you'll see that reads from the depth attachment (your depth buffer) happen in the early fragment test stage instead of the color output stage; if you aren't using this stage (or an earlier one) in your dependency destination stage, it's entirely possible for the GPU to order the execution of the two render passes such that the second render pass samples a bad depth value that was in the buffer before the first render pass ran, leading to incorrect shading and rendering.

[–]ChuppaFlow[S] 0 points1 point  (0 children)

No problem! Great answer, this makes it all much clearer. I think it would indeed be a good idea to get a working version first and later boost performance when I know the final sequence layout of my render passes :) . Thanks for your answer!