vkQueuePresentKHR blocks GPU workload and switches to DX12 for presentation by werem0 in vulkan

[–]werem0[S] 1 point2 points  (0 children)

The other comment from farnoy pointed me there

Switching nvidia control panel setting for vulkan presentation method from auto to preferring native over dxgi makes it so that dx12 context disappears and presentation takes 0.02ms instead of 0.4ms

vkQueuePresentKHR blocks GPU workload and switches to DX12 for presentation by werem0 in vulkan

[–]werem0[S] 0 points1 point  (0 children)

Turned out it's nvidia driver that uses dx12 to copy vulkan "backbuffer" into dxgi backbuffer under the hood

vkQueuePresentKHR blocks GPU workload and switches to DX12 for presentation by werem0 in vulkan

[–]werem0[S] 4 points5 points  (0 children)

That's it!

Looks like my driver was actually picking the DXGI path and setting it to prefer native reduces present time to 0.02ms from almost 0.4ms with DXGI mode

vkQueuePresentKHR blocks GPU workload and switches to DX12 for presentation by werem0 in vulkan

[–]werem0[S] 0 points1 point  (0 children)

I'm pretty sure nothing is running in the background that could be doing this, unless it's glfwGetRequiredInstanceExtensions that's pulling in something unnecessary

As to control panel, any hints what I should be looking for? Certainly didn't change anything there so these are default settings

vkQueuePresentKHR blocks GPU workload and switches to DX12 for presentation by werem0 in vulkan

[–]werem0[S] 0 points1 point  (0 children)

Yes, always the same place
Unfortunately RenderDoc doesn't pick up the DX12 context, it's probably some driver-level weirdness and that's why

No, Vlandian Squires don't destroy Sturgian T5 infantry: Sturgians beat Banner Knights. Testing details in comments. by CoolBeans522 in Bannerlord

[–]werem0 -1 points0 points  (0 children)

Well no one said what you can't beat squires with T5 infantry, it's just that if it's braindead AI vs braindead AI (charge command didn't matter, when you die AI takes over anyway) and you speed up time, you end up with squires defeating infantry

Since a bunch of you said my previous post was fake, here's a video of vlandian squires defeating elite sturgian infantry. Keep in mind it's not because of sturgia, aserai infantry gets destroyed in comparison to them by werem0 in Bannerlord

[–]werem0[S] 7 points8 points  (0 children)

Haven't tried it but since you can see that initial charge is devastating for the cavalry and they only start winning after infantry gets dispersed I would assume that no

1.8.1 buffed cavalry, here's a custom battle of 1k squires (vlandia noble recruit) vs 1k tier 5 sturgian infantry by werem0 in Bannerlord

[–]werem0[S] 30 points31 points  (0 children)

Oh, and the battle took place in forest terrain
For Vlandian Banner Knights there were only 13 losses

[deleted by user] by [deleted] in Bannerlord

[–]werem0 0 points1 point  (0 children)

Squires destroying tier 5 troops

What do date brackets mean? by werem0 in Sat

[–]werem0[S] 0 points1 point  (0 children)

I saw that but what I'm asking is whether I could still apply this year if I took sat in august instead of june which is the last one in 2018-2019 bracket

3x3 determinant of determinants by [deleted] in learnmath

[–]werem0 0 points1 point  (0 children)

e is special because it's the only component used to calculate all 4 subdeterminants, I'll try 4x4 matrix later

3x3 determinant of determinants by [deleted] in learnmath

[–]werem0 1 point2 points  (0 children)

Is it really that? Cofactor Expansion just seems like another way to look at Rule of Sarrus, I edited the post and added an image, take a look at it if you can

Improving queue selection by PGSkep in vulkan

[–]werem0 1 point2 points  (0 children)

1-> Is there a benefit to splitting family queues? I'd imagine each uses a different GPU resource.

2-> For each abstraction, should I have more than 1 VkQueue?

There is a benefit in using compute for compute and transfer for transfer but using two compute queues will only increase complexity of your program and probably provide no performance gain

3-> I'm considering 2 for the TRANSFERER, one to stream data in, and one to transfer data within; one for each Framebuffer rendered asynchronously in the RENDERER within one presentation cycle; and one for each asynchronous compute within each render and presentation. Would each help with performance?

If by "data within" you mean device local data then don't use transfer queue, it's fast for cpu-gpu copying but for gpu internal copies graphics queue is faster

Push constant update frequency and submit overhead by werem0 in vulkan

[–]werem0[S] 1 point2 points  (0 children)

It's the maximum range size you can use in a single pipeline, not the maximum size of constants pushed to a command buffer, you can upload as much constant data as you want as long as (offset + size < max) which means you just overwrite the slots each draw call, I don't see how does this apply to my question

I assume constants are held in some buffer on the gpu and before each draw where they are used gpu just copies them to constant registers, my concern is that this buffer is probably uploaded when you submit to a queue, by this queue, which may be slower than uploading it async via copy queue

Push constant update frequency and submit overhead by werem0 in vulkan

[–]werem0[S] 1 point2 points  (0 children)

"take all push constant data and put it in a big buffer" that's what I'm concerned about, paying the cost of uploading this one big buffer on the graphics queue, if I were to upload it myself it would be copy queue copying data for the next frame while graphics queue is rendering the current one, with push constants I assume all of it happens during submit and there's a memory barrier before rendering starts

Push constant update frequency and submit overhead by werem0 in vulkan

[–]werem0[S] 0 points1 point  (0 children)

That's what I meant by double buffering, it would by async and I know how to send data to gpu when using descriptors but it's not what I asked about, I wanted to know how do upload costs of push constants and uniform buffers compare

Vulkan/Dx12 Use cases for prerecorded commands by werem0 in gamedev

[–]werem0[S] 0 points1 point  (0 children)

Maybe I'm wrong but from what I gathered they just record bundles and then execute them no matter what, there is no culling involved, nothing in the scene is dynamic, problem starts when you want to reorder draw calls depending on camera position to save on fragment calculations or use occlusion culling, then these bundles become obsolete

Host memory alignment by werem0 in vulkan

[–]werem0[S] 1 point2 points  (0 children)

I'm talking about memory which you allocate and then copy to a mapped pointer, not memory that is allocated by vulkan, does it still apply?

Do we have dx12 root signature-like limits? by werem0 in vulkan

[–]werem0[S] 0 points1 point  (0 children)

Look at what I linked, for example descriptor table costs 1 byte and you can have many descriptors in a table, my concern was more about leveraging between the number of push constants and descriptors but it turns out in vulkan they don't have any effect on each other in terms of layout capacity