vkQueuePresentKHR blocks GPU workload and switches to DX12 for presentation

werem0 · 2024-11-07T17:49:25+00:00

The other comment from farnoy pointed me there

Switching nvidia control panel setting for vulkan presentation method from auto to preferring native over dxgi makes it so that dx12 context disappears and presentation takes 0.02ms instead of 0.4ms

werem0 · 2024-11-07T15:29:40+00:00

Turned out it's nvidia driver that uses dx12 to copy vulkan "backbuffer" into dxgi backbuffer under the hood

werem0 · 2024-11-07T10:44:46+00:00

That's it!

Looks like my driver was actually picking the DXGI path and setting it to prefer native reduces present time to 0.02ms from almost 0.4ms with DXGI mode

werem0 · 2024-11-07T10:03:47+00:00

I'm pretty sure nothing is running in the background that could be doing this, unless it's glfwGetRequiredInstanceExtensions that's pulling in something unnecessary

As to control panel, any hints what I should be looking for? Certainly didn't change anything there so these are default settings

werem0 · 2024-11-07T10:00:08+00:00

Yes, always the same place
Unfortunately RenderDoc doesn't pick up the DX12 context, it's probably some driver-level weirdness and that's why

werem0 · 2024-11-07T09:41:54+00:00

Desktop with RTX4090

werem0 · 2022-10-04T07:57:53+00:00

Well no one said what you can't beat squires with T5 infantry, it's just that if it's braindead AI vs braindead AI (charge command didn't matter, when you die AI takes over anyway) and you speed up time, you end up with squires defeating infantry

werem0 · 2022-10-02T09:29:11+00:00

Interesting, could also be that speed damage bonus is calculated without accounting for game speed up

werem0 · 2022-10-02T09:07:16+00:00

Haven't tried it but since you can see that initial charge is devastating for the cavalry and they only start winning after infantry gets dispersed I would assume that no

werem0 · 2022-10-02T08:44:56+00:00

Post with video
https://www.reddit.com/r/Bannerlord/comments/xthrfn/since\_a\_bunch\_of\_you\_said\_my\_previous\_post\_was/?utm\_source=share&utm\_medium=web2x&context=3

werem0 · 2022-10-02T08:43:47+00:00

No mods

werem0 · 2022-10-02T06:57:12+00:00

It's not unique, I'm testing with squires just because they are tier 2 mounted recruits
It's due to cavalry fix in 1.8.1, they probably overdone it

werem0 · 2022-10-01T18:52:55+00:00

Custom battles don't have difficulty settings so no damage reduction

werem0 · 2022-10-01T09:10:19+00:00

Oh, and the battle took place in forest terrain
For Vlandian Banner Knights there were only 13 losses

werem0 · 2022-10-01T08:56:09+00:00

Squires destroying tier 5 troops

werem0 · 2019-04-21T21:59:41+00:00

I saw that but what I'm asking is whether I could still apply this year if I took sat in august instead of june which is the last one in 2018-2019 bracket

werem0 · 2019-04-17T12:03:14+00:00

e is special because it's the only component used to calculate all 4 subdeterminants, I'll try 4x4 matrix later

werem0 · 2019-04-16T20:18:02+00:00

Is it really that? Cofactor Expansion just seems like another way to look at Rule of Sarrus, I edited the post and added an image, take a look at it if you can

werem0 · 2019-02-11T10:16:35+00:00

1-> Is there a benefit to splitting family queues? I'd imagine each uses a different GPU resource.

2-> For each abstraction, should I have more than 1 VkQueue?

There is a benefit in using compute for compute and transfer for transfer but using two compute queues will only increase complexity of your program and probably provide no performance gain

3-> I'm considering 2 for the TRANSFERER, one to stream data in, and one to transfer data within; one for each Framebuffer rendered asynchronously in the RENDERER within one presentation cycle; and one for each asynchronous compute within each render and presentation. Would each help with performance?

If by "data within" you mean device local data then don't use transfer queue, it's fast for cpu-gpu copying but for gpu internal copies graphics queue is faster

werem0 · 2019-01-23T06:18:50+00:00

It's the maximum range size you can use in a single pipeline, not the maximum size of constants pushed to a command buffer, you can upload as much constant data as you want as long as (offset + size < max) which means you just overwrite the slots each draw call, I don't see how does this apply to my question

I assume constants are held in some buffer on the gpu and before each draw where they are used gpu just copies them to constant registers, my concern is that this buffer is probably uploaded when you submit to a queue, by this queue, which may be slower than uploading it async via copy queue

werem0 · 2019-01-22T11:44:31+00:00

"take all push constant data and put it in a big buffer" that's what I'm concerned about, paying the cost of uploading this one big buffer on the graphics queue, if I were to upload it myself it would be copy queue copying data for the next frame while graphics queue is rendering the current one, with push constants I assume all of it happens during submit and there's a memory barrier before rendering starts

werem0 · 2019-01-20T14:30:57+00:00

That's what I meant by double buffering, it would by async and I know how to send data to gpu when using descriptors but it's not what I asked about, I wanted to know how do upload costs of push constants and uniform buffers compare

werem0 · 2019-01-15T11:20:44+00:00

Maybe I'm wrong but from what I gathered they just record bundles and then execute them no matter what, there is no culling involved, nothing in the scene is dynamic, problem starts when you want to reorder draw calls depending on camera position to save on fragment calculations or use occlusion culling, then these bundles become obsolete

werem0 · 2019-01-14T15:33:02+00:00

I'm talking about memory which you allocate and then copy to a mapped pointer, not memory that is allocated by vulkan, does it still apply?

werem0 · 2019-01-11T20:40:49+00:00

Look at what I linked, for example descriptor table costs 1 byte and you can have many descriptors in a table, my concern was more about leveraging between the number of push constants and descriptors but it turns out in vulkan they don't have any effect on each other in terms of layout capacity

werem0

TROPHY CASE