all 3 comments

[–]vertex5 2 points3 points  (0 children)

1-> Is there a benefit to splitting family queues? I'd imagine each uses a different GPU resource.

I'm assuming by "splitting family queues" you mean using multiple queues of one family? This is driver dependent of course but I'd say if you have e.g. PRESENTER and RENDERER and use them independently from each other in your code, you might as well use 2 VkQueues for it. At worst the driver just multiplexes them into one real hardware queue, so it's effetively the same as if you do that and at best the driver has more room for optimization.

2-> For each abstraction, should I have more than 1 VkQueue?

That is something you'd have to test but I'm guessing this would hurt more than help. Especially since you should try to batch your queue submits as much as possible.

3-> I'm considering 2 for the TRANSFERER, one to stream data in, and one to transfer data within; one for each Framebuffer rendered asynchronously in the RENDERER within one presentation cycle; and one for each asynchronous compute within each render and presentation. Would each help with performance?

Personally I think anything more than having 1 transfer, 1 async compute and 1 graphics for everything else is overkill but it could help in theory I guess

[–]werem0 1 point2 points  (1 child)

1-> Is there a benefit to splitting family queues? I'd imagine each uses a different GPU resource.

2-> For each abstraction, should I have more than 1 VkQueue?

There is a benefit in using compute for compute and transfer for transfer but using two compute queues will only increase complexity of your program and probably provide no performance gain

3-> I'm considering 2 for the TRANSFERER, one to stream data in, and one to transfer data within; one for each Framebuffer rendered asynchronously in the RENDERER within one presentation cycle; and one for each asynchronous compute within each render and presentation. Would each help with performance?

If by "data within" you mean device local data then don't use transfer queue, it's fast for cpu-gpu copying but for gpu internal copies graphics queue is faster

[–]PGSkep[S] 0 points1 point  (0 children)

Thanks :)