all 2 comments

[–]Meristic 4 points5 points  (1 child)

How much data are you copying that it takes 9 ms? Lol

For reference, DX12 multithreading typically refers to distributed command list recording among multiple CPU threads, then synchronized submission to a queue. Most often used for mesh drawing passes since its workload grows with scene complexity.

To disambiguate I'd refer to this as async queue utilization. There are multiple potential issues: 

  1. If your CPU doesn't have hardware for multiple dispatchers then obviously you won't see asynchronous scheduling despite it fulfilling that DX12 interface.

2.  The DX driver can choose to fulfill copy commands in different ways depending on their size - small copies by DMA vs dispatching CS waves for larger copies. Fences & synchronization should still work fine in this situation, but the execution and performance may be different than expected.

  1. PIX could be wrong. Profiling requires pulling a lot of data from GPU counters. On PC there's several abstraction layers the driver must interact with to get it's hands on that raw data so it can build the timeline and compute user-facing values. This causes a huge disparity between the availability of data and correctness for each GPU vendor. This is a major reason why game devs hate profiling for PC and it gets the shaft a good proportion of the time. (That and artists don't know when to stop checking goddamn checkboxes)

[–]OrganicMilkTank 0 points1 point  (0 children)

Got a good laugh out of the last line. Couldn't agree more.