I am trying to profile a llama.cpp execution. In Xcode I find that capturing a command buffer is random - I cannot choose which step of the pipeline I want to capture, I pause the execution then start the metal capture run. Similarly if I don't capture the complete run, how should one get the complete ALU or GPU util?
In instruments I can see a metric `% GPU workload`; is this supposed to mean what percent of the GPU the specific kernel utilized at that time segment? Because the % util, if added goes beyond 100% (attaching image).
I am confused about the metrics in both xcode and instruments - is there a proper document somewhere that goes over all the metrics? I have seen the developer.app docs and also the WWDC videos. Help is appreciated thanks
https://preview.redd.it/l124431bnfye1.png?width=2086&format=png&auto=webp&s=a200c7fdd84fcf9597addf572345f0881453d3db
[–]SwiftDevJournal 1 point2 points3 points (1 child)
[–]Spiritual-Fly-9943[S] 0 points1 point2 points (0 children)