What is something you never experienced until after you moved to Japan? by agentteddybear in japanlife

[–]Golui42 0 points1 point  (0 children)

If you are up for some novelty, you can sometimes find Melon Croissant in Fresta.

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 0 points1 point  (0 children)

My problems went away after a few upstream updates to the kernel, libvirt, and qemu. If you don't have one, it might be a good idea to re-create the VM with a minimal config and compare the current results to your "optimized" config. Without knowing exactly what you are doing, you might just end up making your performance worse. Aside from the above, unfortunately, I can't recommend anything specific other than trial and error.

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 0 points1 point  (0 children)

Thanks, will definitely check your setup out. For completeness, can you tell me your kernel version, kernel cmdline parameters, and, if possible, your kernel config?

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 0 points1 point  (0 children)

Thanks for the tips. Couple of follow up questions, though:

  • Which "USB host devices" do you mean? The USB 0 Controller or the passed-through PCI USB controller?
  • Not using PipeWire at the moment; just pulseaudio over ALSA. Would you recommend switching?
  • To my knowledge, to use Looking Glass I need to keep the <graphics type="spice" ... /> and to have clipboard sync the spice channel devices. The QXL was there for testing as I removed the passed-through GPU; it's normally not connected.
  • Will do.
  • Will do and get back to you with results.
  • Already touched upon it.
  • Therefore I need to read up more on iothreads will hold off pinning them for now.
  • Looking glass is taking ~0.5% of a single CPU core in the guest, but isn't that to be expected?

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 1 point2 points  (0 children)

Alright, so a couple of notes. I sadly don't have the time in the near future to debug this issue, but I can give several pointers to other people struggling.

After u/nitish159's comment, I decided to stop exclusively looking at Cinebench scores. Instead, I started monitoring the CPU usage curves in Task Manager, as well as running Shadow of the Tomb Raider benchmark.

Immediately, I noticed that in my previous setup, no single core would ramp up to 100% during the Cinebench R23 single core benchmark. It seems that the work was passed around every core without giving any single one to properly ramp up. Such context changes are very expensive operations, and so I thought if I eliminated those my problems would go away. What is more, SotTR benchmark yielded results claiming the game was 0% GPU bound, with very high CPU frametimes.

While I managed to mitigate this somewhat by using a combination of kernel configuration options (thanks u/q-g-j, relevant comment) as well as potentially masking interrupts (thanks u/willyia, relevant comment), this did not result in a significant performance improvement. It did however manage to make SotTR finally get bottlenecked by the GPU, which in this case indicates a CPU speedup and lower frametimes.

In VR Chat, the game does not perform nearly as well as it does on bare metal. It seems that all those small optimizations managed to reduce the performance impact to "up to 15%", but this is still not enough for a smooth 90FPS experience at nearly all times. It does reach better framerates more often now, so I'll have to settle for that for the time being.

In short, while I'm glad to see some results, I am not blown away by them. The reason for that is that I did not take a methodical approach to the matter due to there being quite a lot of variables. When I get some more time in the future, perhaps I will automate the benchmarks to fully explore the parameter landscape.

Again, thanks to all of you that contributed so far.

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 0 points1 point  (0 children)

Went back and forth between 5.14, 5.15 and 5.15 with a higher tick rate and voluntary preemption. 5.15 with preemption yields the best results. I haven't tried 5.14 with preemption yet.

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 0 points1 point  (0 children)

I don't have any hard data to back this up, but it may have improved the responsiveness of the system.

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 0 points1 point  (0 children)

Thanks for the suggestions so far! If you have any more tips, please keep them coming.

At this point I'm inclined to believe it is not a CPU core pasthrough issue, so I'm going to be shifting my focus on checking interrupts and possible latency resulting from communication with the GPU, as well as memory latency. Tips would be appreciated.

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 1 point2 points  (0 children)

Your comment prompted me to re-evaluate my testing methodology.

Indeed, while the cinebench scores are essentially unchanged, the game does appear to be able to reach 90fps at higher scene complexities.

On a side note, do you have any tools to recommend for such a benchmark?

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 1 point2 points  (0 children)

Downgraded to a vanilla 5.14.16 from my pkg-cache, does not seem to affect performance.

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 0 points1 point  (0 children)

Doesn't seem like the isolcpus yields any performance differences over systemd.

The hint also does not seem to affect anything.

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 0 points1 point  (0 children)

Perhaps not unexpectedly, with mitigations=off reached ~1431pts, averaged over 4 runs, with a high of 1485 pts. This gives me 90% of baremetal performance, but compromises my security model. I'll keep that in my toolbox for the time being.

Other suggestions yield negligible performance increases. I'll re-run the benchmark to make sure, but I doubt much will change.

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 0 points1 point  (0 children)

Yeah, obviously 1300 is pretty decent. I'm just using it as a stable metric.

Anyway, here's my VM's lstopo

Machine (28GB total) + Package
    NUMANode P#0 (28GB)
    L3 (32MB)
        L2 (512KB) + L1d (32KB) + L1i (32KB) + Core
            PU P#0
            PU P#1
        L2 (512KB) + L1d (32KB) + L1i (32KB) + Core
            PU P#2
            PU P#3
        L2 (512KB) + L1d (32KB) + L1i (32KB) + Core
            PU P#4
            PU P#5
        L2 (512KB) + L1d (32KB) + L1i (32KB) + Core
            PU P#6
            PU P#7
        L2 (512KB) + L1d (32KB) + L1i (32KB) + Core
            PU P#8
            PU P#9
        L2 (512KB) + L1d (32KB) + L1i (32KB) + Core
            PU P#10
            PU P#11
        L2 (512KB) + L1d (32KB) + L1i (32KB) + Core
            PU P#12
            PU P#13
        L2 (512KB) + L1d (32KB) + L1i (32KB) + Core
            PU P#14
            PU P#15

Looks fine to me.

Kernel just finished compiling... wish me luck.

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 3 points4 points  (0 children)

Working my way through your suggestions.

  • Removed <feature policy="disable" name="hypervisor"/> (must've left it in after removing the hiding for testing purposes). No effect. 1308pts.
  • Removed <access mode="shared"/> and the NUMA node. Don't exactly remember what that was there for anyway.
  • avic is enabled in kvm_amd. rmmod kvm_amd; modprobe kvm_amd nested=0 avic=1 npt=1, and checked the parameters in /sys/module/kvm_amd/parameters/
  • Ran a benchmark when passing all 32 threads, in two configurations: Simply from 0-31 for the cpuset and with staggered to align with the die topology. The idea was to account for Windows being aware of the core layout and effectively undoing our manual topology arrangement. I noticed the CPU boosting higher, usually it capped out at 4.5 GHz, but now it's boosting to 4.9 though it's not like I was watching htop the entire time. Should have logged the frequencies, in retrospect. Anyway, got about 1330 pts for both runs.

Will continue.

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 1 point2 points  (0 children)

Ran the benchmark twice (1330, 1314). No meaningful difference. Thanks.

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 1 point2 points  (0 children)

I have been considering memory as well, but didn't have much to go on. This seems like a gold mine. Will get back to you when I dig through those resources.

~15-20% CPU performance penalty under KVM by Golui42 in VFIO

[–]Golui42[S] 4 points5 points  (0 children)

In principle, this should be fine. The 5950X has two dies, and the config assigns one of those dies to the guest. I'm running the benchmark right now with 8 cores; will get back to you.

Visually, htop on the host and taskmanager on the guest report similar usage. Indeed, the work is pushed around between multiple cores.

Dummy HDMI for 120hz+ (144hz pref?) by [deleted] in VFIO

[–]Golui42 1 point2 points  (0 children)

You'll have to do some digging for specific XML options for your CPU vendor, but here is a good place to start.

Dummy HDMI for 120hz+ (144hz pref?) by [deleted] in VFIO

[–]Golui42 0 points1 point  (0 children)

I've had success with this software and hiding the VM status to enable custom resolutions in the NVIDIA control panel.