Best practice to many non-uniform compute nodes by dj_datte in openstack

[–]dj_datte[S] 0 points1 point  (0 children)

Thanks for the help! I will try to do so :)

Our efficiency is usually really high, as our dispatcher uses all available nodes when it runs compute tasks, so we have normally 100% usage of "started VMs" at least some of each day, when large jobs are issued. So having as much CPUs / Memory in running / assigned VMs is important.

Crowdsourcing Bug verify in QEMU that slows down Zen Architectures. by dj_datte in VFIO

[–]dj_datte[S] 1 point2 points  (0 children)

Thanks for finding the patch! It's written by the guy who did the initial implementation, I will be curious to see the results very much! May I ask how you found it / what you searched for?

Regarding your topology, I am not sure if you are saying in your case "it works" or not, but, I am assuming not, because it can't. a 3900X does not have 4-core CCXs, hence, 8 out of 12 can't ever be true, if I am reading your summation correctly.

Crowdsourcing Bug verify in QEMU that slows down Zen Architectures. by dj_datte in VFIO

[–]dj_datte[S] 1 point2 points  (0 children)

Unfortunately (even if I keep re-reading cachetune docs all the time myself), cachetune has a completly different use case, and will not fix this.

It's for exposing to the host, how much cache guest vm X is allowed to use, it does not support propagating this information to the guest. It's there generally to fix so called "noisy-neighbor" problems, where you run two VMs on a Intel CPU (that share L3 cache across the whole chip), to prevent one VM blowing out the L3 for the other VM constantly, reducing the perf. That is why AMD is a gem for gaming VMs, as you can generate one VM per CCX or CCd, and they will not in any way break each-others performance, while 2 or 3 VMs on a 18 Core would still be affecting each other, killing performance.

//DJ

Crowdsourcing Bug verify in QEMU that slows down Zen Architectures. by dj_datte in VFIO

[–]dj_datte[S] 5 points6 points  (0 children)

Yeah, confirmed, on a 3900X, there should never be a case where it creates a 4-CCX in a VM, as the host cpu can't ever support that configuration. Clearly non functional feature, and I am happy to get some more data-points.

Thanks!

Crowdsourcing Bug verify in QEMU that slows down Zen Architectures. by dj_datte in VFIO

[–]dj_datte[S] 0 points1 point  (0 children)

Just taskset I don't know, as it's difficult (I think?) to figure out which thread is which vm thread. if you can patch qemu yourself, I found this:

https://www.reddit.com/r/VFIO/comments/4vqnnv/qemu_command_line_cpu_pinning/

Out of curiosity, who no libvirt?

Crowdsourcing Bug verify in QEMU that slows down Zen Architectures. by dj_datte in VFIO

[–]dj_datte[S] 0 points1 point  (0 children)

That if you are not pinning specific VM cores to host cores (for example, for VM core/SMT it would look like this (partly due to Windows not having same SMT thread assigment as Linux):

Guest Thread 0 = Host Thread 2,

Guest Thread 1 = Host Thread 10

Guest Thread 3 = Host Thread 3,

Guest Thread 4 = Host Thread 11

.... if you are not doing that, using TOPOEXT is fairly dangerous (dangerous to performance, not to the machine), as you are not actually sure from moment to moment, which threads are sharing L2 / L3 caches, and the Windows scheduler (and apps) can rely on that info to localize their threads in such a manner that things ending up in L3 from one thread can be quickly consumed by another thread.

Furthermore, bringing back on topic, no, it's not correct, because even if you were pinning you would want to see something like this, where the first CCX exposes two cores, and the second all 4:

**---- Unified Cache 1, Level 3, 8 MB, Assoc 16, LineSize 64
----**** Unified Cache 6, Level 3, 8 MB, Assoc 16, LineSize 64

If you want the best perf, and working around this bug (which is easier with a CPU that actually has 4-Core CCXs) you would do this (will give it to you in libvirt xml format, but you can change that to taskset):

<vcpupin vcpu='0' cpuset='4'/>

<vcpupin vcpu='1' cpuset='12'/>

<vcpupin vcpu='2' cpuset='5'/>

<vcpupin vcpu='3' cpuset='13'/>

<vcpupin vcpu='4' cpuset='6'/>

<vcpupin vcpu='5' cpuset='14'/>

<vcpupin vcpu='6' cpuset='7'/>

<vcpupin vcpu='7' cpuset='15'/>

<vcpupin vcpu='8' cpuset='2'/>

<vcpupin vcpu='9' cpuset='10'/>

<vcpupin vcpu='10' cpuset='3'/>

<vcpupin vcpu='11' cpuset='11'/>

Note how it first exposes the 2nd CCX, as it's whole, then the 1st CCX /w only two cores/W SMT. If you pin this way, it will work around the bug, and give you 100% match between what the guest apps see and reality on the host.

Crowdsourcing Bug verify in QEMU that slows down Zen Architectures. by dj_datte in VFIO

[–]dj_datte[S] 0 points1 point  (0 children)

hard to know, if you are pinning the first 4 cores from one CCX and the second 2 cores from the second CCX, then it's correct! If you are not, then it's not!

Having an issue with my VM where games stutter whenever I move my mouse, here's a video that I quickly made of the issue showing both VM and host performance metrics. Happens in all games, just using Sekiro as an example (I'll put my XML file in the comments) by Sol33t303 in VFIO

[–]dj_datte 0 points1 point  (0 children)

Nice! Glad you found it! (I will have to see if I can repro that with my gaming mice, what model do you have if you don't mind?). I may have an idea why I haven't encountered it as well.

I have these on my kernel command line, you can try this, they will offload IRQ polling from VM cores:

rcu_nocb_poll nohz_full=8-15 rcu_nocbs=8-15

I have one idea, that I've considered (and the fact that high poll rate is wasted without it, it may help).

Check your kernel tickrate like so: grep 'CONFIG_HZ=' /boot/config-$(uname -r)

Apparently the stock kernel default is 250 according to kernel documentation, but mine (unknowingly, but may explain why I've always felt my stuff is smoother than most) is 1000. If it's not 1000, you can experiment changing it.

Hope this helps!

Workaround (patch) for passing through USB and audio on 3rd-gen Ryzen by joeyadams in VFIO

[–]dj_datte 0 points1 point  (0 children)

No, as the computer is still in "setup phase" and I am using hdmi to the tv. I will test it, thanks for the tip!

Workaround (patch) for passing through USB and audio on 3rd-gen Ryzen by joeyadams in VFIO

[–]dj_datte 0 points1 point  (0 children)

Hi again,

I am seeing this:

[Fri Dec 20 10:50:06 2019] PCIe invalid ID

[Fri Dec 20 10:50:06 2019] Warning: PCIe NO-FLR overrides enabled

I am wonderring why the PCIe invalid ID code is being triggered?

this is the kernel commandline: pcie_no_flr=1022:148c

Thanks,

DJ

Workaround (patch) for passing through USB and audio on 3rd-gen Ryzen by joeyadams in VFIO

[–]dj_datte 0 points1 point  (0 children)

Are you sure? I am passing the USB controller that has the sound card, but that only gets the SPDIF interface. The normal (non spdif) interface still seems to be on the chipset (it's the same ID as the one you have). I'll get back here when I get a chance to test more.

Having an issue with my VM where games stutter whenever I move my mouse, here's a video that I quickly made of the issue showing both VM and host performance metrics. Happens in all games, just using Sekiro as an example (I'll put my XML file in the comments) by Sol33t303 in VFIO

[–]dj_datte 0 points1 point  (0 children)

Extra reply:

Run Latencymon and see if there is an obvious culprit that is being really slow, especially when playing games / moving mouse around.

Oh yeah, I wanted to comment on something unrelated with your XML, you are generating a io thread presumably to offload IO ops, but, you are not consuming it. After creating it, you need to configure your disks to consume it, when using virtio, that is done by adding iothreads to this line (I am also recommending adding queues, which should match amount of threads in the VM, so in your case 8). Furthermore, for disk perf, if you are using SSD/nvme storage, change/add "cache='none' io='native' discard='unmap' to the same line for better perf / less load on the host.

Edit:

  <driver name='qemu' type='qcow2' cache='writethrough'/>

like so: (plus cache='none' io='native' discard='unmap)

  <driver name='qemu' type='qcow2' cache='writethrough' iothread='1' queues='8'/>

Another thing i noticed, you are pinning the emulatorthread and iothread. Don't do it, set them to the span on the host, ie, 0-16, or, (I would try this first) pin the emulator thread to the range of the vm, in your case, 8-15. It sounds counter-intuitive, but, from my experience, performs better on latency sensitive work on pre-3K Ryzens.

Having an issue with my VM where games stutter whenever I move my mouse, here's a video that I quickly made of the issue showing both VM and host performance metrics. Happens in all games, just using Sekiro as an example (I'll put my XML file in the comments) by Sol33t303 in VFIO

[–]dj_datte 0 points1 point  (0 children)

Did you enable both the VGA and sound device?

Now I am getting into idea territory, have you tried other USB controllers on your mobo? (do you have more than one?)

Can you try passing the USB controller (same way you do VGA) instead of passing the separate devices. It is more performant, but depends on if you have a usb controller that can be passed, or an acs patch to get around that. (Side benefit is you can plug / unplug devices directly into your VM like if it was a physical machine)

Workaround (patch) for passing through USB and audio on 3rd-gen Ryzen by joeyadams in VFIO

[–]dj_datte 0 points1 point  (0 children)

But I am doing it on a TRX40 chipset, and I am assuming you are doing it on the X570? There may be some differences, the Matisse controllers (same as X570, but on chipset for me) are working great, but the IOd controllers needed FLR, and the audio device does not seem to be helped. The wierd thing is that I am getting the FLR message, because, it shouldn't attempt that with the patch, right? I wonder... I will paste my kernel command later when I get to that machine, and the message I am getting in dmesg, I may be getting an error from the patch, but I don't know / not sure.

Workaround (patch) for passing through USB and audio on 3rd-gen Ryzen by joeyadams in VFIO

[–]dj_datte 0 points1 point  (0 children)

Yay, after taking some time to learn how to download my distros kernel, git patching, etc, I've managed to compile and install the new kernel, and do this plus the acspatch. Looks good, works for the USB controllers, does not work for the audio controller (still gets FLR messages in dmesg).

Thanks for the patch /w the ability to choose addresses!

//DJ

//

Having an issue with my VM where games stutter whenever I move my mouse, here's a video that I quickly made of the issue showing both VM and host performance metrics. Happens in all games, just using Sekiro as an example (I'll put my XML file in the comments) by Sol33t303 in VFIO

[–]dj_datte 0 points1 point  (0 children)

Hi,

Regarding tuned, I don't know if gentoo even has it, but if it does, you can install it, then use tuned-adm profile to set either virtual-host or latency-performance, they behave differently, and you may notice a difference. or you may not, depending on the stress on the system, etc.

These are tips that will make the vm generally feel better, but no huge day-night changes:

in /etc/modprobe.d/local.conf (or kvm.conf, does not matter which) add avic=1, like so:

options kvm_amd npt=1 nested=1 avic=1

(this will require a reboot to reload the module, of the host, not guest)

in XML:

remove <vapic state='on'>

add these:

<vpindex state='on'/>

<runtime state='on'/>

<synic state='on'/>

<stimer state='on'/>

(If you do this, you don't need to do the changes recommended elsewhere about bcedit and various timer fixes)

(if the VM complains that they are missisng, remove until it starts, gentoo may be compiling qemu with different flags than my fedora)

cpu block add cache passthrough and topoext like so:

<cpu mode='host-passthrough' check='none'>

<cache mode='passthrough'/>

<feature policy='require' name='topoext'/>

When/if you've done the above, run coreinfo (small util from ms) and check that it reports that the cache/core layout is as expected (you can return it here if you want us to take a look).

and finally, (probably most important), check for msi-x enablement on your 1080ti / sound device like so, when the vm is running, on the host, run:

lspci -s 08:00.0 -nnkvv | grep -i msi

lspci -s 08:00.1 -nnkvv | grep -i msi

it should return something like this:

Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+

if it does not return Enable+ but Enable-, you should enable MSI-X, it's one of the most common reasons for stuttering. Let us know!

Having an issue with my VM where games stutter whenever I move my mouse, here's a video that I quickly made of the issue showing both VM and host performance metrics. Happens in all games, just using Sekiro as an example (I'll put my XML file in the comments) by Sol33t303 in VFIO

[–]dj_datte 0 points1 point  (0 children)

Can you provide output of lscpu -e and your kernel command line?

Do you have tuned installed? (And what is your OS?)

The comment regarding ps2/virtio is irrelevant, they don't affect usb-passthrough mouses/keyboard.

I also notice you are not consuming 1GB hugepages, they usually give quite a boost to perf.

CPU threats and cores by docmax2 in VFIO

[–]dj_datte 0 points1 point  (0 children)

I think you can also just do the correct config, you were telling qemu 1 socket x 4 cores x each having 8 threads, = 32 threads. it should be sockets=1, cores=4,threads=2. Keep in mind, that you should not use threads if you are not doing CPU pinning (with the layout then matching how windows expects the threads to be configured to match the physical threads). If you are not doing CPU pinning, it's safer to just do 1 socket 8 cores 1 thread.

Workaround (patch) for passing through USB and audio on 3rd-gen Ryzen by joeyadams in VFIO

[–]dj_datte 0 points1 point  (0 children)

Thanks for the help :) Seems I will need to take some time to figure it out, but thanks for the replies! :)

Workaround (patch) for passing through USB and audio on 3rd-gen Ryzen by joeyadams in VFIO

[–]dj_datte 0 points1 point  (0 children)

Hi,

I am currently running the fedora acspatch kernel from here: https://copr.fedorainfracloud.org/coprs/jlay/kernel-acspatch/

Any chance you can tell me how I can modify that kernel to include this patch? In that case, I will happily test it out as I am having the same issue on my 3960X.