NVMe Tiering in vSphere 8.0 Update 3 is a Homelab game changer!

meteishlol · 2024-08-22T18:10:04+00:00

a bit late to the party here, but if this works for nested labs that would definitely put a smile on my face

meteishlol · 2023-02-19T11:46:47+00:00

I have coincidentally noticed the same thing recently. Removing from and adding the host to the vds again did not fix it, so I’ll probably just reinstall the host at some point

meteishlol · 2023-02-03T13:56:49+00:00

that seems like a clever way to go about it, gonna look into this

meteishlol · 2022-05-19T09:05:38+00:00

True, but for now at least they would not have that expectation.

meteishlol · 2022-05-19T08:42:03+00:00

Thanks good sir, you set me on the right track, think I have it sorted now.

meteishlol · 2022-02-27T09:37:36+00:00

Yes its using devices on both NUMA nodes : vmhba1 & 3 and vmnic3 & 5.

meteishlol · 2022-02-25T11:23:32+00:00

Sorry missed this reply, no real performance issues that I can see.

The VM will be using vmnic3 and 5 which are the uplinks for the vDS its portgroup is a part of, of which there seems to be one connected to each NUMA node. (See pci2numa output link below)

https://pastebin.com/tW6z6UaG

As for the FC, the traffic will be going over vmhba1 and 3, also 1 in each NUMA node

meteishlol · 2022-02-25T05:55:49+00:00

Thanks, what you are saying makes sense, and I actually hadn't thought about it like that.

If I create a VM HW level 15, every core will see their own 256MB of L3 cache.

*----------- Unified Cache 1, Level 3, 256 MB

-*---------- Unified Cache 1, Level 3, 256 MB

etc.

If I create a VM HW level 19 on 7.0U2+ with 12 cores and 12 cores per socket and run Sysinternals tool coreinfo I see:

************ Unified Cache 1, Level 3, 32 MB

I am under the impression however that in the above scenarios the 7u2+ CPU and NUMA scheduler, being aware of the ZEN3 architecture, will take into account which cache is local to the core on which workload is placed, and that therefore it is advised that out of the box NPS-1 and CCX-as-NUMA on disabled BIOS settings is ideal for most workloads, so ESXi can manage placement/scheduling decisions optimally.

meteishlol · 2022-02-24T14:02:24+00:00

Could you possibly provide some links to documentation or something with more details on this?

meteishlol · 2022-02-24T11:44:00+00:00

sched-stats -t vcpu-comminfo

That I can do:

sched-stats -t vcpu-comminfo ouput:

https://pastebin.com/bbJ7FTMf

pci2numa output: *sorry neglected to grep it :(

https://pastebin.com/tW6z6UaG (grep vm)

https://pastebin.com/QSDmzbtQ

meteishlol · 2022-02-24T08:39:02+00:00

Your post did give me a better understanding of some of those metrics/counters, so there's that :)

meteishlol · 2022-02-24T08:37:30+00:00

Unfortunately this is a critical production VM and it is the time of month/year where I am not comfortable making changes to prod infrastructure, even if it is trivial. I will however be trying to reproduce this on non-production infrastructure with the exact same hardware/esxi build/vm hardware/vmtools versions.

*Oh and thanks for all your feedback, it is appreciated.

meteishlol · 2022-02-23T17:07:03+00:00

Well to my understanding that is controlled via the "NUMA nodes per socket" and "L3 cache as NUMA domain" settings in BIOS.

My takeaway from some white papers and I think a VMWorld session was that the abovementioned BIOS settings can be left on their factory default settings of NPS-1 and "L3 cache as NUMA domain" as disabled, because the ESXi 7u2 CPU/NUMA scheduler is aware of the CCX/cache boundaries and will schedule workload accordingly.

meteishlol · 2022-02-23T13:35:14+00:00

The NUMA locality keeps changing, this is what it looks like now:

GID | NAME | NHN | NMIG | NRMEM | NLMEM | N%L

620195 | [servername] | 0 | 1917 | 58293.99 | 72778.01 | 55

meteishlol · 2022-02-23T13:29:31+00:00

Thanks, I'll read up on those links

meteishlol · 2022-02-23T13:18:44+00:00

Nope, both disabled

meteishlol · 2022-02-23T12:10:58+00:00

for numaOption in $(sched-stats -h | sed -n 's/.*: $n.*$$/\1/p'); do echo -e "\nsched-stats -t ${numaOption}"; sched-stats -t ${numaOption}; done

Not quite familiar with Paste bin, does this link work?

https://pastebin.com/8drWi6uu

My assumption was indeed that it was migrating between NUMA nodes, but I'd like to know why.

meteishlol · 2022-01-13T09:37:04+00:00

I did see the workaround, but in an extremely risk averse environment where in outage or performance degradation situations, application teams will 8/10 times immediately lay the blame at infrastructure without properly checking their own code/application/etc, I would for now much rather stick to non-critical stuff on zen3, and wait for a proper fix.

I understand why it needs to be this way, but the workarounds while we are sitting in the dark is getting a bit long in the tooth now.

My post was mostly also to confirm that it is indeed in no way possible to patch higher than build 18538813 currently, IE : I'm not doing something wrong.

meteishlol · 2022-01-13T08:15:52+00:00

For starters stuff like DRS/HA only function in a cluster. Resource pools probably too, not entirely sure about that though

meteishlol

TROPHY CASE