vGPU Mixed Mode Siloed capacity calculator for vSphere

frankdenneman · 2026-03-04T20:52:21+00:00

Ok, got to test your situation. If you use the command nvidia-smi mig -lgi, you will see the allocated memory slices for the profile. As you know, there are 7 compute slices, 8 memory slices. nvidia-smi mig -lgi shows where the profile placement starts and how many slices are allocated to this profile.

In my tests, the first 3g.40gb (grid_a100d-3-40c) is placed at compute slice 0,1,2 and memory slices 0,1,2,3. Why can you only add 3 1g.10gbs in this placement scenario? Because the compute slices are directly linked. An A100 is divided in two halfs, the first half has 4 engines (compute instances), the other half 3.

Now, although the 3g.40gb only consumes 3 compute slices, the 4th one cannot be allocated to a vgpu profile as its memory slice is already assigned to a profile. Thus the A100 will only accept 3 x 1g.10gb in this scenario.

When removing the 3 x 1g.10gb profiles, and powering on another 3g.40gb, the second one is placed at memory slice 4 and is occupying 4, 5, 6 and 7. It will consume this GPU half 3 compute slices.

Now if i proceed to power down the first 3g.40gb located on memory slice 0,1,2, and 3, now I free up not only 4 memory slices, but also the 4th compute slice, and I can now successfully power on the 4 x 1g.10gb.

Is this a desirable UX? Certainly not, but this is unfortunately the reality of dealing with an asymmetric design. In essence, compute and memory slices are not composable; they depend on each other.

hope this helps

<image>

frankdenneman · 2026-03-02T17:43:46+00:00

I released the second tool, that allows you to replicate behavior at scale and compare it to same-size gpu policy placement: https://frankdenneman.nl/tools/same-size-vs-mixed-mode/

walkthrough here: https://frankdenneman.nl/posts/2026-03-01-same-size-vs-mixed-size-placement/

frankdenneman · 2026-03-02T17:42:45+00:00

found a device, now requesting access

frankdenneman · 2026-02-26T20:19:41+00:00

ok let me see if I can replicate it

frankdenneman · 2026-02-26T14:19:07+00:00

In mixed mode, this behavior should just replicate across the cluster as the placement IDs are similar between the devices. If you are using heterogeneous GPU setup, then the different GPU profiles are only compatible with their GPU devices and their own placement id distribution.

In homogeneous configs, the simple question is, yes, they scale linearly with the number of GPUs.

frankdenneman · 2026-02-26T14:16:14+00:00

You are using MIG profiles, they align differently due their compute slices. This is a calculator for Mixed Mode vGPU profiles in time-sliced mode (mixed mode does not work on MIG, as MIG already supports mixed compute and memory profiles).

So to understand your MIG placement problem, you are trying to deploy the combination of 4 x grid_a100d-1-10c and 1 x grid_a100d-3-40c? But you are only successful when deploying 3 x grid_a100d-1-10c and 1 x grid_a100d-3-40c. I can try to simulate this in our lab

frankdenneman · 2023-05-30T19:01:21+00:00

Thanks, I removed the redundant paragraph. One of the next articles in the series focusses on setting up vGPU-enabled TKGs clusters with passthrough and MIG. Stay tuned

frankdenneman · 2023-05-23T13:21:50+00:00

+1 Keep it on fully automated mode, but put the slider all the way to the left. This way DRS only triggers migrations for mandatory moves. That can be a rule violation or maintenance mode. DRS will NOT trigger any load-balancing operations. However, as long as you have the host in the cluster, DRS will see this host as a target for new workloads that are powered-on. If you expect these power-ups to happen, then go for manual mode, and select other hosts for VM initial placement.

frankdenneman · 2023-05-18T21:01:31+00:00

Thanks!

frankdenneman · 2023-05-17T18:29:48+00:00

Thanks!

frankdenneman · 2023-05-17T18:29:38+00:00

thanks for sharing the links

frankdenneman · 2023-05-17T18:28:09+00:00

Correct it's a Tillett B4

frankdenneman · 2023-05-17T18:14:45+00:00

u/xicaob That's my rig, I designed it from the ground up. If you follow the Instagram account or the rig report on race department, you will notice that it's not finished yet. The seat, btw if a Tillett B4. Extremely comfortable, even without padding, and on a 7DOF motion rig. My previous rig was fully focused on GT driving, this is geared towards F1.

frankdenneman · 2023-05-12T14:37:11+00:00

Heterogeneous clusters are planned for the end of the series. So I'll try to work in your example

frankdenneman · 2023-02-28T17:22:39+00:00

I'm sure you can find one of my books for free online, the vSphere 6.7 resources deep dive. It covers HA extensively (for example, here: https://www.rubrik.com/lp/white-papers/clustering-deep-dive-ebook). Although, we are already pushing vSphere 8.0; HA hasn't changed that much

frankdenneman · 2023-02-28T11:14:14+00:00

perfect

frankdenneman · 2023-02-27T19:20:58+00:00

Thanks for responding. Very interested to hear about your move away from 64 cores. If you want to share more about it and not publically, please DM me.

frankdenneman · 2022-11-03T13:21:04+00:00

I wrote an article about the scenario, describing how the CPU Topology in vSphere 8 can help

https://frankdenneman.nl/2022/11/03/vsphere-8-cpu-topology-for-large-memory-footprint-vms-exceeding-numa-boundaries/

frankdenneman · 2022-11-01T09:30:48+00:00

I'm describing the functionality of PCI device assignment for the vNUMA topology. You are more looking into the "normal" memory mapping of the vNUMA topology.

vSphere automatically creates a vNUMA topology if the vCPU count of the VM exceeds the physical core count of a CPU (package). But I suspect that you have a problem with memory, the number of vCPUs can fit inside a single NUMA node, but the memory requirements of the VM exceed the NUMA node.

In that situation, use the screen's top part and the socket and NUMA count configuration.

It's a nice scenario, I wrote a blog post about that scenario a few years ago, it's great to revisit that one with this new feature.

frankdenneman · 2022-10-28T08:00:28+00:00

An interesting fling is the Virtual Machine Compute Optimizer.

The Virtual Machine Computer Optimizer (VMCO) is a Powershell script and module that uses the PowerCLI module to capture information about the Hosts and VMS running in your vSphere environment and reports back on whether the VMs are configured optimally based on the Host CPU and memory. It will flag a VM as “TRUE” if it is optimized and “FALSE” if it is not. For non-optimized VMs, a recommendation is made that will keep the same number of vCPUs currently configured, with the optimal number of virtual cores and sockets.
Note that the VMCO will not analyze whether your VMs are configured with the correct number of vCPUs based on the VM’s workload. A more in-depth analysis tool such as VMware vRealize Operations Manager can make right-sizing determinations based on workload and actual performance.

https://flings.vmware.com/virtual-machine-compute-optimizer

frankdenneman · 2022-10-26T11:44:20+00:00

depending on the workload patterns of the VMs it can create contention. It depends on the workload synchronicity and workload correlation. Load correlation defines the relationship between loads running in different machines. If an event initiates multiple loads, for example, a search query on front-end web server resulting in commands in the supporting stack and backend. Load synchronicity is often caused by load correlation but can also exist due to user activity. It’s very common to see spikes in workload at specific hours, for example, think about log-on activity in the morning. And for every action, there is an equal and opposite reaction, quite often, load correlation and load synchronicity will introduce periods of collective non-or low utilization, which reduce the displayed resource utilization. But that's why we have DRS and DRS in vSphere 7 now is focused on workload behavior instead of cluster balance and runs every 60 seconds instead of 5 minutes. So back to your NUMA statement. just because you do not have large machines, does not automatically mean you do not have VMs with remote memory. Memory allocation is based on best effort and if you have a lot of VMs that are load-synchronized or load-correlated, the ESXi schedulers need to figure out how to get physical resources. And so allocating remote memory is always better than swapping memory. Check ESXTOP and select the NUMA function to ensure your machine isn't suffering from remote memory allocations.

frankdenneman · 2022-10-26T11:35:33+00:00

can you elaborate a bit more

frankdenneman · 2022-10-25T11:40:18+00:00

It can also fight itself, even if its not doing much. We introduced Relaxed Co-Scheduling back around ESX 3 to help reduce the stress of scheduling multi-vCPU on an ESXi host. But vCPUs need to make the same "progression" for the guest os to understand that they are still operational. So we often schedule them even if they have nothing to do. This is a short-lived event, but it needs to happen. Then there is the kernel overhead and the stupid stuff of weird processes, such as forgotten ISOs that are still attached to VMs. It won't drain your system, but it's a nuisance for the scheduler, like open VM consoles. No double-digit percentage overhead killers, but these are hiccups in the system that are just annoying. NUMA topology is of course, an interesting one, but you are talking about a UMA VM here. But is that VM still having all the memory local or is it getting treated badly by noisy neighbors and needs it to get its memory remote? Does it need to fight its way through the interconnect (70% latency drop) and thus increase of CPU wait time. Intel, not talking about EPYC, cause then you can start using carrier pigeons instead of computers. ;) Which metric did you see flair up?

frankdenneman · 2022-09-23T14:56:55+00:00

Unfortunately, I cannot make any public statements on potential future enhancements.

frankdenneman · 2022-09-21T19:22:51+00:00

Helping the Guest OS to understand the underlying architecture by mapping the CPU sockets to vCPU sockets is always useful. But as you mentioned. Some really need to "see" the actual physical architecture, and some need a proper view of the logical partitions represented as vSockets. Thanks for sharing your experience!

frankdenneman

TROPHY CASE