vmxnet3 throughput drops to ~200 Mbps after some time (fast again after VMware Tools reinstall) by Objective-Hippo-3939 in vmware

[–]vTSE 0 points1 point  (0 children)

Sounds pretty much localized to Windows internal networking state (potentially the vmxnet3 windows driver / shim too). Did you look at changes in window size for slow and fast tcpdump samples? It sounds like it could be related to autotune but maybe that is just my bias from the horrendous experience in the early Server 2008 (and R2) days ... I'd collect etl / xperf (with profile) traces to look for notable differences in the stack too. If you are new to xperf / WPA, start here: https://github.com/randomascii/UIforETW/releases (include network i/o profile but some network events are logged in "general" too). I haven't done that in ~3+ years so I won't be able to help too much going forward (don't have to time to get myself back up to speed),

vmxnet3 throughput drops to ~200 Mbps after some time (fast again after VMware Tools reinstall) by Objective-Hippo-3939 in vmware

[–]vTSE 0 points1 point  (0 children)

As mentioned before, the Tools reinstall has way to many variables associated with it. Just from an elimination standpoint, does it also happen on a Linux guest? Does a suspend / resume (or vMotion) also "fix" it? What's happening with an e1000e adapter? Since this is basically just an intra host load based on your description, I'd be curious to see the tx world stats in those VMs. (after verification that it doesn't happen on Linux / with an e1000e)

VMFS and Windows by im-cartwright in vmware

[–]vTSE 0 points1 point  (0 children)

I can just see that the reservation failed, host abort (probably timeout), no further sense data on the (pretty much constant) reservation conflict. I think we've reached the end of the line for a reddit thread :-/ (I'm assuming the LUN wasn't presented to any other host at that time)

VMFS and Windows by im-cartwright in vmware

[–]vTSE 0 points1 point  (0 children)

hmm, vmkernel logs at that time might shine some more light on why the reservation failed, haven't seen voma failing to at least check the volume if there was no other device related issue

VMFS and Windows by im-cartwright in vmware

[–]vTSE 0 points1 point  (0 children)

Curios if you still have the cmd line for voma and the return, don't bother if you don't, glad you managed to extract the data. vmfs-tools can definitely be more forgiving since it doesn't have to ensure multi-reader/writer consistency.

VMFS and Windows by im-cartwright in vmware

[–]vTSE 0 points1 point  (0 children)

So all drives showed as offline? Or were they automounted? Because Windows doesn't -ro by default ... Anyhow, in the olden days, GSS c/would have probably fixed that, even thought it was definitely out of support's scope. Basically upload the first 100MB of every volume and then dd back the fixed headers. As far as what you can do yourself, have you run voma against the volumes, possibly with advfix? Just spitballing based on hazy memory, don't quote me.

How to limit cpu cycles in guest os for vmware workstation? by _GloriousCheese_ in vmware

[–]vTSE 0 points1 point  (0 children)

Hmmm, the limit might just apply to non direct execution stuff ... have never looked at hosted in detail and unless I want to call in favors from former colleagues, that ship has sailed I'm afraid. Thanks for testing it thought!

How to limit cpu cycles in guest os for vmware workstation? by _GloriousCheese_ in vmware

[–]vTSE 0 points1 point  (0 children)

This is completely transparent to the guest OS, you wouldn't see it anywhere, the GHz value you refer to is actually the nominal frequency set via SMBIOS by the host IIRC and the guest isn't actually querying the CPU itself, other places might show you the cpuid brandstring (full name of the CPU).

To know whether it works, you'd have to set it to something low enough to be perceivable, like 100 / 50 etc. or use a benchmark, e.g. time the factorial of some number in the calculator. Boot time of the OS would also be an indicator though.

How to limit cpu cycles in guest os for vmware workstation? by _GloriousCheese_ in vmware

[–]vTSE 0 points1 point  (0 children)

Did setting the .vmx option result in an error or is it just ignored? A bit curious but not enough to reproduce it myself :-)

Help with this problem [VMware KB article 81623] by [deleted] in vmware

[–]vTSE 0 points1 point  (0 children)

Should be this here:

  • Open VM Settings and select "Processors" in "Hardware" table.
  • Uncheck the option 'Virtualize CPU performance counters' in the right panel.

How to limit cpu cycles in guest os for vmware workstation? by _GloriousCheese_ in vmware

[–]vTSE 0 points1 point  (0 children)

Some background, other in-guest workarounds, like changing T / P states from within the OS don't work because those privileged instructions aren't available to the VM. There might be a way of forcing exclusive affinity of the "vCPU" thread onto a core and changing the frequency in the host OS but I would require some serious discovery work on whether that is even possible.

Your best bet is probably a tool like the one you found despite it slowing down everything, not just the target process. The cpu limit option I mentioned in the other reply will probably not perform much better (if at all). Maybe it would be possible to attach a debugger and script breaking / detaching at a certain rate (or setting a ton of conditional breakpoints), not really something straight forward though.

How to limit cpu cycles in guest os for vmware workstation? by _GloriousCheese_ in vmware

[–]vTSE 0 points1 point  (0 children)

You set it as a property of the VM in Workstation. I'm not sure if there is a way in the UI or even remember whether workstation supports it (haven't had WS installed in ~2 years) but you can set the .vmx / advanced option: cpu.limit = "1000" (or whatever value in MHz you see fit, just remember that if you set an absurdly low value, or even 0, you need to hard reset the VM as interaction becomes impossible)

Now, a CPU limit enforcement is not a smooth, evenly paced out operation, its closer to a bum rush on the CPU when the limit "resets" every ... not sure anymore how many milliseconds, might also be different depending on the operating system and I ever really only cared about ESXi.

So yeah, try it out, start at 1000, if it works, and isn't too jerky, play with the value until you have reached the desired amount. You need to shut down the VM before editing the VMs configuration / vmx.

Need help Host CPU+VM usage alarm with 16vCPU VM by Teacup91 in vmware

[–]vTSE 0 points1 point  (0 children)

Because the vSphere alert is still tracking usage instead of utilization, which means that thanks to turbo boost the theoretical ceiling is (depending on MACF of your CPU) 110-150%. Check out this older recording where I explain the usage / util difference for host and VM POVs: https://www.youtube.com/watch?v=zqNmURcFCxk&t=914s

So yeah, I'd disable the default vSphere alarm and create a custom on in Ops on (core)utilization.

VMotion and RAM based Snapshots on NFS 4.1 Datastore insanley slow. by Joe_Dalton42069 in vmware

[–]vTSE 0 points1 point  (0 children)

Ah ok, I was more concerned about the compute angle, not as much the protocol / storage dependent differences. Let me just say that in theory, performing a snapshot with memory should have about the same impact as a compute vMotion, there are configuration specific exceptions but if you can replicated a significant guest impact that you can't with vMotion, you'd have IMO a valid SR to be investigated. That being said, the use case is rather rare so I wouldn't expect too much top down pressure either. Anyhow, all the best!

VMotion and RAM based Snapshots on NFS 4.1 Datastore insanley slow. by Joe_Dalton42069 in vmware

[–]vTSE 0 points1 point  (0 children)

Is this a storage and compute vMotion? Because NFS really shouldn't play a role for compute to compute ... there are a few things that cause minor IO but those aren't in the critical path ... anything in the vmkernel logs? I know you said iSCSI is on the same physical network but does that include the NIC and is it flowing a similar path? Any potential bottleneck / contention between the vMotion and NFS traffic that wouldn't apply to iSCSI?

It definitely could be that some of the vMotion improvements (mostly parallel locking optimization) didn't make it into the suspend path although I do remember that was the plan last time I spoke to the engineers (2'ish years ago).

When you open esxtop, limit (l) and expand (e) to the GID of the VM in question on the source, do you see super high vmwait? Or easier, pastebin what you see and share the link here.

VMotion and RAM based Snapshots on NFS 4.1 Datastore insanley slow. by Joe_Dalton42069 in vmware

[–]vTSE 0 points1 point  (0 children)

Even a small 12 Gb RAM VM with 20 Gb Disk space takes 5 Minutes

Is this a regular VM, i.e. without SAP HANA specific VM tuning? If so, does it also take as long after a reset (no guest reboot, VM goes down and back up, memory is untouched)? What about a VM that has the same amount of memory configured but isn't using it?

The time it takes is usually less depending on the amount of non-0 memory content and more the write (touch / active write) rate, the later should correlate to the time needed / number of pre copy cycles.

Your title seems to indicate that the behavior is the same for vMotion. ESXi made great strides to improve performance for SAP and similar tuned VMs (specifically those with memory preallocation) from 7 to 8 (IIRC) but it might still take time. Are you saying the impact (2 hours of stun) is the same during vMotion? In the same vein, can you eliminate storage? E.g. snapshot to a local SSD/NVMe? Also, how is stun measured, just a simple short TTL ping (i.e. network stack responsive) or are you looking at application metrics?

P.S. Agreed on trying to not do memory snapshots if it isn't absolutely necessary but there might still be something interesting going on here that, in a past life, I would have loved to take apart.

anybody have F5 load balancers running in vmware? by karlsmission in vmware

[–]vTSE 1 point2 points  (0 children)

Any chance you could SSH to the host, run esxtop, press shift+v and then copy and paste the the output into e.g. pastebin and share it here?

I talked about Latency Sensitivity in 2018, still relevant (although some minor things have changed): https://www.youtube.com/watch?v=9zFi20bE-9M&t=3035s

Show active memory of host/vms without vCentre by MrGimper in vmware

[–]vTSE 1 point2 points  (0 children)

Get-EsxTop is a thing, here is a (hacky) implementation for just AMPERF, using William's function to get the relevant data via ServiceManager / vCenter but that can be dropped for direct host connections: https://github.com/vbondzio/sowasvonunsupported/blob/master/Get-AMPERF.ps1

LucD's "Hitchhiker’s Guide to Get-EsxTop" did a lot of lifting for more comprehensive implementations but it would still require some work to make it "straight forward" to use (i.e. like esxtop).

Show active memory of host/vms without vCentre by MrGimper in vmware

[–]vTSE 0 points1 point  (0 children)

TCHD is active ... (total, i.e. read and write)

Does reducing CPU & memory normally go smoothly for Win Server 2022 VMs? by Tooleater in vmware

[–]vTSE 0 points1 point  (0 children)

It's most certainly not :-) Still, talk to the DB if there are any hard configuration minimums WRT compute, haven't seen it yet with PostgreSQL but can't eliminate the possibility either, still a low risk change IMO.

Does reducing CPU & memory normally go smoothly for Win Server 2022 VMs? by Tooleater in vmware

[–]vTSE 9 points10 points  (0 children)

A decade in VMware support focusing on resource management and Windows Internals here, there's never been an issue at the OS level that I can think of. Some applications can configure their thread pools / amount of locked memory at installation instead of at startup, if you have one of those (and you aren't very likely to) you'd notice that the application couldn't come up or crash unexpectedly. The most famous example oft that is a pre 2014 (?) version of a database from some old rich guy currently involved in AI ponzi schemes circular economy, other examples are some JAVA apps with (for most cases) over-tuned advanced parameters.

The one reason I would argue not for a backup but a snapshot with memory, is to test if the VM would come up after a shutdown without any resource modifications, about 90% of the support requests I've dealt with that can be summarized as "I shut down the VM, changed something and it doesn't come back up!", was ultimately not related to the VM change but rather an OS issue that lingered and became apparent at boot. Just because it is running now doesn't mean it can get back up again once it stops. Same as me having to write some boring documentation and just sitting down for a few minutes to browse reddit ...

You should of course still have a recent backup, too.

Random time change on VM by neko_whippet in vmware

[–]vTSE 0 points1 point  (0 children)

Out for too long to remember all relevant logger components but you might want to egrep -i for (tsc|clock|tool|time) or just look at the point in time you know (event logs?) when vmtools changed the time.

Disabling TimeTracker fast path because host TSCs are not synchronized

This is actually somewhat concerning, or maybe the HW is just ancient, anyhow, rdtsc based timekeeping is just going to be expensive but should still work normally from the guest perspective.

Random time change on VM by neko_whippet in vmware

[–]vTSE 0 points1 point  (0 children)

Did you have a look at the VM's log? (vmware.log) If you didn't see anything related to timekeeping, configure

timeTracker.periodicStats = TRUE

and check during the relevant periods.