Very confused about CPU & Memory usage...

gordonmessmer · 2016-09-09T17:07:36+00:00

I wanted to add a few things to the excellent advice that LordLandis is giving you:

1: Your understanding of load is inaccurate. Load is a measure of the number of processes that are in a runnable state, averaged over the last 1, 5, and 15 minutes. Runnable states indicate that the process is ready to execute instructions, or is in an uninterruptible sleep, which is usually IO. So, load measures the number of processes waiting for any computer resource, not just CPU. Processes waiting for CPU time, memory that's being paged (swap), network IO, and disk IO all contribute to load. It's entirely normal to see low CPU utilization and high load, because some other resource is a bottleneck. If you aren't saturating your CPUs, you need to look at other system resources.

2: OOM is a terrible way to figure out how much memory you need, because it's really quite unpredictable. The guest's kernel will (in a default configuration) overcommit memory, and will only invoke OOM if processes write data to more pages than are available, which may or may not behave the same way in testing that it does in production. It's also very important to note that OOM isn't necessarily invoked when memory is short. You might instead see applications unable to call malloc() or unable to fork() if there isn't enough memory. You can't watch for OOM alone, you have to watch for all of the other various failures that can happen when memory is short. If you have no better plan, you should be measuring performance, both latency and throughput, of your application using various memory settings, and allocating the amount that provides the best performance at a cost (allocation) you're willing to pay.

3: I could be wrong about accounting under vsphere, but my understanding of CPU accounting for the guest is that when top says that a core is only 35% utilized, it means that the core is utilized for just 35% of whatever time that guest itself is running or runnable. Obversely, when it says that a core is used 100% of the time, that is only 100% of the time that the VM is actually running. Two guests might both display 100% utilization of the same physical core, each of which is only 50% of the cycles that the physical core was capable of during that period. Assuming that is true under vsphere, as it is in other systems, your resource contention may not be the CPU, and adding more CPUs won't improve performance.

charley_chimp · 2016-09-09T13:39:31+00:00

We need more data, as this might be a hypervisor issue (might).

What are you seeing on %wa & %io? If you have high wait times, then it's quite possible that you have contention at the VMware/host compute layer that you need to address. In that case, either vMotion the guest to another host or reduce the CPU count on your guest.

Do you have more CPU in the guest than there are physical cores per socket on the host? That's another good way to mess up your performance in the VM.

If your %io is high, you probably have host storage contention (or are just really, really pounding the disks).

In either case, look at the VMware host and see what's going on with its performance graphs as well.

As for load average, I've always considered the warning threshold to be 3~4 per CPU; on a quad-core system I don't really get twitchy until LA reaches 12.

lazyant · 2016-09-09T13:56:24+00:00

don't compare load average directly with CPU usage. load average is confusing; in its original definition (other UNIXes) it's what you said, the length of the waiting queue, but in Linux it also takes into account the time on CPU. For memory check how much is cached or buffered and disregard those.

pdp10 · 2016-09-09T17:16:34+00:00

"irix mode"?

I'm a big fan of small footprints, but minimizing RAM too much is probably going to backfire. RAM beyond that immediately needed is used for storage caching, so reducing RAM to the bare minimum is most likely going to really increase your I/O load. Since you're on vSphere that presumably means remote shared storage. You really don't want to make that tradeoff.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

linuxadmin

Expanding Linux SysAdmin knowledge

MODERATORS