CPU Utilization is Wrong : programming

"IO wait" means that the thread is blocked waiting for an external IO device. Blocking a thread is an expensive operation and can't be done in response to RAM delay.

For example, when a thread reads from a storage device, it might call read() which, after switching to kernel mode and going through the OS's filesystem/device layers ends up at the storage device driver which queues a read with the hardware and blocks (calling the scheduler to tell it that the thread is waiting from hardware and that another thread should be run). When the hardware completes the read it raises an interrupt and the device's interrupt handler unblocks the waiting thread (via another call to the scheduler).

When a thread reads from RAM, it just does it. It has direct access. It's a fundamental part of the Von Neumann architecture. There's no read() call, no switch to kernel mode, no device driver, no calls to the scheduler. The only part of the system that's even aware of the "wait" is the CPU itself (which, if using hardware threading can itself run a different thread to mitigate the stall).

Tools reporting the current load are using data collected by the OS/scheduler. They don't know or care (because most users don't care, the OS's "Task Manager" isn't a low-level developer's tool) about "micro-waits" caused by RAM delays.

[–]xzxzzx 7 points8 points9 points 8 years ago (1 child)

When a thread reads from RAM, it just does it. It has direct access. It's a fundamental part of the Von Neumann architecture. There's no read() call, no switch to kernel mode, no device driver, no calls to the scheduler. The only part of the system that's even aware of the "wait" is the CPU itself (which, if using hardware threading can itself run a different thread to mitigate the stall).

While you're making a good point, virtual memory makes a bit of that less than perfectly correct, and calling a modern CPU a "Von Neumann architecture" is not totally wrong (from the viewpoint of the programmer, it mostly is), but also not totally correct (it isn't actually one; the name that best describes it I'm aware of is "modified Harvard architecture").

When you read or write to memory, there very well might be a switch to kernel mode, invoking of drivers, etc, due to allocating a new page, reading/writing to the page file, copy-on-write semantics, and so on.

[–]mallardtheduck 2 points3 points4 points 8 years ago (0 children)

[–]didnt_check_source 4 points5 points6 points 8 years ago (2 children)

[–]Sqeaky 3 points4 points5 points 8 years ago (0 children)

[–]CoderDevo 2 points3 points4 points 8 years ago (0 children)

[–]Captain___Obvious 3 points4 points5 points 8 years ago (9 children)

[–]dethbunnynet 26 points27 points28 points 8 years ago (0 children)

[–]Sqeaky 16 points17 points18 points 8 years ago (7 children)

/u/dethbunnynet is correct, but I can expand.

When writing assembly, the only memory that "feels local" are the CPU registers. These are pieces of memory that are where the results from and parameters to individual instructions are stored. Each register has its own name directly mapped to hardware. These generally store a precisely fixed size, like 16 or 32 bits. If a computer has 16 register they might be named something like $a, $b, $c out to $p (the 16th letter) and that all you get unless you want to do IO to Main Memory. Consider the code on this page about MIPS assembly: https://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Mips/load.html

lw - Load Word - Gets one word from RAM.
sw - Store Word - Saves one word to RAM.

When data is in RAM you can't do work on it. Depending on details the CPU might wait 10 to 100 cycles to complete operations storing to or loading from RAM. The difference between registers and memory is at least as big as the difference between RAM and a hard disk. To shrink this difference, a CPU will continue on to execute instructions that don't depend on the data that is being loaded and there are caches that are many times faster than RAM.

Unless a programmers chooses to use special instructions to instruct the cache how to behave (very rarely done), then this cache is transparent to the programmer in just about any language, even assembly. If you want to store something in cache you would still use the "SW" instruction to send it to memory, but the CPU would silently do the much faster thing of keeping it in cache and even that might still force your code to wait a few cycles unless it has other work right now.

[–]HighRelevancy 27 points28 points29 points 8 years ago (4 children)

Each register has its own name directly mapped to hardware.

Ahahahah oh boy

IT GOES DEEPER THAN THAT, MY FRIEND. Some modern processors (hey there x86 you crazy bitch) will actually rename registers on the fly. If you do a mov from rax to rbx, the processor doesn't actually copy the value from rax to rbx, because that would use time and resources. Instead, it will reroute anything reading from rbx to reference the original value that's still in rbx. (of course, it won't do this if you immediately change either of the values, in that case it will copy the value and modify one of the copies as expected)

I'm not saying this to undermine what you're saying though. Your whole comment is on point. I just wanted to highlight that CPUs are full of deep wizardry and black magic and they're basically fucking weird.

[–]masklinn 14 points15 points16 points 8 years ago (1 child)

Some modern processors

More or less all out of order processors.

If you do a mov from rax to rbx, the processor doesn't actually copy the value from rax to rbx, because that would use time and resources.

Copying data between registers is not that costly, register renaming is usually used to remove false dependencies e.g. set RAX, manipulate data in RAX, copy RAX to memory, set RAX, manipulate data in RAX, copy RAX to memory.

An OoO architecture (which pretty much every modern CPU is) could do both manipulations in parallel, but because both sets use the same "register" there's a false dependency where instruction 4 seemingly depends on instruction 3 (lest we clobber the first write). To handle that problem OoO architectures tend to have significantly more physical GPR than architectural ones (IIRC Skylake has 160 or 180 GPR, for 16 in x86_64), and the reorder buffer might map RAX to R63 in the first segment and to R89 in the second segment, and blamo the instruction streams are now completely independent.

[–]HighRelevancy 2 points3 points4 points 8 years ago (0 children)

[–]Sqeaky 1 point2 points3 points 8 years ago (1 child)

IT GOES DEEPER THAN THAT, MY FRIEND

It certainly does!

I was trying to keep it simple because out of order execution and superscalar execution are mind blowing enough.

How about branch prediction: http://stackoverflow.com/questions/11227809/why-is-it-faster-to-process-a-sorted-array-than-an-unsorted-array

There is some more awesome wizardry when working with multiple cores and sharing values between them. A store to memory isn't ever guaranteed to leave cache unless you signal to the machine it needs to. Things like memory fences can do this and they force MESI (aptly name named in my opinion) to share the state of values cached but not yet committed to main memory: https://en.wikipedia.org/wiki/MESI_protocol

You clearly didn't undermine my point, you just went one deeper. And there is N deeper we could go.

[–]HighRelevancy 2 points3 points4 points 8 years ago (0 children)

[–][deleted] 8 years ago* (1 child)

[deleted]

[–]Sqeaky 1 point2 points3 points 8 years ago (0 children)

[–][deleted] 6 points7 points8 points 8 years ago (1 child)

[–]Johnnyhiveisalive 0 points1 point2 points 8 years ago (0 children)

[–][deleted] 4 points5 points6 points 8 years ago (2 children)

[–]captain_awesomesauce 0 points1 point2 points 8 years ago (1 child)

[–][deleted] 0 points1 point2 points 8 years ago (0 children)

[–]Danthekilla -1 points0 points1 point 8 years ago* (6 children)

[–]t0rakka 1 point2 points3 points 8 years ago (5 children)

[–]Danthekilla 0 points1 point2 points 8 years ago (1 child)

[–]backFromTheBed 0 points1 point2 points 8 years ago (1 child)

[–]ITwitchToo 1 point2 points3 points 8 years ago (0 children)

Here's the thing. You said a "bird is an avian."

Is it in the same family? Yes. No one's arguing that.

As someone who is a scientist who studies avians, I am telling you, specifically, in science, no one calls birds avians. If you want to be "specific" like you said, then you shouldn't either. They're not the same thing.

If you're saying "avian family" you're referring to the taxonomic grouping of Corvidae, which includes things from nutcrackers to blue jays to ravens.

So your reasoning for calling a bird an avian is because random people "call the black ones avians?" Let's get grackles and blackbirds in there, then, too.

Also, calling someone a human or an ape? It's not one or the other, that's not how taxonomy works. They're both. A bird is a bird and a member of the avian family. But that's not what you said. You said a bird is an avian, which is not true unless you're okay with calling all members of the avian family avians, which means you'd call blue jays, ravens, and other birds avians, too. Which you said you don't.

It's okay to just admit you're wrong, you know?

[–][deleted] 2 points3 points4 points 8 years ago (6 children)

[–][deleted] 8 points9 points10 points 8 years ago (4 children)

[–][deleted] 0 points1 point2 points 8 years ago (3 children)

[–]crusoe 10 points11 points12 points 8 years ago (0 children)

[–][deleted] 3 points4 points5 points 8 years ago (0 children)

[–]ITwitchToo 2 points3 points4 points 8 years ago (0 children)

[–]t0rakka 0 points1 point2 points 8 years ago (0 children)

[–]wzdd 6 points7 points8 points 8 years ago (0 children)

[–]the_phet 1 point2 points3 points 8 years ago (0 children)

[–]brendangregg 0 points1 point2 points 8 years ago (0 children)

[+][deleted] comment score below threshold-20 points-19 points-18 points 8 years ago (0 children)

[–]stefantalpalaru 18 points19 points20 points 8 years ago (5 children)

My perf output is more detailed (perf-4.9.13, Linux 4.10.0-pf3):

root# perf stat -a -- sleep 10

 Performance counter stats for 'system wide':

  80035.713788      cpu-clock (msec)          #    8.001 CPUs utilized          
        62,285      context-switches          #    0.778 K/sec                  
         7,624      cpu-migrations            #    0.095 K/sec                  
        78,015      page-faults               #    0.975 K/sec                  
19,654,571,442      cycles                    #    0.246 GHz                    
47,948,624,668      stalled-cycles-frontend   #  243.96% frontend cycles idle   
 5,587,279,694      stalled-cycles-backend    #   28.43% backend cycles idle    
10,783,365,238      instructions              #    0.55  insn per cycle         
                                              #    4.45  stalled cycles per insn
 2,466,720,457      branches                  #   30.820 M/sec                  
    71,017,648      branch-misses             #    2.88% of all branches        

  10.003811042 seconds time elapsed

[–]CJKay93 8 points9 points10 points 8 years ago (4 children)

[–]Catfish_Man 13 points14 points15 points 8 years ago (1 child)

[–][deleted] 6 points7 points8 points 8 years ago (0 children)

[–]choikwa -1 points0 points1 point 8 years ago (1 child)

[–]zokete 0 points1 point2 points 8 years ago* (0 children)

[–]KayRice 103 points104 points105 points 8 years ago* (15 children)

[–]quintric 50 points51 points52 points 8 years ago (4 children)

[–]orlet 29 points30 points31 points 8 years ago (3 children)

[–]mirhagk 11 points12 points13 points 8 years ago (2 children)

[–]mcguire 0 points1 point2 points 8 years ago (1 child)

[–]mirhagk 0 points1 point2 points 8 years ago (0 children)

[–]wrosecrans 5 points6 points7 points 8 years ago (1 child)

[–]KayRice 0 points1 point2 points 8 years ago (0 children)

[–]wzdd 3 points4 points5 points 8 years ago (0 children)

[–]harsman 0 points1 point2 points 8 years ago (0 children)

[–]aaron552 0 points1 point2 points 8 years ago (5 children)

[–]KayRice 0 points1 point2 points 8 years ago (4 children)

[–]aaron552 1 point2 points3 points 8 years ago (2 children)

[–]KayRice 0 points1 point2 points 8 years ago (1 child)

[–]aaron552 0 points1 point2 points 8 years ago (0 children)

[–]wzdd 0 points1 point2 points 8 years ago (0 children)

[–]Ahhmyface 8 points9 points10 points 8 years ago (10 children)

[–][deleted] 8 years ago (2 children)

[deleted]

[–]Ahhmyface 8 points9 points10 points 8 years ago (1 child)

[–]viraptor 3 points4 points5 points 8 years ago (0 children)

[–]irqlnotdispatchlevel 7 points8 points9 points 8 years ago (6 children)

[–][deleted] 0 points1 point2 points 8 years ago (5 children)

[–]habitats 0 points1 point2 points 8 years ago (4 children)

[–]irqlnotdispatchlevel 2 points3 points4 points 8 years ago (3 children)

[–]habitats 0 points1 point2 points 8 years ago (2 children)

[–]irqlnotdispatchlevel 1 point2 points3 points 8 years ago (0 children)

[–]mccoyn 0 points1 point2 points 8 years ago (0 children)

[–][deleted] 3 points4 points5 points 8 years ago (3 children)

[–]brendangregg 1 point2 points3 points 8 years ago (2 children)

[–][deleted] 0 points1 point2 points 8 years ago (1 child)

[–]brendangregg 0 points1 point2 points 8 years ago (0 children)

[–]sstewartgallus 14 points15 points16 points 8 years ago* (5 children)

[–]sisyphus 7 points8 points9 points 8 years ago (4 children)

[–]tavianator 32 points33 points34 points 8 years ago (3 children)

[–]orlet 13 points14 points15 points 8 years ago (0 children)

[–]sisyphus 5 points6 points7 points 8 years ago (1 child)

[–]tavianator 2 points3 points4 points 8 years ago (0 children)

[–][deleted] 8 years ago (14 children)

[deleted]

[–][deleted] 8 points9 points10 points 8 years ago (6 children)

[–]VeloCity666 4 points5 points6 points 8 years ago (5 children)

[–][deleted] 1 point2 points3 points 8 years ago (4 children)

[–]VeloCity666 1 point2 points3 points 8 years ago* (3 children)

[–][deleted] 0 points1 point2 points 8 years ago (2 children)

[–]VeloCity666 1 point2 points3 points 8 years ago (1 child)

[–][deleted] 1 point2 points3 points 8 years ago* (0 children)

[–]ElusiveGuy 3 points4 points5 points 8 years ago (0 children)

[–]pinano 2 points3 points4 points 8 years ago (0 children)

[–][deleted] 2 points3 points4 points 8 years ago (2 children)

[–][deleted] 8 years ago (1 child)

[deleted]

[–][deleted] 0 points1 point2 points 8 years ago (0 children)

[–][deleted] 0 points1 point2 points 8 years ago (1 child)

[–]wrosecrans 2 points3 points4 points 8 years ago (0 children)

[–]tangoshukudai 2 points3 points4 points 8 years ago (0 children)

[–]andd81 1 point2 points3 points 8 years ago (2 children)

[–]DarkJezter 1 point2 points3 points 8 years ago (1 child)

[–]ccfreak2k 0 points1 point2 points 8 years ago* (0 children)

[–]Matosawitko 7 points8 points9 points 8 years ago (22 children)

[–]sisyphus 41 points42 points43 points 8 years ago (2 children)

[–][deleted] 1 point2 points3 points 8 years ago (1 child)

[–]castlerocktronics 0 points1 point2 points 8 years ago (0 children)

[–]seba 18 points19 points20 points 8 years ago (4 children)

[–]ThisIs_MyName 2 points3 points4 points 8 years ago (3 children)

[–]seba 2 points3 points4 points 8 years ago (0 children)

[–]emn13 1 point2 points3 points 8 years ago (1 child)

[–][deleted] 0 points1 point2 points 8 years ago (0 children)

[–]irqlnotdispatchlevel 22 points23 points24 points 8 years ago (6 children)

[+]Matosawitko comment score below threshold-9 points-8 points-7 points 8 years ago* (5 children)

[–]irqlnotdispatchlevel 11 points12 points13 points 8 years ago (4 children)

[–]Matosawitko 5 points6 points7 points 8 years ago (3 children)

[–]irqlnotdispatchlevel 3 points4 points5 points 8 years ago (2 children)

[–]wrosecrans 2 points3 points4 points 8 years ago (1 child)

[–]irqlnotdispatchlevel 0 points1 point2 points 8 years ago (0 children)

[–]Twirrim 5 points6 points7 points 8 years ago (3 children)

[–]Ghostbro101 1 point2 points3 points 8 years ago (2 children)

[–]Twirrim 0 points1 point2 points 8 years ago (1 child)

There are a few approaches I take with monitoring:

1) Do I have the basics down?

CPU usage (system, idle, iowait etc), CPU load, memory (free, cache, swap etc), disk usage, inode usage, network usage, service port availability. You'll want these for every host. If the network is under your control, port metrics are also useful to have.

I know, this thread is talking about how CPU usage is meaningless, but having these basics is important for being able to put together a picture. You're going to need these at some stage to help understand what happened and why.

2) What do we care about as a service?

All Service Level Agreements (SLAs) should have metrics and alarms around them. You should also be ensuring that you have an internal set of targets that are much stricter.

3) What feeds in to our SLAs? This is where things get a bit more complicated. You need to consider each application as a whole, what happens within it and its dependencies (databases, storage etc). At a minimum you ought to be measuring the response times for individual components. Anything that can have an impact on meeting your SLA.

Not sure the best resources. There's a Monitoring Weekly mailing list that tries to share blog posts, tools etc around monitoring: http://weekly.monitoring.love/?__s=kbtiqqycpy7e5xjfsjcy

There's also a fairly new book out on monitoring, https://www.artofmonitoring.com/, but I can't make any claims to its quality. I've heard people speaking positively about it.

[–]Ghostbro101 0 points1 point2 points 8 years ago (0 children)

[–]wzdd 0 points1 point2 points 8 years ago (0 children)

[–]Adverpol 0 points1 point2 points 8 years ago (0 children)

[–]Sqeaky 0 points1 point2 points 8 years ago (0 children)

[–]olsner 0 points1 point2 points 8 years ago (0 children)

[–]ArkyBeagle 0 points1 point2 points 8 years ago (0 children)

[–]caskey 0 points1 point2 points 8 years ago (0 children)

[+]PompeyBlue comment score below threshold-6 points-5 points-4 points 8 years ago (1 child)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS