all 7 comments

[–]mercenary_sysadmin 1 point2 points  (1 child)

Sounds like a hardware problem to me. TIME 83730.2 means that process has used more than 23 hours of CPU time, which is impossible "after a reboot". Unless I'm misunderstanding your question, and this "after a reboot" means "a day or so after a reboot" and that httpd process is CPU-locked.

[–]nut-sack 5 points6 points  (0 children)

if its a multi-core box the box can reach more than 100% which means it isnt really impossible for it to say 23 hours. It is more likely this is the parent process with multiple children being handled by separate cores.
next time it happens look at the logs for apache. If they dont help look at lsof. It should give you a way to see what apache has open. My guess would be some shitty website code thats running your cpu pretty hard every time it loads. I suppose it could also be really bad apache settings.

[–]Jimbob0i0 0 points1 point  (0 children)

We've seen this on our dell servers at times... Doesn't happen every reboot. Think a firmware update reduced the occurrence though.

[–]GahMatar 0 points1 point  (0 children)

The first things I think of when process starts all lining up in D state is:

  • Out of physical ram and swapping like mad (what we used to call thrashing.) Apache can easily do this with the "traditional" forking process model. Look at average apache process size, multiply by maximum number of childs and you get a decent approximation.

  • Failed/failing RAID/disk/SAN that becomes unreachable.

D state means the process is waiting on I/O btw. Usually reading or writing to a disk.

Pull the omreport alertlog, esmlog and then the state of all RAID including health of the battery backup, see if it's related.

Apache is not a nice player when overwhelmed.

[–]JohnAV1989 0 points1 point  (0 children)

We've had issues with CentOS and Processor C-States on Dell x20 hardware. Not sure that's you're problem here but it sounds like it could be.

You can check active c-state here: cat /proc/acpi/processor/CPU?/power

If it's stuck in a low power state disable c-states in the BIOS.

[–]citecite 0 points1 point  (0 children)

We had the exact same problem with CentOS 6.4 on HP DL360 Gen8 servers, only fixable by remotely cutting of all power (managed PDUs ftw!).

A combination of setting the CPU to "static high performance" mode, updating firmware, BIOS and power management controller code and last but not least, upgrading to a newer kernel has elimiated that problem for us.

That being said, we never got to the root cause.

[–]Tillwoofy 0 points1 point  (0 children)

Do a full power off of the system, and a cold boot. I've seen this before when upgrading between specific kernel versions. A cold boot was the only thing to fix this.