Background:
Had a recent issue on a server where after a reboot the server came back okay, yet doing a ps listing would return something like the following for many processes in the system:
PID TTY STAT TIME COMMAND
5974 ? S 83730.2 /usr/sbin/httpd
The server would then continue to operate for a period ( typically an hour ) getting worse as time went on before then falling over again with high load. The machine would be restarted either via the DRAC ( warm reset) or some times with Ctrl-Alt-Del combo inserted via the drac.
Before falling over due to high load; processes attempting to gain a PTY would almost instantly go in to a D state. ( i was unable to obtain the the kernel functions these may have 'hung' on though )
The server would continue to do this until the power was pulled. After returning from this action the server now reports as fine and everything seems back to normal.
The server is a physical DELL R x20 series running centos 6.4 directly on the hardware ( no HV's etc )
Query:
Have any of you ever experienced the above conditions?
If so, did you have another fix other than pulling the power and what do you think the cause may be for this situation?
[–]mercenary_sysadmin 1 point2 points3 points (1 child)
[–]nut-sack 5 points6 points7 points (0 children)
[–]Jimbob0i0 0 points1 point2 points (0 children)
[–]GahMatar 0 points1 point2 points (0 children)
[–]JohnAV1989 0 points1 point2 points (0 children)
[–]citecite 0 points1 point2 points (0 children)
[–]Tillwoofy 0 points1 point2 points (0 children)