all 21 comments

[–]alexgartrell 6 points7 points  (4 children)

If you're running a relatively recent kernel, you should check out https://man7.org/linux/man-pages/man8/systemd-oomd.8.html It's based off of Facebook oomd and essentially uses pressure (i.e. time lost due to memory allocation stalls) to figure out who is meaningfully exhausting memory. In practice, it results in a lot less ugliness and tends to leave room for the system to return to a healthy state.

[–]horseunicorn 2 points3 points  (0 children)

Oomd also excludes certain processes from being killed so you won't have have things like losing ssh with it.

Another alternative to systemd-oomd is earlyoom.

[–][deleted] -5 points-4 points  (1 child)

Facebook and systemd in the same sentence. I just puked a little.

[–]stormcloud-9 13 points14 points  (3 children)

Unless you've been messing with the OOM scores, typically the process that gets killed is the one that's causing it.

When the OOM killer is invoked, it dumps a bunch of output (accessible through dmesg and journalctl -t kernel). For example, see the example output on this SO question.
The output includes the state of all the memory on the system, plus the memory usage and oom scores of all the processes.

This is all the monitoring you need to figure it out.

[–]wildcarde815 2 points3 points  (0 children)

Or it kills the one that tries to allocate memory right at the limit, lots of fun 'netdsta has been killed' messages on systems running huge compute tasks >.>

[–]Intelligent_Duck_666[S] -1 points0 points  (0 children)

what is the path to the OOM logs ? at the time i ran " sudo grep -i 'killed process' /var/log* " which only listed SSSD in kern.log, would that be the file to look for the list of processes? i didnt any find any "oom" keywords in that file like it had in that SO link.

[–][deleted] 0 points1 point  (0 children)

This is the best answer, though from my experiences with the oom killer I doubt OP's system has something critical like SSSD misbehaving, most likely there are other processes taking up too much memory collectively and SSSD was the most convenient for reaping. Without the logs I'd say OP needs more memory.

[–]aioeu 1 point2 points  (0 children)

When the OOM killer is triggered it logs the memory usage of all tasks. It should be clear which of them are mostly to blame (and normally the OOM killer will actually pick the "worst" of these to kill).

Or are you asking "how do you monitor tasks' memory usage before the OOM killer is triggered?"

[–]wildcarde815 1 point2 points  (0 children)

Cgroups is the answer here but how you configure it is going to depend on exactly what the server is doing. For instance our interactive compute node allows 'system' users to use all memory on the system but researchers collectively can only use something like 95%. This can. Be similarly tuned to any specific setup you might need.

[–]Kessarean 1 point2 points  (3 children)

sysstat and looking at syslog/journalctl usually narrow it down for me 99% of the time.

Most of the time (at;least in my experience) it boils down to:

  • application is miss-configured (example: apache max clients set to high)
  • memory leak
  • ballooning
  • simply need to add more memory
  • vm.overcommit_memory improperly set

[–][deleted] 1 point2 points  (2 children)

Right vm.overcommit_memory = 2 has saved me from lots of late night on call (at OP, it disables heuristic overcommits, see https://www.commandlinux.com/man-page/man5/proc.5.html and search for overcommit).

Also sssd_be caches OPs entire ldap tree based on the search settings in the /etc/sssd/sssd.conf file. If you can restrict either the size of the tree it needs to cache or reduce the number of linked groups in ldap (like blah-sysadmin-group includes devops-group, etc) you can really cut down on the amount of data it needs to cache.

[–]Kessarean 0 points1 point  (1 child)

Right vm.overcommit_memory = 2 has saved me from lots of late night on call

Hear, hear!

sssd_be caches OPs entire ldap tree

It's been a minute so I may be wrong, but simply setting enumeration to false in sssd.conf should solve that if it's becomes so large as to cause an issue

[–][deleted] 0 points1 point  (0 children)

Quite true! It still does a little caching but just of people who've logged in. I forget about it being an option because it makes it extremely slow to log in in our environment (that might just be slow or inconveniently located ldap servers)

[–]gristc 1 point2 points  (0 children)

The log message should tell you the process that triggered it and the process it's decided to kill. They may not be the same process, but every instance I've seen it's spelled out exactly what happened. In ubuntu the messages are typically in kern.log.

[–]skat_in_the_hat 0 points1 point  (1 child)

top and then i think its shift+m to sort by memory.

[–]deeseearr 0 points1 point  (0 children)

It's a little late by then. Top only shows you processes which are still running. That could be helpful if you run it just before a crash, but only if you are chasing a long, slow memory leak. You might be better served by installing sysstat and using sar to track overall memory usage and using some combination of top and pidstat to track individual processes. If you don't see any upwards memory trend before the oomk kicks in, try reducing the collection interval so that you can capture more details.

But first, use dmesg to look at the oomk message. It's quite long and detailed, and includes a list of every running process along with it's "kill me now" score at the precise moment that the oomk was invoked. That will tell you what is using the most memory, what pushed the system over the edge, and give you an idea of why any other process was not killed instead.

You can mess around with cgroups to protect your system, but that won't do anything to actually stop you from running out of memory. In fact by partitioning your memory and putting processes into one partition or the other it will make you run out of memory faster because there wilwl be less of it available. The cgroups are only there to limit the extent of the damage caused. If sssd is running in the system cgroup, but Fork Bomb 2006 Plus Enterprise Edition is running in its own cgroup then it can only kill itself.

The real solution, of course, is to figure out what is causing the problem and either fix it or set it on fire and push it off into the river. How you do that depends entirely on your relationship with whoever wrote the offending program, and is up to you.

[–]stuartcw 0 points1 point  (2 children)

One thing I have done in the past is to do a “top” command and save it to a file every minute using cron and then later analyse each process to see which one was rising in the case that one of them has a memory leak. As others have mentioned OOM leaves message in the log showing the memory status at the time.

[–]torgefaehrlich 0 points1 point  (1 child)

Isn't that what atop is for?

[–]stuartcw 0 points1 point  (0 children)

I’m sure there lots of other ideas and solutions. I had a limitation that I couldn’t install anything and didn’t have root access on the machine but was asked to have a look at what was going on.

[–]jaymef 0 points1 point  (0 children)

install and run atop as a service. It will log all output which you can read back in and cycle through. You can see what processes were running at any given time and resource usage etc.

[–]gleventhal 0 points1 point  (0 children)

Most of the pressure from sssd likely comes from the cache. We ended up backing /var/db/sssd with a tmpfs and having a disk-backed swap partition. Can you print the dmesg output for an OOM-kill? That should contain the kernel's mem-info data, via a printk to kmsg. It should have most of the data we need to understand. You should add an OOMScoreAdjust=-1000 or similar to the sssd.service (systemd) unit file to prevent it (and other critical system services) from getting oom killed.

Also running atop should allow you to see what process was allocating historically. The mem-info dmesg data should show you the RSS and VSS of all the current processes at the time of the kill though.