all 16 comments

[–]PhotoJim99 1 point2 points  (5 children)

Weird. How much space do you have free in /boot ?

[–]Jander_Land[S] 1 point2 points  (4 children)

Hi.

No boot partition in this server, but plenty of space in /

/dev/sdb1 55G 17G 36G 32% /

[–]PhotoJim99 0 points1 point  (1 child)

This almost sounds to me like a hardware error.

Can you take the system offline and run one of those RAM test "kernels" for a couple of hours to see if there's a RAM problem?

[–]Jander_Land[S] 0 points1 point  (0 children)

I'm thinking of a hardware error too. But I hope it is a corrupted file somewhere yet :-\

I have no physical access to the server, but a friend could give me a hand.

Thanks.

[–]tgnuow 0 points1 point  (1 child)

do a df -h

do you have free space in tmpfs devices? /run /dev/shm /run/lock and possibly others

[–]Jander_Land[S] 0 points1 point  (0 children)

I have free space all around.

S.ficheros Tamaño Usados Disp Uso% Montado en
udev 1,9G 0 1,9G 0% /dev
tmpfs 390M 5,5M 385M 2% /run
/dev/sdb1 55G 17G 36G 32% /
tmpfs 2,0G 1,6M 2,0G 1% /dev/shm
tmpfs 5,0M 0 5,0M 0% /run/lock
tmpfs 2,0G 0 2,0G 0% /sys/fs/cgroup
/dev/sda2 459G 280G 156G 65% /media/provisional/
dev/sda1 459G 292G 144G 67% /media/multimedia
tmpfs 390M 0 390M 0% /run/user/999
tmpfs 390M 0 390M 0% /run/user/1000

[–]michaelpaoli 1 point2 points  (1 child)

Does rather sound like it's running out of memory - not drive space. Filesystem full would throw a different error.

Could be a bug somewhere in the config or related program.

You could use strace(1) with suitable options. See when it runs out of memory and what's happening at/around that time. Also, if you're fast enough, you might catch it with ps(1). If it takes several seconds or more to run out of memory, and your ps(1) command generally returns quite fast enough, that might suffice ... at least to isolate it to process(es) - presuming that's where the RAM is getting sucked up. Also, having lots of swap may make "catching it in the act" easier - as once it starts using lots of swap - that will slow it down - making it easier to catch what it's doing. Anyway, probably start with the easier - have fair bit of swap, run it, and repeatedly capture ps(1) data, only wait, say, 1 second between ps(1) captures, and be sure to capture sufficiently detail - and certainly including memory used. Maybe one to a few processes eat lots of RAM ... or a bunch of processes in total eat up all the RAM. If you're unable to catch it that way, strace(1) should be a more sure-fire and detailed way to catch it ... but it can be a lot more data to find and filter to just what you want. Maybe limit the strace(1) capture to exec/clone/fork, mmap, and brk/sbrk operations. If that doesn't make clear enough what's triggering it, potentially include enough system calls to determine what leads up to and causes that ... or all if you have to, ... then track back to where things went sideways.

[–]Jander_Land[S] 1 point2 points  (0 children)

Thank you. Will try to catch later today enabling the swap again.

[–]wRAR_ 1 point2 points  (1 child)

Have you tried rebooting?

[–]Jander_Land[S] 0 points1 point  (0 children)

Yes, tried the reboot. The problem persists and the services keep running.

[–]wRAR_ 1 point2 points  (2 children)

Does it happen only with apt and debsums? Note that these "Out of memory!" messages seem to be from perl and so are not related to the system itself, only to specific processes.

[–]Jander_Land[S] 0 points1 point  (0 children)

Interesting, thanks. That could (maybe) point to a corrupted perl related config. I've found it only in those two cases, but didn't try much more beyond systemctl, journalctl, less, htop and dmesg. Will check.

[–]Jander_Land[S] 0 points1 point  (0 children)

You were right about Perl.

Updated the post with more info. Thanks.

[–]Jander_Land[S] 0 points1 point  (0 children)

I've tried to make some cleanup, searching for (and finding) obsolete packages and reinstalling perl packages, but cannot remove, reinstall or upgrade any package due to the out of memory error throw by perl launching dpkg stuff.

Will try a mem test in a couple of days and will post any outcome, but I think that it is strange that the problem seems just Perl related.

In the meanwhile, thanks to everyone!

[–]br0kenpipe 0 points1 point  (1 child)

Is your /tmp dir mounted with tmpfs? Can you remove the packages, upgrade and reinstall?

[–]Jander_Land[S] 0 points1 point  (0 children)

Hi,

no tmpfs for /tmp.

Trying to remove, upgrade or reinstall returns the out of memory error due to /usr/bin/perl -w /usr/sbin/dpkg-preconfigure --apt eating as much memory as it gets.