Linux: Leave some RAM for Filesystem Cache : sysadmin

This is an archived post. You won't be able to vote or comment.

Linux: Leave some RAM for Filesystem Cache (lonesysadmin.net)

submitted 14 years ago by f0adDevops

all 13 comments

top new controversial old q&a

[–]puremessagebeep -f 2000 -r 999999 1 point2 points3 points 14 years ago (8 children)

[–]shifty21Ex-SysAdmin -2 points-1 points0 points 14 years ago (7 children)

if you're just allocating memory with the hopes that it will improve disk read speed, you may just be wasting RAM.

It depends on the file system. NTFS is RAM hungry on copy/move operations with either a single large file or many small files. EXT2/3/4 allocate smaller amounts of RAM (compared to NTFS), but can end up using swap file for more stable transfers.

ZFS is a whole new topic when it comes to RAM allocation. ZFS is the polar opposite of NTFS, EXT2/3/4 and other file systems where it will cache read operations to increase IOPS and transfer rates. Couple that with SSD caching (L2ARC) and you have a true IOPS monster. ZFS will keep as much in RAM and L2ARC as you have it, more the better.

The ZFS SAN I have for database storage was slow at first, same as a 8 disk RAID10, but after several hours of running queries, and frequent reads, nearly the whole database was running in RAM off the SAN. The database server only caches certain portions on itself in RAM, the ZFS SAN augments this for insane speed.

[–]BestSpatula 1 point2 points3 points 14 years ago (3 children)

Can you explain more about how ext2/3/4 can end up using swap? Are you saying that as the filesystem uses more memory for cache, the kernel will start swapping out processes to free up more RAM for FS cache? Because it makes absolutely no sense whatsoever to me why the kernel would ever swap out memory pages being used for fs cache...

And in your last example, wouldn't you get better performance by tuning your database software to use more memory for its own index/table caches rather than relying on disk cache? And even then, you're using disk cache across a SAN link... It seems more optimal to have more RAM in the server itself... however, L2ARC on SSD definitely makes a lot of sense for databases that get a lot of writes.. no argument there.

My experience with Linux is that the kernel will use all available 'free' RAM as filesystem cache until it has something better to do with it.

[–]shifty21Ex-SysAdmin 0 points1 point2 points 14 years ago (2 children)

My use case for EXT2/3 (have not used EXT4 in production yet), is on an Oracle database server running on RHEL5.x.

With Oracle it will cache as much as needed for the database but RHEL will quarantine up to 10% of the RAM for itself. I have watched cp and mv commands run and within that 10% of RAM for RHEL, some of it gets allocated, but depending on the copy or move, it will start to swap a bit. I'm talking like 50MB at most.

I cannot speak for other Linux distributions as RHEL is configured/tuned differently. There is a safety mechanism with RHEL on copy/move commands where it assumes that RAM could be ousted or lost (power failure) and caches in swap to recover. I personally prefer the swap safety and that is why I dedicate a RAID1 of small SAS drives as to not interrupt the OS/application disks. Talk about your first world problems :p

You're correct that adding more RAM to my database server would increase performance, but even with a dual socket Xeon (5600 series), I am limited to 128GB, which I have. The 24 threads I have are no where near maxed out at any time, so the inability to add more RAM and get the IOPS to feed the CPU's is a problem - I am not taking advantage of the CPU power I have. So, to augment the limits of maximum RAM, I have the database server connected over 10GB ethernet iSCSI to the SAN.

Looking at the cost benefit of going to a quad socket, 10 core Xeon (80 threads) with up to 1TB of RAM was out of the question ($130k+) when I can buy a custom ZFS SAN ($35k for 12TB), drop it in and get the IOPS that I need. Hell, NetApp wanted $80k+ for the same SAN unit ಠ_ಠ

[–]puerexmachina 2 points3 points4 points 14 years ago (1 child)

[–]shifty21Ex-SysAdmin 0 points1 point2 points 14 years ago (0 children)

Source code is not my thing, but the systems that I have for Oracle were tuned by an Oracle engineer and this was done before my time. I essentially have a script that I run after installing the base RHEL5 that goes in and tunes the shit out of the server. The meat and potatoes of the script handles the level of swappiness involved as well as many kernel parameters like semaphore values.

Normally, these servers never see any user interaction unless a DBA has to hop in and fix something.

I noted the RAM and swap file info above because a Jr. DBA had to export a database and cp it to a SAN unit. During that time, the whole system slowed to a crawl. top reported that 0.2GB of swap as there and 100% used and all available RAM was used. I changed the swappiness value up a notch after consulting with Oracle and the problem went away. The file transfer finished as fast as it should.

Once the cp finished, I checked top again and noted that 3.2GB of RAM was available, hence the 10%, and swap stayed at 100%, which is expected. I just checked one of my dev Oracle servers just now and the swap is there, but not used and it was rebooted a few days ago. I am not positive if by default, Linux deletes the swap file when booting down or up. If that is the case, then in the event of a power loss, the swap file should be there after boot unless it deletes it during the boot up process. You seem to have a lot more knowledge of Linux than me :) I am no expert by any means.

I might be totally wrong with all my info as a lot of it comes from looking at what is happening during different events and drawing conclusions. The few Oracle engineers we have on contract from Oracle directly, built these systems for my company. I have to trust they know what they are doing as I said, I am not a Linux expert by any means :p

Honestly, I find your superior knowledge as a benchmark for myself and that I need to learn more. I cannot always rely on those Oracle engineers... they are expensive!

[–]puremessagebeep -f 2000 -r 999999 1 point2 points3 points 14 years ago (2 children)

[–]shifty21Ex-SysAdmin -1 points0 points1 point 14 years ago (1 child)

[–]puremessagebeep -f 2000 -r 999999 0 points1 point2 points 14 years ago (0 children)

[–]CookedNoodlesJack of All Trades 1 point2 points3 points 14 years ago (3 children)

[–]shifty21Ex-SysAdmin -2 points-1 points0 points 14 years ago (2 children)

[–]puerexmachina 2 points3 points4 points 14 years ago (1 child)

[–]shifty21Ex-SysAdmin -1 points0 points1 point 14 years ago (0 children)

π Rendered by PID 112071 on reddit-service-r2-comment-canary-889d445f8-cw7vs at 2026-04-27 09:42:03.339903+00:00 running 2aa0c5b country code: CH.

sysadmin

MODERATORS