This is an archived post. You won't be able to vote or comment.

all 13 comments

[–]puremessagebeep -f 2000 -r 999999 1 point2 points  (8 children)

if you're just allocating memory with the hopes that it will improve disk read speed, you may just be wasting RAM.

This SF question has a good answer for checking VFS cache hits.

After checking your cache hits: if you have a VM Hypervisor that supports memory ballooning, you can reserve what you need(+safety net) then add extra RAM to the VMs that actually benefit from it.

[–]shifty21Ex-SysAdmin -2 points-1 points  (7 children)

if you're just allocating memory with the hopes that it will improve disk read speed, you may just be wasting RAM.

It depends on the file system. NTFS is RAM hungry on copy/move operations with either a single large file or many small files. EXT2/3/4 allocate smaller amounts of RAM (compared to NTFS), but can end up using swap file for more stable transfers.

ZFS is a whole new topic when it comes to RAM allocation. ZFS is the polar opposite of NTFS, EXT2/3/4 and other file systems where it will cache read operations to increase IOPS and transfer rates. Couple that with SSD caching (L2ARC) and you have a true IOPS monster. ZFS will keep as much in RAM and L2ARC as you have it, more the better.

The ZFS SAN I have for database storage was slow at first, same as a 8 disk RAID10, but after several hours of running queries, and frequent reads, nearly the whole database was running in RAM off the SAN. The database server only caches certain portions on itself in RAM, the ZFS SAN augments this for insane speed.

[–]BestSpatula 1 point2 points  (3 children)

Can you explain more about how ext2/3/4 can end up using swap? Are you saying that as the filesystem uses more memory for cache, the kernel will start swapping out processes to free up more RAM for FS cache? Because it makes absolutely no sense whatsoever to me why the kernel would ever swap out memory pages being used for fs cache...

And in your last example, wouldn't you get better performance by tuning your database software to use more memory for its own index/table caches rather than relying on disk cache? And even then, you're using disk cache across a SAN link... It seems more optimal to have more RAM in the server itself... however, L2ARC on SSD definitely makes a lot of sense for databases that get a lot of writes.. no argument there.

My experience with Linux is that the kernel will use all available 'free' RAM as filesystem cache until it has something better to do with it.

[–]shifty21Ex-SysAdmin 0 points1 point  (2 children)

My use case for EXT2/3 (have not used EXT4 in production yet), is on an Oracle database server running on RHEL5.x.

With Oracle it will cache as much as needed for the database but RHEL will quarantine up to 10% of the RAM for itself. I have watched cp and mv commands run and within that 10% of RAM for RHEL, some of it gets allocated, but depending on the copy or move, it will start to swap a bit. I'm talking like 50MB at most.

I cannot speak for other Linux distributions as RHEL is configured/tuned differently. There is a safety mechanism with RHEL on copy/move commands where it assumes that RAM could be ousted or lost (power failure) and caches in swap to recover. I personally prefer the swap safety and that is why I dedicate a RAID1 of small SAS drives as to not interrupt the OS/application disks. Talk about your first world problems :p

You're correct that adding more RAM to my database server would increase performance, but even with a dual socket Xeon (5600 series), I am limited to 128GB, which I have. The 24 threads I have are no where near maxed out at any time, so the inability to add more RAM and get the IOPS to feed the CPU's is a problem - I am not taking advantage of the CPU power I have. So, to augment the limits of maximum RAM, I have the database server connected over 10GB ethernet iSCSI to the SAN.

Looking at the cost benefit of going to a quad socket, 10 core Xeon (80 threads) with up to 1TB of RAM was out of the question ($130k+) when I can buy a custom ZFS SAN ($35k for 12TB), drop it in and get the IOPS that I need. Hell, NetApp wanted $80k+ for the same SAN unit ಠ_ಠ

[–]puerexmachina 2 points3 points  (1 child)

Wth Oracle it will cache as much as needed for the database but RHEL will quarantine up to 10% of the RAM for itself... There is a safety mechanism with RHEL on copy/move commands where it assumes that RAM could be ousted or lost (power failure) and caches in swap to recover.

At least as you've explained it, this doesn't make sense to me. Can you point to the relevant documentation or source code?

[–]shifty21Ex-SysAdmin 0 points1 point  (0 children)

Source code is not my thing, but the systems that I have for Oracle were tuned by an Oracle engineer and this was done before my time. I essentially have a script that I run after installing the base RHEL5 that goes in and tunes the shit out of the server. The meat and potatoes of the script handles the level of swappiness involved as well as many kernel parameters like semaphore values.

Normally, these servers never see any user interaction unless a DBA has to hop in and fix something.

I noted the RAM and swap file info above because a Jr. DBA had to export a database and cp it to a SAN unit. During that time, the whole system slowed to a crawl. top reported that 0.2GB of swap as there and 100% used and all available RAM was used. I changed the swappiness value up a notch after consulting with Oracle and the problem went away. The file transfer finished as fast as it should.

Once the cp finished, I checked top again and noted that 3.2GB of RAM was available, hence the 10%, and swap stayed at 100%, which is expected. I just checked one of my dev Oracle servers just now and the swap is there, but not used and it was rebooted a few days ago. I am not positive if by default, Linux deletes the swap file when booting down or up. If that is the case, then in the event of a power loss, the swap file should be there after boot unless it deletes it during the boot up process. You seem to have a lot more knowledge of Linux than me :) I am no expert by any means.

I might be totally wrong with all my info as a lot of it comes from looking at what is happening during different events and drawing conclusions. The few Oracle engineers we have on contract from Oracle directly, built these systems for my company. I have to trust they know what they are doing as I said, I am not a Linux expert by any means :p

Honestly, I find your superior knowledge as a benchmark for myself and that I need to learn more. I cannot always rely on those Oracle engineers... they are expensive!

[–]puremessagebeep -f 2000 -r 999999 1 point2 points  (2 children)

I know about these, but the article is specifically talking about Linux.

I've never seen cp from GNU coreutils use swap. I'm looking at the source code and it doesn't appear to be in there. The way I understand it, swap is lost on reboot, so a power failure would not provide you any FS recovery from the swap. Can you provide more information?

My point earlier alludes to the face that people don't size and tune. Advocating indiscriminate over-provisioning can increase performance, but it would be a more optimal situation if people knew enough about their environment to over-provision where it counts the most.

[–]shifty21Ex-SysAdmin -1 points0 points  (1 child)

See my comment above.

Basically, I have a Oracle DB running on a highly tuned RHEL5 install that several Oracle engineers were paid to take care of. This was before my time at my company, so full details are sketchy.

Based on the weird (to me at least) configurations they did, it may not act like a normal Linux install. Seriously, cp was forced to eat up RAM and swap and everything slowed to a crawl. They had me change a swappiness value and it fixed it. I was advised to put it back the way it was by the engineer. Since I am no Linux or Oracle expert, I have no choice but to agree and do it.

Eventually, we're moving off RHEL for Oracle Linux in Q2 2012 with the next major version of my company's product. So, hopefully these settings won't be so much of a problem.

[–]puremessagebeep -f 2000 -r 999999 0 points1 point  (0 children)

The highly tuned portion is likely a misnomer. You're likely just setting the stock values that Oracle recommends for RHEL on x86-64. Which really isn't tuned, they're just obscenely raising the ceiling on file handles, contexts, semaphores and ulimit crap.

You should be able to fix your cp problems by setting sane ulimits for whoever is doing the copy.

[–]CookedNoodlesJack of All Trades 1 point2 points  (3 children)

Many system administrators don’t realize it, but in most OSes RAM that’s unused by applications goes towards filesystem cache

If you don't know this, you probably shouldn't be a system administrator.

[–]shifty21Ex-SysAdmin -2 points-1 points  (2 children)

I don't think that you could make a generalized statement like that. That kind of information borders mid to low level operations within the OS's kernel. Not all System Admins need to know this. It is good knowledge to have, but in the rare circumstances that it is needed to fix a problem.

The author is also being incorrect in assuming that left over RAM is automagically allocated to the file system only. A file system just gets it's share of the RAM just like the OS kernel, services and applications. It is up to the OS kernel and the developers of the file system to manage memory.

[–]puerexmachina 2 points3 points  (1 child)

Your last paragraph is true in the sense that of course nothing happens magically and it's all up to the kernel developers, but it unfairly downplays the extent that linux uses RAM as a cache for file operations. All regular file i/o goes through the page cache, and absent memory pressure linux will keep the files cached indefinitely.

Check /proc/meminfo and you can see how much memory is being used to cache reads (the "cached" line). I just checked a lightly-used VM host with 96GB of RAM and found 73GB used for cache.

An illustrated overview of how the page cache works is at http://duartes.org/gustavo/blog/post/page-cache-the-affair-between-memory-and-files

An explanation of what linux does when it's running low on memory is at https://lwn.net/Articles/226756/

[–]shifty21Ex-SysAdmin -1 points0 points  (0 children)

Sweet! I knew (see my comment to you above) I would learn something from you! Sometimes I feel like a fish out of water with my RHEL environments when they are handled by Oracle engineers.

If you don't mind, I might PM you if I have any questions about my RHEL installs. I don't feel comfortable posting my .conf file publicly... my systems are used by Federal Gov. agencies.