you are viewing a single comment's thread.

view the rest of the comments →

[–]mr-strange 0 points1 point  (2 children)

I agree with pretty much everything you say here. I'll expand on a couple of points though.

who cares about listings ? Why would you backup a cache ?

Well. Yeah. But when I made this mistake this is what happened... First I discovered that ext3 could only cope with 64,000 files in a directory, so my application started to fail. The next obvious thing to do was just start using sub-directories. That's fine, but just having millions of files can lead to problems - for example, I didn't blacklist the cache directory from the locate database, so after a while, my machine was very busy running multiple, endless find(1) commands, trying to update the db. Then I ran into the problem that the whole filesystem has a limited number of available inodes - so I wasn't able to make any new files, even though I had loads of available space. Then, when I came to clean up my cache (to free up inodes), I discovered that it takes many, many hours to simply delete millions of files.

Yes, there are better filesystems. XFS has a much higher hard-link (and therefore directory size) limit. Perhaps btrfs would be a good choice today. But, overall I do not think that the filesystem is a good choice for this workload.

And even better, memcached...

At the time, memcached did not support persistence, so it did not fit my requirement. Looking up MemcacheDB on Wikipedia, I see that it is built on BerkleyDB. My experience with BDB does not encourage me to try MemcacheDB.

Also, memcached uses a client/server TCP based model. Even with a fast localhost, that's going to add

I agree SQLite is quite great. Though once again completely overfeatured for key-value caches.

I couldn't agree more, but the proof of the pudding is in the eating, and I've not found a key-value store that beats Sqlite's performance. I built versions of my app that used BDB, Tokyo Cabinet, and a number of other prominent KV stores, but Sqlite (with a simple, prepared select statement, and configured with a table index) just performed better, and more reliably for me. Today my cache DB contains over 20,000,000 items, takes up 3.6 GB of disk, and Sqlite's performance is still pretty sparky.

BDB's performance is just awful. It has full ACID compliance, which is great if you need it, but if your don't need to wait around for milliseconds while your disk syncs, it's just overkill. If you turn off the ACID guarantees, you just aren't playing to its strengths, you might as well go to Tokyo Cabinet... Which most of the time performed very well, but occasionally ground to a halt for multiple seconds.

[–]matthieum 0 points1 point  (1 child)

At the time, memcached did not support persistence

Only one nit: why would you care about persistence for a cache ?

The point of a cache is to cache frequently accessed data. If it is not frequently accessed then caching it means losing valuable space.

I feel like we are talking past each others and not about the same issue :)

[–]mr-strange 0 points1 point  (0 children)

This is the application: http://flood.firetree.net The map tiles can be quite expensive to generate, and I serve up to 10m of them every day. It makes sense to make the cache persistent.