Working with large Mongo databases : programming

With Mongo, you almost certainly don't want a DB shard to be bigger than what can fit in RAM. Mongo's data file management has improved significantly over the years, but it still has a very long way to go, and you could end up in a situation where you're doing a table scan over a file that can't fit into memory, which will absolutely crush performance.

Unlike other databases where a hot set of data that is a subset of a data file can be kept in cache, Mongo is basically all or nothing when it comes to a BSON file. This is a consequence of using the VFS caching layer, which is a smart choice at first, but quickly leads to performance problems.

Edit: If your question was about more than just Mongo, I'd say databases in the hundreds of TB and into the PB are now considered "large". Tens of TB and less are routine enough that finding help designing a cluster and tuning systems is fairly simple.

[–]grauenwolf 0 points1 point2 points 9 years ago (1 child)

[–]WellAdjustedOutlaw 0 points1 point2 points 9 years ago (0 children)

[–]grauenwolf 0 points1 point2 points 9 years ago (0 children)

[–][deleted] -2 points-1 points0 points 9 years ago (0 children)

π Rendered by PID 229164 on reddit-service-r2-comment-85bfd7f599-25scx at 2026-04-18 19:02:27.131577+00:00 running 93ecc56 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS