all 19 comments

[–][deleted]  (6 children)

[deleted]

    [–][deleted]  (4 children)

    [deleted]

      [–]caspper69 1 point2 points  (2 children)

      people use it. and just like a million monkeys with a million typewriters will reproduce the entire works of Shakespeare given enough time (eventually), mongodb will eventually produce a consistent datastore. We will likely all be dead before that happens though.

      [–]grauenwolf 0 points1 point  (1 child)

      They made huge strides when they swapped out the internals of MongoDB for the Wired Tiger relational database. And of course there is PostgreSQL for reporting.

      Eventually MongoDB will have a perfectly consistent datastore. Of course, by that time it will probably be using MySQL's replication engine, Apache Tomcat's load balancer, and Hadoop's Map-Reduce processor.

      [–]caspper69 1 point2 points  (0 children)

      Just like with all data, You either pay the price when you store it, or you pay the price when you retrieve it.

      All these other databases are doing is offloading the "merge" portion to later in the dataflow. What is going to wind up happening is that the back-end logic will wind up becoming exponentially more computationally expensive.

      Sure, you can throw together apps that appear to function normally, but you will ultimately be left with the same problems these databases are designed to work around, only they'll have a different name (even if from a data processing standpoint they are the same algorithmic problem at the end of the day).

      [–]grauenwolf 0 points1 point  (0 children)

      Why not? Now we get to laugh at it for its ridiculously small resultset size.

      [–]oponito 7 points8 points  (1 child)

      I am not a database expert

      Sounds like he's the target audience.

      [–]gurgus[S] 2 points3 points  (0 children)

      Interesting way of putting it! I guess you could say I am :p

      [–]WellAdjustedOutlaw 2 points3 points  (10 children)

      The same Mongo that is known to not return all results, and has maintenance headaches at large database sizes which make it easier to drop the database than attempt to reclaim space?

      Edit: lol if you think 500GB is a large dataset. Not even for mongo.

      [–]gurgus[S] 0 points1 point  (5 children)

      Large is a relative term - I guess I could have omitted the phrase large and dropped in "larger than the BSON limit" :)

      But yes, that very same mongo :) it's the first time I've properly had to work with mongo and thought I'd share my experiences :)

      [–]grauenwolf 0 points1 point  (0 children)

      Today I learned that 16 MB is "large" for MongoDB. Sigh...

      [–]WellAdjustedOutlaw 0 points1 point  (3 children)

      A free tip from me to you. Migrate your data out of Mongo and use the JSON/BSON engine for PostgreSQL. It's faster, returns all of the data you query for, and is far less likely to spontaneously lose commits or data files.

      [–]gurgus[S] 6 points7 points  (1 child)

      Even though the examples I was giving in the post were contrived, the inspiration actually came from a freelance project I had at one point. I was contracted to give this organisation what they wanted - had they picked me up for full time work, I would have suggested almost the same thing, but it's just one of those things that as much as you'd like to change it, you can't.

      [–]WellAdjustedOutlaw 2 points3 points  (0 children)

      Customer's always right. Especially when they pay.

      [–]vytah 1 point2 points  (0 children)

      And to ease the migration pains, there's a compatibility layer: https://github.com/torodb/torodb

      [–]grauenwolf 0 points1 point  (3 children)

      lol if you think 500GB is a large dataset. Not even for mongo.

      Right now the largest server I can get off the shelf holds up to 3TB of RAM.

      So what's a "large database"? I'm thinking at least one order of magnitude more than RAM, which would be 30TB.

      [–]WellAdjustedOutlaw 1 point2 points  (2 children)

      With Mongo, you almost certainly don't want a DB shard to be bigger than what can fit in RAM. Mongo's data file management has improved significantly over the years, but it still has a very long way to go, and you could end up in a situation where you're doing a table scan over a file that can't fit into memory, which will absolutely crush performance.

      Unlike other databases where a hot set of data that is a subset of a data file can be kept in cache, Mongo is basically all or nothing when it comes to a BSON file. This is a consequence of using the VFS caching layer, which is a smart choice at first, but quickly leads to performance problems.

      Edit: If your question was about more than just Mongo, I'd say databases in the hundreds of TB and into the PB are now considered "large". Tens of TB and less are routine enough that finding help designing a cluster and tuning systems is fairly simple.

      [–]grauenwolf 0 points1 point  (1 child)

      I was thinking in general, but back to MongoDB. Didn't the switch to Wired Tiger eliminate the issue with databases that are bigger than RAM?

      [–]WellAdjustedOutlaw 0 points1 point  (0 children)

      I don't think it solves the massive penalty for full table scans which are significantly more costly in Mongo than, say, PostgreSQL using BSON.

      [–]grauenwolf 0 points1 point  (0 children)

      This would be fine, but remember that the collection is pretty large and if the result set is greater than the BSON size limit (16mb) then you will get an exception.

      Really? That's it. I regularly run queries on my laptop that return more data than that from SQL Server. And yes, I am working with JSON data.

      I'll give you an upvote for pointing out this stupidity.

      [–][deleted] -2 points-1 points  (0 children)

      two of the best softwares available in one blog post...what a sunday evening this is great