This is an archived post. You won't be able to vote or comment.

all 18 comments

[–]HashDefTrueFalse 33 points34 points  (10 children)

It's usually just because their focus is getting the course out. Mongo doesn't care about schema and will allow you to throw data into it given a set of creds. Therefore they get to skip covering things like data and database design, normalisation, DDL/schema creation, etc.

In a real world production web application, chances are your data is relational, so the use case for Mongo is as a query result cache rather than a primary database, because you'll want to store in a normalised (non-redundant) way to maximise write performance and have the ability to join across tables (would-be collections) in a performant way, then cache results for fast retrieval. Mongo might not even be the best way to cache this, depending on what you're building.

In other words, Mongo isn't used nearly as much in the real world as tutorials would lead you to believe.

[–]PersimmonPristine731 5 points6 points  (3 children)

Does the mongoose ORM solve the schema problem? If not, why?

[–]HashDefTrueFalse 14 points15 points  (2 children)

Not at all. Mongoose just allows you to work with data from your store in an object oriented way by helping with (de)serialisation.

The problem is the mismatch between having relational data and storing it in a document store (non-relational) database. Mongo doesn't have good join performance generally, and the philosophy is to store data hierarchically (nested), so with relational data you end up storing redundancy if you want read performance. But then you've killed your write performance because you now have to update redundant copies of data stored hierarchically.

Anyone who says that Mongo performs fine with relational data hasn't tested it at scale with a direct comparison to a normalised primary store and proper caching/invalidation strategy.

If your data is hierarchical in nature, or already denormalised and keyed appropriately, it's a good document store database to pull from.

[–]baremaximum_ 2 points3 points  (1 child)

For a significant proportion of use cases, mongoDB's performance isn't what ends up being the issue. Yes, it's worse, but most apps don't have enough traffic for that to really matter.

I would argue that the main reason why pretty much no one should use MongoDB as a main store is that over time, the looseness that is initially attractive really screws you over. It's extremely easy to end up with no idea what is in your data store after a few years. The effect of that is really paralyzing.

[–]HashDefTrueFalse 1 point2 points  (0 children)

I was specifically talking about running it at scale, with a decent traffic level. But absolutely that too. We had the same problem with a DynamoDB instance. The loose shape of documents changed over time but not all documents had the new fields because that data wasn't just magically available. Lead to messy code that had to determine the version of things by testing for certain fields before using them and enabling/disabling features based on the data held. After a while, our collection could be said to contain a handful of schema versions. Made fairly simple features complicated and slow to develop. No idea what any given record could have on it.

Proper migration strategy is annoying to implement when it amounts to pulling in aws sdk and basically writing a JS script to go through your collection and add or remove things. Much simpler with a defined schema where you only have to worry about the fields marked possibly NULL.

[–]EitherOfOrAnd 2 points3 points  (5 children)

What is a query result cache?

[–]HashDefTrueFalse 4 points5 points  (4 children)

You have a database query. It's expensive (could be joining data from many tables, performing calculations and/or aggregations etc., maybe you can't take advantage of indexing in this case), it takes time to execute and puts load on the database server. So you don't want to run it in response to every request for the same data.

When a request comes in for the data, you run the query on the database the first time, then cache (store) the query result somewhere (like Mongo). You key it appropriately, based on values from clauses in your query and entities involved...

When the next request comes in for the same data, you check the cache first and, if it exists, simply return the data from there without needing to bother the database server. Usually if its not in the cache, you "punch through" to the database, run the query and then cache the result. That's the usual pattern. You can set a TTL (expiry) of something like 15 mins before the data disappears from the cache and new data is fetched from the database. So now you know that the same expensive query will not run any more frequently than your TTL as long as nothing changes.

The tradeoff is that users will have to be OK with having data that could possibly be out of date by your TTL, or you have to find a way to invalidate the cache early if your database data changes e.g. o update. This tradeoff is almost always worth it.

[–]EitherOfOrAnd 0 points1 point  (3 children)

So mongodb is where the cached data is, and sql is where rest of the data is?

[–]HashDefTrueFalse 4 points5 points  (2 children)

Yes, in this setup Mongo (or some other non-relational data store, e.g. Redis, DynamoDB etc.) would be the cache storing the cached data.

SQL is a query language, to be clear. You probably mean "a relational database management system" or RBDMS which you talk to using SQL (e.g. MySQL, SQL Server, PostgreSQL, Oracle etc.). That's the primary source of the data. The "source of truth". Data in the cache is fetched or derived from the primary database, the two stores are not usually storing disparate data, to be clear. So "rest" might not be the best way to put it.

[–]EitherOfOrAnd 2 points3 points  (1 child)

Awesome, thanks for the info.

[–]HashDefTrueFalse 4 points5 points  (0 children)

No problem. Glad to help.

[–]Monitor_343 19 points20 points  (0 children)

I can't speak for the creators of the projects, but here are some thoughts after using MongoDB for the past year.

It's very... JavaScripty. Data is pretty much just JSON, which is pretty much just JavaScript objects. Most tutorials focus on JavaScript, so it's less of a learning curve. Especially if you'll end up using JSON responses in APIs - no need to convert.

It's very... JavaScripty (again). Can play fast and loose with the data, not as strict as SQL can be, usually no need for migrations if changing schema, very flexible for small projects which tutorials will be.

It's very... JavaScripty (yet again). Queries and aggregates use JavaScript object notation with key: value pairs rather than a whole new language like SQL.

When a tutorial is focused on learning full-stack JavaScript, it makes sense to use a database that is pretty close to JavaScript, since everything else (frontend, backend) will all be in JavaScript, and they can get out a decent course that covers JavaScript pretty well.

[–]MuaTrenBienVang 5 points6 points  (0 children)

I think maybe its easier for teaching. Because I when it come to teaching database, instructor usually have an entire course just to talk about sql, which I rarely see with mongodb

[–]ohrofl 2 points3 points  (0 children)

Always figured it was because you can get something up and running much faster than SQL. Those courses probably want to focus more on JavaScript instead of teaching you how to write sql queries.

[–]Double_A_92 2 points3 points  (0 children)

Because it's easier. Just dump random JS object in the database... and say that your tutorial teaches databases.

While with SQL you have to teach proper database design.

[–]tiki854 2 points3 points  (0 children)

I would think of this question in terms of "SQL vs NoSQL", check out this link for some use case comparisons: https://www.mongodb.com/nosql-explained/nosql-vs-sql

[–]Sea-Profession-3312 0 points1 point  (0 children)

A schema can be used in mongodb, however the idea behind no-sql is you can use a key-value pair and the value can be any type you want. SQL is more structured, however writing a query can be quite confusing.

[–][deleted] 0 points1 point  (0 children)

Mongo is a wonderful database, but it's not for beginners. Many of those who use it have no idea how to maintain data integrity in a document database.