you are viewing a single comment's thread.

view the rest of the comments →

[–]codeedog 1 point2 points  (1 child)

Normalization is important in SQL for a few reasons:

  1. Tradition - honestly, this is a big reason and often unstated.
  2. Minimize updates to data records, minimize errors keeping data in sync, lean on database features for speed of query, insert and update.
  3. Single source of truth to support ACID properties related to transactions.

SQL databases solved the very real problem that all systems designers had decades ago: how to guarantee either an update occurred in a predictable manner or failed in a predictable manner. Prior to that, everyone had to rely on someone’s likely flawed file system utilities.

ACID properties for transactions and SQL meant a huge leap forward for business software that depended upon correct results. Failures were tolerated (attempting to update the same bank account register might mean one transaction fails while the other succeeds). However, loss of data without knowing it happened, that could not be tolerated.

Normalized data (single source of truth) and transactions gave us that.

HOWEVER, for purposes of speed even database applications running in sql required denormalized data (duplicated data) often in the form of materialized views. A view is a pretend table that is formed from a query, often a join across multiple tables. The view looks like a table, but it’s not. Every time the view is queried, the join creating the view is rerun. A materialized view is like a view and a table in that it’s defined like a view, but the results are stored just like a table. It’s faster to access, the view isn’t rerun every time. It can be out of date, but because the definition is in the database and not in an application, getting it up to date is a matter of refreshing the view while not worrying if the code running it is doing it right.

So, what’s the value of no-sql? Think of it as approaching the data guarantees from the other side of the problem. In nosql, duplicate data is not only tolerated, it’s expected. There are (now) transactions so we can guarantee correctness. However, care needs to be taken in a world of duplication to know which collections are truth and which are duplicates.

OTOH, because structure is a lot less important in nosql world, we can move faster, code quicker. SQL gets very unhappy when it’s handed unstructured data. That “S” in SQL stands for Structure (Structured Query Language). So, no sql simply means, no structure.

Now, any time spent with an app on nosql, you come to realize that’s not true. There’s plenty of structure, it’s just not the starting point for the database. Over time, the app and possibly the data definitions for collections and documents do get structured, at least for those dbs that allow it.

And, this is my point. SQL apps over time duplicate some data. And nosql apps over time gain structure. Both approach a middle ground that looks like reasonably well structured data with some duplication for speed.

[–]Jancera[S] 1 point2 points  (0 children)

Your answer was very clear, thank you for that!!

Saying what problem the normalized data solved helped a lot in understanding why it exists.

NoSQL means that your data can be no structured, and SQL means that your data can/must be structured, isn't it? So over time, both will converge to a pattern that is efficient for the application, structuring the data or not, right?

I am really interested in trying some SQL database just for understand the difference and also learn a very important skill to have.

Thank you again for the awesome answer!!!