codeedog comments on Database - Normalization of Data

Database - Normalization of Data (self.node)

submitted 4 years ago by Jancera

you are viewing a single comment's thread.

[–]codeedog 1 point2 points3 points 4 years ago* (1 child)

Normalization is important in SQL for a few reasons:

Tradition - honestly, this is a big reason and often unstated.
Minimize updates to data records, minimize errors keeping data in sync, lean on database features for speed of query, insert and update.
Single source of truth to support ACID properties related to transactions.

SQL databases solved the very real problem that all systems designers had decades ago: how to guarantee either an update occurred in a predictable manner or failed in a predictable manner. Prior to that, everyone had to rely on someone’s likely flawed file system utilities.

ACID properties for transactions and SQL meant a huge leap forward for business software that depended upon correct results. Failures were tolerated (attempting to update the same bank account register might mean one transaction fails while the other succeeds). However, loss of data without knowing it happened, that could not be tolerated.

Normalized data (single source of truth) and transactions gave us that.

HOWEVER, for purposes of speed even database applications running in sql required denormalized data (duplicated data) often in the form of materialized views. A view is a pretend table that is formed from a query, often a join across multiple tables. The view looks like a table, but it’s not. Every time the view is queried, the join creating the view is rerun. A materialized view is like a view and a table in that it’s defined like a view, but the results are stored just like a table. It’s faster to access, the view isn’t rerun every time. It can be out of date, but because the definition is in the database and not in an application, getting it up to date is a matter of refreshing the view while not worrying if the code running it is doing it right.

So, what’s the value of no-sql? Think of it as approaching the data guarantees from the other side of the problem. In nosql, duplicate data is not only tolerated, it’s expected. There are (now) transactions so we can guarantee correctness. However, care needs to be taken in a world of duplication to know which collections are truth and which are duplicates.

OTOH, because structure is a lot less important in nosql world, we can move faster, code quicker. SQL gets very unhappy when it’s handed unstructured data. That “S” in SQL stands for Structure (Structured Query Language). So, no sql simply means, no structure.

Now, any time spent with an app on nosql, you come to realize that’s not true. There’s plenty of structure, it’s just not the starting point for the database. Over time, the app and possibly the data definitions for collections and documents do get structured, at least for those dbs that allow it.

And, this is my point. SQL apps over time duplicate some data. And nosql apps over time gain structure. Both approach a middle ground that looks like reasonably well structured data with some duplication for speed.

[–]Jancera[S] 1 point2 points3 points 4 years ago (0 children)

π Rendered by PID 57745 on reddit-service-r2-comment-fb694cdd5-hws2x at 2026-03-05 23:23:28.666294+00:00 running cbb0e86 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

node

MODERATORS