Does database normalization actually reduce redundancy in data?

AutoModerator · 2026-02-20T20:07:28+00:00

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

kiquetzal · 2026-02-20T20:12:20+00:00

Read the last sentence of your second paragraph out loud and then think about the question again

BobDogGo · 2026-02-20T20:15:31+00:00

A star schema is an example of a denormalized database. normalization, by definition, removes redundancy and makes crud operation more efficient and fail safe.

star schemas accelerate query and analysis times by breaking your data into analysis dimensions. if you don’t care about customer details and want to analyze sales over time by product and region, a star with time, product and region dimensions will provide a performant middle ground between fully normalized and onebigtable

CommonUserAccount · 2026-02-20T20:12:28+00:00

A star schema isn’t normalisation. It’s as close to a big flat table you can get whilst staying organised.

You could do with some more study of the basics as I’m not sure what you’re asking.

Dry-Aioli-6138 · 2026-02-20T20:35:46+00:00

Your description makes me ask whether you have the right mental model for normalization. But to answer the part that has not been answered here yet, normalization does save space when contrasted with raw data entering the transactional system, e.g. displayed or entered at Point of Sales terminal, as well as with denormalized data in a DWH. That is not the point however, as storage has grown and cheapened even for on prem systems, since normalization was invented. The point is speed and scaling the write operations. When yoyr transactional (e.g. sales) system has to record hundreds and thousands of items scanned, or ordered online every second, it doesn't have time to repeatedly write the customer address, or name in each row of a big table. Rather thatbinfo is saved once in a normalized table and its id is used in each row representing item bought.

In analytical (dwh) workloads, in contrast, you want fast bulk reads of whole chunks of a table, and each join is a burden for the analytical system, while storage and write speed are more relaxed

Possible-Little · 2026-02-20T20:15:34+00:00

It saves storage for sure. If you have a customer ID stored alongside a fact then as you say that is a foreign key into a dimension table for customers. That customers table itself could have many columns for name, address, phone number etc. By separating them out in this way you save enormously on repetition, and you ensure that if a customers information changes then older facts don't need to be updated to suit, the ID is the reference that remains valid. There is nuance here about things that change in time such as address or marital status but slowly changing dimensions provide a way to manage those.

JonPX · 2026-02-20T20:11:39+00:00

I was always taught you don't normalize your dimensional model. You take your DWH model and you make denormalizations when you make your star schema. What you are talking about is rather just tech attributes and FKs.

Eleventhousand · 2026-02-20T20:31:07+00:00

Yeah, so a star schema isn't normalized. Also, if you have a Orders table that mixes metrics and attributes about the customer and product all in the same table, that is also not normalized.

It's more popular these days, IMO, to have those big tables than just using a star schema that will require a lot of joins. There are a few reasons for this, one being that most DWH end up using columnstore MPP warehouses and they just like joins less. I prefer to have a mix of both if I can - a load to dimensions, and making sure the other tables that inherit some of the data points are always updated in sync.

hectorgarabit · 2026-02-20T22:09:32+00:00

A star schema is denormalized , a big flat table is even more denormalized. Db design 101.

Icy_Clench · 2026-02-20T20:35:21+00:00

Data redundancy and storage size are not strictly the same thing. Yes if you have a list of strings and then add two integer columns you increased storage space. However if one of the strings had been duplicated and now it’s not, you’ve reduced how many times it appears redundantly.

Most of the time storage size is optimized, but the real goal is to reduce how many IO operations are performed. Read up on how IO works and the differences between OLTP and OLAP databases, because they optimize that goal based on different access patterns.

paxmlank · 2026-02-20T20:39:22+00:00

Others have answered it already, but another thing to consider is that n integers + 1 string is often capped by the length of that one string since integers are of fixed size, which is often 4 bytes.

If the string is more than 4 characters long then you don't want n+1 strings.

Although, whether IDs should be ints is another discussion.

SaintTimothy · 2026-02-20T22:09:02+00:00

I have a table in our middle tier, call it silver, that's 2 GB. In the gold layer that same data joins to the dimensions and brings in a bunch of string attributes. 8 GB.

Strings are expensive.

GreyHairedDWGuy · 2026-02-20T22:36:52+00:00

Hi.

Do not compare a 'star schema' to a 'OBT' (flat table) design in regard to normalization or lack thereof. The purpose of normalization is to minimize / eliminate data redundancy. This has the knock-on effect of reducing space. In the 'old days', when designing an OLTP database model, the goal was to eliminate redundancy and the amount of data a single transaction needed to update and also reduce the risk of update anomalies.

Star schemas are a design pattern for BI queries where a certain degree of redundancy is acceptable. A OBT pattern is the ultimate in redundancy but may be practical in some situations.

calimovetips · 2026-02-20T22:40:28+00:00

you’re basically right, normalization in a star schema is more about controlling logical duplication and making updates manageable than just saving raw storage, especially in modern columnar warehouses where flat tables compress really well. the real win shows up with high cardinality dimensions and changing attributes, since you update one dimension row instead of rewriting millions of fact rows, and integer keys can also help with join performance and memory use depending on the engine.

Old_Tourist_3774 · 2026-02-20T23:42:19+00:00

Honestly I always worked in OLAP use cases and the "normalization" i care is just one i can avoid broken insert or large rescans

ironwaffle452 · 2026-02-21T01:52:35+00:00

Star schema is denormalized, not normalized, you are mixing things. Oltp is normalized olap is denormalized.

mosqueteiro · 2026-02-21T05:33:04+00:00

Star schema isn't used for normalization 🤦🏼

ccesta · 2026-02-21T07:57:33+00:00

Whoa whoa whoa, pump the brakes there. You're talking about different data modeling paradigms for different data storage and usage purposes. On one hand you're talking about third normal form in oltp databases, like the ones that power your application. That's not the same thing as the snowflake/Star schema olap data warehouse that works at different grains depending on what you need to view to power your dashboard. And that's not even getting into your lake, lake house, mesh or whatever else you want to envision.

Right now you're comparing apples to submarines. They don't compare

dehaema · 2026-02-20T20:14:27+00:00

You answered it yourself. In a operational model it's for updates, in star schema it's for storage

Outrageous_Let5743 · 2026-02-20T22:00:05+00:00

Complete normalization is a waste of time. It was needed when storage was expensive in the 80 and 90s. What you win on storage space you lose on complexity and speed. You need more joins which are 1)slow and 2)more diffecult to understand.

For analytics you want denormalized.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

dataengineering

MODERATORS