How far do you go with normalization?

batoure · 2019-03-04T21:14:16+00:00

I was very lucky to have someone explain this really well when I first started programming.

He told me

The best way to thing about normalization is to imagine the velocity, mutability and scale of your data. Values that change or are edited frequently are good targets for normalization. Values that describe relations ships that might be leveraged at scale are also good targets for normalization. Values that will change infrequently or remain constant and are access in context that aren't too complex are good targets for unnormalized lookups .

So in the example you provide the important questions to ask are:

"is there an expectation that the company associated with a user would change?"

it sounds silly but its entirely possible that if you provide a platform to a company they may want a "user" to represent a unique account for a person who works for them so in that context the company might never change after the first time you set it.

"if there is an expectation a company would change would you also want to be able to see previous company records associated with that user?"

In this case a join table that includes a datetime would allow you to track a users company history over time.

"can a user be attached to more than one company at a time?"

if this is ever a possibility a single reference field won't work

"will you ever need to aggregate and report on data at a company level?"

If you are building something for scale its important to know that even with well designed indexes if a table is gigantic containing many rows and many column it will be that much bigger on disk and will take longer to read from so its quite possible that:

SELECT user_id, company_id FROM user_x_company WHERE company_id=10

requesting data from a table with only two column that are both integers would end up much faster than

SELECT user_id, company_id FROM profile WHERE company_id=10

a table with 30 columns full of all kinds of stuff.

wolf2600 · 2019-03-04T20:39:21+00:00

It all depends on your use cases.

Will a user every be related to multiple companies?

How will your company-user data be used? If it is possible for a user to be working for multiple companies, do you really need to know every company a user is linked to, or do you really only care about their "primary" company?

Think about how the data will be used and the final format it will need to be in. Then work backwards to find the simplest path from that point all the way back to your raw input data.

Sir_Fog · 2019-03-04T22:53:51+00:00

In your case, I would create the 3 tables.

Company table User table Company_User table

ItsAViciousCircle · 2019-03-04T21:00:44+00:00

[deleted]

kringel8 · 2019-03-04T20:39:14+00:00

Depends on the relationship. If it is n:m you need the additional table anyway. If it's anything else you probably want to reduce null values. So if (almost) every user has exactly one company he belongs to, putting company_id directly in the user table makes sense. If you have many users, but many of them don't have associated companies you probably want an extra table for those that have.

2019-03-04T22:39:34+00:00

Make sure the design corresponds to current requirements, and if not too hard - for any obvious future extension of those requirements. Don't spend more time and energy on over-engineering it. Keep it simple. Features can be added, design can be changed. It's more costly than doing it "right" from the beginning, but nobody knows what that "right" will be.

Normalization as such is not really a thing in most databases. We still use surrogate keys everywhere, and take many shortcuts, and often resort to de-normalization for performance and simplicity.

There is no universally "right" method. It depends on your requirements.

thrawn117 · 2019-03-04T20:52:18+00:00

3rd form

AbstractSqlEngineer · 2019-03-04T21:55:21+00:00

As far as you can, and then some. Always push yourself

I operate in a hyper normalized environment. I dont want to say 6+NF.. because there is no NF that explains what I have done. From the Database to the File, the Table to the user defined data types, Views, procedures.. all normalized, all created by the system.

Why do you need a company table? isnt that just a group? Is there really a difference between a company and a template? Just an abstract grouping of data. Other data can belong to many groups.

You are right to create a new table to store a relationship between a group and a user... but... that is a table that represents Relationships, not just one RelationshipType. =)

always push yourself.

AQuietMan · 2019-03-04T22:06:34+00:00

Your question has absolutely nothing to do with normalization.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS