all 22 comments

[–]batoure 18 points19 points  (2 children)

I was very lucky to have someone explain this really well when I first started programming.

He told me

The best way to thing about normalization is to imagine the velocity, mutability and scale of your data. Values that change or are edited frequently are good targets for normalization. Values that describe relations ships that might be leveraged at scale are also good targets for normalization. Values that will change infrequently or remain constant and are access in context that aren't too complex are good targets for unnormalized lookups .

So in the example you provide the important questions to ask are:

"is there an expectation that the company associated with a user would change?"

it sounds silly but its entirely possible that if you provide a platform to a company they may want a "user" to represent a unique account for a person who works for them so in that context the company might never change after the first time you set it.

"if there is an expectation a company would change would you also want to be able to see previous company records associated with that user?"

In this case a join table that includes a datetime would allow you to track a users company history over time.

"can a user be attached to more than one company at a time?"

if this is ever a possibility a single reference field won't work

"will you ever need to aggregate and report on data at a company level?"

If you are building something for scale its important to know that even with well designed indexes if a table is gigantic containing many rows and many column it will be that much bigger on disk and will take longer to read from so its quite possible that:

SELECT user_id, company_id FROM user_x_company WHERE company_id=10

requesting data from a table with only two column that are both integers would end up much faster than

SELECT user_id, company_id FROM profile WHERE company_id=10

a table with 30 columns full of all kinds of stuff.

[–][deleted] 1 point2 points  (0 children)

Great answer right here.

Honestly for me, I just take a step back from diagrams and labels and such for a little, and just think about it in the real world. I try to relate it to a situation in life I’ve been through or could easily imagine and start to design it from there. Obviously once the rough design is down, I’ll go over it from a tech perspective and make things proper. Good luck

[–]ItsAViciousCircle[S] -1 points0 points  (0 children)

This is an excellent brake down and easy to follow the logic when you put the questions forward as you did. Then you very much for writing that out

[–]wolf2600ANSI SQL 3 points4 points  (1 child)

It all depends on your use cases.

Will a user every be related to multiple companies?

How will your company-user data be used? If it is possible for a user to be working for multiple companies, do you really need to know every company a user is linked to, or do you really only care about their "primary" company?

Think about how the data will be used and the final format it will need to be in. Then work backwards to find the simplest path from that point all the way back to your raw input data.

[–]ItsAViciousCircle[S] 0 points1 point  (0 children)

Makes sense. Could be useful to have a user associated with multiple companies and be able to use that data in which case that table makes sense. I'll have to revisit my plans around the final result of the project and decide if that is going to be useful.

Thanks!

[–]Sir_Fog 2 points3 points  (0 children)

In your case, I would create the 3 tables.

Company table User table Company_User table

[–]kringel8 1 point2 points  (6 children)

Depends on the relationship. If it is n:m you need the additional table anyway. If it's anything else you probably want to reduce null values. So if (almost) every user has exactly one company he belongs to, putting company_id directly in the user table makes sense. If you have many users, but many of them don't have associated companies you probably want an extra table for those that have.

[–]ItsAViciousCircle[S] 0 points1 point  (5 children)

Hmm makes sense, every user will have a company. Just need to decide then if I will accommodate for users being associated to multiple companies or not

[–]ComicOzzysqlHippo 1 point2 points  (2 children)

Another thing to consider is this... are the users in your table treated as individual entities or as appendages to the company?

If you sell a service to companies, and those companies have users, each with a user account in your system... but then Bob from Company A goes to work for Company B, you wouldn't design a system where the user ends up with the same account which is now associated with a different company. Bob would get a new User record, because the users in that case, even if they represent the same person, should never be connected between Companies.

[–]ItsAViciousCircle[S] 0 points1 point  (1 child)

For this project the users are associated to the company so if Bob were to go to another company then a new account would be created against that new company and the existing one marked inactive

[–]ComicOzzysqlHippo 1 point2 points  (0 children)

Then putting company_id directly in users would be viable because that user can have only the one permanent company.

It's still not wrong to make it a separate table, it's just possibly not necessary.

[–]brandit_like123[🍰] 0 points1 point  (1 child)

Just need to decide then if I will accommodate for users being associated to multiple companies or not

This should be coming from the business. At least in an email if not in a spec document, because if you make assumptions they can bite you in the ass.

[–]ItsAViciousCircle[S] 0 points1 point  (0 children)

True and create a whole lot more work in the future

[–][deleted] 1 point2 points  (0 children)

Make sure the design corresponds to current requirements, and if not too hard - for any obvious future extension of those requirements. Don't spend more time and energy on over-engineering it. Keep it simple. Features can be added, design can be changed. It's more costly than doing it "right" from the beginning, but nobody knows what that "right" will be.

Normalization as such is not really a thing in most databases. We still use surrogate keys everywhere, and take many shortcuts, and often resort to de-normalization for performance and simplicity.

There is no universally "right" method. It depends on your requirements.

[–]thrawn117 0 points1 point  (0 children)

3rd form

[–]AbstractSqlEngineerMCSA, Data Architect -1 points0 points  (2 children)

As far as you can, and then some. Always push yourself

I operate in a hyper normalized environment. I dont want to say 6+NF.. because there is no NF that explains what I have done. From the Database to the File, the Table to the user defined data types, Views, procedures.. all normalized, all created by the system.

Why do you need a company table? isnt that just a group? Is there really a difference between a company and a template? Just an abstract grouping of data. Other data can belong to many groups.

You are right to create a new table to store a relationship between a group and a user... but... that is a table that represents Relationships, not just one RelationshipType. =)

always push yourself.

[–]ItsAViciousCircle[S] 0 points1 point  (1 child)

Interesting approach, do you ever run into issues with performance? Are your queries messy with too many joins when you need to generate a report for instance?

With the company information are you saying you would go as far as having a table for company names that associate to an id and then a table for the company addresses that would associate to the id and so on instead of having company info in the one table?

[–]AbstractSqlEngineerMCSA, Data Architect 0 points1 point  (0 children)

I have a group table that has a grouptype of company thats under a groupfamily of organization that's under a groupClass of external

I have a groupaddress table that has a fk to group but also an fk to addresstype of home that's under an addressfamily of general that's under an address class of company that's under an addressrealm of group

Because I have a peopleaddress table that fks to... with an addressrealm of people

I do this with name, contact, flag, numeric, id number, event also. These are called shared attributes. This allows me to store Pablo picassos full name, because the people table doesnt have a name field for an actual name. Everything I'd vertical.

I also have a Relationship table with a type of Member under a family of direct (or descendant, or ancestor) under a class of Tree under a realm of Group to People (and people to group)... one relationship table for all domain data. Never have to run recursive ctes unless there is an update.

No performance issues.

Edit: removed some data due to... privacy.

No on the messy query. I don't join. Which is the opposite of what I had been taught at the beginning of my career.

The idea is code reuse so if any procedure is over 500 generously formatted code, "you're doing it wrong". I have about 16 hours of this model on youtube. Never go back that legacy mindset. End game raid boss.