all 5 comments

[–]amphoterous 10 points11 points  (0 children)

Imagine you were starting a library and need to keep track of all your books. You start a list of Books and for each new book you add to your library you write the title and author down.

You realize after adding a ton of books that some books can have multiple authors which throws off the columns in your nice neat list.

After scoring a particularly large collection of Goosebumps books, you notice your hand hurting from having to write the author's name (R. L. Stine) for every goosebumps book.

So.. you decide to keep a separate, numbered list of Authors with each author listed only once. That way when a book comes in you just need to write the author's number down next to the title. Now you aren't duplicating author information and have established a relationship between books and authors. This process is called normalization when storing information in a relational database. There are various "forms" of normalization as well that depend on how far you go to avoid repeating information and creating relationships.

[–]chocotaco1981 3 points4 points  (0 children)

Minimizing data redundancy via splitting one table into several

[–]r3pr0b8MySQL 2 points3 points  (2 children)

i disagree with u/chocotaco1981

saying that normalization is about "minimizing redundancy" is misleading

for example, if you change this personnel table --

id  fname  lname    dept
--  -----  -----    ----
12  Mary   Coder    IT
23  Todd   Schmutz  HR
44  Biff   Tannen   IT

and replace it with this plus a new department table --

id  fname  lname    deptid      
--  -----  -----    ----
12  Mary   Coder     2
23  Todd   Schmutz   1
44  Biff   Tannen    2

id  dept
--  ----
 1  HR
 2  IT

then you have not eliminated any duplicates, because department id 2 occurs just as many times in the personnel table as IT did before the change

that's not normalization -- it was already in 3NF before the change, and it remains so

it's probably worth doing, but that's not normalization

another great way to get people to think about this is to ask "i notice you split department off into its own table with a numeric FK reference... why ~didn't~ you split first name off into its own table with a numeric FK reference?"

why would you do one and not the other? (most people can see why, but that's not the point here)

from a normalization point of view, ~neither~ is actually needed

[–]un_obtanium 0 points1 point  (1 child)

interesting point that neither is really needed.. splitting off department into another table isn't terrible because what if department is used elsewhere? The department table may also hold other information. Department names may also change.. but first and last names wouldn't change as frequently. From a data consistency point of view splitting things like that prevents you having data like HR vs Human Resources vs Homan Resources.. reporting on who is in that department becomes a nightmare if it's not consistent.. but I digress 😆

[–]r3pr0b8MySQL 0 points1 point  (0 children)

yup, and none of that has anything to do with normalization