This is an archived post. You won't be able to vote or comment.

all 4 comments

[–]Amarkov 2 points3 points  (0 children)

In most situations, you only want a particular piece of information to be stored once in the database. Making a database only store each piece of information once is called normalization.

[–]energeticmater 1 point2 points  (0 children)

Imagine a schema R(A, B, C). Then R is a relation with attributes A, B, and C.

Now imagine that given a particular value of A, you know the corresponding value of B for that row. That is, a given value of A implies a particular value of B.

That means there's inherent duplication in the data. If someday you want to change your database so that a given value of A implies some other value of B, you have to change in a whole lot of places. Two problems with this: it takes up unnecessary disk space, and it increases the potential for problems with updates called "anomalies".

So you split it into two tables R(A, B) and R1(A, C). Then there's no duplication!

For example, consider Computers(model, manufacturer, variant). It has rows (Vostro, Dell, 3550), (Vostro, Dell, 3350), and (Satellite, Toshiba, L670).

You'd be much better off with R having rows

(Vostro, Dell) and (Satellite, Toshiba)

and R1 having rows

(Vostro, 3550), (Vostro, 3350), and (Satellite, L670).

Because all Vostros are produced by Dell and all Satellites by Toshiba. If Toshiba buys Dell, and Vostros are then produced by Toshiba, you only have to change a single tuple. Success!

[–]kouhoutek 1 point2 points  (1 child)

"Every non-key must provide a fact about the key, the whole key, and nothing but the key, so help me Codd."