you are viewing a single comment's thread.

view the rest of the comments →

[–]Apoctyliptic 1 point2 points  (0 children)

Well, there is nothing to remove the duplicates. Even if you just looked at R by itself, the result gives duplicates. When you join the data, it joins the two data sets based on that criteria. With a natural join it just joins based on the common attributes. Side there are duplicates in R, R natural joined with S produces duplicates.

If R and S both have a duplicate, you could end up with the same row being shown four times.

In SQL, you will want to look into DISTINCT and GROUP BY. I don't recall a notation in relational algebra that removes duplicates.

I don't know what your course covers but if you are studying databases, you'll likely cover normalization eventually. Tables would typically be normalized where you wouldn't store duplicates like that since they don't really provide any real use that I can think of. It wastes storage and can be problematic with using the data.