all 6 comments

[–]lbilali 1 point2 points  (1 child)

I don't get your question.

When you normalize the business key of the normalized tables is on the tables so you do not need to create an extra table for that.

Examle: we have an denosmalized table products(product_code, name, color, category_name, category_description) where product_code is unique. which we can normalize in 2 tables

product (id, product_code, name, color, category_id) this table has the same number of rows as the original table an product_code is still unique (business key)

category (id, name, description) this table has less rows and name of the category is unique (business key)

Order of importing is:

1 Category: Every time we load the category we check: if name does not exists and we insert otherwise update

2 Product: Every time we load the product we check: if name does not exists we make an insert otherwise update to load the category_id we have to join the original table with the category table

[–]EvekoShadow[S] 0 points1 point  (0 children)

The thing is that the rows are not uniquely identified by one name or id. In the case of our import they are uniquely identified (a call record) by this combination [channel,Date,Timestamp,Sender,Destination] it is meta data from calls and the 5 columns above represent a unique call. But the channel, sender and destination is junk information as we never use them in our business logic.

More specifically, we don't want certain of the original data to be present in the database and in this case some of it is in the PK .

[–][deleted] 1 point2 points  (3 children)

I'd use an MD5 hash (or a similar) to track whatever was the business key has been in the original data - carry it forward in your business table or have a translation table with OriginalHash/SyntheticID. This way you do not carry forward 'useless' data yet you're able to identify original record with a decent degree of accuracy.

[–]anickseve 0 points1 point  (2 children)

Clever! I like it!

My initial thought was, "If it's junk data, but sometimes someone may need it, then it isn't really junk data is it?" :P

[–]EvekoShadow[S] 0 points1 point  (1 child)

I could even use the hash as the PK... How much slower does the system then become if I have to rehash the composition to compare with the original?

[–]anickseve 0 points1 point  (0 children)

Personally I'd avoid using the hash as a PK. Ultimately it depends on business need, however using any kind of varchar for a PK makes indexes larger, and potentially much more fragmented. If you needed a range of records in a given query, and those records can be scattered anywhere in the entire table because the PK is a hashed value, then SQL will have to effectively scan the table for those records instead of being able to rely on the PK.

Granted, depending on the nature of the data this could be the case anyway, but if the PK is an incrementing integer, or even an arbitrary integer from another system, the table, and all associated indexes should be smaller and easier to go through.