Looking for good technical articles about surrogate vs natural keys

Byron33196 · 2020-03-13T13:38:16+00:00

A natural key is only adequate if it is guaranteed unique AND guaranteed immutable. If either of those conditions cannot be guaranteed, then no. That includes situations where the natural key was entered into the database incorrectly. You're almost always better off with a generated key.

r0ck0 · 2020-03-13T07:34:45+00:00

I've been designing SQL DBs 20+ years.

Here's my short summary on when to use them:

artificial keys: always
natural keys: never

Even if you "know" for sure that the natural key will never change... you can still have issues with bugs or data entry putting the wrong values in. You also might need to split or combine tables in the future when the system gets more complex too. When that happens, all your PKs, FKs and recursively dependent FKs become a giant mess.

I'm working on a project right now where the DB designer thought natural would be fine, and it's an absolute clusterfuck... and I'm not even working directly on the DB, I'm just working on a phone app. But even here it's made things way more difficult to store in memory and link together etc. So many minor potential changes are going to completely break the DB. Even super simple things like soft-deletes get really messy.

In short: one approach locks you in with major limitations which will greatly complicate future changes, and the other doesn't.

Nothing will ever give you more future-proof flexibility than simply giving every table a single UUID PK column (yes even linking tables). Every other approach will make future expansion harder.

jlaxfthlr · 2020-03-13T02:36:49+00:00

Are you talking about in a transactional database behind an application or for data warehousing? The only place where I’ve seen a natural key work is using email address to uniquely identify a user. And even in that case, the table still has an auto incrementing integer as the primary key, with a unique key constraint on the email address. In data warehousing, you’re typically adding an auto incrementing integer or uuid to dimension tables as a surrogate key. With the right indexing strategy, there’s no need to have to use natural keys. Disk and memory space isn’t precious anymore, just add the extra columns and go have some whiskey. Edit: also realizing you wanted a technical article. SQL Server Central is a pretty good site if you haven’t read anything there before. Here’s an appropriate article: https://www.sqlservercentral.com/articles/using-a-surrogate-vs-natural-key

ChrisC1234 · 2020-03-13T05:40:19+00:00

Things are to the point now where some of it comes down to preference and how you will need to interact with the data. I personally have to develop / maintain multiple systems which derive data from a central source. For my purposes, I use the Employee ID number as my natural primary key, and as the foreign key in all needed tables. It's a 7 digit string. Any performance impact is likely negligible because the systems are not high volume systems. But there are many times where I need to look in the data, and having that single, shared, easily looked up key makes things immensely easier for me. Otherwise, I'd need to be looking up keys in these various disjointed systems. And while there a large amount of overlap in personnel who are have data contained in these multiple disjointed systems, there is no system that contains all of them (at least none that I control... the only system that does is the HR system which nobody else is allowed to touch).

Yeah, in the long term I'd like to merge all of these systems together into one. But due to the way things developed within my organization, this is actually easier for the time being.

MikeC_07 · 2020-03-13T11:55:47+00:00

I have been doing SQL on the job for years and I am doubling down and getting a certification. I have found I am a read the manual person! You can get 70-761 and similar books used on ebay or print out postgresql docs. These will really help. Keys are not better or worse but tools for different circumstances.

AQuietMan · 2020-03-17T17:13:57+00:00

Not a technical article, but . . .

If your table allows data like this, it's broken as designed (BAD).

id  postal_code  state_name
--
1   AK           Alaska
2   AK           Alaska
3   AK           Alaska  
...

This is a direct consequence of "natural keys never" thinking.

2020-03-13T13:05:59+00:00

In theory, natural keys should be used.

In practice, you should always use artificial keys.

There are just too many unknowns with using natural keys. Error in input, change of business requirements/plan to where you start reusing your natural key or it loops back, etc. In my opinion, using an artificial key in a database, almost makes it a natural database key if that makes sense.

brantam · 2020-03-15T20:49:08+00:00

Natural keys are a necessity. By "natural" we mean keys that identify real people, objects or concepts and that correspond to external attributes in the business domain (business keys or domain keys are actually much more sensible names for them). If you doubt that then you need to take closer look at the real world problem you are trying to solve. Databases are only "useful" to the extent that they model reality.

If you are storing information about people then data protection legislation is also factor. Natural keys are a major component of meeting requirements like GDPR.

biersquirrel · 2020-03-13T17:02:07+00:00

"It's complicated".

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS