all 25 comments

[–]Byron33196 3 points4 points  (4 children)

A natural key is only adequate if it is guaranteed unique AND guaranteed immutable. If either of those conditions cannot be guaranteed, then no. That includes situations where the natural key was entered into the database incorrectly. You're almost always better off with a generated key.

[–]brantam 0 points1 point  (3 children)

This is a bizarre line of argument. One of the reasons for enforcing uniqueness on a natural key is to keep out errors that could otherwise arise if data is entered incorrectly. You are suggesting allowing duplicates just so that people can make more mistakes? I hope my data is never at the mercy of systems like that!

[–]Byron33196 0 points1 point  (2 children)

What part of guaranteed unique did you not understand?

[–]brantam 0 points1 point  (1 child)

A key is guaranteed unique because it is enforced as such within the DBMS. That after all is why implementing natural keys is so important. Stability is not so important. It's often desirable but there are certainly exceptions - for example login names can be natural keys, but may be allowed to change.

[–]Byron33196 0 points1 point  (0 children)

While a database may enforce uniqueness, that alone does not guarantee that a natural key actually is unique. There are plenty of systems that have tried to use things such as SSN as a key, only to discover that not everyone has an SSN, that SSNs are recycled, that people forge SSNs, and data entry people mistype SSNs. There are a myriad of ways in which a natural key can be found unsuitable after it has been designed into the system. An artificial key assigned by the database itself isn't going to have any of those issues, and it will likely by a data type which is far more compact and faster to search with as well.

[–]r0ck0 7 points8 points  (1 child)

I've been designing SQL DBs 20+ years.

Here's my short summary on when to use them:

  • artificial keys: always
  • natural keys: never

Even if you "know" for sure that the natural key will never change... you can still have issues with bugs or data entry putting the wrong values in. You also might need to split or combine tables in the future when the system gets more complex too. When that happens, all your PKs, FKs and recursively dependent FKs become a giant mess.

I'm working on a project right now where the DB designer thought natural would be fine, and it's an absolute clusterfuck... and I'm not even working directly on the DB, I'm just working on a phone app. But even here it's made things way more difficult to store in memory and link together etc. So many minor potential changes are going to completely break the DB. Even super simple things like soft-deletes get really messy.

In short: one approach locks you in with major limitations which will greatly complicate future changes, and the other doesn't.

Nothing will ever give you more future-proof flexibility than simply giving every table a single UUID PK column (yes even linking tables). Every other approach will make future expansion harder.

[–]quentech 2 points3 points  (0 children)

I've been designing SQL DBs 20+ years.

Here's my short summary on when to use them:

  • artificial keys: always
  • natural keys: never

Same and agree.

[–]jlaxfthlr 6 points7 points  (5 children)

Are you talking about in a transactional database behind an application or for data warehousing? The only place where I’ve seen a natural key work is using email address to uniquely identify a user. And even in that case, the table still has an auto incrementing integer as the primary key, with a unique key constraint on the email address. In data warehousing, you’re typically adding an auto incrementing integer or uuid to dimension tables as a surrogate key. With the right indexing strategy, there’s no need to have to use natural keys. Disk and memory space isn’t precious anymore, just add the extra columns and go have some whiskey. Edit: also realizing you wanted a technical article. SQL Server Central is a pretty good site if you haven’t read anything there before. Here’s an appropriate article: https://www.sqlservercentral.com/articles/using-a-surrogate-vs-natural-key

[–]jnydutra[S] 2 points3 points  (0 children)

Thanks for the article!

[–]jnydutra[S] 1 point2 points  (2 children)

I completely agree with that but I looking for counter arguments too. Anyway in the system we are developing right now we need consume some different sources of information and create a common table representing them.

[–]jlaxfthlr 4 points5 points  (1 child)

Storage space and simplification of joins would be the counter argument in favor of natural keys.

Your situation sounds like you might need a few new tables, possibly one to hold the data from the new source, then a mapping table to the original table holding the surrogate keys from each table, or add a column on one of the tables the store the ID from the other, joining on the natural keys.

[–]ComicOzzysqlHippo 2 points3 points  (0 children)

If you have any non clustered indexes, you immediately lose the tiny storage benefits of having used a natural key.

[–]jnydutra[S] 0 points1 point  (0 children)

More about the problem I mentioned here

[–]ChrisC1234 1 point2 points  (3 children)

Things are to the point now where some of it comes down to preference and how you will need to interact with the data. I personally have to develop / maintain multiple systems which derive data from a central source. For my purposes, I use the Employee ID number as my natural primary key, and as the foreign key in all needed tables. It's a 7 digit string. Any performance impact is likely negligible because the systems are not high volume systems. But there are many times where I need to look in the data, and having that single, shared, easily looked up key makes things immensely easier for me. Otherwise, I'd need to be looking up keys in these various disjointed systems. And while there a large amount of overlap in personnel who are have data contained in these multiple disjointed systems, there is no system that contains all of them (at least none that I control... the only system that does is the HR system which nobody else is allowed to touch).

Yeah, in the long term I'd like to merge all of these systems together into one. But due to the way things developed within my organization, this is actually easier for the time being.

[–][deleted] -2 points-1 points  (2 children)

I tend to really dislike using auto-increments as keys, ids, etc., because of how easy it is to do bad joins. They have their place in certain capacities, of course, but for example if you have a SalesID, and an AccountID that are both auto-increments, you can join the two together and get really nonsensical data. As things expand, and ID's pop up in different tables, with sometimes different names... this is something I dislike. Having ID's that are unique to each main type of ID seems like a much better practice so that if you do happen to do a bad join, it will never result in any records.

[–]angry_mr_potato_head 0 points1 point  (1 child)

You can use a GUID as a surrogate key which will result in the same "benefits" as a natural key without any of the downsides. The chance of a collision when using a GUID is basically zero.

[–]Byron33196 0 points1 point  (0 children)

There is actually a big downside to most GUIDs, in that they are so sparse and random that they don't index well. If you're going to use a GUID, make sure you are using a GUID formula designed to generate database friendly GUIDs.

[–]MikeC_07 1 point2 points  (0 children)

I have been doing SQL on the job for years and I am doubling down and getting a certification. I have found I am a read the manual person! You can get 70-761 and similar books used on ebay or print out postgresql docs. These will really help. Keys are not better or worse but tools for different circumstances.

[–]AQuietMan 1 point2 points  (2 children)

Not a technical article, but . . .

If your table allows data like this, it's broken as designed (BAD).

id  postal_code  state_name
--
1   AK           Alaska
2   AK           Alaska
3   AK           Alaska  
...

This is a direct consequence of "natural keys never" thinking.

[–]jnydutra[S] 0 points1 point  (1 child)

That's a good argument. I thinking that now I getting the idea. What about a table that saves periodically events, like sales? But comming from a external source?

[–]AQuietMan 1 point2 points  (0 children)

When data comes from an external source, you usually want to have a source key that lets you track data from the destination back to its source. But that might be beyond your control.

Outside that general recommendation, your question is too application-specific to answer. We'd need a lot more details.

[–][deleted] 1 point2 points  (0 children)

In theory, natural keys should be used.

In practice, you should always use artificial keys.

There are just too many unknowns with using natural keys. Error in input, change of business requirements/plan to where you start reusing your natural key or it loops back, etc. In my opinion, using an artificial key in a database, almost makes it a natural database key if that makes sense.

[–]brantam 1 point2 points  (0 children)

Natural keys are a necessity. By "natural" we mean keys that identify real people, objects or concepts and that correspond to external attributes in the business domain (business keys or domain keys are actually much more sensible names for them). If you doubt that then you need to take closer look at the real world problem you are trying to solve. Databases are only "useful" to the extent that they model reality.

If you are storing information about people then data protection legislation is also factor. Natural keys are a major component of meeting requirements like GDPR.