Natural Keys versus Surrogate Keys….Here we are again!

Posted by

Well….that blew up! 😀

Last week I published a video (which you can watch below if you haven’t seen it) about the evolution of the ROWID.

One particular demo in the video shows how the ROWID for a given row might change simply by issuing an UPDATE statement. Unsurprisingly, Twitter (as Twitter often does) totally lost its mind 😀. People were saying that every application ever built would be going to hell in a handbasket and that this would be the downfall of modern IT etc etc.

Of course, ever since 9i there have been plenty of mechanisms via which the ROWID for a given row might change as a result of some operation, for example, the FLASHBACK TABLE command, ALTER TABLE SHRINK command, partition table row movement, and in later releases, the ability to shuffle data around in an online fashion, aka ALTER TABLE MOVE ONLINE. Thus for over 20 years now the reality has been that if you are using the ROWID as a mechanism to re-query a previously selected row from a table in the database, you always should have been coding your applications to handle the possibility that the ROWID might point to row that no longer exists, or could even be a different row altogether. Of course people have not been doing this, and thus the ensuing panic when my video showed that the ROWID is not as unchanging as people assumed it to be.

But this post is not about the ROWID it is about database design. The discussion on Twitter turned from ROWID’s to the fact that application development tools such as APEX and Forms offer developers the choice of using ROWIDs or primary keys for database table row access. We’ve always recommended to use primary keys over ROWID’s but we do give the flexibility of choice. So now the topic morphed into what the best key should be when building APEX apps.

image

History has proven time and time again that the moment you commence a discourse on surrogate keys, there will be an equal and opposite reaction from the natural keys camp.

image

I stress – I ‘m not trying to jump down the throats of the Twitter community here. The conversation was civil and people presented their arguments without profanity or insult. This is why I’m blurring the tweet authors – it is not about who made the tweets, but just that ultimately, we were heading down the path of the same arguments on natural versus surrogate keys, and there is plenty of validity in both sides of the discussion

image

At this point you’re probably thinking “Hey Connor, will you get to the damn point of the blog post!” 😀 and that’s fair enough.  Here’s is my point: I don’t think the issue here is natural keys or surrogate keys. When we talk about “Natural versus Surrogate” I think the issue here is one word:

VERSUS.

Why is it always “VERSUS”?

This is where I think we IT practitioners can get lost in the weeds some times. We are all so focused on logical, methodical solutions to problems, that we always want to have one solution. One key to rule them all so to speak.

But I don’t see the need for any such mutual exclusion for how we choose our keys. For example, if I have a GENDER table, I’m totally happy to have a natural key for this, eg

image

In these more enlightened times, I may need to add additional rows such as Non-binary or Omni-gender, but there’s an excellent chance I can achieve this with a simple one or two letter key. That is going to make things easier not just for developers, but also for business users when they come to do ad-hoc query. The GENDER column on their CUSTOMERS table will have a natural easily understood meaning for when users are building predicates and they might only need to join back to the parent table when they truly need to (for descriptive names etc). I see no downsides to this natural key choice.

Conversely, if in this same database, the sales orders I take from customers have a natural key structure of

image

which then yields a child table of items for each order:

image

which then gives me a status history for each item in the order:

image

then you can safely assume that I’m getting increasingly concerned about

  • the size of that key,
  • the number of join conditions I’m going to have to write,
  • the nightmare data maintenance scenario if the “immutable” key suddenly is no longer so due to a business decision or initial design error. Update cascade anyone? 😀

Thus I would definitely going to explore surrogate keys for these table(s).

“Oh my goodness! Natural and surrogate keys in the same schema?!?!? Is he insane???”

Similarly, this might not be isolated to the just the case of keys getting increasingly longer. If I was designing a table of invoices with an intended primary key of the invoice number, then if that real world invoice number is (say): INV-5D3H77A4DF, then I’m reasonably confident that this is going to fine as a natural key. But if that same business had real world invoice numbers of: PERTH-AUG-2022-174 then alarm bells are going off for me, because it looks obvious that historically this business has added meaning to various parts of the invoice number, and once it has meaning, then as sure as night follows day, there will one day be a request of “We need to change this invoice number from A to B everywhere it appears in the database”

It is for this same reason that a month ago at Kscope, when I had to step in and give Chris Saxon’s talk on Normalisation because sadly he could not travel to the event, I augmented his deck with the following slide, scattered at various places throughout the presentation

image

Remember that our goal is to build robust, functional, practical databases to sit beneath robust, functional, practical customer applications. It is not to win some fictional “Annual Coolest Tightest Database Design for 2022” award. If that was the case, we’d all be building databases in 6th normal form and throwing ourselves off cliff tops in frustration when we try to achieve anything practical with them.

Let’s ditch the “VERSUS” and just focus on building great databases that serve our customers well. Feel free to use natural and surrogate keys wherever they fit best based on the data you are modelling, the tools you are using, and the customer needs.

10 comments

  1. Hi Connor,

    While reading your post, I just asked myself what will happen in an autonomous database with some features that are inherently based on ROWID usage.
    One case that comes to my mind is that of materialized views containing joins, where, for making them fast refreshable, it is mandatory that the SELECT list include a ROWID column for each table, in addition to having the materialized view logs containing ROWID, a.s.o.
    Isn’t such a feature going to become completely unreliable when used in an AT ?

    I think that there is an essential difference between the case when you, as a developer or DBA choose to deliberately ENABLE ROW MOVEMENT for a table, and, as such, you become responsible of the features that might be impacted, like, in the MV example, maybe having to perform a complete refresh of an MV after some database operations that might change ROWID values, vs the case when the database is doing this “internally” and “silently”, and might break operations that otherwise are correctly following all the best practices rules, like for example the join MV case, vs using ROWID to locate a row in an APEX application, which is arguably a bad practice.

    Cheers & Best Regards,
    Iudith Mentzel

    1. Yes but in the MV, its on us. Our MV’s and MV LOGS have to handle rowid changes due to the fluidity of the database they are built on.. If they don’t work, its OUR bug. There will still be plenty of features that Oracle provides based on the ROWID, but those features will also take into account the properties of ROWIDs.

      1. Worth pointing out that, if you use a WHERE CURRENT OF in PL/SQL for a cursor specified “FOR UPDATE’, that implicitly rewrites it into an UPDATE… WHERE ROWID = :B1 if you trace the SQL or look it up in v$sql

        declare
        cursor c_rid is
        select colour
        from rid
        for update of colour;
        begin
        for r_rid in c_rid loop
        update rid
        set colour = ‘red’
        where current of c_rid;
        end loop;
        end;

  2. Nice discourse! But … isn’t the existence of surrogate keys (when caused by an impossibility to have a set of columns as ‘natural keys’) an indication that we experienced a failure in applying the relational paradigm correctly during the design phase?

    1. Of course, but we’re generally not paid by customers to produce relational paradigms, we’re paid to produce working product.

      No example I mentioned in the blog post is about not being able to *find* a natural key, it was whether the natural key is going to be *best* for successful development/running/maintenance of the application.

      1. Similar to your GENDER example, if I had a table of POLICY_STATUSES with static entries such as ‘INITIAL’, ‘PENDING’, ‘AWAITING ATTENTION’, ‘CLOSED’, I would rather see an actual English word as my foreign key than some arbitrary surrogate number.

        ‘INIT’ is preferable to 1.
        ‘PENDING’ is preferable to 2.
        ‘AWAITING’ is preferable to 3.
        ‘CLOSED’ is preferable to 4, and so on. (Prob. under 10 chars is good as a FK).

        But if it were a question of the POLICY itself, a surrogate physical PK (seq. no.) may well be preferable to using the human-readable policy no. (or the logical PK of POLICY_CREATE_DATE, PERSON etc.). [Both would ofc. need to have their own UNIQUE constraint].

  3. Just a reminder that Apex now needs upgrading to do away with the use of ROWID, and instead allow for 32 column PKs.

    (Whilst 32 columns is ofc. excessive, 4-5 columns is not. How can we ever use IOTs again in Apex for example, if this isn’t catered for and we can no longer use UROWID ?)

  4. Just a reminder that Apex now needs upgrading to do away with the use of ROWID, and instead allow for 32 column PKs.

    (Whilst 32 columns is ofc. excessive, 4-5 columns is not. How can we ever use IOTs again in Apex for example, if this isn’t catered for and we can no longer use UROWID ?)

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.