Age calculations – Just how old ARE you?

Posted by

Any database application that stores information about people will very likely store a person’s date of birth as part of that data. Whether it be for marketing purposes, or part of the security checks for a password reset, or part of the authentication when the inevitable call centre calls you with their “latest, greatest offer”, the date of birth is common.

But when it comes to reporting, typically it is not the date of birth that we present on our screen or hard-copy, but the person’s age. Clearly we cannot store a persons age in the database because it changes every day, month or year depending on the granularity with which you present it. Which leads to the ponderance – Surely it cannot be that hard to calculate someone’s age from their date of birth?

As it turns out, it is not as simple as it looks, and often developers have come up with “close but not close” enough methods to calculate the age, and by “not close enough” I mean, the methods break down as we hit boundary cases.

Lets create some test data, where the columns in the table T below are

  • birth – the date of birth someone provides to our application
  • sys_date – a representation of the standard “sysdate” function which reflects the date at which we run our age calculation
  • true_age – what the correct age is given the “birth” and “sys_date” column values.

SQL> create table t
  2    ( birth      date,
  3      sys_date   date,
  4      true_age   int );

Table created.

SQL>
SQL> insert into t values ( date '2000-05-12',date '2016-03-30',15 );

1 row created.

SQL> insert into t values ( date '2000-05-12',date '2016-05-30',16 );

1 row created.

SQL> insert into t values ( date '2000-05-12',date '2016-05-12',16 );

1 row created.

SQL> insert into t values ( date '2000-05-12',date '2076-06-12',76 );

1 row created.

SQL> insert into t values ( date '2000-02-29',date '2013-03-01',13 );

1 row created.

SQL> insert into t values ( date '2000-02-29',date '2016-02-29',16 );

1 row created.

SQL> insert into t values ( date '2000-02-29',date '2017-02-28',16 );

1 row created.

SQL> insert into t values ( date '2000-02-29',date '2087-02-28',86 );

1 row created.

SQL> insert into t values ( date '2000-01-01',date '2087-12-31',87 );

1 row created.

Using this data, lets look at common yet flawed mechanism for calculating age. This first one I’ve seen many times over the course of my career, namely to do a simple date subtraction. Sometimes people divide by 365, or 366 or even 365.25, but no matter what divisor is chosen, you’ll always hit some errors in the calculation.


SQL> select
  2  x.*,
  3   case when true_age = calculated_age then 'OK' else 'ERROR' end tag
  4  from
  5  (
  6  select
  7    t.*,
  8    trunc((sys_date-birth)/365) calculated_age
  9  from t
 10  ) x;

BIRTH     SYS_DATE    TRUE_AGE CALCULATED_AGE TAG
--------- --------- ---------- -------------- -----
12-MAY-00 30-MAR-16         15             15 OK
12-MAY-00 30-MAY-16         16             16 OK
12-MAY-00 12-MAY-16         16             16 OK
12-MAY-00 12-JUN-76         76             76 OK
29-FEB-00 01-MAR-13         13             13 OK
29-FEB-00 29-FEB-16         16             16 OK
29-FEB-00 28-FEB-17         16             17 ERROR
29-FEB-00 28-FEB-87         86             87 ERROR
01-JAN-00 31-DEC-87         87             88 ERROR

9 rows selected.

Side note: Date subtraction in itself needs care. See my recent post for more details on this.

A better method is to take advantage of the MONTHS_BETWEEN function in the Oracle Database, because unlike the days in a year which is variable, there is always 12 months in a year, so this sounds more promising as a divisor. However, as you can see below, even MONTHS_BETWEEN struggles when someone has a date of birth on a “leap day”. How annoying of them Smile


SQL> select
  2  x.*,
  3   case when true_age = calculated_age then 'OK' else 'ERROR' end tag
  4  from
  5  (
  6  select
  7    t.*,
  8    trunc(months_between(sys_date,birth)/12) calculated_age
  9  from t
 10  ) x;

BIRTH     SYS_DATE    TRUE_AGE CALCULATED_AGE TAG
--------- --------- ---------- -------------- -----
12-MAY-00 30-MAR-16         15             15 OK
12-MAY-00 30-MAY-16         16             16 OK
12-MAY-00 12-MAY-16         16             16 OK
12-MAY-00 12-JUN-76         76             76 OK
29-FEB-00 01-MAR-13         13             13 OK
29-FEB-00 29-FEB-16         16             16 OK
29-FEB-00 28-FEB-17         16             17 ERROR
29-FEB-00 28-FEB-87         86             87 ERROR
01-JAN-00 31-DEC-87         87             87 OK

9 rows selected.

Somewhat counter-intuitively, a reliable formula to derive a persons age is to dispense with the date arithmetic altogether and convert the dates to numbers.


SQL> select
  2  x.*,
  3   case when true_age = calculated_age then 'OK' else 'ERROR' end tag
  4  from
  5  (
  6  select
  7    t.*,
  8    trunc((to_number(to_char(sys_date,'YYYYMMDD'))-
  9          to_number(to_char(birth,'YYYYMMDD')))/10000) calculated_age
 10  from t
 11  ) x;

BIRTH     SYS_DATE    TRUE_AGE CALCULATED_AGE TAG
--------- --------- ---------- -------------- -----
12-MAY-00 30-MAR-16         15             15 OK
12-MAY-00 30-MAY-16         16             16 OK
12-MAY-00 12-MAY-16         16             16 OK
12-MAY-00 12-JUN-76         76             76 OK
29-FEB-00 01-MAR-13         13             13 OK
29-FEB-00 29-FEB-16         16             16 OK
29-FEB-00 28-FEB-17         16             16 OK
29-FEB-00 28-FEB-87         86             86 OK
01-JAN-00 31-DEC-87         87             87 OK

9 rows selected.

You can see how this neat little trick works by looking at the components of the expression


SQL> select
  2    to_number(to_char(sys_date,'YYYYMMDD')) d1,
  3    to_number(to_char(birth,'YYYYMMDD')) d2,
  4    to_number(to_char(sys_date,'YYYYMMDD')) -
  5      to_number(to_char(birth,'YYYYMMDD')) delt
  6  from t;

        D1         D2       DELT
---------- ---------- ----------
  20160330   20000512     159818
  20160530   20000512     160018
  20160512   20000512     160000
  20760612   20000512     760100
  20130301   20000229     130072
  20160229   20000229     160000
  20170228   20000229     169999
  20870228   20000229     869999
  20871231   20000101     871130

9 rows selected.

The conversion to numbers means that elements like Feb 28th and Feb 29th are still distinct as opposed to both being seen as “the last day of Feburary” by internal date functions, and (for example on the second last line), the age calculation stay just below 87 as opposed to crossing over the boundary to 87.

Now that you have a reliable formula, you can easily add the AGE as a column in a view on your table and not have to worry about the internals.

2 comments

    1. We are converting the dates to YYYYxxxx

      So the years are in effect, multiples of 10,000 (because the ‘xxxx’ is capped at 9999). The ‘xxxx’ (ie months/days) becomes the means via which we decide on boundary points, that is, when comparing YYYYxxxx to ZZZZwwww, we only need to care about whether the ‘xxxx’ or greater or less than ‘wwww’ to decide whether a year has “ticked over”. THis way, we dont need to worry about leap years etc etc

      I didnt “invent” this – I saw it many years ago somewhere, so apologies for not doing a proper acknowledgement to whoever first used it.

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.