Any database application that stores information about people will very likely store a person’s date of birth as part of that data. Whether it be for marketing purposes, or part of the security checks for a password reset, or part of the authentication when the inevitable call centre calls you with their “latest, greatest offer”, the date of birth is common.
But when it comes to reporting, typically it is not the date of birth that we present on our screen or hard-copy, but the person’s age. Clearly we cannot store a persons age in the database because it changes every day, month or year depending on the granularity with which you present it. Which leads to the ponderance – Surely it cannot be that hard to calculate someone’s age from their date of birth?
As it turns out, it is not as simple as it looks, and often developers have come up with “close but not close” enough methods to calculate the age, and by “not close enough” I mean, the methods break down as we hit boundary cases.
Lets create some test data, where the columns in the table T below are
- birth – the date of birth someone provides to our application
- sys_date – a representation of the standard “sysdate” function which reflects the date at which we run our age calculation
- true_age – what the correct age is given the “birth” and “sys_date” column values.
SQL> create table t
2 ( birth date,
3 sys_date date,
4 true_age int );
Table created.
SQL>
SQL> insert into t values ( date '2000-05-12',date '2016-03-30',15 );
1 row created.
SQL> insert into t values ( date '2000-05-12',date '2016-05-30',16 );
1 row created.
SQL> insert into t values ( date '2000-05-12',date '2016-05-12',16 );
1 row created.
SQL> insert into t values ( date '2000-05-12',date '2076-06-12',76 );
1 row created.
SQL> insert into t values ( date '2000-02-29',date '2013-03-01',13 );
1 row created.
SQL> insert into t values ( date '2000-02-29',date '2016-02-29',16 );
1 row created.
SQL> insert into t values ( date '2000-02-29',date '2017-02-28',16 );
1 row created.
SQL> insert into t values ( date '2000-02-29',date '2087-02-28',86 );
1 row created.
SQL> insert into t values ( date '2000-01-01',date '2087-12-31',87 );
1 row created.
Using this data, lets look at common yet flawed mechanism for calculating age. This first one I’ve seen many times over the course of my career, namely to do a simple date subtraction. Sometimes people divide by 365, or 366 or even 365.25, but no matter what divisor is chosen, you’ll always hit some errors in the calculation.
SQL> select
2 x.*,
3 case when true_age = calculated_age then 'OK' else 'ERROR' end tag
4 from
5 (
6 select
7 t.*,
8 trunc((sys_date-birth)/365) calculated_age
9 from t
10 ) x;
BIRTH SYS_DATE TRUE_AGE CALCULATED_AGE TAG
--------- --------- ---------- -------------- -----
12-MAY-00 30-MAR-16 15 15 OK
12-MAY-00 30-MAY-16 16 16 OK
12-MAY-00 12-MAY-16 16 16 OK
12-MAY-00 12-JUN-76 76 76 OK
29-FEB-00 01-MAR-13 13 13 OK
29-FEB-00 29-FEB-16 16 16 OK
29-FEB-00 28-FEB-17 16 17 ERROR
29-FEB-00 28-FEB-87 86 87 ERROR
01-JAN-00 31-DEC-87 87 88 ERROR
9 rows selected.
Side note: Date subtraction in itself needs care. See my recent post for more details on this.
A better method is to take advantage of the MONTHS_BETWEEN function in the Oracle Database, because unlike the days in a year which is variable, there is always 12 months in a year, so this sounds more promising as a divisor. However, as you can see below, even MONTHS_BETWEEN struggles when someone has a date of birth on a “leap day”. How annoying of them
SQL> select
2 x.*,
3 case when true_age = calculated_age then 'OK' else 'ERROR' end tag
4 from
5 (
6 select
7 t.*,
8 trunc(months_between(sys_date,birth)/12) calculated_age
9 from t
10 ) x;
BIRTH SYS_DATE TRUE_AGE CALCULATED_AGE TAG
--------- --------- ---------- -------------- -----
12-MAY-00 30-MAR-16 15 15 OK
12-MAY-00 30-MAY-16 16 16 OK
12-MAY-00 12-MAY-16 16 16 OK
12-MAY-00 12-JUN-76 76 76 OK
29-FEB-00 01-MAR-13 13 13 OK
29-FEB-00 29-FEB-16 16 16 OK
29-FEB-00 28-FEB-17 16 17 ERROR
29-FEB-00 28-FEB-87 86 87 ERROR
01-JAN-00 31-DEC-87 87 87 OK
9 rows selected.
Somewhat counter-intuitively, a reliable formula to derive a persons age is to dispense with the date arithmetic altogether and convert the dates to numbers.
SQL> select
2 x.*,
3 case when true_age = calculated_age then 'OK' else 'ERROR' end tag
4 from
5 (
6 select
7 t.*,
8 trunc((to_number(to_char(sys_date,'YYYYMMDD'))-
9 to_number(to_char(birth,'YYYYMMDD')))/10000) calculated_age
10 from t
11 ) x;
BIRTH SYS_DATE TRUE_AGE CALCULATED_AGE TAG
--------- --------- ---------- -------------- -----
12-MAY-00 30-MAR-16 15 15 OK
12-MAY-00 30-MAY-16 16 16 OK
12-MAY-00 12-MAY-16 16 16 OK
12-MAY-00 12-JUN-76 76 76 OK
29-FEB-00 01-MAR-13 13 13 OK
29-FEB-00 29-FEB-16 16 16 OK
29-FEB-00 28-FEB-17 16 16 OK
29-FEB-00 28-FEB-87 86 86 OK
01-JAN-00 31-DEC-87 87 87 OK
9 rows selected.
You can see how this neat little trick works by looking at the components of the expression
SQL> select
2 to_number(to_char(sys_date,'YYYYMMDD')) d1,
3 to_number(to_char(birth,'YYYYMMDD')) d2,
4 to_number(to_char(sys_date,'YYYYMMDD')) -
5 to_number(to_char(birth,'YYYYMMDD')) delt
6 from t;
D1 D2 DELT
---------- ---------- ----------
20160330 20000512 159818
20160530 20000512 160018
20160512 20000512 160000
20760612 20000512 760100
20130301 20000229 130072
20160229 20000229 160000
20170228 20000229 169999
20870228 20000229 869999
20871231 20000101 871130
9 rows selected.
The conversion to numbers means that elements like Feb 28th and Feb 29th are still distinct as opposed to both being seen as “the last day of Feburary” by internal date functions, and (for example on the second last line), the age calculation stay just below 87 as opposed to crossing over the boundary to 87.
Now that you have a reliable formula, you can easily add the AGE as a column in a view on your table and not have to worry about the internals.
Why you are using 10000 to findout the age?? May I know the logic
We are converting the dates to YYYYxxxx
So the years are in effect, multiples of 10,000 (because the ‘xxxx’ is capped at 9999). The ‘xxxx’ (ie months/days) becomes the means via which we decide on boundary points, that is, when comparing YYYYxxxx to ZZZZwwww, we only need to care about whether the ‘xxxx’ or greater or less than ‘wwww’ to decide whether a year has “ticked over”. THis way, we dont need to worry about leap years etc etc
I didnt “invent” this – I saw it many years ago somewhere, so apologies for not doing a proper acknowledgement to whoever first used it.