Existence checks

Posted by

Often, to do an existence check for data, people issue a “SELECT COUNT(…)” against the relevant table. So in pseudo-code, the logic looks like something like this:



select count(*)
into   my_variable
from   MY_TABLE
where  COL = 

if my_variable > 0 then ...


For unique key lookups, then that’s fine, but it’s a risky strategy as a general premise, because counting all of the records can be costly, especially if you’re only interested in the first one. You are coding always hoping that some index lookup is going to be possible. And in strict terms, you actually are not meeting the requirement. Someone asked you to check for existence, NOT to get an exact count of how many matches there were. You are doing more work than you need to.

So here are some potentially better alternatives:

Option 1: stop after the first row

This is a quick and easy win, because you are simply adding a predicate to the existing code.



select count(*)
into   my_variable
from   MY_TABLE
where  COL = 
and    rownum = 1

if my_variable > 0 then ...


Option 2: use an existence check

This is often referred to as making the SQL “self-documenting”, because the code is describing the kind of check you were doing, that is, you were interested in existence, not the count


select count(*)
into my_variable
from dual
where exists (
    select *
    from MY_TABLE
    where COL = )

if my_variable > 0 then ..


The big caveat…

Perhaps the best way to check if something exists is sometimes just not to check at all ! Because lets face it, if your code looks something like:

<check for existence>

if <not exists> then
insert into …

then you really haven’t achieved much. The check for existence does nothing to ensure that by the time you execute the INSERT, the row doesn’t now exist, so your code still has to handle the case where you tried to insert a row and found it already there.

And if you have to do that…then did you really get any benefit in running the <check for existence> code in the first place ?

8 comments

  1. Your option 1 will not stop that count at first row found in the source table, only limit the resultset to one row (which it would yield anyway).

    As per documentation ‘For each row returned by a query, the ROWNUM pseudocolumn returns a number indicating the order in which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.’

  2. Try this 🙂

    SQL> create table T ( x int, y int );

    Table created.

    SQL> insert /*+ APPEND */ into T
    2 select rownum, decode(rownum,5000000,1,rownum)
    3 from
    4 ( Select 1 from dual connect by level <= 1000 ),
    5 ( Select 1 from dual connect by level <= 5000 );

    5000000 rows created.

    SQL> commit;

    Commit complete.

    SQL> set autotrace on stat
    SQL> select count(*) from t where y = 1;

    COUNT(*)
    ———-
    2

    Statistics
    ———————————————————-
    1 recursive calls
    0 db block gets
    11467 consistent gets
    11462 physical reads
    0 redo size
    542 bytes sent via SQL*Net to client
    551 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    1 rows processed

    SQL> select count(*) from t where y = 1 and rownum = 1;

    COUNT(*)
    ———-
    1

    Statistics
    ———————————————————-
    1 recursive calls
    0 db block gets
    4 consistent gets
    13 physical reads
    0 redo size
    542 bytes sent via SQL*Net to client
    551 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    1 rows processed

    1. Hi,

      Very good explanation and a good point to note too.
      Is that insert statement correct? I

      SQL> insert /*+ APPEND */ into T
      2 select rownum, decode(rownum,5000000,1,rownum)
      3 from
      4 ( Select 1 from dual connect by level ed
      Wrote file afiedt.buf

      1 insert /*+ APPEND */ into T
      2 select rownum, decode(rownum,5000000,1,rownum)
      3 from
      4 ( Select 1 from dual connect by level <= 1000 ),
      5* ( Select 1 from dual connect by level /

      5000000 rows created.

      SQL> commit;

      Commit complete.

      Thanks again!!

  3. the following query is the best (provoided statistics are not stale)

    select num_rows from user_tables where table_name = ‘DEPT’;

  4. it is a good idea to check when the table has been last analyzed:

    select LAST_ANALYZED from user_TAB_STATISTICS where table_name = ‘DEPT’ and owner = USER;

    if we know that the table is reasonably stable and nobody has truncated it etc. then num_rows will work nicely

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.