Often, to do an existence check for data, people issue a “SELECT COUNT(…)” against the relevant table. So in pseudo-code, the logic looks like something like this:
select count(*)
into my_variable
from MY_TABLE
where COL =
if my_variable > 0 then ...
For unique key lookups, then that’s fine, but it’s a risky strategy as a general premise, because counting all of the records can be costly, especially if you’re only interested in the first one. You are coding always hoping that some index lookup is going to be possible. And in strict terms, you actually are not meeting the requirement. Someone asked you to check for existence, NOT to get an exact count of how many matches there were. You are doing more work than you need to.
So here are some potentially better alternatives:
Option 1: stop after the first row
This is a quick and easy win, because you are simply adding a predicate to the existing code.
select count(*)
into my_variable
from MY_TABLE
where COL =
and rownum = 1
if my_variable > 0 then ...
Option 2: use an existence check
This is often referred to as making the SQL “self-documenting”, because the code is describing the kind of check you were doing, that is, you were interested in existence, not the count
select count(*)
into my_variable
from dual
where exists (
select *
from MY_TABLE
where COL = )
if my_variable > 0 then ..
The big caveat…
Perhaps the best way to check if something exists is sometimes just not to check at all ! Because lets face it, if your code looks something like:
<check for existence>
if <not exists> then
insert into …
then you really haven’t achieved much. The check for existence does nothing to ensure that by the time you execute the INSERT, the row doesn’t now exist, so your code still has to handle the case where you tried to insert a row and found it already there.
And if you have to do that…then did you really get any benefit in running the <check for existence> code in the first place ?
Your option 1 will not stop that count at first row found in the source table, only limit the resultset to one row (which it would yield anyway).
As per documentation ‘For each row returned by a query, the ROWNUM pseudocolumn returns a number indicating the order in which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.’
Try this 🙂
SQL> create table T ( x int, y int );
Table created.
SQL> insert /*+ APPEND */ into T
2 select rownum, decode(rownum,5000000,1,rownum)
3 from
4 ( Select 1 from dual connect by level <= 1000 ),
5 ( Select 1 from dual connect by level <= 5000 );
5000000 rows created.
SQL> commit;
Commit complete.
SQL> set autotrace on stat
SQL> select count(*) from t where y = 1;
COUNT(*)
———-
2
Statistics
———————————————————-
1 recursive calls
0 db block gets
11467 consistent gets
11462 physical reads
0 redo size
542 bytes sent via SQL*Net to client
551 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select count(*) from t where y = 1 and rownum = 1;
COUNT(*)
———-
1
Statistics
———————————————————-
1 recursive calls
0 db block gets
4 consistent gets
13 physical reads
0 redo size
542 bytes sent via SQL*Net to client
551 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Hi,
Very good explanation and a good point to note too.
Is that insert statement correct? I
SQL> insert /*+ APPEND */ into T
2 select rownum, decode(rownum,5000000,1,rownum)
3 from
4 ( Select 1 from dual connect by level ed
Wrote file afiedt.buf
1 insert /*+ APPEND */ into T
2 select rownum, decode(rownum,5000000,1,rownum)
3 from
4 ( Select 1 from dual connect by level <= 1000 ),
5* ( Select 1 from dual connect by level /
5000000 rows created.
SQL> commit;
Commit complete.
Thanks again!!
Thanks for that – output got mangled when I posted it. Should be fixed now.
I tested several simliar examples too, but in the context of exception handling: http://orasql.org/2012/05/18/about-the-performance-of-exception-handling/
True in moderation, but in case of high concurrency, failed inserts on duplicate key can lead to deadlocks.
the following query is the best (provoided statistics are not stale)
select num_rows from user_tables where table_name = ‘DEPT’;
it is a good idea to check when the table has been last analyzed:
select LAST_ANALYZED from user_TAB_STATISTICS where table_name = ‘DEPT’ and owner = USER;
if we know that the table is reasonably stable and nobody has truncated it etc. then num_rows will work nicely