Never rely on an assumed order

We’ve (hopefully) all had it drummed into us enough times – you cannot assume the order of results from queries unless you explicitly include an ORDER BY statement.

Here’s another trivial example of this- I was doing a little demo script for an AskTom question.

Here’s the script running in 11.2.0.4


SQL> create table T
  2  as
  3  select
  4    rownum c1,
  5    mod(rownum,10) c2,
  6    trunc(rownum/1000) c3,
  7    mod(rownum,100) c4,
  8    trunc(rownum/100) c5,
  9    mod(rownum,1000) c6,
 10    trunc(rownum/10) c7
 11  from dual
 12  connect by level <= 100;

Table created.

SQL>
SQL> begin
  2  for i in 1 .. 7 loop
  3    execute immediate 'create bitmap index IX'||i||' on T ( c'||i||')';
  4  end loop;
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> select
  2    index_name,
  3    leaf_blocks,
  4    avg_leaf_blocks_per_key,
  5    avg_data_blocks_per_key
  6  from user_indexes
  7  where table_name = 'T';

INDEX_NAME                     LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ----------- ----------------------- -----------------------
IX1                                      1                       1                       1
IX2                                      1                       1                       1
IX3                                      1                       1                       1
IX4                                      1                       1                       1
IX5                                      1                       1                       1
IX6                                      1                       1                       1
IX7                                      1                       1                       1

And here’s the same script running in 12.1.0.2


SQL> create table T
  2  as
  3  select
  4    rownum c1,
  5    mod(rownum,10) c2,
  6    trunc(rownum/1000) c3,
  7    mod(rownum,100) c4,
  8    trunc(rownum/100) c5,
  9    mod(rownum,1000) c6,
 10    trunc(rownum/10) c7
 11  from dual
 12  connect by level <= 100;

Table created.

SQL>
SQL> begin
  2  for i in 1 .. 7 loop
  3    execute immediate 'create bitmap index IX'||i||' on T ( c'||i||')';
  4  end loop;
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> select
  2    index_name,
  3    leaf_blocks,
  4    avg_leaf_blocks_per_key,
  5    avg_data_blocks_per_key
  6  from user_indexes
  7  where table_name = 'T';

INDEX_NAME                     LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ----------- ----------------------- -----------------------
IX7                                      1                       1                       1
IX6                                      1                       1                       1
IX5                                      1                       1                       1
IX4                                      1                       1                       1
IX3                                      1                       1                       1
IX2                                      1                       1                       1
IX1                                      1                       1                       1

A little known ORDER BY extension

Within a CONNECT BY statement, you can order siblings, that is, under a particular branch of the hierarchy, the child entries can be ordered, but you still preserve the hierarchy.


SQL> select lpad('*', level, '*' ) || ename ename
  2  from emp
  3  start with mgr is null
  4  connect by prior empno = mgr
  5  order SIBLINGS by ename
  6  /

ENAME
------------------------------
*KING
**BLAKE
***ALLEN      <==| 
***JAMES      <==|
***MARTIN     <==| ordered within the "BLAKE" branch
***TURNER     <==|
***WARD       <==|
**CLARK
***MILLER
**JONES
***FORD
****SMITH
***SCOTT
****ADAMS

14 rows selected.

SQL> select lpad('*', level, '*' ) || ename ename
  2  from emp
  3  start with mgr is null
  4  connect by prior empno = mgr
  5  order SIBLINGS by ename DESC
  6  /

ENAME
------------------------------
*KING
**JONES
***SCOTT
****ADAMS
***FORD
****SMITH
**CLARK
***MILLER
**BLAKE
***WARD
***TURNER
***MARTIN
***JAMES
***ALLEN

14 rows selected.

Mr DISTINCT might not be your friend

Whenever you have the need to use the DISTINCT keyword, its worth just pausing for a second, and making sure that you are not hiding just a larger issue. It actually might represent either incorrect use of SQL or incorrect assumptions from the data model.

Consider the following example

SELECT DISTINCT d.dname
FROM   emp e, dept d
WHERE  e.ename = 'SMITH'
AND    e.deptno = d.deptno

The query is certainly valid, but when I see “distinct” I ask myself the following questions:

Has the DISTINCT has been added in an attempt to only return a single row ?, ie, is someone working under the assumption being that an employee name can only refer to a single department ? Unless there is a unique constraint on the ENAME column, then we can still just as easily get multiple rows back (even with the DISTINCT), so the SQL will be a “sleeping problem” in the application until the data causes it to fail.

Because the DISTINCT keyword here:

  • doesn’t give you any guarantee that you’ll only get one row, and
  • might be making the database do excessive work (get all rows, sort them and remove duplicates), every single time you run this SQL

Alternatively, perhaps there is meant to be a single department for an ENAME, ie, ENAME is unique. If that is the case, then we should change the database model/design to reflect it and not use DISTINCT to “fake” it.

I’m not saying DISTINCT is always a bug – but it’s always worth just asking yourself if you were correct to be using it.