ROWNUM and ORDER BY

Just a quick revisit on an old topic so I could link to this post in a Stack Overflow discussion.



SQL> create table t ( x int );

Table created.

SQL>
SQL> insert into t
  2  select dbms_random.value(1,1000)
  3  from dual
  4  connect by level <= 100;

100 rows created.

SQL>
SQL> select count(*) from t;

  COUNT(*)
----------
       100

1 row selected.

--
-- DOES NOT WORK
--
SQL>
SQL> select * from t
  2  where rownum <= 10
  3  order by x desc;

         X
----------
       995
       909
       873
       871
       640
       555
       318
       197
       134
        70

10 rows selected.

--
-- CORRECT ALTERNATIVES
--

SQL>
SQL> select *
  2  from
  3    ( select * from t order by x desc )
  4  where rownum <= 10;

         X
----------
      1000
       995
       991
       977
       977
       971
       955
       909
       909
       900

10 rows selected.

SQL> select * from t
  2  order by x desc
  3  fetch first 10 rows only;

         X
----------
      1000
       995
       991
       977
       977
       971
       955
       909
       909
       900

10 rows selected.

SQL>
SQL> select x
  2  from
  3    ( select t.*,
  4         row_number() over ( order by x desc ) r
  5      from t  )
  6  where r <= 10;

         X
----------
      1000
       995
       991
       977
       977
       971
       955
       909
       909
       900

10 rows selected.

As you can see, you cannot simply have WHERE ROWNUM and ORDER BY in the same SELECT statement if you are after the “top” rows based on some criteria.

Never rely on an assumed order

We’ve (hopefully) all had it drummed into us enough times – you cannot assume the order of results from queries unless you explicitly include an ORDER BY statement.

Here’s another trivial example of this- I was doing a little demo script for an AskTom question.

Here’s the script running in 11.2.0.4


SQL> create table T
  2  as
  3  select
  4    rownum c1,
  5    mod(rownum,10) c2,
  6    trunc(rownum/1000) c3,
  7    mod(rownum,100) c4,
  8    trunc(rownum/100) c5,
  9    mod(rownum,1000) c6,
 10    trunc(rownum/10) c7
 11  from dual
 12  connect by level <= 100;

Table created.

SQL>
SQL> begin
  2  for i in 1 .. 7 loop
  3    execute immediate 'create bitmap index IX'||i||' on T ( c'||i||')';
  4  end loop;
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> select
  2    index_name,
  3    leaf_blocks,
  4    avg_leaf_blocks_per_key,
  5    avg_data_blocks_per_key
  6  from user_indexes
  7  where table_name = 'T';

INDEX_NAME                     LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ----------- ----------------------- -----------------------
IX1                                      1                       1                       1
IX2                                      1                       1                       1
IX3                                      1                       1                       1
IX4                                      1                       1                       1
IX5                                      1                       1                       1
IX6                                      1                       1                       1
IX7                                      1                       1                       1

And here’s the same script running in 12.1.0.2


SQL> create table T
  2  as
  3  select
  4    rownum c1,
  5    mod(rownum,10) c2,
  6    trunc(rownum/1000) c3,
  7    mod(rownum,100) c4,
  8    trunc(rownum/100) c5,
  9    mod(rownum,1000) c6,
 10    trunc(rownum/10) c7
 11  from dual
 12  connect by level <= 100;

Table created.

SQL>
SQL> begin
  2  for i in 1 .. 7 loop
  3    execute immediate 'create bitmap index IX'||i||' on T ( c'||i||')';
  4  end loop;
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> select
  2    index_name,
  3    leaf_blocks,
  4    avg_leaf_blocks_per_key,
  5    avg_data_blocks_per_key
  6  from user_indexes
  7  where table_name = 'T';

INDEX_NAME                     LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ----------- ----------------------- -----------------------
IX7                                      1                       1                       1
IX6                                      1                       1                       1
IX5                                      1                       1                       1
IX4                                      1                       1                       1
IX3                                      1                       1                       1
IX2                                      1                       1                       1
IX1                                      1                       1                       1

A little known ORDER BY extension

Within a CONNECT BY statement, you can order siblings, that is, under a particular branch of the hierarchy, the child entries can be ordered, but you still preserve the hierarchy.


SQL> select lpad('*', level, '*' ) || ename ename
  2  from emp
  3  start with mgr is null
  4  connect by prior empno = mgr
  5  order SIBLINGS by ename
  6  /

ENAME
------------------------------
*KING
**BLAKE
***ALLEN      <==| 
***JAMES      <==|
***MARTIN     <==| ordered within the "BLAKE" branch
***TURNER     <==|
***WARD       <==|
**CLARK
***MILLER
**JONES
***FORD
****SMITH
***SCOTT
****ADAMS

14 rows selected.

SQL> select lpad('*', level, '*' ) || ename ename
  2  from emp
  3  start with mgr is null
  4  connect by prior empno = mgr
  5  order SIBLINGS by ename DESC
  6  /

ENAME
------------------------------
*KING
**JONES
***SCOTT
****ADAMS
***FORD
****SMITH
**CLARK
***MILLER
**BLAKE
***WARD
***TURNER
***MARTIN
***JAMES
***ALLEN

14 rows selected.

Mr DISTINCT might not be your friend

Whenever you have the need to use the DISTINCT keyword, its worth just pausing for a second, and making sure that you are not hiding just a larger issue. It actually might represent either incorrect use of SQL or incorrect assumptions from the data model.

Consider the following example

SELECT DISTINCT d.dname
FROM   emp e, dept d
WHERE  e.ename = 'SMITH'
AND    e.deptno = d.deptno

The query is certainly valid, but when I see “distinct” I ask myself the following questions:

Has the DISTINCT has been added in an attempt to only return a single row ?, ie, is someone working under the assumption being that an employee name can only refer to a single department ? Unless there is a unique constraint on the ENAME column, then we can still just as easily get multiple rows back (even with the DISTINCT), so the SQL will be a “sleeping problem” in the application until the data causes it to fail.

Because the DISTINCT keyword here:

  • doesn’t give you any guarantee that you’ll only get one row, and
  • might be making the database do excessive work (get all rows, sort them and remove duplicates), every single time you run this SQL

Alternatively, perhaps there is meant to be a single department for an ENAME, ie, ENAME is unique. If that is the case, then we should change the database model/design to reflect it and not use DISTINCT to “fake” it.

I’m not saying DISTINCT is always a bug – but it’s always worth just asking yourself if you were correct to be using it.