Work colleague Chris Saxon published a post about GROUP BY not guaranteeing that a sort would occur, and concluded (correctly) that the only way to ensure that a query result will be sorted is to have an ORDER BY.

Modern versions of the database will use a hashing mechanism to perform a GROUP BY. If you want a more detailed dive into that, check out this video

In the comments section of Chris’s post, someone mentioned “in a prior release a sort would be issued” which was a reference to a time before we started using the GROUP BY HASH operation. However, even without a hash operation, the statement is incorrect.

It would have been correct if we swapped “would” for “could”, namely, that in most situations a sort would be performed. The problem was, developers back in the day would assume that the sort was hence mandatory and thus use a GROUP BY in lieu of a sort which is when all the trouble started.

But “most” is not the same as “always”. Lets look at a couple of examples, which go all the way back to Oracle 7 thus proving you could never blindly assume a sort would occur.

Example 1: We GROUP BY but do not sort.


SQL> create table tab ( pk1 int,
  2   pk2 int,
  3   c1 int,
  4   c2 int,
  5  primary key(pk1,pk2) );

Table created.

SQL>
SQL> insert into tab values (1, 1, 2, 2 );

1 row created.

SQL> insert into tab values (2, 1, 1, 1 );

1 row created.

SQL>
SQL> create index tab_idx on tab(pk1,c1,pk2,c2);

Index created.

SQL>
SQL> set autotrace traceonly explain
SQL> select c1, c2, pk1, pk2, count(*)
  2  from tab
  3  group by c1, c2, pk1, pk2;

Execution Plan
----------------------------------------------------------
Plan hash value: 1837474081

--------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |     2 |   104 |     1   (0)| 00:00:01 |
|   1 |  SORT GROUP BY NOSORT|         |     2 |   104 |     1   (0)| 00:00:01 |
|   2 |   INDEX FULL SCAN    | TAB_IDX |     2 |   104 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------

You can see the existence an operation called GROUP BY NOSORT, which as the name suggests…did not do a sort.

Examples 2: We don’t even do a GROUP BY !


SQL> create table tab ( pk1 int,
  2     c1 int,
  3     c2 int,
  4    primary key(pk1) );

Table created.

SQL>
SQL> insert into tab values (1, 2, 2 );

1 row created.

SQL> insert into tab values (2, 1, 1 );

1 row created.

SQL> create index tab_idx on tab(pk1,c1,c2);

Index created.

SQL>
SQL> set autotrace traceonly explain
SQL> select c1, c2, pk1, count(*)
  2    from tab
  3    group by c1, c2, pk1;

Execution Plan
----------------------------------------------------------
Plan hash value: 41970856

----------------------------------------------------------------------------
| Id  | Operation        | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT |         |     2 |    78 |     1   (0)| 00:00:01 |
|   1 |  INDEX FULL SCAN | TAB_IDX |     2 |    78 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Bottom Line – If you want something sorted, you must have an ORDER BY

Ho Ho Ho … Merry Christmas!

2 responses to “Kris Kringle the Database – GROUP BY out of sorts?”

    1. Yup – that’s where I sourced some of this demo (Chris and I look after AskTOM). Plenty of other examples as well (partitioning, external tables, parallel queries, the list goes on).

      We’ll keep telling people to add an ORDER BY …. and people will keep on not listening 🙂

Leave a reply to sqlmonster7621ed93a5 Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trending