Attribute clustering (part 2)

In the previous post, I demonstrated attribute clustering by

creating a table of source data which contained data in randomised order, via


SQL> create table source_data  as
  2      select d.*
  3      from dba_objects d
  4      where object_id is not null
  5      order by dbms_random.random;

and then it loading into a table with attribute clustering defined to cluster the data into the desired order.

The obvious question is then “Why not just cluster the data by combining the two operations ?”, namely


SQL> create table t1
  2  clustering by linear order(object_id)
  3  as select d.* from dba_objects d
  4  order by dbms_random.random;

There was a reason I did my first demo in the way I did. Let’s look at the execution plan for that CREATE statement. Now, bear with me, for simplicity sake, rather than show the execution plan for DBA_OBJECTS access (which (which is a huge complex view), I’ll use SOURCE_DATA both with and without ORDER BY clauses as a representative.


SQL> create table t1
  2  clustering by linear order(object_id)
  3  as select * from source_data;

------------------------------------------------------------------------
| Id  | Operation                        | Name        | Rows  | Bytes |
------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT           |             | 99386 |    10M|
|   1 |  LOAD AS SELECT                  | T1          |       |       |
|   2 |   OPTIMIZER STATISTICS GATHERING |             | 99386 |    10M|
|   3 |    SORT ORDER BY                 |             | 99386 |    10M|
|   4 |     TABLE ACCESS FULL            | SOURCE_DATA | 99386 |    10M|
------------------------------------------------------------------------

SQL> create table t1
  2  clustering by linear order(object_id)
  3  as select * from dba_objects
  4  order by dbms_random.random;
  
------------------------------------------------------------------------
| Id  | Operation                        | Name        | Rows  | Bytes |
------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT           |             | 99386 |    10M|
|   1 |  LOAD AS SELECT                  | T1          |       |       |
|   2 |   OPTIMIZER STATISTICS GATHERING |             | 99386 |    10M|
|   3 |    SORT ORDER BY                 |             | 99386 |    10M|
|   4 |     TABLE ACCESS FULL            | SOURCE_DATA | 99386 |    10M|
------------------------------------------------------------------------

Notice that the execution plans appear the same. We can see a SORT ORDER BY in the first CTAS, even though there is no ORDER BY in the SELECT part. We can hence reasonably assume this is to achieve the desired clustering. But the second CTAS raises an interesting question. If I have to sort the data for the clustering and there was an ORDER BY clause in the query, which one of those, if any, does the SORT ORDER BY in the execution relate to ? Well, lets run our CTAS command and find out.


SQL> create table t1
  2  clustering by linear order(object_id)
  3  as select * from source_data
  4  order by dbms_random.random;

Table created.

SQL>
SQL> select object_id, owner from t1 where rownum < 50;

 OBJECT_ID OWNER
---------- ------------------------------
     72256 SYS
     29101 PUBLIC
     26037 SYS
     47229 PUBLIC
     54002 SYS
     65203 PUBLIC
     49264 SYS
     71998 SYS
     91256 APEX_040200
     85311 PUBLIC
     72948 SYS
     68626 SYS
       337 SYS
     54564 SYS
     11256 SYS
     40694 SYS
     44193 PUBLIC
     31856 SYS
     26153 SYS
     25604 PUBLIC
  ...
  ...

Notice that the data is not clustered as per our expectation. It looks like the final ORDER BY took precedence over our attempt to cluster the data.

So you need to be careful here when using attribute clustering. The database engine is quite particular on the syntax and ordering operations.

In the example above, simply in-lining the order by statement resolves the issue


SQL> create table t1
  2  clustering by linear order(object_id)
  3  as
  4  select *
  5  from
  6  ( select * from source_data order by dbms_random.random );

Table created.

SQL>
SQL> select object_id, owner from t1 where rownum < 50;

 OBJECT_ID OWNER
---------- ------------------------------
         2 SYS
         3 SYS
         4 SYS
         5 SYS
         6 SYS
         7 SYS
         8 SYS
         9 SYS
        10 SYS
        11 SYS
        12 SYS
        13 SYS
        14 SYS
        15 SYS
        16 SYS
        17 SYS
        ...
        ...

And just in case you are wondering, there was no change the execution plan for the above statement. That is, you don’t see any view resolution and multiple SORT ORDER BY lines. It looks as before


------------------------------------------------------------------------
| Id  | Operation                        | Name        | Rows  | Bytes |
------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT           |             | 99386 |    10M|
|   1 |  LOAD AS SELECT                  | T1          |       |       |
|   2 |   OPTIMIZER STATISTICS GATHERING |             | 99386 |    10M|
|   3 |    SORT ORDER BY                 |             | 99386 |    10M|
|   4 |     TABLE ACCESS FULL            | SOURCE_DATA | 99386 |    10M|
------------------------------------------------------------------------

But the 1 level of in-lining was enough to fix the issue. So always validate that you are getting the clustering you are expecting.

4 responses to “Attribute clustering (part 2)”

Attribute clustering (part 3) | Learning is not a spectator sport

November 25, 2016 at 12:58 pm

[…] in part1 and part2, we looked at creating tables with clustered data. If you’re ready to climb aboard the […]

Loading…

BC

July 23, 2017 at 8:23 am

Connor,

What is the advantage of using clustering instead of just using CTAS and order by?

Thanks

BC

Loading…

1. connormcdonald
  
  January 15, 2018 at 2:01 pm
  
  More for ongoing operations. You could simply issue:
  
  alter table T move
  
  from time to time to re-cluster your data. In particular, with 12.2 you could do it online
  
  Loading…
  
Attribute clustering….super cool – Learning is not a spectator sport

October 18, 2018 at 6:30 pm

[…] spoken about attribute clustering before here, here and here. So from that you can probably glean that I’m a […]

Loading…