In the previous post, I demonstrated attribute clustering by
- creating a table of source data which contained data in randomised order, via
SQL> create table source_data as
2 select d.*
3 from dba_objects d
4 where object_id is not null
5 order by dbms_random.random;
- and then it loading into a table with attribute clustering defined to cluster the data into the desired order.
The obvious question is then “Why not just cluster the data by combining the two operations ?”, namely
SQL> create table t1
2 clustering by linear order(object_id)
3 as select d.* from dba_objects d
4 order by dbms_random.random;
There was a reason I did my first demo in the way I did. Let’s look at the execution plan for that CREATE statement. Now, bear with me, for simplicity sake, rather than show the execution plan for DBA_OBJECTS access (which (which is a huge complex view), I’ll use SOURCE_DATA both with and without ORDER BY clauses as a representative.
SQL> create table t1
2 clustering by linear order(object_id)
3 as select * from source_data;
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
------------------------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | | 99386 | 10M|
| 1 | LOAD AS SELECT | T1 | | |
| 2 | OPTIMIZER STATISTICS GATHERING | | 99386 | 10M|
| 3 | SORT ORDER BY | | 99386 | 10M|
| 4 | TABLE ACCESS FULL | SOURCE_DATA | 99386 | 10M|
------------------------------------------------------------------------
SQL> create table t1
2 clustering by linear order(object_id)
3 as select * from dba_objects
4 order by dbms_random.random;
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
------------------------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | | 99386 | 10M|
| 1 | LOAD AS SELECT | T1 | | |
| 2 | OPTIMIZER STATISTICS GATHERING | | 99386 | 10M|
| 3 | SORT ORDER BY | | 99386 | 10M|
| 4 | TABLE ACCESS FULL | SOURCE_DATA | 99386 | 10M|
------------------------------------------------------------------------
Notice that the execution plans appear the same. We can see a SORT ORDER BY in the first CTAS, even though there is no ORDER BY in the SELECT part. We can hence reasonably assume this is to achieve the desired clustering. But the second CTAS raises an interesting question. If I have to sort the data for the clustering and there was an ORDER BY clause in the query, which one of those, if any, does the SORT ORDER BY in the execution relate to ? Well, lets run our CTAS command and find out.
SQL> create table t1
2 clustering by linear order(object_id)
3 as select * from source_data
4 order by dbms_random.random;
Table created.
SQL>
SQL> select object_id, owner from t1 where rownum < 50;
OBJECT_ID OWNER
---------- ------------------------------
72256 SYS
29101 PUBLIC
26037 SYS
47229 PUBLIC
54002 SYS
65203 PUBLIC
49264 SYS
71998 SYS
91256 APEX_040200
85311 PUBLIC
72948 SYS
68626 SYS
337 SYS
54564 SYS
11256 SYS
40694 SYS
44193 PUBLIC
31856 SYS
26153 SYS
25604 PUBLIC
...
...
Notice that the data is not clustered as per our expectation. It looks like the final ORDER BY took precedence over our attempt to cluster the data.
So you need to be careful here when using attribute clustering. The database engine is quite particular on the syntax and ordering operations.
In the example above, simply in-lining the order by statement resolves the issue
SQL> create table t1
2 clustering by linear order(object_id)
3 as
4 select *
5 from
6 ( select * from source_data order by dbms_random.random );
Table created.
SQL>
SQL> select object_id, owner from t1 where rownum < 50;
OBJECT_ID OWNER
---------- ------------------------------
2 SYS
3 SYS
4 SYS
5 SYS
6 SYS
7 SYS
8 SYS
9 SYS
10 SYS
11 SYS
12 SYS
13 SYS
14 SYS
15 SYS
16 SYS
17 SYS
...
...
And just in case you are wondering, there was no change the execution plan for the above statement. That is, you don’t see any view resolution and multiple SORT ORDER BY lines. It looks as before
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |
------------------------------------------------------------------------
| 0 | CREATE TABLE STATEMENT | | 99386 | 10M|
| 1 | LOAD AS SELECT | T1 | | |
| 2 | OPTIMIZER STATISTICS GATHERING | | 99386 | 10M|
| 3 | SORT ORDER BY | | 99386 | 10M|
| 4 | TABLE ACCESS FULL | SOURCE_DATA | 99386 | 10M|
------------------------------------------------------------------------
But the 1 level of in-lining was enough to fix the issue. So always validate that you are getting the clustering you are expecting.
Connor,
What is the advantage of using clustering instead of just using CTAS and order by?
Thanks
BC
More for ongoing operations. You could simply issue:
alter table T move
from time to time to re-cluster your data. In particular, with 12.2 you could do it online