One of the nice facilities in 12c is the ability to instruct the database to co-locate data during loading in order to facilitate reduced resource consumption for subsequent queries.
Here’s an example of it in use. First I’ll create table SOURCE_DATA to mimic some randomised incoming data feed
SQL> exec dbms_random.seed(0)
PL/SQL procedure successfully completed.
SQL> drop table source_data purge;
drop table source_data purge
*
ERROR at line 1:
ORA-00942: table or view does not exist
SQL>
SQL> create table source_data as
2 select d.*
3 from dba_objects d
4 where object_id is not null
5 order by dbms_random.random;
Table created.
So let’s now populate (create) our table T with a straight copy of the data from SOURCE_DATA and index a column of interest
SQL> select object_id, owner from t where rownum < 50;
OBJECT_ID OWNER
---------- ------------------------------
59002 PUBLIC
50435 SYS
37276 SYS
2280 SYS
65169 PUBLIC
36054 SYS
47375 SYS
66267 PUBLIC
34259 SYS
74791 APEX_050100
37002 SYS
30269 SYS
33423 SYS
70990 MDSYS
29479 SYS
33347 SYS
49426 SYS
44222 SYS
17796 PUBLIC
44212 SYS
35609 SYS
45084 SYS
29310 SYS
55452 SYS
30311 SYS
...
SQL> create index t_idx on t(object_id);
Index created.
You can see the data is in pseudo-random order on both OBJECT_ID and OWNER. (We only care about OBJECT_ID for the sake of this demo). Let us now see what the I/O of an index range scan query on OBJECT_ID costs us.
SQL> set autotrace traceonly stat
SQL> select /*+ index(t) */ * from t where object_id between 10000 and 15000;
4997 rows selected.
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5339 consistent gets
12 physical reads
0 redo size
629403 bytes sent via SQL*Net to client
4214 bytes received via SQL*Net from client
335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4997 rows processed
SQL> set autotrace off
SQL>
SQL>
Just under 5400 consistent gets, which is not dissimilar from the rows we fetched, because those rows from OBJECT_ID 10,000 to 15,000 they are scattered throughout the table. Let us now repeat the exercise, this time using the clustering facilities available under 12c.
SQL>
SQL> drop table t purge;
Table dropped.
SQL>
SQL> create table t
2 clustering by linear order(object_id)
3 as select * from source_data;
Table created.
SQL> select object_id, owner from t where rownum < 50;
OBJECT_ID OWNER
---------- ------------------------------
2 SYS
3 SYS
4 SYS
5 SYS
6 SYS
7 SYS
8 SYS
9 SYS
10 SYS
11 SYS
12 SYS
13 SYS
14 SYS
15 SYS
16 SYS
17 SYS
18 SYS
19 SYS
20 SYS
21 SYS
22 SYS
23 SYS
24 SYS
25 SYS
26 SYS
27 SYS
28 SYS
29 SYS
30 SYS
31 SYS
32 SYS
33 SYS
34 SYS
35 SYS
36 SYS
37 SYS
38 SYS
39 SYS
40 SYS
41 SYS
42 SYS
43 SYS
44 SYS
45 SYS
46 SYS
...
...
SQL> create index t_idx on t(object_id);
Index created.
SQL>
SQL> set autotrace traceonly stat
SQL> select /*+ index(t) */ * from t where object_id between 10000 and 15000;
4997 rows selected.
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
756 consistent gets
12 physical reads
0 redo size
629403 bytes sent via SQL*Net to client
4214 bytes received via SQL*Net from client
335 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4997 rows processed
SQL> set autotrace off
SQL>
SQL>
Before we even used the index, you could see from the sample of rows we queried from the table, the data has been stored in an ordered sequence. As a result, our index lookup is dramatically improved, dropping down to just ~750 consistent gets. Clustering is covered in great detail in the Data Warehousing Guide, but I presented this example to help with my next blog post on the same topic (coming soon).
Some people have mentioned that Attribute Clustering is an engineered system feature only. This is not correct – you’re free to Attribute Clustering in Enterprise Edition. It is only when you want to incorporate Zone Maps that licensing things get more complicated
Looks nice, but is this just a new implementation of clustering by cluster key, available since about (?) Version 6?
Single table hash clusters are I suppose a bit more complex to define and old style clustering was not used much in my experience.
There is a lot more to it that the conventional clustering by key – check the docs. There linear clustering, grouped clustering, data movement handling, and the combination of zone maps can make order of magnitude differences. Plus no overhead of clustered indexes etc, or needing to know the final size of the object in advance.
Thank you for wonderful article on Attribute Clustering.
However I have one question.
If we use the attribute clustering(based on non key attribute), what effect it would create on the fragmentation of any clustered index(based on key attribute ) defined on that table.
Thanks.
If you mean the clustering factor, then any re-ordering of rows in a table can impact the clustering factor in positive (or negative) way. Obviously you want to choose the clustering option that gives you the best overall result.
Other options would include (for example) hash partitioning on “column1” to give a moderate amount of clustering, and then applying attribute clustering on “column2” within those partitions etc.