Attribute clustering (part 1)

One of the nice facilities in 12c is the ability to instruct the database to co-locate data during loading in order to facilitate reduced resource consumption for subsequent queries.

Here’s an example of it in use. First I’ll create table SOURCE_DATA to mimic some randomised incoming data feed



SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

SQL> drop table source_data purge;
drop table source_data purge
           *
ERROR at line 1:
ORA-00942: table or view does not exist


SQL>
SQL> create table source_data  as
  2      select d.*
  3      from dba_objects d
  4      where object_id is not null
  5      order by dbms_random.random;

Table created.

So let’s now populate (create) our table T with a straight copy of the data from SOURCE_DATA and index a column of interest


SQL> select object_id, owner from t where rownum < 50;

 OBJECT_ID OWNER
---------- ------------------------------
     59002 PUBLIC
     50435 SYS
     37276 SYS
      2280 SYS
     65169 PUBLIC
     36054 SYS
     47375 SYS
     66267 PUBLIC
     34259 SYS
     74791 APEX_050100
     37002 SYS
     30269 SYS
     33423 SYS
     70990 MDSYS
     29479 SYS
     33347 SYS
     49426 SYS
     44222 SYS
     17796 PUBLIC
     44212 SYS
     35609 SYS
     45084 SYS
     29310 SYS
     55452 SYS
     30311 SYS
...

SQL> create index t_idx on t(object_id);

Index created.

You can see the data is in pseudo-random order on both OBJECT_ID and OWNER. (We only care about OBJECT_ID for the sake of this demo). Let us now see what the I/O of an index range scan query on OBJECT_ID costs us.


SQL> set autotrace traceonly stat
SQL> select /*+ index(t) */ * from t where object_id between 10000 and 15000;

4997 rows selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       5339  consistent gets
         12  physical reads
          0  redo size
     629403  bytes sent via SQL*Net to client
       4214  bytes received via SQL*Net from client
        335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       4997  rows processed

SQL> set autotrace off
SQL>
SQL>

Just under 5400 consistent gets, which is not dissimilar from the rows we fetched, because those rows from OBJECT_ID 10,000 to 15,000 they are scattered throughout the table. Let us now repeat the exercise, this time using the clustering facilities available under 12c.


SQL>
SQL> drop table t purge;

Table dropped.

SQL>
SQL> create table t
  2  clustering by linear order(object_id)
  3  as select * from source_data;

Table created.

SQL> select object_id, owner from t where rownum < 50;

 OBJECT_ID OWNER
---------- ------------------------------
         2 SYS
         3 SYS
         4 SYS
         5 SYS
         6 SYS
         7 SYS
         8 SYS
         9 SYS
        10 SYS
        11 SYS
        12 SYS
        13 SYS
        14 SYS
        15 SYS
        16 SYS
        17 SYS
        18 SYS
        19 SYS
        20 SYS
        21 SYS
        22 SYS
        23 SYS
        24 SYS
        25 SYS
        26 SYS
        27 SYS
        28 SYS
        29 SYS
        30 SYS
        31 SYS
        32 SYS
        33 SYS
        34 SYS
        35 SYS
        36 SYS
        37 SYS
        38 SYS
        39 SYS
        40 SYS
        41 SYS
        42 SYS
        43 SYS
        44 SYS
        45 SYS
        46 SYS
...
...

SQL> create index t_idx on t(object_id);

Index created.

SQL>
SQL> set autotrace traceonly stat
SQL> select /*+ index(t) */ * from t where object_id between 10000 and 15000;

4997 rows selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        756  consistent gets
         12  physical reads
          0  redo size
     629403  bytes sent via SQL*Net to client
       4214  bytes received via SQL*Net from client
        335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       4997  rows processed

SQL> set autotrace off
SQL>
SQL>

Before we even used the index, you could see from the sample of rows we queried from the table, the data has been stored in an ordered sequence. As a result, our index lookup is dramatically improved, dropping down to just ~750 consistent gets.  Clustering is covered in great detail in the Data Warehousing Guide, but I presented this example to help with my next blog post on the same topic (coming soon).

Some people have mentioned that Attribute Clustering is an engineered system feature only.  This is not correct – you’re free to Attribute Clustering in Enterprise Edition. It is only when you want to incorporate Zone Maps that licensing things get more complicated Smile

7 Comments on “Attribute clustering (part 1)

  1. Looks nice, but is this just a new implementation of clustering by cluster key, available since about (?) Version 6?

    Single table hash clusters are I suppose a bit more complex to define and old style clustering was not used much in my experience.

  2. There is a lot more to it that the conventional clustering by key – check the docs. There linear clustering, grouped clustering, data movement handling, and the combination of zone maps can make order of magnitude differences. Plus no overhead of clustered indexes etc, or needing to know the final size of the object in advance.

  3. Pingback: Attribute clustering (part 2) | Learning is not a spectator sport

  4. Pingback: Attribute clustering (part 3) | Learning is not a spectator sport

  5. Thank you for wonderful article on Attribute Clustering.

    However I have one question.

    If we use the attribute clustering(based on non key attribute), what effect it would create on the fragmentation of any clustered index(based on key attribute ) defined on that table.
    Thanks.

  6. If you mean the clustering factor, then any re-ordering of rows in a table can impact the clustering factor in positive (or negative) way. Obviously you want to choose the clustering option that gives you the best overall result.

    Other options would include (for example) hash partitioning on “column1” to give a moderate amount of clustering, and then applying attribute clustering on “column2” within those partitions etc.

  7. Pingback: Attribute clustering….super cool | Learning is not a spectator sport

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: