Attribute clustering (part 1)

One of the nice facilities in 12c is the ability to instruct the database to co-locate data during loading in order to facilitate reduced resource consumption for subsequent queries.

Here’s an example of it in use. First I’ll create table SOURCE_DATA to mimic some randomised incoming data feed



SQL> exec dbms_random.seed(0)

PL/SQL procedure successfully completed.

SQL> drop table source_data purge;
drop table source_data purge
           *
ERROR at line 1:
ORA-00942: table or view does not exist


SQL>
SQL> create table source_data  as
  2      select d.*
  3      from dba_objects d
  4      where object_id is not null
  5      order by dbms_random.random;

Table created.

So let’s now populate (create) our table T with a straight copy of the data from SOURCE_DATA and index a column of interest


SQL> create table t as select * from source_data;

Table created.

SQL>
SQL> select object_id, owner from t where rownum < 50;  OBJECT_ID OWNER ---------- ------------------------------      37627 PUBLIC      79081 ORDSYS      26953 SYS      74744 PUBLIC      47975 PUBLIC      94003 APEX_050000      59014 SYS      66409 PUBLIC      36977 PUBLIC      74886 PUBLIC        565 SYS      35456 SYS      74656 SYS      61451 PUBLIC      11486 SYS      61444 SYS      44048 SYS      24810 PUBLIC      27578 SYS      68871 PUBLIC      22010 SYS      31915 PUBLIC      60762 SYS      45858 SYS      44413 PUBLIC      61323 PUBLIC      81339 ORDSYS      31560 SYS       8621 SYS      42483 PUBLIC      35172 SYS      88978 APEX_040200      67118 PUBLIC      27550 SYS       7244 SYS      96985 APEX_050000      80677 PUBLIC      65032 SYS      42454 SYS      69067 PUBLIC      15227 SYS      29374 SYS      19280 XDB      45063 PUBLIC      61815 PUBLIC      94727 APEX_050000      31122 SYS      51869 PUBLIC      39565 PUBLIC 49 rows selected. SQL>
SQL> create index t_idx on t(object_id);

Index created.

You can see the data is in pseudo-random order on both OBJECT_ID and OWNER. (We only care about OBJECT_ID for the sake of this demo). Let us now see what the I/O of an index range scan query on OBJECT_ID costs us.


SQL> set autotrace traceonly stat
SQL> select /*+ index(t) */ * from t where object_id between 10000 and 15000;

4997 rows selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       5339  consistent gets
         12  physical reads
          0  redo size
     629403  bytes sent via SQL*Net to client
       4214  bytes received via SQL*Net from client
        335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       4997  rows processed

SQL> set autotrace off
SQL>
SQL>

Just under 5400 consistent gets, which is not dissimilar from the rows we fetched, because those rows from OBJECT_ID 10,000 to 15,000 they are scattered throughout the table. Let us now repeat the exercise, this time using the clustering facilities available under 12c.


SQL>
SQL> drop table t purge;

Table dropped.

SQL>
SQL> create table t
  2  clustering by linear order(object_id)
  3  as select * from source_data;

Table created.

SQL>
SQL> select object_id, owner from t where rownum < 50;  OBJECT_ID OWNER ---------- ------------------------------          2 SYS          3 SYS          4 SYS          5 SYS          6 SYS          7 SYS          8 SYS          9 SYS         10 SYS         11 SYS         12 SYS         13 SYS         14 SYS         15 SYS         16 SYS         17 SYS         18 SYS         19 SYS         20 SYS         21 SYS         22 SYS         23 SYS         24 SYS         25 SYS         26 SYS         27 SYS         28 SYS         29 SYS         30 SYS         31 SYS         32 SYS         33 SYS         34 SYS         35 SYS         36 SYS         37 SYS         38 SYS         39 SYS         40 SYS         41 SYS         42 SYS         43 SYS         44 SYS         45 SYS         46 SYS         47 SYS         48 SYS         49 SYS         50 SYS 49 rows selected. SQL>
SQL> create index t_idx on t(object_id);

Index created.

SQL>
SQL> set autotrace traceonly stat
SQL> select /*+ index(t) */ * from t where object_id between 10000 and 15000;

4997 rows selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        756  consistent gets
         12  physical reads
          0  redo size
     629403  bytes sent via SQL*Net to client
       4214  bytes received via SQL*Net from client
        335  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       4997  rows processed

SQL> set autotrace off
SQL>
SQL>

Before we even used the index, you could see from the sample of rows we queried from the table, the data has been stored in an ordered sequence. As a result, our index lookup is dramatically improved, dropping down to just ~750 consistent gets.  Clustering is covered in great detail in the Data Warehousing Guide, but I presented this example to help with my next blog post on the same topic (coming soon).

Some people have mentioned that Attribute Clustering is an engineered system feature only.  This is not correct – you’re free to Attribute Clustering in Enterprise Edition. It is only when you want to incorporate Zone Maps that licensing things get more complicated Smile

4 thoughts on “Attribute clustering (part 1)

  1. Looks nice, but is this just a new implementation of clustering by cluster key, available since about (?) Version 6?

    Single table hash clusters are I suppose a bit more complex to define and old style clustering was not used much in my experience.

  2. There is a lot more to it that the conventional clustering by key – check the docs. There linear clustering, grouped clustering, data movement handling, and the combination of zone maps can make order of magnitude differences. Plus no overhead of clustered indexes etc, or needing to know the final size of the object in advance.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s