Attribute clustering….super cool

I’ve spoken about attribute clustering before here, here and here. So from that you can probably glean that I’m a fan.

I recently spoke about an example of this as well during my AskTOM Office Hours session which you can watch below:

After posting that video, I had a follow-up question which I thought would be worth exploring. My examples so far on attribute clustering were either loading a table from empty, or issuing an “alter table move” on a table that already contained data. In both those situations, it is reasonable to assume that clustering is going to help because it aligns conceptually with what we’d expect to happen, namely

  • read all of the data
  • re-organize all of the data so that it is clustered by the nominated clustering key(s)
  • reload all of the data

But what if we do not have that luxury of moving all of the data around? What if I have table that is already clustered, but now I need to load more data into it? As we know, we only see clustering in effect if we perform a direct load operation. But a direct load operation only creates new blocks. That would suggest we cannot cluster the data, no?

Lets explore this scenario with an example.

As a control to this experiment, I’ll perform some index lookups to a “conventional” table, that is, random data without any clustering attributes.

I’ll create a table with the data arranged in random order, and then add the same amount of data again. (I could do this in a single operation, but I’m keeping it in alignment with experiments I’ll perform shortly to mimic the concept of load data and then appending new data).


SQL> create table t as
  2  select rownum id, rpad('x',100) data, mod(rownum,100) cust_id
  3  from dual
  4  connect by level <= 100000
  5  order by dbms_random.value;

Table created.

SQL>
SQL> insert /*+ APPEND */ into t
  2  select rownum+100000 id, rpad('x',100) data, mod(rownum,100) cust_id
  3  from dual
  4  connect by level <= 100000
  5  order by dbms_random.value;

100000 rows created.

SQL>
SQL> commit;

Commit complete.

I’ve got 200,000 randomly distributed rows for 100 distinct CUST_ID values. Now I’ll add an index and perform a simple lookup to examine the cost. (Note: The statistics you see are not from an initial execution, but from subsequent executions to eliminate parsing and other costs)


SQL> create index ix on t ( cust_id );

Index created.

SQL>
SQL>
SQL> set autotrace on stat
SQL> select /*+ index(t (cust_id)) */ max(data)
  2  from t
  3  where cust_id = 17;

MAX(DATA)
----------------------------------------------------------------------------------------------------
x

1 row selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1502  consistent gets
          0  physical reads
          0  redo size
        641  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

So our control figure is approximately 1500 logical I/O’s.

In the previous blog posts on this topic, I then added attribute clustering definitions and perform an ‘alter table move’ to reorganize all of the data into a nice clustered arragement. This time, I’ll repeat the above experiment in a different manner. The sequence of steps will be:

  1. load the data,
  2. add the clustering attributes,
  3. reorganize the data,
  4. add an index,
  5. observe the expected benefits of clustering,
  6. add more data, which we might expect could not be clustered because the table is already clustered now,
  7. perform more index lookups and see what the cost is.

If clustering is only useful for an initial data load, then we’d expect to see a significant degradation when the final index lookups are performed in (7). First of all, just to reinforce the coolness of attribute clustering just one more time, here are steps (1) through (5). Remember – the control figure here is 1500 LIO’s.


SQL> create table t as
  2  select rownum id, rpad('x',100) data, mod(rownum,100) cust_id
  3  from dual
  4  connect by level <= 100000
  5  order by dbms_random.value;

Table created.

SQL>
SQL> create table t2 as select * from t;

Table created.

SQL>
SQL> alter table t ADD clustering by linear order(cust_id);

Table altered.

SQL> alter table t move;

Table altered.

SQL>
SQL> create index ix on t ( cust_id );

Index created.

SQL>
SQL> set autotrace on stat
SQL> select /*+ index(t (cust_id)) */ max(data)
  2  from t
  3  where cust_id = 17;

MAX(DATA)
----------------------------------------------------------------------------------------------------
x

1 row selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         20  consistent gets
          2  physical reads
          0  redo size
        641  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Ooooh yeahhhhh… 20 LIOs!!! But now the real test comes. You can see I also created a table called T2, being a clone of T in its randomised data state. Now I am going to append the contents on T2 to my (clustered) T table.


SQL> insert /*+ APPEND */ into t select * from t2;

100000 rows created.

SQL> commit;

Commit complete.

One thing that does look promising is notice the “SORT ORDER BY” step in the execution plan. That suggests that even though this is appending new data to an existing set of data, some clustering might be taking place.


SQL> explain plan for insert /*+ APPEND */ into t select * from t2;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 528765404

-------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |      |   100K|    10M|       |  2905   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | T    |       |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |      |   100K|    10M|       |  2905   (1)| 00:00:01 |
|   3 |    SORT ORDER BY                 |      |   100K|    10M|    11M|  2905   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL            | T2   |   100K|    10M|       |   442   (1)| 00:00:01 |
-------------------------------------------------------------------------------------------------

11 rows selected.

But the real test comes in the index lookup. Let’s repeat that.


SQL>
SQL> set autotrace on stat
SQL> select /*+ index(t (cust_id)) */ max(data)
  2  from t
  3  where cust_id = 17;

MAX(DATA)
----------------------------------------------------------------------------------------------------
x

1 row selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         43  consistent gets
          0  physical reads
          0  redo size
        641  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

And we’re still awesomely good. The amount of data for CUST_ID = 17 has doubled, and the LIOs approximately doubled as well. So we are still seeing benefits of clustering even though we are appending to a table with previously clustered data.

Just how close to optimal is this? I’ll perform an ‘alter table move’ against the entire table, and re-measure:


SQL> alter table t move online;

Table altered.

SQL> select /*+ index(t (cust_id)) */ max(data)
  2  from t
  3  where cust_id = 17;

MAX(DATA)
---------------------------------------------------------------------------
x

1 row selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         37  consistent gets
          0  physical reads
          0  redo size
        641  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Hence optimal clustering would have yielded 37 LIOs for this data, so 43 is very close to that, and a huge step up in efficiency over the 1500 LIOs from the base experiment. In effect, we have “clumps” of clustered data in the table now – so larger, low frequency dataset loads will give better results than smaller, higher frequency loads. But still…very cool stuff that you should be taking advantage of in your databases.

LOBs from afar

This has always been a nuisance.  There you are – getting all the bells and whistles with LOBs…until a database link enters the room Smile


--
-- Database: DB11
--
SQL> create table t ( id int, c clob );

Table created.

SQL> insert into t values (1,rpad('x',32000,'x'));

1 row created.

SQL> select * 
  2  from   t;

        ID C
---------- --------------------------------------------
         1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

--
-- Database: anywhere except DB11
--
SQL> select * 
  2  from   t@db11;

ERROR:
ORA-22992: cannot use LOB locators selected from remote tables

 

Well that sucks. But look what has snuck its way into 12.2 !


SQL> select * 
  2  from   t@db122;

        ID C
---------- --------------------------------------------
         1 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

 

There is a whole chapter on it here in the official docs.  Very nice indeed.

Interval partitioning just got better

Interval partitioning was a great feature when it arrived in version 11, because we no longer had to worry so much about ensuring partitions were available for new data when it arrived.  Partitions would just be created on the fly as required.  I’m not going to talk about interval partition in detail because there’s plenty of good content already out there.  But one key element for interval partitioning is that the intervals have to start from somewhere, which is why you always have to define a table with at least one partition.

 

image

 

So what if I want to drop that partition that is the conceptual “starting point”.  Well…I get problems Smile


SQL> create table sales
  2  ( tstamp       timestamp    not null,
  3    empno        number(10)   not null,
  4    ename        varchar2(10) not null,
  5    deptno       varchar2(10) not null
  6  )
  7  partition by range (tstamp)
  8  interval( numtoyminterval(1,'YEAR'))
  9  (
 10    partition p00 values less than (timestamp '2010-01-01 00:00:00')
 11  );

Table created.

SQL>
SQL> insert into sales values ( timestamp '2011-01-01 00:00:00' , 0,0,0);

1 row created.

SQL> insert into sales values ( timestamp '2012-01-01 00:00:00' , 0,0,0);

1 row created.

SQL> insert into sales values ( timestamp '2013-01-01 00:00:00' , 0,0,0);

1 row created.

SQL>
SQL> col high_value format a36
SQL> col pname format a10
SQL>
SQL> select
  2    partition_name pname,
  3    partition_position pos,
  4    high_value,
  5    interval
  6  from   user_tab_partitions
  7  where  table_name = 'SALES';

PNAME             POS HIGH_VALUE                           INT
---------- ---------- ------------------------------------ ---
P00                 1 TIMESTAMP' 2010-01-01 00:00:00'      NO
SYS_P121            2 TIMESTAMP' 2012-01-01 00:00:00'      YES
SYS_P122            3 TIMESTAMP' 2013-01-01 00:00:00'      YES
SYS_P123            4 TIMESTAMP' 2014-01-01 00:00:00'      YES

4 rows selected.

SQL>
SQL> alter table sales drop partition p00;
alter table sales drop partition p00
                                 *
ERROR at line 1:
ORA-14758: Last partition in the range section cannot be dropped

There is a simple workaround for that issue. If you re-issue the INTERVAL definition for the table, all of the existing partitions will be “upgraded” (or should be it downgraded…I dunno) to being range partitions. Hence we will now have “moved” the starting point, and can then drop the problem partition.


SQL> alter table sales set interval( numtoyminterval(1,'YEAR'));

Table altered.

SQL> select
  2    partition_name pname,
  3    partition_position pos,
  4    high_value,
  5    interval
  6  from   user_tab_partitions
  7  where  table_name = 'SALES';

PNAME             POS HIGH_VALUE                           INT
---------- ---------- ------------------------------------ ---
P00                 1 TIMESTAMP' 2010-01-01 00:00:00'      NO
SYS_P121            2 TIMESTAMP' 2012-01-01 00:00:00'      NO
SYS_P122            3 TIMESTAMP' 2013-01-01 00:00:00'      NO
SYS_P123            4 TIMESTAMP' 2014-01-01 00:00:00'      NO

4 rows selected.

SQL> alter table sales drop partition p00;

Table altered.

So that’s all pretty easy, but of course, you must now run this “re-interval” command all the time to be sure that you will always be able to drop any partition you want.

Unless of course….. you’re on 12.2 ! Let’s repeat the demo on 12.2


SQL> create table sales
  2  ( tstamp       timestamp    not null,
  3    empno        number(10)   not null,
  4    ename        varchar2(10) not null,
  5    deptno       varchar2(10) not null
  6  )
  7  partition by range (tstamp)
  8  interval( numtoyminterval(1,'YEAR'))
  9  (
 10    partition p00 values less than (timestamp '2010-01-01 00:00:00')
 11  );

Table created.

SQL>
SQL> insert into sales values ( timestamp '2011-01-01 00:00:00' , 0,0,0);

1 row created.

SQL> insert into sales values ( timestamp '2012-01-01 00:00:00' , 0,0,0);

1 row created.

SQL> insert into sales values ( timestamp '2013-01-01 00:00:00' , 0,0,0);

1 row created.

SQL>
SQL> col high_value format a36
SQL> col pname format a10
SQL>
SQL> select
  2    partition_name pname,
  3    partition_position pos,
  4    high_value,
  5    interval
  6  from   user_tab_partitions
  7  where  table_name = 'SALES';

PNAME             POS HIGH_VALUE                           INT
---------- ---------- ------------------------------------ ---
P00                 1 TIMESTAMP' 2010-01-01 00:00:00'      NO
SYS_P3415           2 TIMESTAMP' 2012-01-01 00:00:00'      YES
SYS_P3416           3 TIMESTAMP' 2013-01-01 00:00:00'      YES
SYS_P3417           4 TIMESTAMP' 2014-01-01 00:00:00'      YES

4 rows selected.

SQL>
SQL> alter table sales drop partition p00;

Table altered.

SQL>
SQL> select
  2    partition_name pname,
  3    partition_position pos,
  4    high_value,
  5    interval
  6  from   user_tab_partitions
  7  where  table_name = 'SALES';

PNAME             POS HIGH_VALUE                           INT
---------- ---------- ------------------------------------ ---
SYS_P3415           1 TIMESTAMP' 2012-01-01 00:00:00'      NO
SYS_P3416           2 TIMESTAMP' 2013-01-01 00:00:00'      YES
SYS_P3417           3 TIMESTAMP' 2014-01-01 00:00:00'      YES

3 rows selected.

How cool is that! We now automatically modify one of the interval partitions to being a range partition, so you’ll not get the error.

The little features are often the coolest Smile

12c Release 2 – Transparent Data Encryption online !

It doesn’t take a rocket scientist to know that even if you have incredibly stringent controls on user authentication, user authorisation etc, that wont save you if your data on disk is not encrypted.  All you need is an errant tape, a missing disk, a misplaced flash stick…and kersplat, someone has a copy of your datafiles from your Oracle database.

Data at rest should be encrypted, but that often meant taking applications offline to do so.

I’ve put my Speed Racer hat on Smile and here’s a video on a new 12c Release 2 feature covered in 60 seconds !

Oracle Database 12c Release 2 is here !

image

Since OpenWorld 2016 when we first saw some of the cool features in Oracle Database 12c Release 2, many IT professionals out there have been exploring the release via our various cloud offerings, but if your organization has not yet embraced the cloud, then March 2017 is a great month for you !  Because you can now download the latest and greatest release of our database from the usual downloads page, and run it on your own servers in your own data centre.

Of course, there’s a difference between downloading and installing the software, and being up to speed with all of the great features that have come in 12c and 12c Release 2 so keep an eye on my blog, and on my YouTube channel.

Over the coming weeks I’ll be talking about lots of the new features in 12.2 in a short easy to digest videos to help you appreciate all of the goodness that is 12c Release 2 Smile

To whet your appetite, how cool is this simple enhancement to SQL Plus – command history !



SQL> history
  1  select * from dba_users where username = 'SYSTEM';
  2  select count(*)
     from dba_tables
     where table_Name like 'A%';
  3  select * from tab;
  4  show sga
  5  set timing off
  6  select * from dba_tablespaces;

SQL> history 4 run

Total System Global Area 1048576000 bytes
Fixed Size                  8795840 bytes
Variable Size             251660608 bytes
Database Buffers          784334848 bytes
Redo Buffers                3784704 bytes

A 12.2 treat for the festive season

We’re being asked to store more and more data, yet keep backup windows, query performance and the like unchanged, no matter how much we store. As a result, more and more database shops are needing to partition their data. The problem is – partitioning data is a significant restructure of the data, which thus incurs a large outage and the accompanying planning and coordination.

Unless you’re on 12.2.

Here’s a demo where we can take an existing table and

  • convert it to partitioned on a range scheme, also using intervals so we don’t need to worry about ongoing range creation maintenance, and
  • take one of the existing indexes, and convert it to a local index to mirror the table partitioning scheme, and
  • take the other existing index, and globally partition it.

Oh…and of course, we will do the whole thing online without service disruption

Oh…and of course, we will do it with a single SQL command.

Ab…So….Lute….Ly awesome !



SQL*Plus: Release 12.1.0.2.0 Production on Thu Dec 22 09:53:37 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Mon Dec 19 2016 13:38:54 +08:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.0.3 - 64bit Production


SQL> create table T as
  2  select d.*
  3  from dba_Objects d,
  4       ( select 1 from dual
  5         connect by level <= 20 )   6  where d.object_id is not null; Table created. SQL> create index IX on t ( object_id );

Index created.

SQL> create index IX2 on t ( created, object_name );

Index created.

SQL> alter table T modify
  2  partition by range (object_id) interval (10000)
  3  (
  4    partition p1 values less than (20000)
  5  ) online
  6  update indexes
  7  ( ix  local,
  8    ix2 global partition by range (created)
  9    (
 10     partition ix2_p1 values less than (date '2016-08-01'),
 11     partition ix2_p2 values less than (maxvalue)
 12   )
 13  );

Table altered.

SQL> select partition_name
  2  from   user_tab_partitions
  3  where  table_name = 'T';

PARTITION_NAME
--------------------------------------------------------------------------------------------------------------------------------
P1
SYS_P1145
SYS_P1146
SYS_P1147
SYS_P1148
SYS_P1149
SYS_P1150

7 rows selected.

SQL> select index_name, partition_name
  2  from   user_ind_partitions
  3  where  index_name like 'IX%';

INDEX_NAME                     PARTITION_NAME
------------------------------ ------------------------------
IX                             P1
IX                             SYS_P1145
IX                             SYS_P1146
IX                             SYS_P1147
IX                             SYS_P1148
IX                             SYS_P1149
IX                             SYS_P1150
IX2                            IX2_P1
IX2                            IX2_P2

9 rows selected.

SQL>