Cool stuff with partition elimination

Sometimes in the IT world, the term “surprise” is not a good one.

“I woke up this morning and got a surprise…my database was down.”

“I ran a SELECT COUNT(*) on my most important table, and got a surprise result of zero rows.”

and so forth. Generally as IT professionals, encountering the unexpected is not a good start to the day Smile.

But sometimes, surprises can be a nice thing.  Here is my example for today – when I found that the database can do a remarkably good job when it comes to reducing the workload with partitioned tables.  

Most people will already be familiar with the concept of partition elimination.  If you have a table partitioned into (say) yearly segments, and you ask for all of the data for the past 2 years, then the optimizer is intelligent enough to only scan the relevant partitions rather than the entire table.  Here is an example of that in action. I have a table that is partitioned by year on a END_DATE column, and then sub-partitioned into quarters based on a START_DATE column.


SQL> create table t
  2      partition by range( end_dt )
  3      subpartition by range( start_dt )
  4      (
  5         partition p_2014 values less than ( to_date('01-Jan-2015','dd-mon-yyyy') )
  6                 (
  7                         subpartition sp_2014_q1 values less than ( to_date('01-apr-2014','dd-mon-yyyy') ) ,
  8                         subpartition sp_2014_q2 values less than ( to_date('01-jul-2014','dd-mon-yyyy') ) ,
  9                         subpartition sp_2014_q3 values less than ( to_date('01-oct-2014','dd-mon-yyyy') ) ,
 10                         subpartition sp_2014_q4 values less than ( to_date('01-jan-2015','dd-mon-yyyy') )
 11                ) ,
 12        partition p_2015 values less than ( to_date('01-Jan-2016','dd-mon-yyyy') )
 13                (
 14                        subpartition sp_2015_q1 values less than ( to_date('01-apr-2015','dd-mon-yyyy') ) ,
 15                        subpartition sp_2015_q2 values less than ( to_date('01-jul-2015','dd-mon-yyyy') ) ,
 16                        subpartition sp_2015_q3 values less than ( to_date('01-oct-2015','dd-mon-yyyy') ) ,
 17                        subpartition sp_2015_q4 values less than ( to_date('01-jan-2016','dd-mon-yyyy') )
 18                )
 19     )
 20     as
 21     select a.* ,
 22        to_date('01-Jan-2014','dd-mon-yyyy') + mod(rownum,730)-1 as start_dt,
 23        to_date('01-Jan-2014','dd-mon-yyyy') + mod(rownum,730) as end_dt
 24     from all_objects a;

Table created.

SQL>
SQL> alter table t modify start_dt not null;

Table altered.

SQL> alter table t modify end_dt not null;

Table altered.

SQL>
SQL> set autotrace traceonly explain
SQL> select *
  2      from t
  3      where start_dt = to_date('13-Jan-2015','dd-mon-yyyy') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 2183203583

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL    |      |   101 | 15049 |   491   (1)| 00:00:01 |     1 |     2 |
|   2 |   PARTITION RANGE SINGLE|      |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
|*  3 |    TABLE ACCESS FULL    | T    |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_DT"=TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


You can see that we had to scan all of the partitions, but within each of the partitions we only had to scan a single sub-partition (as indicated by “PARTITION RANGE SINGLE”).  So for each year (based on END_DT) we scanned a single one of the 4 subpartitions.  We are doing only 25% of the work of scanning the entire table.  But one pleasant surprise I saw today was how the optimizer can take advantage of additional information to improve things even more.  Let us now add a fairly obvious rule about the data:


SQL>
SQL> alter table t add constraint t_chk check( start_dt < end_dt );

Table altered.


And have another look at that query execution plan.


SQL>
SQL> set autotrace traceonly explain
SQL> select *
  2      from t
  3      where start_dt = to_date('13-Jan-2015','dd-mon-yyyy') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3874588989

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |   101 | 15049 |   247   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE |      |   101 | 15049 |   247   (1)| 00:00:01 |     2 |     2 |
|   2 |   PARTITION RANGE SINGLE|      |   101 | 15049 |   247   (1)| 00:00:01 |     1 |     1 |
|*  3 |    TABLE ACCESS FULL    | T    |   101 | 15049 |   247   (1)| 00:00:01 |     5 |     5 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_DT"=TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "END_DT">TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

SQL>
SQL>

How cool is that!  The constraint let us remove even more partitions from consideration. In fact, we ended up only scanning a single partition. Moreover, we never specified END_DT as a predicate, but you can see in the FILTER section, we could synthesize such a predicate using the rule defined by the check constraint we added.

I suppose the moral of the story is two-fold here.

1) The optimizer can be pretty smart with partition elimination,

2) As always, never hide rules and facts about the data from the database.  Make sure you define those constraints in the database tier.

18c merge partition online

One of the cool things in 18c is the ability to merge partitions without causing a service interruption.  Here’s a video demonstration of that in action:

This is just an accompanying blog post to let you grab the scripts for the demo so that you can try this yourself on livesql.oracle.com, or any of the Oracle Cloud services that will be running 18c in the near future.

But also, make sure that you watch to the end of the video, because I also discuss some of the opportunities that this new feature provides.  I would contend that 99% of people using partitioning in their real world applications rarely change the size or structure of their partitions.  With online merge, you now have much more freedom in those choices.  For analytics, you might now be able to run workloads against your live Production transaction processing systems by utilizing a fine-grained partitioning regime, and later merging those partitions to become larger granules over time.

So try to think “outside of the box” on this one.  There could be a lot of cool benefits for you that you might not have initially considered.  The true feature is not “just” merge online – it is that you have new partitioning designs and implementations at your fingertips.


 create table sales
   ( ts        date,
     id        int,
     amt       number,
     completed varchar2(1)
   )
 partition by range ( ts )
 interval ( numtodsinterval(1,'DAY') )
 (
   partition p1 values less than ( date '2017-01-01' ),
   partition p2 values less than ( date '2017-02-01' ),
   partition p3 values less than ( date '2017-03-01' ),
   partition p4 values less than ( date '2017-04-01' ),
   partition p5 values less than ( date '2017-05-01' )
 );

 insert /*+ APPEND */ into sales
 select date '2017-01-01' - 10 + rownum / 10000 ,rownum, dbms_random.value(1,100), null
 from dual
 connect by level < 10000*120;


 exec dbms_stats.gather_table_stats('','SALES') 

 select partition_name, num_rows
 from user_tab_partitions
 where table_name = 'SALES'
 order by partition_position;

 create index sales_ix on sales ( id );

 alter table sales merge partitions p3, p4 
     into partition p3a update indexes ONLINE;

iASH–my “infinite ASH” routine

I love Active Session History (ASH) data because a lot of the work I’ve done in my consulting life was “after the fact” diagnosis.  By this I mean that many of us have been in a similar circumstance where the customer will contact you not when a problem is occurring, but only when you contact them for some other potentially unrelated reason.  At which point you hear will that dreaded sentence:

“Yeah, the Order Entry screen was really slow a couple of hours ago

And this is where ASH is an awesome resource.  With the ASH data available, there is a good chance you will be able to diagnose the issue without having to make an embarrassing request for the customer to repeat the task so that you can trace the underlying database activity.  Because no-one likes to be the person that says:

“Yeah that performance must have really sucked for you … Hey, let’s do it again!”

But even ASH has it’s limitations because sometimes the customer sentence is phrased slightly differently Smile

“Yeah, the Order Entry screen was really slow last Tuesday

In this case, it is unlikely that the ASH data will still be available.  Whilst a subset of the invaluable ASH data is retained in DBA_HIST_ACTIVE_SESS_HISTORY, I would prefer to keep the complete set of ASH data available for longer than the timespan for which it is practical (due to the limitations of memory). So I wrote myself a simple little routine that keep all of the ASH data around for longer.  It’s hardly rocket science – just a little partitioned table to capture V$ACTIVE_SESSION_HISTORY at regular intervals.  Let’s walk through it so you can understand it and adapt it for your own use.

First I’ll create a partitioned table to hold the ASH data.  I’m using partitioning to avoid the need to index the table, so the insertion cost is minimal.  I’m partitioning by day and the code assumes this, so take care if you intend to modify it.


SQL> create table ash_hist
  2  partition by range (sample_time)
  3  interval( numtodsinterval(1,'day'))
  4  (partition p1 values less than (timestamp' 2017-01-01 00:00:00'))
  5  as select * from sys.gv_$active_session_history;

Table created.

Here is my procedure to capture the data.  The essentials of the routine are:

  • Starting with the most recent partition, find the last recorded entry in ASH_HIST.  We’ll look back up to 10 days to find our starting point (hence the daily partitions).
  • If there is no data for the last 10 days, we’ll bomb out, because we haven’t been running the routine frequently enough.
  • Copy all the ASH data from this point to now into ASH_HIST using a nice efficient INSERT-APPEND, but we’ll skip the session that is doing the copying. (You can include it if you want just by removing line 8)
  • Once per week (you can control this by tinkering with the IF conditions on line 34) we’ll drop the oldest partitions.  By default I keep 90 days, but you can set this by altering “l_retention” on line 5.

SQL>
SQL> CREATE OR REPLACE procedure save_ash_hist is
  2    l_hi_val_as_string varchar2(1000);
  3    l_hi_val_as_date   date;
  4    l_max_recorded     timestamp;
  5    l_retention        number := 90;
  6
  7  begin
  8    dbms_application_info.set_module('$$SAVE_ASH$$','');
  9    -- we are looping to take advantage
 10    -- of partition elimination
 11
 12    for i in 0 .. 10 loop
 13       select max(sample_time)
 14       into   l_max_recorded
 15       from   ash_hist
 16       where  sample_time > systimestamp - i;
 17
 18       exit when l_max_recorded is not null;
 19    end loop;
 20
 21    if l_max_recorded is null then
 22      raise_application_error(-20000,'No max sample time with 10 days');
 23    end if;
 24    dbms_output.put_line('Last copied time was '||l_max_recorded);
 25
 26    insert /*+ append */ into ash_hist
 27    select *
 28    from sys.gv_$active_session_history
 29    where sample_time > l_max_recorded
 30    and   ( module != '$$SAVE_ASH$$' or module is null );
 31    dbms_output.put_line('Copied '||sql%rowcount||' rows');
 32    commit;
 33
 34    if to_char(sysdate,'DYHH24') between 'TUE01' and 'TUE06' then
 35
 36      begin
 37        execute immediate 'alter table ash_hist set interval ()';
 38      exception
 39        when others then null;
 40      end;
 41      execute immediate 'alter table ash_hist set interval (NUMTODSINTERVAL(1,''DAY''))';
 42
 43      for i in ( select *
 44                 from   user_tab_partitions
 45                 where  table_name = 'ASH_HIST'
 46                 and    partition_position > 1
 47                 order by partition_position )
 48      loop
 49        l_hi_val_as_string := i.high_value;
 50        execute immediate 'select '||l_hi_val_as_string||' from dual' into l_hi_val_as_date;
 51
 52        if l_hi_val_as_date < sysdate - l_retention then
 53          execute immediate 'alter table ash_hist drop partition '||i.partition_name;
 54        else
 55          exit;
 56        end if;
 57
 58      end loop;
 59    end if;
 60  end;
 61  /

Procedure created.

And that is all there is to it.  Each time we run the procedure, we’ll grab all the ASH data since the last time we ran and keep it in ASH_HIST.


SQL>
SQL> select count(*) from ash_hist;

  COUNT(*)
----------
       792

1 row selected.


SQL>
SQL> exec save_ash_hist

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from ash_hist;

  COUNT(*)
----------
       819

1 row selected.

A simple scheduler job to run the routine every couple of hours (I’m assuming your SGA holds at least 2 hours of samples in V$ACTIVE_SESSION_HISTORY – if not, you’d need to adjust the frequency) and you’re off and running.


SQL>
SQL>
SQL> BEGIN
  2      dbms_scheduler.create_job (
  3         job_name           =>  'ASH_CAPTURE',
  4         job_type           =>  'PLSQL_BLOCK',
  5         job_action         =>  'save_ash_hist;',
  6         start_date         =>  CAST((TRUNC(SYSDATE,'HH') + (1/24) + (55/24/60)) AS TIMESTAMP), -- job commences at 55 mins past the next hour
  7         repeat_interval    =>  'FREQ=HOURLY; INTERVAL=2',
  8         enabled            =>  true,
  9         comments           =>  'Permanent record of ASH data');
 10  END;
 11  /

PL/SQL procedure successfully completed.

You can assume all of the standard disclaimers here. Use at own risk, blah blah, no warranty, blah blah, etc

Addenda:  I should add that you could write this complete level of detail directly to DBA_HIST_ACTIVE_SESS_HISTORY via tinkering with “_ash_disk_filter_ratio” , but please get the endorsement of Support first.

Partition-wise join

So just what is a “partition-wise” join ?  We will use a metaphor to hopefully Smile explain the benefit.

image

Let’s say two people, Logan and Shannon, decide to move in together.  If each of them already have an existing residence, they will both have a lot of the common items that you find in any household.  So they have a decision to make – do they keep two of everything, or do they have a bit of a “cull” of things that they have in common.  In this imaginary scenario, we will focus on household items in the bathroom and the kitchen.  Logan grabs a set of kitchen knives a knife block, calls Shannon and asks: “Hey Shannon, do you already have a knife block?”

What do you think Shannon will do ? Search the entire house for an existing knife block ?  Of course not.  If there is a knife block, then the only place it will be located will be in the kitchen.  In fact, when matching up the items throughout the house, Shannon and Logan will restrict their investigation to the room that makes sense for the item in question.  That is just common sense – why would anyone search in the bathroom for (say) forks and spoons ?  It would just be a waste of effort.

(Editors Note:  Anyone with young children will of course dispute this metaphor, stating quite correctly that you can probably find every possible household item in every possible room, and probably outside as well Smile but we’ll omit that possibility for the sake of this discussion)

image

And that is exactly what a partition-wise join enables us to do in the database.  If two tables are partitioned with the same definition, and we are joining on the partition key, then that definition guarantees that for a row in one table with partition key “K” and hence partition “P”, we only need to seek that row in the same partition in the table we are joining to (where “same” is based on the partitioning definition).  It is the partitioning equivalent of “only searching in the kitchen and not the bathroom”.  We can see this via the execution plan when doing such a join.  Let’s create two tables with equal partition definitions and then join on the partition key.


SQL> --
SQL> -- Example 1
SQL> --
SQL>
SQL> drop table t1 purge;

Table dropped.

SQL> drop table t2 purge;

Table dropped.

SQL>
SQL> create table t1 ( x int, y int )
  2  partition by range ( x )
  3  (
  4  partition p_kitchen values less than (10000),
  5  partition p_bathroom values less than (20000),
  6  partition p_dining values less than (30000)
  7  );

Table created.

SQL>
SQL>
SQL> create table t2 ( x int, y int )
  2  partition by range ( x )
  3  (
  4  partition p_kitchen values less than (10000),
  5  partition p_bathroom values less than (20000),
  6  partition p_dining values less than (30000)
  7  );

Table created.

SQL>
SQL>
SQL> insert into t1 select rownum, rownum from dual connect by level < 30000;

29999 rows created.

SQL> insert into t2 select * from t1;

29999 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','t1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('','t2')

PL/SQL procedure successfully completed.

SQL>
SQL> --
SQL> set autotrace traceonly explain
SQL> select count(t1.y), count(t2.y)
  2  from t1,t2
  3  where t1.x = t2.x;

Execution Plan
----------------------------------------------------------
Plan hash value: 3155849676

---------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    20 |  1641   (1)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE      |      |     1 |    20 |            |          |       |       |
|   2 |   PARTITION RANGE ALL|      | 29999 |   585K|  1641   (1)| 00:00:01 |     1 |     3 |
|*  3 |    HASH JOIN         |      | 29999 |   585K|  1641   (1)| 00:00:01 |       |       |
|   4 |     TABLE ACCESS FULL| T1   | 29999 |   292K|   820   (1)| 00:00:01 |     1 |     3 |
|   5 |     TABLE ACCESS FULL| T2   | 29999 |   292K|   820   (1)| 00:00:01 |     1 |     3 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T1"."X"="T2"."X")

SQL> set autotrace off
SQL>

The key part of the execution plan here is that the HASH JOIN is occurring within (or “under”) the PARTITION RANGE ALL iteration.  This can be interpreted as: “Start with the first partition in each table, do a hash join on that partition.  Then move onto the next partition; do a hash join on that partition”, and so on.  This is efficient on resources because at no point are we trying (and obviously failing) to join a row from table T1 partition P_KITCHEN to table T2 partition P_BATHROOM or P_DINING.  Each hash join is a smaller operation and hence also more likely to be completed in the available PGA allocation for that session.  Also, when it comes to running such a query in parallel, then each parallel slave can tackle the job of handling a partition in isolation to the other slaves.

If the partitions do not align (see the Editors note above Smile), then our join will not be as efficient.


SQL> --
SQL> -- Example 2
SQL> --
SQL>
SQL>
SQL> drop table t2 purge;

Table dropped.

SQL> create table t2 ( x int, y int )
  2  partition by range ( x )
  3  (
  4  partition p1 values less than (15000),
  5  partition p3 values less than (30000)
  6  );

Table created.

SQL>
SQL> --
SQL> -- all partitions do NOT align, so we do NOT see partition-wise join
SQL> --
SQL>
SQL> insert into t2 select * from t1;

29999 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','t2')

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> select count(t1.y), count(t2.y)
  2  from t1,t2
  3  where t1.x = t2.x;

Execution Plan
----------------------------------------------------------
Plan hash value: 666786458

---------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |     1 |    20 |  1369   (1)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE               |         |     1 |    20 |            |          |       |       |
|*  2 |   HASH JOIN                   |         | 29999 |   585K|  1369   (1)| 00:00:01 |       |       |
|   3 |    PART JOIN FILTER CREATE    | :BF0000 | 29999 |   585K|  1369   (1)| 00:00:01 |       |       |
|   4 |     PARTITION RANGE ALL       |         | 29999 |   292K|   820   (1)| 00:00:01 |     1 |     3 |
|   5 |      TABLE ACCESS FULL        | T1      | 29999 |   292K|   820   (1)| 00:00:01 |     1 |     3 |
|   6 |    PARTITION RANGE JOIN-FILTER|         | 29999 |   292K|   548   (1)| 00:00:01 |:BF0000|:BF0000|
|   7 |     TABLE ACCESS FULL         | T2      | 29999 |   292K|   548   (1)| 00:00:01 |:BF0000|:BF0000|
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."X"="T2"."X")

Note
-----
   - this is an adaptive plan

SQL> set autotrace off
SQL>
SQL>

The key element here is that the HASH JOIN now sits above the cycling through all of the partitions.  In earlier releases of Oracle, you would not see the line containing the :BF0000, so it would be a simple join across all the rows as if the tables were not partitioned at all.  But when the partitions do not align, things are slightly better in modern releases.  We use a “Bloom filter” (hence the :BF prefix) to reduce the overhead of joining the two tables.  Since I’m using metaphors in this post, think of “phoning ahead” to the cinema to see if there are seats available for your favourite movie.  If the cinema owner says the movie is sold out, you have saved yourself a car trip. But just because the owner says there are seats available, it is still possible you might drive there and find that the movie has sold out during that time.  A Bloom filter is like phoning ahead – there’s a good chance you can avoid some work, but it is not a guarantee.  You can read about Bloom filters here in a great whitepaper by Christian Antognini.

Note that all of the partitions must align. Here is an example where the first three partitions are in alignment, having boundaries are 10000, 20000 and 30000, but our second table T2 has an additional partition defined.  Once again, we fall back to the Bloom filter option.


SQL> --
SQL> -- Example 3
SQL> --
SQL> drop table t2 purge;

Table dropped.

SQL> create table t2 ( x int, y int )
  2  partition by range ( x )
  3  (
  4  partition p1 values less than (10000),
  5  partition p2 values less than (20000),
  6  partition p3 values less than (30000),
  7  partition p4 values less than (40000)
  8  );

Table created.

SQL>
SQL> --
SQL> -- all partitions do NOT align, so we do NOT see partition-wise join
SQL> --
SQL>
SQL> insert into t2 select rownum, rownum from dual connect by level < 40000;

39999 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','t2')

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> select count(t1.y), count(t2.y)
  2  from t1,t2
  3  where t1.x = t2.x;

Execution Plan
----------------------------------------------------------
Plan hash value: 666786458

---------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |     1 |    20 |  1913   (1)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE               |         |     1 |    20 |            |          |       |       |
|*  2 |   HASH JOIN                   |         | 29999 |   585K|  1913   (1)| 00:00:01 |       |       |
|   3 |    PART JOIN FILTER CREATE    | :BF0000 | 29999 |   585K|  1913   (1)| 00:00:01 |       |       |
|   4 |     PARTITION RANGE ALL       |         | 29999 |   292K|   820   (1)| 00:00:01 |     1 |     3 |
|   5 |      TABLE ACCESS FULL        | T1      | 29999 |   292K|   820   (1)| 00:00:01 |     1 |     3 |
|   6 |    PARTITION RANGE JOIN-FILTER|         | 39999 |   390K|  1093   (1)| 00:00:01 |:BF0000|:BF0000|
|   7 |     TABLE ACCESS FULL         | T2      | 39999 |   390K|  1093   (1)| 00:00:01 |:BF0000|:BF0000|
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."X"="T2"."X")

Note
-----
   - this is an adaptive plan

SQL> set autotrace off
SQL>
SQL>
SQL>


So faster queries on partitioned tables is not just about partition pruning.  Partition-wise joins also can make a beneficial impact on query response times.

Updating indexes with partition maintenance

An index is basically a structure that maps keys (values) in columns to the physical location of their corresponding rows in a table.  So if you move the rows (ie, change the physical location of a row) then the index entries for those rows need to be updated, or the index is no longer usable.  And as most people are aware, the latter is the default when you perform a partition maintenance operation on a table that re-locates rows.  For example, we’ll create a simple partitioned table, add both a local and a global index, split one of the partitions, and then see what the impact on those indexes is.


SQL>
SQL> drop table t purge;

Table dropped.

SQL>
SQL> create table t (x int, y int, z int )
  2  partition by range (x)
  3  (
  4    partition p1 values less than (4000),
  5    partition p2 values less than (8000)
  6  );

Table created.

SQL>
SQL> insert into t select q*1000,rownum,rownum from dba_objects, ( select rownum q from dual connect by level <= 7 );

548338 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> create index ix_local on t ( y ) local;

Index created.

SQL> create index ix_global on t ( z ) ;

Index created.

SQL>
SQL> alter table t split partition p2 at (6000)
  2    into ( partition p2a, partition p2b ) ;

Table altered.

SQL>
SQL> select index_name, partition_name, status
  2  from user_ind_partitions
  3  where index_name like 'IX%';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
IX_LOCAL                       P1                   USABLE
IX_LOCAL                       P2A                  UNUSABLE
IX_LOCAL                       P2B                  UNUSABLE

3 rows selected.

SQL>
SQL> select index_name, status
  2  from user_indexes
  3  where index_name like 'IX%';

INDEX_NAME                     STATUS
------------------------------ --------
IX_GLOBAL                      UNUSABLE
IX_LOCAL                       N/A

2 rows selected.

SQL>
SQL>
SQL>

The result is what we would expect.  The global index has become unusable because a portion of the rows that had entries in the index have been moved as part of the partition split operation.  The local index has also become unusable but only for those partitions that were impacted by the split.  Partition P1 was not affected by the split, and hence the corresponding local index partition is still valid.  Just a quick footnote – the “N/A” in USER_INDEXES for the local index represents that the STATUS of the local index should be determined by looking at the status for each underlying partition.  In early releases of Oracle this could create problems in terms of availability for partitioned tables, because if you tried to use an index that had become unusable, you would get an error: ORA-01502: index or partition of such index is in unusable state,  which is not the most pleasant thing to be sending back to your application users Smile

In Oracle 9, a workaround to the problem was introduced, the UPDATE GLOBAL INDEXES clause.  This keeps the (global) index entries valid by correcting the index entries as the partition operation is performed.  This has some costs in terms of resource consumption, because obviously maintaining a large amount of index entries will consume CPU and redo, but it kept the index available to application users.  Here’s a demo of that in action:


SQL> drop table t purge;

Table dropped.

SQL>
SQL> create table t (x int, y int, z int )
  2  partition by range (x)
  3  (
  4    partition p1 values less than (4000),
  5    partition p2 values less than (8000)
  6  );

Table created.

SQL>
SQL> insert into t select q*1000,rownum,rownum from dba_objects, ( select rownum q from dual connect by level <= 7 );

548338 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> create index ix_local on t ( y ) local;

Index created.

SQL> create index ix_global on t ( z ) ;

Index created.

SQL>
SQL> alter table t split partition p2 at (6000)
  2    into ( partition p2a, partition p2b )
  3    update global indexes;

Table altered.

SQL>
SQL> select index_name, partition_name, status
  2  from user_ind_partitions
  3  where index_name like 'IX%';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
IX_LOCAL                       P1                   USABLE
IX_LOCAL                       P2A                  UNUSABLE
IX_LOCAL                       P2B                  UNUSABLE

3 rows selected.

SQL>
SQL> select index_name, status
  2  from user_indexes
  3  where index_name like 'IX%';

INDEX_NAME                     STATUS
------------------------------ --------
IX_GLOBAL                      VALID
IX_LOCAL                       N/A

2 rows selected.

SQL>
SQL> alter index ix_local rebuild partition P2A;

Index altered.

SQL> alter index ix_local rebuild partition P2B;

Index altered.

But notice that the local index has not been spared from being marked unusable.  We had to rebuild each local partition after the operation. You might be thinking that only having UPDATE GLOBAL INDEXES was a hence a waste of time, but don’t forget that some partition operations, such as DROP and TRUNCATE do not impact local indexes, so updating the global index entries is all that is required:


SQL> alter table t drop partition p2a update global indexes;

Table altered.

SQL>
SQL> select index_name, partition_name, status
  2  from user_ind_partitions
  3  where index_name like 'IX%';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
IX_LOCAL                       P1B                  USABLE
IX_LOCAL                       P2B                  USABLE

2 rows selected.

SQL>
SQL> select index_name,  status
  2  from user_indexes
  3  where index_name like 'IX%';

INDEX_NAME                     STATUS
------------------------------ --------
IX_GLOBAL                      VALID
IX_LOCAL                       N/A

2 rows selected.

SQL>
SQL>

But if you do have existing scripts that contain the UPDATE GLOBAL INDEXES clause, it is perhaps worth revisiting them to take advantage of the more complete implementation of this facility, which arrived in Oracle 11.  The UPDATE INDEXES clause will take care of both global and local indexes during partition maintenance operations.


SQL>
SQL> alter table t split partition p1 at (2000)
  2    into ( partition p1a, partition p1b )
  3    update indexes;

Table altered.

SQL>
SQL> select index_name, partition_name, status
  2  from user_ind_partitions
  3  where index_name like 'IX%';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
IX_LOCAL                       P1A                  USABLE
IX_LOCAL                       P1B                  USABLE
IX_LOCAL                       P2A                  USABLE
IX_LOCAL                       P2B                  USABLE

4 rows selected.

SQL>
SQL> select index_name,  status
  2  from user_indexes
  3  where index_name like 'IX%';

INDEX_NAME                     STATUS
------------------------------ --------
IX_GLOBAL                      VALID
IX_LOCAL                       N/A

2 rows selected.

SQL>
SQL> alter table t drop partition p1a update indexes;

Table altered.

SQL>
SQL> select index_name, partition_name, status
  2  from user_ind_partitions
  3  where index_name like 'IX%';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
IX_LOCAL                       P1B                  USABLE
IX_LOCAL                       P2A                  USABLE
IX_LOCAL                       P2B                  USABLE

3 rows selected.

SQL>
SQL> select index_name,  status
  2  from user_indexes
  3  where index_name like 'IX%';

INDEX_NAME                     STATUS
------------------------------ --------
IX_GLOBAL                      VALID
IX_LOCAL                       N/A

2 rows selected.

SQL>

Not that I can see a real need for it, but it’s worth noting that these are your only two options.  There is no such thing as keeping only the local indexes entries updated and not the global ones.


SQL>  alter table t split partition p2 at (6000)
  2      into ( partition p2a, partition p2b ) update local indexes;
    into ( partition p2a, partition p2b ) update local indexes
                                                 *
ERROR at line 2:
ORA-00905: missing keyword

Finally, remember that since Oracle 10g we decided that it is better to allow a query to run slowly rather than have it crash with an error, so the default value for “skip_unusable_indexes” is now TRUE.  So if you have indexes that have a status of UNUSABLE, you will not detect this via errors in your SQL queries.  The queries will just be optimized as if the indexes did not exist, which might have an impact on their execution time.  Here is a a simple starting point for a monitoring query to keep an eye on unusable indexes (and rebuild them if necessary):


SQL> set serverout on
SQL> declare
  2    procedure ddl(p_cmd varchar2) is
  3    begin
  4      --execute immediate p_cmd;
  5      dbms_output.put_line(p_cmd);
  6    end;
  7  begin
  8      for i in (
  9          select index_owner, index_name, partition_name, 'partition' ddl_type
 10          from all_ind_partitions
 11          where status = 'UNUSABLE'
 12          union all
 13          select index_owner, index_name, subpartition_name, 'subpartition' ddl_type
 14          from all_ind_subpartitions
 15          where status = 'UNUSABLE'
 16          union all
 17          select owner, index_name, null, null
 18          from all_indexes
 19          where status = 'UNUSABLE'
 20      )
 21      loop
 22        if i.ddl_type is null then
 23          ddl('alter index '||i.index_owner||'.'||i.index_name||' rebuild');
 24        else
 25          ddl('alter index '||i.index_owner||'.'||i.index_name||' modify '||i.ddl_type||' '||i.partition_name||' rebuild');
 26        end if;
 27      end loop;
 28  end;
 29  /
alter index MCDONAC.T_PAR_IX modify partition P1 rebuild
alter index MCDONAC.T_PAR_IX modify partition P2 rebuild

PL/SQL procedure successfully completed.

And finally, as Tim reminded me, with 12c Release 2, many partition operations (and many other maintenance operations as well) can now be done online simply by specifying ONLINE as a suffix to the operation. That (as the name suggests) will keep all indexes in a USABLE state.


SQL>  create table t (x int, y int, z int )
  2      partition by range (x)
  3      (
  4        partition p1 values less than (4000),
  5        partition p2 values less than (8000)
  6      );

Table created.

SQL>
SQL> insert into t select q*1000,rownum,rownum from dba_objects, ( select rownum q from dual connect by level <= 7 );

547862 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> create index ix_local on t ( y ) local;

Index created.

SQL>
SQL> create index ix_global on t ( z ) ;

Index created.

SQL>
SQL> alter table t split partition p2 at (6000)
  2      into ( partition p2a, partition p2b ) online;

Table altered.

SQL> select index_name, partition_name, status
  2  from user_ind_partitions
  3  where index_name like 'IX%';

INDEX_NAME                     PARTITION_NAME                 STATUS
------------------------------ ------------------------------ --------
IX_LOCAL                       P1                             USABLE
IX_LOCAL                       P2A                            USABLE
IX_LOCAL                       P2B                            USABLE

3 rows selected.

SQL>
SQL> select index_name, status
  2  from user_indexes
  3  where index_name like 'IX%';

INDEX_NAME                     STATUS
------------------------------ --------
IX_GLOBAL                      VALID
IX_LOCAL                       N/A

2 rows selected.



12c Statistics on load–special cases

One of the cool features in 12c is the automatic collection of optimizer statistics when a table is either created or loaded via direct path from empty.  This makes a lot of sense because it saves us from what used to be the mandatory second step of gathering statistics whenever we loaded an empty table with data.

For example

11.2.0.4


SQL> create table t1 as
  2  select * from dba_objects;

Table created.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
(null)

SQL> select count(*) from t1;

  COUNT(*)
----------
     86802

So at this point, we’d then have to scan the table we just loaded with a DBMS_STATS call in order to come up with some representative statistics. This was fixed in 12c with “statistics on load”.

12c 


SQL> create table t1 as
  2  select * from dba_objects;

Table created.

SQL> – no dbms_stats call
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
     78170

But this facility has been blogged about extensively, so I wont labour the point. This post is more about just a couple of things to be aware of when taking advantage of the facility.

Partitioned tables

The 12c feature is for statistics on load of the table. So if you are creating/loading a partitioned table, whilst you will still get statistics collected, they are at the table level only. For example,


SQL> create table t1
  2  partition by range ( object_id )
  3  interval ( 20000 )
  4  (
  5    partition p1 values less than ( 20000 ),
  6    partition p2 values less than ( 200000 )
  7  )
  8  as select d.* from dba_objects d
  9  where object_id is not null;

Table created.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
     78165

SQL>
SQL> select partition_name, num_rows
  2  from user_tab_partitions
  3  where table_name = 'T1';

PARTITION_   NUM_ROWS
---------- ----------
P1         (null)
P2         (null)

Notice that the partition level statistics are not collected. Interestingly, indexes however will be ok whether they are created during or after the load.


--
-- indexes created after the load on the table above
--
SQL> create index ix1 on t1 ( object_id ) local;

Index created.

SQL> create index ix2 on t1 ( object_name);

Index created.

SQL>
SQL> select index_name, leaf_blocks
  2  from user_indexes
  3  where index_name in ( 'IX1','IX2');

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
IX1                                    173
IX2                                    535

2 rows selected.

SQL>
SQL> select index_name, partition_name, leaf_blocks
  2  from user_ind_partitions
  3  where index_name in ( 'IX1','IX2');

INDEX_NAME                     PARTITION_ LEAF_BLOCKS
------------------------------ ---------- -----------
IX1                            P1                  43
IX1                            P2                 130

--
-- indexes created as part of the table creation process
--
SQL> create table t1 (
  2     owner
  3    ,object_name
  4    ,subobject_name
  5    ,object_id
  6    ,data_object_id
  7    ,object_type
  8    ,created
  9    ,last_ddl_time
 10    ,timestamp
 11    ,status
 12    ,temporary
 13    ,generated
 14    ,secondary
 15    ,namespace
 16    ,edition_name
 17    ,sharing
 18    ,editionable
 19    ,oracle_maintained
 20    ,application
 21    ,default_collation
 22    ,duplicated
 23    ,sharded
 24    ,created_appid
 25    ,created_vsnid
 26    ,modified_appid
 27    ,modified_vsnid
 28    ,constraint pk primary key ( object_id ) using index local
 29    ,constraint uq unique ( owner, object_id ) using index
 30  )
 31  partition by range ( object_id )
 32  interval ( 20000 )
 33  (
 34    partition p1 values less than ( 20000 ),
 35    partition p2 values less than ( 200000 )
 36  )
 37  as select d.* from dba_objects d
 38  where object_id is not null;

Table created.

SQL>
SQL>
SQL> select index_name, leaf_blocks
  2  from user_indexes
  3  where index_name in ('PK','UQ');

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
PK                                     163
UQ                                     263

2 rows selected.

SQL>
SQL> select index_name, partition_name, leaf_blocks
  2  from user_ind_partitions
  3  where index_name in ('PK','UQ');

INDEX_NAME                     PARTITION_ LEAF_BLOCKS
------------------------------ ---------- -----------
PK                             P1                  41
PK                             P2                 122

2 rows selected.

CTAS empty is still OK

A common trick for creating an empty table whilst copying the definition of an existing table is to use a predicate with an always false condition. For example,


SQL> create table t1 as
  2  select * from dba_objects
  3  where 1=0;

Table created.

Statistics on load will still take place for such a table, ie


SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
         0

Now you might be worried that since there are now statistics on the table, that perhaps a direct load using INSERT /*+ APPEND */,  which would normally perform statistics on load will no longer work. But there is no need to panic. We’ll still determine that the table is empty, and do another statistics on load operation when you perform that direct load.


SQL> insert /*+ APPEND */ into t1
  2  select * from dba_objects;

78171 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
     78171

But this also has implications if you want to retain some existing statistics on the table. Because when we truncate the table, that same reset of statistics collection can occur.


SQL> create table t1 as
  2  select * from dba_objects
  3  where rownum <= 10;

Table created.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
        10

SQL>
SQL> truncate table t1;

Table truncated.

SQL>
SQL> insert /*+ APPEND */ into t1
  2  select * from dba_objects;

78171 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
     78171

Similarly, all you need is one transaction that “brings the table to life” for automatic statistics collection to be no longer active. Even if that transactions rolls back. For example:


SQL> create table t1 as
  2  select * from dba_objects
  3  where 1=0;

Table created.

SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
         0

SQL> -- a normal mode insert of a row
SQL> insert into t1
  2  select * from dba_objects where rownum = 1;

1 row created.

SQL>
SQL> rollback;

Rollback complete.

SQL>
SQL> insert /*+ APPEND */ into t1
  2  select * from dba_objects;

78171 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
         0

All of these behaviour are pretty much how I would expect them to work (your opinion may differ of course Smile), and I know that some people have opted to disable the automatic collection altogether to avoid any ambiguity but that strikes me as overkill.

If you are really concerned about it, it is easy enough to suffix your load routines with a simple wrapper function to check the statistics and issue the appropriate DBMS_STATS call to make up the shortfall.   And it would be remiss of me not to mention the latest optimizer whitepaper which covers in detail other optimizer enhancements in 12c. 

Happy stats collecting !