Cool stuff with partition elimination

Posted by

Sometimes in the IT world, the term “surprise” is not a good one.

“I woke up this morning and got a surprise…my database was down.”

“I ran a SELECT COUNT(*) on my most important table, and got a surprise result of zero rows.”

and so forth. Generally as IT professionals, encountering the unexpected is not a good start to the day Smile.

But sometimes, surprises can be a nice thing.  Here is my example for today – when I found that the database can do a remarkably good job when it comes to reducing the workload with partitioned tables.  

Most people will already be familiar with the concept of partition elimination.  If you have a table partitioned into (say) yearly segments, and you ask for all of the data for the past 2 years, then the optimizer is intelligent enough to only scan the relevant partitions rather than the entire table.  Here is an example of that in action. I have a table that is partitioned by year on a END_DATE column, and then sub-partitioned into quarters based on a START_DATE column.


SQL> create table t
  2      partition by range( end_dt )
  3      subpartition by range( start_dt )
  4      (
  5         partition p_2014 values less than ( to_date('01-Jan-2015','dd-mon-yyyy') )
  6                 (
  7                         subpartition sp_2014_q1 values less than ( to_date('01-apr-2014','dd-mon-yyyy') ) ,
  8                         subpartition sp_2014_q2 values less than ( to_date('01-jul-2014','dd-mon-yyyy') ) ,
  9                         subpartition sp_2014_q3 values less than ( to_date('01-oct-2014','dd-mon-yyyy') ) ,
 10                         subpartition sp_2014_q4 values less than ( to_date('01-jan-2015','dd-mon-yyyy') )
 11                ) ,
 12        partition p_2015 values less than ( to_date('01-Jan-2016','dd-mon-yyyy') )
 13                (
 14                        subpartition sp_2015_q1 values less than ( to_date('01-apr-2015','dd-mon-yyyy') ) ,
 15                        subpartition sp_2015_q2 values less than ( to_date('01-jul-2015','dd-mon-yyyy') ) ,
 16                        subpartition sp_2015_q3 values less than ( to_date('01-oct-2015','dd-mon-yyyy') ) ,
 17                        subpartition sp_2015_q4 values less than ( to_date('01-jan-2016','dd-mon-yyyy') )
 18                )
 19     )
 20     as
 21     select a.* ,
 22        to_date('01-Jan-2014','dd-mon-yyyy') + mod(rownum,730)-1 as start_dt,
 23        to_date('01-Jan-2014','dd-mon-yyyy') + mod(rownum,730) as end_dt
 24     from all_objects a;

Table created.

SQL>
SQL> alter table t modify start_dt not null;

Table altered.

SQL> alter table t modify end_dt not null;

Table altered.

SQL>
SQL> set autotrace traceonly explain
SQL> select *
  2      from t
  3      where start_dt = to_date('13-Jan-2015','dd-mon-yyyy') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 2183203583

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL    |      |   101 | 15049 |   491   (1)| 00:00:01 |     1 |     2 |
|   2 |   PARTITION RANGE SINGLE|      |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
|*  3 |    TABLE ACCESS FULL    | T    |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_DT"=TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


You can see that we had to scan all of the partitions, but within each of the partitions we only had to scan a single sub-partition (as indicated by “PARTITION RANGE SINGLE”).  So for each year (based on END_DT) we scanned a single one of the 4 subpartitions.  We are doing only 25% of the work of scanning the entire table.  But one pleasant surprise I saw today was how the optimizer can take advantage of additional information to improve things even more.  Let us now add a fairly obvious rule about the data:


SQL>
SQL> alter table t add constraint t_chk check( start_dt < end_dt );

Table altered.


And have another look at that query execution plan.


SQL>
SQL> set autotrace traceonly explain
SQL> select *
  2      from t
  3      where start_dt = to_date('13-Jan-2015','dd-mon-yyyy') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3874588989

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |   101 | 15049 |   247   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE |      |   101 | 15049 |   247   (1)| 00:00:01 |     2 |     2 |
|   2 |   PARTITION RANGE SINGLE|      |   101 | 15049 |   247   (1)| 00:00:01 |     1 |     1 |
|*  3 |    TABLE ACCESS FULL    | T    |   101 | 15049 |   247   (1)| 00:00:01 |     5 |     5 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_DT"=TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "END_DT">TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

SQL>
SQL>

How cool is that!  The constraint let us remove even more partitions from consideration. In fact, we ended up only scanning a single partition. Moreover, we never specified END_DT as a predicate, but you can see in the FILTER section, we could synthesize such a predicate using the rule defined by the check constraint we added.

I suppose the moral of the story is two-fold here.

1) The optimizer can be pretty smart with partition elimination,

2) As always, never hide rules and facts about the data from the database.  Make sure you define those constraints in the database tier.

8 comments

  1. Hi

    Isn’t it so that querying
    select * from t where start_dt = to_date(’13-Jan-2015′,’dd-mon-yyyy’)
    there is very little benefit to have “END_DT”>TO_DATE(‘ 2015-01-13 00:00:00’, ‘syyyy-mm-dd hh24:mi:ss’) in where clause because there isn’t any rows where end_date is smaller than to_date(’13-Jan-2015′,’dd-mon-yyyy’)
    and thus pruned partitions are empty.

    If query would have been:
    select * from t where end_dt = to_date(’13-Jan-2015′,’dd-mon-yyyy’)
    when there whould have been added “START_DT”= to_date(’13-Jan-2015′,’dd-mon-yyyy’)
    and start_date TO_DATE(‘ 2015-01-13 00:00:00’, ‘syyyy-mm-dd hh24:mi:ss’)
    would be beneficial because partitions would then be pruned.

    If table would have been partitioned by just: START_DATE and both END_DATE and START_DATE would have been not null,
    then having
    select * from t where end_dt = to_date(’13-Jan-2015′,’dd-mon-yyyy’)
    rewritten with “START_DT”<TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
    would use partition pruning, but that would have typically little value, because most queries have in where clause recent dates and
    there are where few partitions that actually pruned.

    If table would have been partitioned by virtual column MAX_PARTITIONING_DATE = greatest(START_DATE, END_DATE) and there would have been constraints
    START_DATE <= MAX_PARTITIONING_DATE, END_DATE <= MAX_PARTITIONING_DATE.
    then
    select * from t where start_dt = to_date('13-Jan-2015','dd-mon-yyyy')
    would be rewritten as
    select * from t where start_dt = to_date('13-Jan-2015','dd-mon-yyyy')
    and to_date('13-Jan-2015','dd-mon-yyyy') <= MAX_PARTITIONING_DATE
    and some partition pruning whould be used.
    Statement
    select * from t where end_dt = to_date('13-Jan-2015','dd-mon-yyyy')
    would be rewritten as
    select * from t where end_dt = to_date('13-Jan-2015','dd-mon-yyyy')
    and end_date = to_date(’13-Jan-2015′,’dd-mon-yyyy’)
    and start_date = to_date(’13-Jan-2015′,’dd-mon-yyyy’)
    and start_date = to_date(’13-Jan-2015′,’dd-mon-yyyy’)
    and partition pruning would be used.

    But please be are that there are currently (atleast at 12.1.0.2) some problems with this greatest function technique.
    If interval partitioning and constraints are used then array inserts cause errors.

    This use of constraints in optimizing sql-statements is great feature. But current version does still have some enhancement needs to enable its whole potential.

    lh

  2. Very interesting, but wouldn’t it be simpler, more useful & more normal to forget the partitioning, keep the constraint, but use an index?

    Simply, CREATE INDEX T_I1 ON T (start_dt, end_dt) COMPRESS 1;

  3. Hi

    Partitioning is more usefull (e.g. in dw systems) than having indexes if percentage of rows satisfing criteria is high.

    Please also notice that index (start_dt, end_dt) typically isn’t very usefull in situations where
    criteria is (“START_DT”=TO_DATE(‘ 2015-01-13 00:00:00’, ‘syyyy-mm-dd hh24:mi:ss’))

    Mostly recent rows are queried and then majority of rows do satisfy criteria START_DATE <= recent_date.
    Almost all of the index would be scanned.

    Better results can typically be accomplished by having index as (end_dt, start_dt).

    In newer Oracle versions there is something called 'band join' (?). I have not used it and cannot be sure if it would have benefit for these indexes.

    Using this technique of having partitioning column defined as greatest of some columns does have some interesting possibilities; multiple partitioning columns and partitioning by column which has null values.
    It is 'approximative' partitioning technique but could help in some cases.

    lh

    1. Hi

      In previous comment should have been:

      criteria is (“START_DT”=TO_DATE(‘ 2015-01-13 00:00:00’, ‘syyyy-mm-dd hh24:mi:ss’))

  4. Hi,

    Interesting: this changed my original where clause criteria twice:

    Trying to write it again…

    criteria is (“START_DT” equal or less than TO_DATE(‘ 2015-01-13 00:00:00’, ‘syyyy-mm-dd hh24:mi:ss’) AND
    “END_DT” equal or greater than TO_DATE(‘ 2015-01-13 00:00:00’, ‘syyyy-mm-dd hh24:mi:ss’))

  5. Nice feature. However, the check constraint must be enabled, validated, and not deferrable.
    It is not like foreign key join elimination where it still works if you alter the constraint DISABLE NOVALIDATE RELY.

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.