12c Statistics on load–special cases

Posted by

One of the cool features in 12c is the automatic collection of optimizer statistics when a table is either created or loaded via direct path from empty.  This makes a lot of sense because it saves us from what used to be the mandatory second step of gathering statistics whenever we loaded an empty table with data.

For example

11.2.0.4


SQL> create table t1 as
  2  select * from dba_objects;

Table created.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
(null)

SQL> select count(*) from t1;

  COUNT(*)
----------
     86802

So at this point, we’d then have to scan the table we just loaded with a DBMS_STATS call in order to come up with some representative statistics. This was fixed in 12c with “statistics on load”.

12c 


SQL> create table t1 as
  2  select * from dba_objects;

Table created.

SQL> – no dbms_stats call
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
     78170

But this facility has been blogged about extensively, so I wont labour the point. This post is more about just a couple of things to be aware of when taking advantage of the facility.

Partitioned tables

The 12c feature is for statistics on load of the table. So if you are creating/loading a partitioned table, whilst you will still get statistics collected, they are at the table level only. For example,


SQL> create table t1
  2  partition by range ( object_id )
  3  interval ( 20000 )
  4  (
  5    partition p1 values less than ( 20000 ),
  6    partition p2 values less than ( 200000 )
  7  )
  8  as select d.* from dba_objects d
  9  where object_id is not null;

Table created.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
     78165

SQL>
SQL> select partition_name, num_rows
  2  from user_tab_partitions
  3  where table_name = 'T1';

PARTITION_   NUM_ROWS
---------- ----------
P1         (null)
P2         (null)

Notice that the partition level statistics are not collected. Interestingly, indexes however will be ok whether they are created during or after the load.


--
-- indexes created after the load on the table above
--
SQL> create index ix1 on t1 ( object_id ) local;

Index created.

SQL> create index ix2 on t1 ( object_name);

Index created.

SQL>
SQL> select index_name, leaf_blocks
  2  from user_indexes
  3  where index_name in ( 'IX1','IX2');

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
IX1                                    173
IX2                                    535

2 rows selected.

SQL>
SQL> select index_name, partition_name, leaf_blocks
  2  from user_ind_partitions
  3  where index_name in ( 'IX1','IX2');

INDEX_NAME                     PARTITION_ LEAF_BLOCKS
------------------------------ ---------- -----------
IX1                            P1                  43
IX1                            P2                 130

--
-- indexes created as part of the table creation process
--
SQL> create table t1 (
  2     owner
  3    ,object_name
  4    ,subobject_name
  5    ,object_id
  6    ,data_object_id
  7    ,object_type
  8    ,created
  9    ,last_ddl_time
 10    ,timestamp
 11    ,status
 12    ,temporary
 13    ,generated
 14    ,secondary
 15    ,namespace
 16    ,edition_name
 17    ,sharing
 18    ,editionable
 19    ,oracle_maintained
 20    ,application
 21    ,default_collation
 22    ,duplicated
 23    ,sharded
 24    ,created_appid
 25    ,created_vsnid
 26    ,modified_appid
 27    ,modified_vsnid
 28    ,constraint pk primary key ( object_id ) using index local
 29    ,constraint uq unique ( owner, object_id ) using index
 30  )
 31  partition by range ( object_id )
 32  interval ( 20000 )
 33  (
 34    partition p1 values less than ( 20000 ),
 35    partition p2 values less than ( 200000 )
 36  )
 37  as select d.* from dba_objects d
 38  where object_id is not null;

Table created.

SQL>
SQL>
SQL> select index_name, leaf_blocks
  2  from user_indexes
  3  where index_name in ('PK','UQ');

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
PK                                     163
UQ                                     263

2 rows selected.

SQL>
SQL> select index_name, partition_name, leaf_blocks
  2  from user_ind_partitions
  3  where index_name in ('PK','UQ');

INDEX_NAME                     PARTITION_ LEAF_BLOCKS
------------------------------ ---------- -----------
PK                             P1                  41
PK                             P2                 122

2 rows selected.

CTAS empty is still OK

A common trick for creating an empty table whilst copying the definition of an existing table is to use a predicate with an always false condition. For example,


SQL> create table t1 as
  2  select * from dba_objects
  3  where 1=0;

Table created.

Statistics on load will still take place for such a table, ie


SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
         0

Now you might be worried that since there are now statistics on the table, that perhaps a direct load using INSERT /*+ APPEND */,  which would normally perform statistics on load will no longer work. But there is no need to panic. We’ll still determine that the table is empty, and do another statistics on load operation when you perform that direct load.


SQL> insert /*+ APPEND */ into t1
  2  select * from dba_objects;

78171 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
     78171

But this also has implications if you want to retain some existing statistics on the table. Because when we truncate the table, that same reset of statistics collection can occur.


SQL> create table t1 as
  2  select * from dba_objects
  3  where rownum 
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
        10

SQL>
SQL> truncate table t1;

Table truncated.

SQL>
SQL> insert /*+ APPEND */ into t1
  2  select * from dba_objects;

78171 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
     78171

Similarly, all you need is one transaction that “brings the table to life” for automatic statistics collection to be no longer active. Even if that transactions rolls back. For example:


SQL> create table t1 as
  2  select * from dba_objects
  3  where 1=0;

Table created.

SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
         0

SQL> -- a normal mode insert of a row
SQL> insert into t1
  2  select * from dba_objects where rownum = 1;

1 row created.

SQL>
SQL> rollback;

Rollback complete.

SQL>
SQL> insert /*+ APPEND */ into t1
  2  select * from dba_objects;

78171 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
         0

All of these behaviour are pretty much how I would expect them to work (your opinion may differ of course Smile), and I know that some people have opted to disable the automatic collection altogether to avoid any ambiguity but that strikes me as overkill.

If you are really concerned about it, it is easy enough to suffix your load routines with a simple wrapper function to check the statistics and issue the appropriate DBMS_STATS call to make up the shortfall.   And it would be remiss of me not to mention the latest optimizer whitepaper which covers in detail other optimizer enhancements in 12c. 

Happy stats collecting !

2 comments

  1. Not sure it’s worth mentioning, doing it just in case (I’m ok if you prefer to reject the comment 🙂

    Stats are gathered just at the partition level in case you specify extended partition syntax (hopefully that’s the right name), assuming anybody does that.

    create table tc (object_id number, object_name varchar2(30))
    partition by range ( object_id )
    interval ( 20000 )
    (
    partition p1 values less than ( 20000 ),
    partition p2 values less than ( 200000 )
    );

    insert /*+ APPEND */ into tc partition (p1) select mod(rownum, 20000), lpad(‘x’,30,’x’) from dual connect by rownum select num_rows from user_tables where table_name =’TC’;

    NUM_ROWS
    ———-

    Elapsed: 00:00:00.19
    SQL> select partition_name, num_rows from user_tab_partitions where table_name = ‘TC’;

    PARTITION_NAME NUM_ROWS
    —————————— ———-
    P1 1000
    P2

    Cheers,
    Mauro

  2. Thanks for stopping by Mauro !

    With respect to: “if you prefer to reject the comment”, conversely, I think that’s actually an important case, because to leave the table stats untouched could lead to nasty contradictions like:

    SQL> create table tc (object_id number, object_name varchar2(30))
    2 partition by range ( object_id )
    3 interval ( 20000 )
    4 (
    5 partition p1 values less than ( 20000 ),
    6 partition p2 values less than ( 200000 )
    7 );

    Table created.

    SQL>
    SQL> insert /*+ APPEND */ into tc
    2 select 100000, ‘x’ from dual connect by level commit;

    Commit complete.

    SQL>
    SQL> select num_rows from user_tables where table_name =’TC’;

    NUM_ROWS
    ———-
    100

    1 row selected.

    SQL> select partition_name, num_rows from user_tab_partitions where table_name = ‘TC’;

    PARTITION_ NUM_ROWS
    ———- ———-
    P1
    P2

    2 rows selected.

    SQL> insert /*+ APPEND */ into tc partition (p1)
    2 select mod(rownum, 20000), lpad(‘x’,30,’x’) from dual connect by rownum
    SQL> select num_rows from user_tables where table_name =’TC’;

    NUM_ROWS
    ———-
    100

    1 row selected.

    SQL> select partition_name, num_rows from user_tab_partitions where table_name = ‘TC’;

    PARTITION_ NUM_ROWS
    ———- ———-
    P1 40000
    P2

    2 rows selected.

    Cheers,
    Connor

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.