Statistics on Load

Posted by

One of the talks I gave recently on the Groundbreaker Yatra tour was on the changes that arrive with the optimizer when you move to 12c and above. We talked about changes to histograms, all of the adaptive features etc, but also, one of the simplest but coolest optimizer improvements when it comes to loading data. When you perform a direct-path insert into an empty table, or a create-table-as-select (CTAS) operation, there is no need to perform an additional DBMS_STATS call to gather statistics at the conclusion of the load. The load process itself handles that for you. Here’s a simple example of that in action


SQL> create table t (id primary key, tag)
  2  as select rownum id, to_char(rownum) tag
  3  from dual connect by level <= 50000;

Table created.

SQL>
SQL> select num_rows from user_tables
  2  where table_name = 'T';

  NUM_ROWS
----------
     50000

1 row selected.

SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T';

LEAF_BLOCKS
-----------
        103

1 row selected.

Of course, many people are now aware of this functionality, so why does it warrant a blog post? Mainly to make readers aware of a boundary case. If the table you are creating or loading is an index-organized table (IOT), then you do not get the statistics gathered automatically.


SQL> drop table t purge;

Table dropped.

SQL>
SQL> create table t (id primary key, tag)
  2  organization index
  3  as select rownum id, to_char(rownum) tag
  4  from dual connect by level <= 50000;

Table created.

SQL>
SQL> select num_rows from user_tables
  2  where table_name = 'T';

  NUM_ROWS
----------


1 row selected.

SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T';

LEAF_BLOCKS
-----------


1 row selected.

Initially I suspected that the reason for this is that since the statistics that are being picked up on the fly are being collected on the source data, it only becomes possible for those statistics to be transposed if the target table structure is the same as the source. But if that were (strictly) the case, then other examples would exhibit a similar restriction, such as going from compressed data to uncompressed or vice versa. But you can see that the CTAS on load statistics are fine in this circumstance:


SQL> create table t1 (id primary key, tag)
  2  compress
  3  as select * from t;

Table created.

SQL>
SQL> select num_rows from user_tables
  2  where table_name = 'T1';

  NUM_ROWS
----------
     99999

SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T1';

LEAF_BLOCKS
-----------
        208

Similarly, even if I am copying from an identical index-organized table, the statistics will not be collected.


SQL> create table t2 (id primary key, tag)
  2  organization index
  3  as select * from t;

Table created.

SQL> select num_rows from user_tables
  2  where table_name = 'T2';

  NUM_ROWS
----------


SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T2';

LEAF_BLOCKS
-----------

Whatever the root cause is, just be aware, that if you are direct-path loading IOTs, then make sure you take the time to to also collect statistics on that table before you start querying it in earnest.

8 comments

  1. Hi Connor,
    A clarification that might needed to this part:
    ” there is no need to perform an additional DBMS_STATS call to gather statistics at the conclusion of the load. The load process itself handles that for you.”

    That is if you don’t need any histograms. Otherwise you need to run dbms_stats anyway.

  2. I didn’t think you could do direct path loads on an index organized table? I thought the whole purpose of an IOT was to put the table data in “index order” and that direct path means “add data to the end of the segment (beyond the high water mark)” …thus the two are mutually exclusive? All CTAS’s for index organized tables I’ve seen use conventional path inserts. But by all means, please show me if I am wrong…

  3. Hi Connor,
    There also appears to be a restriction with partitioned tables. In 12cR1 the table-level statistics are gathered during a CTAS or direct-path load into an empty table, but partition-level statistics are not. I haven’t checked what happens to index statistics. This means that for partitiones tables it’s best to load using the NO_GATHER_OPTIMIZER_STATISTICS hint and use DBMS_STATS to gather comprehensive statistics after the load. Another consideration is that if a table is being audited using FGA, direct-path insert is disabled and a conventional-path load is executed. Again, I haven’t checked to see if the optimizer recognises this or if it proceeds as though a direct-path load is occurring.
    Regards,
    Stuart.

    1. True. Ultimately, I think it comes back to my comments on “we get our stats from the source”… We really couldn’t implement stats on load with “Oh and by the way, we’ll be slowing down your loads by 50%”. So we can really only get those stats that can be obtained with minimal overhead to the load.

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.