One of the talks I gave recently on the Groundbreaker Yatra tour was on the changes that arrive with the optimizer when you move to 12c and above. We talked about changes to histograms, all of the adaptive features etc, but also, one of the simplest but coolest optimizer improvements when it comes to loading data. When you perform a direct-path insert into an empty table, or a create-table-as-select (CTAS) operation, there is no need to perform an additional DBMS_STATS call to gather statistics at the conclusion of the load. The load process itself handles that for you. Here’s a simple example of that in action
SQL> create table t (id primary key, tag)
2 as select rownum id, to_char(rownum) tag
3 from dual connect by level <= 50000;
Table created.
SQL>
SQL> select num_rows from user_tables
2 where table_name = 'T';
NUM_ROWS
----------
50000
1 row selected.
SQL>
SQL> select leaf_blocks
2 from user_indexes
3 where table_name = 'T';
LEAF_BLOCKS
-----------
103
1 row selected.
Of course, many people are now aware of this functionality, so why does it warrant a blog post? Mainly to make readers aware of a boundary case. If the table you are creating or loading is an index-organized table (IOT), then you do not get the statistics gathered automatically.
SQL> drop table t purge;
Table dropped.
SQL>
SQL> create table t (id primary key, tag)
2 organization index
3 as select rownum id, to_char(rownum) tag
4 from dual connect by level <= 50000;
Table created.
SQL>
SQL> select num_rows from user_tables
2 where table_name = 'T';
NUM_ROWS
----------
1 row selected.
SQL>
SQL> select leaf_blocks
2 from user_indexes
3 where table_name = 'T';
LEAF_BLOCKS
-----------
1 row selected.
Initially I suspected that the reason for this is that since the statistics that are being picked up on the fly are being collected on the source data, it only becomes possible for those statistics to be transposed if the target table structure is the same as the source. But if that were (strictly) the case, then other examples would exhibit a similar restriction, such as going from compressed data to uncompressed or vice versa. But you can see that the CTAS on load statistics are fine in this circumstance:
SQL> create table t1 (id primary key, tag)
2 compress
3 as select * from t;
Table created.
SQL>
SQL> select num_rows from user_tables
2 where table_name = 'T1';
NUM_ROWS
----------
99999
SQL>
SQL> select leaf_blocks
2 from user_indexes
3 where table_name = 'T1';
LEAF_BLOCKS
-----------
208
Similarly, even if I am copying from an identical index-organized table, the statistics will not be collected.
SQL> create table t2 (id primary key, tag)
2 organization index
3 as select * from t;
Table created.
SQL> select num_rows from user_tables
2 where table_name = 'T2';
NUM_ROWS
----------
SQL>
SQL> select leaf_blocks
2 from user_indexes
3 where table_name = 'T2';
LEAF_BLOCKS
-----------
Whatever the root cause is, just be aware, that if you are direct-path loading IOTs, then make sure you take the time to to also collect statistics on that table before you start querying it in earnest.
Hi Connor,
It seems to be a known restriction which is documented in the SQL Tuning Guide: https://docs.oracle.com/database/121/TGSQL/tgsql_statscon.htm#GUID-CDDB5A54-0991-4E68-A9D7-2305777B608B
Hi Connor,
A clarification that might needed to this part:
” there is no need to perform an additional DBMS_STATS call to gather statistics at the conclusion of the load. The load process itself handles that for you.”
That is if you don’t need any histograms. Otherwise you need to run dbms_stats anyway.
I didn’t think you could do direct path loads on an index organized table? I thought the whole purpose of an IOT was to put the table data in “index order” and that direct path means “add data to the end of the segment (beyond the high water mark)” …thus the two are mutually exclusive? All CTAS’s for index organized tables I’ve seen use conventional path inserts. But by all means, please show me if I am wrong…
Direct load becomes possible for IOT in 12c, although its an “interesting” version of what I would call direct load. I blogged about it back in 2016 here https://connor-mcdonald.com/2016/07/04/direct-mode-operations-on-iots/
Hi Connor,
There also appears to be a restriction with partitioned tables. In 12cR1 the table-level statistics are gathered during a CTAS or direct-path load into an empty table, but partition-level statistics are not. I haven’t checked what happens to index statistics. This means that for partitiones tables it’s best to load using the NO_GATHER_OPTIMIZER_STATISTICS hint and use DBMS_STATS to gather comprehensive statistics after the load. Another consideration is that if a table is being audited using FGA, direct-path insert is disabled and a conventional-path load is executed. Again, I haven’t checked to see if the optimizer recognises this or if it proceeds as though a direct-path load is occurring.
Regards,
Stuart.
True. Ultimately, I think it comes back to my comments on “we get our stats from the source”… We really couldn’t implement stats on load with “Oh and by the way, we’ll be slowing down your loads by 50%”. So we can really only get those stats that can be obtained with minimal overhead to the load.
Hi Connor,
There is another not yet documented restriction. For the statistics to be gathered automatically during such kind of operation all the table columns should be included into the insert/select. I have mentioned this restriction in this article:
https://hourim.wordpress.com/2017/12/07/optimizer-statistics-gathering/
Best regards
Very useful info.
Thanks Connor.