Standard Edition–different optimizer but still cool

One cool technique that the optimizer can employ is the BITMAP CONVERSION TO ROWIDS method to take advantage of B-tree indexes in a means that we would normally associate with a bitmap index. This can be particularly useful with multiple predicates on individually indexed columns because it lets us establish the rows of interest before having to visit the heap blocks.  Here’s an example of that in action, even when the indexes in question are Text indexes.

Enterprise Edition plan


SQL> create table MY_TAB ( col1 varchar2(50), col2 varchar2(50));

Table created.

SQL>
SQL> insert into MY_TAB
  2  select dbms_random.string('x', 30), dbms_random.string('x', 30)
  3  from dual
  4  connect by level <= 100000;

100000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> create index MY_TAB_IX1 on MY_TAB(col1) indextype is ctxsys.context;

Index created.

SQL> create index MY_TAB_IX2 on MY_TAB(col2) indextype is ctxsys.context;

Index created.

SQL>
SQL> exec  dbms_stats.gather_table_stats('','MY_TAB');

PL/SQL procedure successfully completed.

SQL> set autotrace traceonly explain
SQL> select * from MY_TAB where contains(col1, 'string1') > 0 or contains(col2, 'string2') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 4174159475

---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |     1 |    62 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| MY_TAB      |     1 |    62 |     2   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |             |       |       |            |          |
|   3 |    BITMAP OR                        |             |       |       |            |          |
|   4 |     BITMAP CONVERSION FROM ROWIDS   |             |       |       |            |          |
|   5 |      SORT ORDER BY                  |             |       |       |            |          |
|*  6 |       DOMAIN INDEX                  | MY_TAB_IX1  |       |       |     1   (0)| 00:00:01 |
|   7 |     BITMAP CONVERSION FROM ROWIDS   |             |       |       |            |          |
|   8 |      SORT ORDER BY                  |             |       |       |            |          |
|*  9 |       DOMAIN INDEX                  | MY_TAB_IX2  |       |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("CTXSYS"."CONTAINS"("COL1",'string1')>0)
   9 - access("CTXSYS"."CONTAINS"("COL2",'string2')>0)

But a quick trip to the licensing guide states the following about the optimizer in Standard Edition:

The following methods are not available in SE:

Bitmapped index, bitmapped join index, and bitmap plan conversions

which make sense given that you do not have access to bitmap indexes in Standard Edition. 

But all is not lost.  Even though the bitmap conversion optimization is not available, the optimizer can still come up with alternate and intelligent mechanisms to extract the required rows. Whereas you might think that an OR condition on two different columns might force a full table scan, here is the same example as above in Standard Edition.

Standard Edition plan


SQL> set autotrace traceonly explain
SQL> select * from MY_TAB where contains(col1, 'string1') > 0 or contains(col2, 'string2') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1568130183

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |     2 |   108 |     2   (0)| 00:00:01 |
|   1 |  VIEW                         | VW_ORE_A5827389 |     2 |   108 |     2   (0)| 00:00:01 |
|   2 |   UNION-ALL                   |                 |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID| MY_TAB          |     1 |    62 |     1   (0)| 00:00:01 |
|*  4 |     DOMAIN INDEX              | MY_TAB_IX1      |       |       |     1   (0)| 00:00:01 |
|*  5 |    TABLE ACCESS BY INDEX ROWID| MY_TAB          |     1 |    62 |     1   (0)| 00:00:01 |
|*  6 |     DOMAIN INDEX              | MY_TAB_IX2      |       |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("CTXSYS"."CONTAINS"("COL1",'string1')>0)
   5 - filter(LNNVL("CTXSYS"."CONTAINS"("COL1",'string1')>0))
   6 - access("CTXSYS"."CONTAINS"("COL2",'string2')>0)

SQL> set autotrace off

A “UNION ALL” suggests the two separate table access paths will potentially return duplicate rows. But notice the nice touch on line 5 – when probing the rows via MY_TAB_IDX2, the optimizer threw in an additional FILTER (LNNVL(“CTXSYS”.”CONTAINS”(“COL1”,’string1′)>0)) to remove those rows that will be returned by the the “partnering” half of the UNION ALL.  In this way, we avoided a potentially expensive sort to remove duplicated results.

So you might see the occasional difference between optimizer plans for Enterprise edition versus Standard edition – but both should do a fine job at executing your queries Smile

(Both tests done on 12.2)

Hybrid histograms

Just a quick post here so I could consolidate some information about histograms in 12c.

On my last Office Hours session, one of the questions that was posed was asking for an explanation of the new types of histograms in use in 12c.  So I had a few slides and described conceptually at a high level how they work and what they represent.  If you’re new to 12c, or new to optimizer histograms in general, then take a look at the video below to get up to speed.  But for those who want to dig into a deeper level, I also saw recently two absolutely cracking articles which discuss the algorithms and implementation in much more detail.  So I’ve linked them here as well.

http://www.oracle.com/technetwork/articles/database/histogram-construction-oracle-12c-4426845.html

http://www.oracle.com/technetwork/articles/database/maintenance-histograms-db-12c-4426850.html

On that note…if you celebrate Easter – have a safe and happy break.  So you on the other side of the weekend!

Hare, Easter Bunny, Plush Bunny, Easter

Execution plans on LiveSQL

To protect the integrity of people’s data, and isolate sessions on LiveSQL, we lock down the environment.  Clearly if you are doing some testing with sensitive data, you don’t want an anonymous member of the user community mining V$SQL to see what commands you have been running.  Conversely, we want to allow people to perform most of the tasks that would do on a standard database installation without having to install or configure anything.  That’s the great thing about LiveSQL.

So for that reason, you don’t get access to all of the features of DBMS_XPLAN.  But since we do provide access to a limited set of V$ view access, you can get execution plan details by going back to first principles.  Here’s some scripts you can use


--
-- First run your query
--
select /*+ gather_plan_statistics */ .... (my_query)

--
--  Then locate the SQL_ID/CHILD_NUMBER for your just executed query
--
select * from v$sqlstats where sql_text like '...';


--
-- Plug them into the queries below.  
--
-- The first one gets the execution plan, the second one get the runtime statistics
--
with plan_table as 
(
select * from v$sql_plan_statistics_all
where sql_id = "my sql id"
and child_number = "my child"
)
select id "Id",
  rpad(
  lpad(' ',2*level) || operation || ' ' || options , 40 ) "Operation",
  object_name "Name",
  cardinality "Rows",
  bytes "Bytes", 
  cost "Cost",
  rtrim(to_char(trunc(time/3600),'fm00')||':'||
  to_char(mod(trunc(time/60),1),'fm00')||':'||
  to_char(trunc(mod(time,60)),'fm00'),':')     "Time"
from plan_table
connect by prior id = parent_id 
start with id = 0
union all
select null, null,null, null,null, null,null from dual
union all
select null, lpad(id,4) ||'-filter '||FILTER_PREDICATES,null, null,null, null,null 
from plan_table where FILTER_PREDICATES is not null
union all
select null, lpad(id,4) ||'-access '||ACCESS_PREDICATES,null, null,null, null,null 
from plan_table where ACCESS_PREDICATES is not null
order by 1 nulls last, 2 nulls first;



with plan_table as 
(
select * from v$sql_plan_statistics_all
where sql_id = 'd09nv17gy4j3z'
and child_number = 0
)
select id "Id",
  rpad(
  lpad(' ',2*level) || operation || ' ' || options , 40 ) "Operation",
  object_name "Name",
  last_starts "Starts",
  cardinality "E-Rows", 
  LAST_OUTPUT_ROWS "A-Rows",
  rtrim(to_char(trunc(time/3600),'fm00')||':'||
  to_char(mod(trunc(time/60),1),'fm00')||':'||
  to_char(trunc(mod(time,60)),'fm00'),':')     "Time",
  LAST_CR_BUFFER_GETS+LAST_CU_BUFFER_GETS "Buffers"
from plan_table p
connect by prior id = parent_id 
start with id = 0
union all
select null, null,null, null,null, null,null,null from dual
union all
select null, lpad(id,4) ||'-filter '||FILTER_PREDICATES,null, null,null, null,null,null 
from plan_table where FILTER_PREDICATES is not null
union all
select null, lpad(id,4) ||'-access '||ACCESS_PREDICATES,null, null,null, null,null,null 
from plan_table where ACCESS_PREDICATES is not null
order by 1 nulls last, 2 nulls first;


12c Statistics on load–special cases

One of the cool features in 12c is the automatic collection of optimizer statistics when a table is either created or loaded via direct path from empty.  This makes a lot of sense because it saves us from what used to be the mandatory second step of gathering statistics whenever we loaded an empty table with data.

For example

11.2.0.4


SQL> create table t1 as
  2  select * from dba_objects;

Table created.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
(null)

SQL> select count(*) from t1;

  COUNT(*)
----------
     86802

So at this point, we’d then have to scan the table we just loaded with a DBMS_STATS call in order to come up with some representative statistics. This was fixed in 12c with “statistics on load”.

12c 


SQL> create table t1 as
  2  select * from dba_objects;

Table created.

SQL> – no dbms_stats call
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
     78170

But this facility has been blogged about extensively, so I wont labour the point. This post is more about just a couple of things to be aware of when taking advantage of the facility.

Partitioned tables

The 12c feature is for statistics on load of the table. So if you are creating/loading a partitioned table, whilst you will still get statistics collected, they are at the table level only. For example,


SQL> create table t1
  2  partition by range ( object_id )
  3  interval ( 20000 )
  4  (
  5    partition p1 values less than ( 20000 ),
  6    partition p2 values less than ( 200000 )
  7  )
  8  as select d.* from dba_objects d
  9  where object_id is not null;

Table created.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
     78165

SQL>
SQL> select partition_name, num_rows
  2  from user_tab_partitions
  3  where table_name = 'T1';

PARTITION_   NUM_ROWS
---------- ----------
P1         (null)
P2         (null)

Notice that the partition level statistics are not collected. Interestingly, indexes however will be ok whether they are created during or after the load.


--
-- indexes created after the load on the table above
--
SQL> create index ix1 on t1 ( object_id ) local;

Index created.

SQL> create index ix2 on t1 ( object_name);

Index created.

SQL>
SQL> select index_name, leaf_blocks
  2  from user_indexes
  3  where index_name in ( 'IX1','IX2');

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
IX1                                    173
IX2                                    535

2 rows selected.

SQL>
SQL> select index_name, partition_name, leaf_blocks
  2  from user_ind_partitions
  3  where index_name in ( 'IX1','IX2');

INDEX_NAME                     PARTITION_ LEAF_BLOCKS
------------------------------ ---------- -----------
IX1                            P1                  43
IX1                            P2                 130

--
-- indexes created as part of the table creation process
--
SQL> create table t1 (
  2     owner
  3    ,object_name
  4    ,subobject_name
  5    ,object_id
  6    ,data_object_id
  7    ,object_type
  8    ,created
  9    ,last_ddl_time
 10    ,timestamp
 11    ,status
 12    ,temporary
 13    ,generated
 14    ,secondary
 15    ,namespace
 16    ,edition_name
 17    ,sharing
 18    ,editionable
 19    ,oracle_maintained
 20    ,application
 21    ,default_collation
 22    ,duplicated
 23    ,sharded
 24    ,created_appid
 25    ,created_vsnid
 26    ,modified_appid
 27    ,modified_vsnid
 28    ,constraint pk primary key ( object_id ) using index local
 29    ,constraint uq unique ( owner, object_id ) using index
 30  )
 31  partition by range ( object_id )
 32  interval ( 20000 )
 33  (
 34    partition p1 values less than ( 20000 ),
 35    partition p2 values less than ( 200000 )
 36  )
 37  as select d.* from dba_objects d
 38  where object_id is not null;

Table created.

SQL>
SQL>
SQL> select index_name, leaf_blocks
  2  from user_indexes
  3  where index_name in ('PK','UQ');

INDEX_NAME                     LEAF_BLOCKS
------------------------------ -----------
PK                                     163
UQ                                     263

2 rows selected.

SQL>
SQL> select index_name, partition_name, leaf_blocks
  2  from user_ind_partitions
  3  where index_name in ('PK','UQ');

INDEX_NAME                     PARTITION_ LEAF_BLOCKS
------------------------------ ---------- -----------
PK                             P1                  41
PK                             P2                 122

2 rows selected.

CTAS empty is still OK

A common trick for creating an empty table whilst copying the definition of an existing table is to use a predicate with an always false condition. For example,


SQL> create table t1 as
  2  select * from dba_objects
  3  where 1=0;

Table created.

Statistics on load will still take place for such a table, ie


SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
         0

Now you might be worried that since there are now statistics on the table, that perhaps a direct load using INSERT /*+ APPEND */,  which would normally perform statistics on load will no longer work. But there is no need to panic. We’ll still determine that the table is empty, and do another statistics on load operation when you perform that direct load.


SQL> insert /*+ APPEND */ into t1
  2  select * from dba_objects;

78171 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
     78171

But this also has implications if you want to retain some existing statistics on the table. Because when we truncate the table, that same reset of statistics collection can occur.


SQL> create table t1 as
  2  select * from dba_objects
  3  where rownum 
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
        10

SQL>
SQL> truncate table t1;

Table truncated.

SQL>
SQL> insert /*+ APPEND */ into t1
  2  select * from dba_objects;

78171 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
     78171

Similarly, all you need is one transaction that “brings the table to life” for automatic statistics collection to be no longer active. Even if that transactions rolls back. For example:


SQL> create table t1 as
  2  select * from dba_objects
  3  where 1=0;

Table created.

SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
         0

SQL> -- a normal mode insert of a row
SQL> insert into t1
  2  select * from dba_objects where rownum = 1;

1 row created.

SQL>
SQL> rollback;

Rollback complete.

SQL>
SQL> insert /*+ APPEND */ into t1
  2  select * from dba_objects;

78171 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> select num_rows
  2  from user_tables
  3  where table_name = 'T1';

  NUM_ROWS
----------
         0

All of these behaviour are pretty much how I would expect them to work (your opinion may differ of course Smile), and I know that some people have opted to disable the automatic collection altogether to avoid any ambiguity but that strikes me as overkill.

If you are really concerned about it, it is easy enough to suffix your load routines with a simple wrapper function to check the statistics and issue the appropriate DBMS_STATS call to make up the shortfall.   And it would be remiss of me not to mention the latest optimizer whitepaper which covers in detail other optimizer enhancements in 12c. 

Happy stats collecting !

AskTOM TV episode 8

On AskTOM episode 8, I’ve taken a look at locating the SQL Plan Directives used for a particular query.  Here is the script output from the video if you want to use this for your own exploration


SQL>
SQL> create table t as
  2  select *
  3  from dba_objects
  4  where owner = 'SYS' and rownum <= 20
  5  union all
  6  select *
  7  from dba_objects
  8  where owner = 'SYSTEM'
  9  and rownum 
SQL> create index ix on t ( owner);

Index created.

SQL>
SQL> select COLUMN_NAME,NUM_DISTINCT,AVG_COL_LEN, num_nulls, density
  2  from   user_tab_cols
  3  where  table_name = 'T'
  4  order by COLUMN_ID;

COLUMN_NAME                    NUM_DISTINCT AVG_COL_LEN  NUM_NULLS    DENSITY
------------------------------ ------------ ----------- ---------- ----------
OWNER                                     2           7          0         .5
OBJECT_NAME                             199          18          0 .005025126
SUBOBJECT_NAME                            1           3        199          1
OBJECT_ID                               220           4          0 .004545455
DATA_OBJECT_ID                          167           4         49 .005988024
OBJECT_TYPE                               9           8          0 .111111111
CREATED                                  12           8          0 .083333333
LAST_DDL_TIME                            16           8          0      .0625
TIMESTAMP                                13          20          0 .076923077
STATUS                                    1           6          0          1
TEMPORARY                                 2           2          0         .5
GENERATED                                 2           2          0         .5
SECONDARY                                 1           2          0          1
NAMESPACE                                 4           3          0        .25
EDITION_NAME                              0           0        220          0
SHARING                                   2          10          0         .5
EDITIONABLE                               1           2        206          1
ORACLE_MAINTAINED                         1           2          0          1
APPLICATION                               1           2          0          1
DEFAULT_COLLATION                         1           7        136          1
DUPLICATED                                1           2          0          1
SHARDED                                   1           2          0          1
CREATED_APPID                             0           0        220          0
CREATED_VSNID                             0           0        220          0
MODIFIED_APPID                            0           0        220          0
MODIFIED_VSNID                            0           0        220          0

26 rows selected.

SQL>
SQL> insert into t  select * from dba_objects;

78329 rows created.

SQL> insert into t  select * from dba_objects;

78329 rows created.

SQL> insert into t  select * from dba_objects;

78329 rows created.

SQL> insert into t  select * from dba_objects;

78329 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> select /*+ gather_plan_statistics */ count(created)
  2  from t
  3  where owner = 'SYS'
  4  and object_type = 'JAVA CLASS';

COUNT(CREATED)
--------------
        138424

1 row selected.

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3qyuxjtjy92m5, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(created) from t where owner
= 'SYS' and object_type = 'JAVA CLASS'

Plan hash value: 2143077847

-------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |      1 |        |      1 |00:00:00.08 |    5991 |
|   1 |  SORT AGGREGATE                      |      |      1 |      1 |      1 |00:00:00.08 |    5991 |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |      1 |     12 |    138K|00:00:00.07 |    5991 |
|*  3 |    INDEX RANGE SCAN                  | IX   |      1 |    110 |    207K|00:00:00.03 |    1218 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OBJECT_TYPE"='JAVA CLASS')
   3 - access("OWNER"='SYS')


22 rows selected.

SQL>
SQL> select sql_id, child_number,is_reoptimizable  from v$sql where sql_id = '3qyuxjtjy92m5';

SQL_ID        CHILD_NUMBER I
------------- ------------ -
3qyuxjtjy92m5            0 Y

1 row selected.

SQL>
SQL> exec dbms_spd.FLUSH_SQL_PLAN_DIRECTIVE

PL/SQL procedure successfully completed.

SQL>
SQL> select /*+ gather_plan_statistics */ count(created)
  2  from t
  3  where owner = 'SYS'
  4  and object_type = 'JAVA CLASS';

COUNT(CREATED)
--------------
        138424

1 row selected.

SQL>
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3qyuxjtjy92m5, child number 1
-------------------------------------
select /*+ gather_plan_statistics */ count(created) from t where owner
= 'SYS' and object_type = 'JAVA CLASS'

Plan hash value: 2143077847

-------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |      1 |        |      1 |00:00:00.06 |    5383 |
|   1 |  SORT AGGREGATE                      |      |      1 |      1 |      1 |00:00:00.06 |    5383 |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |      1 |    138K|    138K|00:00:00.05 |    5383 |
|*  3 |    INDEX RANGE SCAN                  | IX   |      1 |    220 |    207K|00:00:00.02 |     610 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OBJECT_TYPE"='JAVA CLASS')
   3 - access("OWNER"='SYS')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - statistics feedback used for this statement
   - performance feedback used for this statement
   - 1 Sql Plan Directive used for this statement


29 rows selected.

SQL>
SQL> select count(*) from dba_sql_plan_directives;

  COUNT(*)
----------
       354

1 row selected.

SQL>
SQL> set lines 60
SQL>
SQL> desc v$sql_plan
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ADDRESS                                RAW(8)
 HASH_VALUE                             NUMBER
 SQL_ID                                 VARCHAR2(13)
 PLAN_HASH_VALUE                        NUMBER
 FULL_PLAN_HASH_VALUE                   NUMBER
 CHILD_ADDRESS                          RAW(8)
 CHILD_NUMBER                           NUMBER
 TIMESTAMP                              DATE
 OPERATION                              VARCHAR2(30)
 OPTIONS                                VARCHAR2(30)
 OBJECT_NODE                            VARCHAR2(40)
 OBJECT#                                NUMBER
 OBJECT_OWNER                           VARCHAR2(128)
 OBJECT_NAME                            VARCHAR2(128)
 OBJECT_ALIAS                           VARCHAR2(261)
 OBJECT_TYPE                            VARCHAR2(20)
 OPTIMIZER                              VARCHAR2(20)
 ID                                     NUMBER
 PARENT_ID                              NUMBER
 DEPTH                                  NUMBER
 POSITION                               NUMBER
 SEARCH_COLUMNS                         NUMBER
 COST                                   NUMBER
 CARDINALITY                            NUMBER
 BYTES                                  NUMBER
 OTHER_TAG                              VARCHAR2(35)
 PARTITION_START                        VARCHAR2(64)
 PARTITION_STOP                         VARCHAR2(64)
 PARTITION_ID                           NUMBER
 OTHER                                  VARCHAR2(4000)
 DISTRIBUTION                           VARCHAR2(20)
 CPU_COST                               NUMBER
 IO_COST                                NUMBER
 TEMP_SPACE                             NUMBER
 ACCESS_PREDICATES                      VARCHAR2(4000)
 FILTER_PREDICATES                      VARCHAR2(4000)
 PROJECTION                             VARCHAR2(4000)
 TIME                                   NUMBER
 QBLOCK_NAME                            VARCHAR2(128)
 REMARKS                                VARCHAR2(4000)
 OTHER_XML                              CLOB
 CON_ID                                 NUMBER

SQL> set lines 200
SQL>
SQL>
SQL> select other from v$sql_plan
  2  where other is not null
  3  and sql_id = '3qyuxjtjy92m5'
  4  and child_number > 0;

no rows selected

SQL>
SQL> select remarks from v$sql_plan
  2  where remarks is not null
  3  and sql_id = '3qyuxjtjy92m5'
  4  and child_number > 0;

no rows selected

SQL>
SQL> select other_xml from v$sql_plan
  2  where other_xml is not null
  3  and sql_id = '3qyuxjtjy92m5'
  4  and child_number > 0;

OTHER_XML
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
yes
SQL> select xmltype(other_xml) from v$sql_plan
  2  where other_xml is not null
  3  and sql_id = '3qyuxjtjy92m5'
  4  and child_number > 0;

XMLTYPE(OTHER_XML)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

  yes
  yes
  12.2.0.1
  
  2
  1068910003
  2143077847
  1068910003
  
    0
    1
  
  
    
    
    
    
    
    
    
  



1 row selected.

SQL>
SQL> explain plan for
  2  select count(created)
  3  from t
  4  where owner = 'SYS'
  5  and object_type = 'JAVA CLASS';

Explained.

SQL>
SQL> select * from table(dbms_xplan.display(format=>'all'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2143077847

---------------------------------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |     1 |    23 |     5   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                      |      |     1 |    23 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |   143K|  3226K|     5   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | IX   |   220 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   2 - SEL$1 / T@SEL$1
   3 - SEL$1 / T@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OBJECT_TYPE"='JAVA CLASS')
   3 - access("OWNER"='SYS')

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=0) COUNT("CREATED")[22]
   2 - "CREATED"[DATE,7]
   3 - "T".ROWID[ROWID,10]

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement

35 rows selected.

SQL>
SQL> select * from table(dbms_xplan.display(format=>'all +metrics'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2143077847

---------------------------------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |     1 |    23 |     5   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                      |      |     1 |    23 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |   143K|  3226K|     5   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | IX   |   220 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   2 - SEL$1 / T@SEL$1
   3 - SEL$1 / T@SEL$1

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OBJECT_TYPE"='JAVA CLASS')
   3 - access("OWNER"='SYS')

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=0) COUNT("CREATED")[22]
   2 - "CREATED"[DATE,7]
   3 - "T".ROWID[ROWID,10]

Sql Plan Directive information:
-------------------------------

  Used directive ids:
    14906410523430420431

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement

41 rows selected.

SQL>
SQL>

Subtle variations in optimizer stats

Subtle variances in the way you work with statistics can have a significant impact on how your optimizer plans work out…so you need to take care.

Let’s look at the following example


SQL> create table T (
  2    x varchar2(20) , y varchar2(100));

Table created.

SQL> insert into T
  2  select 'x' , rpad('z',100) from all_objects;

94117 rows created.

SQL> insert into T
  2  select 'X' , rpad('z',100) from all_objects;

94117 rows created.

SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true)

PL/SQL procedure successfully completed.

SQL> create index TX on T ( upper(x) ) ;

Index created.

So you can see that in reality, ALL of the rows have a single value for UPPER(X), namely “X”. So let’s look at an execution plan.


SQL> exec dbms_stats.gather_index_stats(user,'TX')

PL/SQL procedure successfully completed.

SQL> explain plan for select * from T where upper(x) = :b1
  2  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 1501193905

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |  1882 |   189K|   354   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |  1882 |   189K|   354   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | TX   |   753 |       |   342   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(UPPER("X")=:B1)

14 rows selected.

That seems incorrect. Bind peeking etc aside (because we are just using standard EXPLAIN), we would expect to not use an index when the there is only 1 value across the whole table.

Let’s try that experiment again.


SQL> drop table T;

Table dropped.

SQL>
SQL> create table T (
  2    x varchar2(20) , y varchar2(100));

Table created.

SQL>
SQL> insert into T
  2  select 'x' , rpad('z',100) from all_objects;

94117 rows created.

SQL>
SQL> insert into T
  2  select 'X' , rpad('z',100) from all_objects;

94117 rows created.

SQL>
SQL> create index TX on T ( upper(x) ) ;

Index created.

SQL>
SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true)

PL/SQL procedure successfully completed.

SQL>
SQL> exec dbms_stats.gather_index_stats(user,'TX')

PL/SQL procedure successfully completed.

SQL>
SQL> explain plan for select * from T where upper(x) = :b1
  2  /

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   188K|    18M|   786   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |   188K|    18M|   786   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(UPPER("X")=:B1)

13 rows selected.

And this time it has got the estimate just right. So what did we do different ? If you look at the two scripts, in the first one, we did

“gather table stats, create index, gather index stats”

and in the second one, we did

“create index, gather table stats, gather index stats”

The subtle difference there is what statistics were calculated at column level on the hidden column (which got created as part of our function based index).

In the second (“correct”) example, statistics were gathered on ALL of the columns, including the hidden one


SQL> select COLUMN_NAME,NUM_DISTINCT,AVG_COL_LEN, num_nulls, density
  2  from user_tab_cols
  3  where  table_name = 'T'
  4  order by COLUMN_ID;

COLUMN_NAME                    NUM_DISTINCT AVG_COL_LEN  NUM_NULLS    DENSITY
------------------------------ ------------ ----------- ---------- ----------
X                                         2           2          0         .5
Y                                         1         101          0          1
SYS_NC00003$                              1           2          0          1

whereas in the first example, because the hidden column did not yet exist when we gathered the table stats, it got missed.


SQL> select COLUMN_NAME,NUM_DISTINCT,AVG_COL_LEN, num_nulls, density
  2  from user_tab_cols
  3  where  table_name = 'T'
  4  order by COLUMN_ID;

COLUMN_NAME                    NUM_DISTINCT AVG_COL_LEN  NUM_NULLS    DENSITY
------------------------------ ------------ ----------- ---------- ----------
X                                         2           2          0         .5
Y                                         1         101          0          1
SYS_NC00003$

So if you see execution plans that don’t seem right…double check your statistics – you might be missing some.

TOP-N histograms on 12c

I had an interesting question on AskTom today, with a nicely provided test case !!!,  so I thought I’d share it here with some additional commentary.

The basic premise of the TOP-N histogram in 12c is that when the number of distinct values in a column exceeded the number of available buckets for histograms, the historical fall-back position was to go to the height-balanced histograms, which is better than no histogram at all, but loses much of the precision we’d like to see from (say) a frequency based histogram. The TOP-N histogram represents a “halfway house” between the two histogram types. When we exceed the number of available histogram buckets, it might be the case that the overwhelming majority of rows do fall within the desired number of buckets, with just a few outliers being the proverbial “straw the broke the camels back”.

In such an instance, it makes more sense to perhaps store a frequency histogram for the most prevalent values, and discard (aka, lump into a single “nonpopular” bucket) the outliers on the premise that either they will not be queried anyway, or that even if they do, their “absence” from the histogram lets us come up with an appropriate cost of access anyway.

(* – I’m only using the term “nonpopular” to align with the documentation…perhaps an unpopular choice Smile )

Let’s look at an example of this in action. We’ll use ALL_OBJECTS as a base, and the OWNER column will be ideal for a skewed set of data.


SQL> create table t1 as select * from all_objects;

Table created.

SQL> select column_name,histogram,num_nulls,num_distinct,sample_size,density
  2  from user_tab_col_statistics
  3  where table_name='T1'
  4  and column_name='OWNER';

COLUMN_NAME          HISTOGRAM        NUM_NULLS NUM_DISTINCT SAMPLE_SIZE    DENSITY
-------------------- --------------- ---------- ------------ ----------- ----------
OWNER                NONE                     0           38       94322 .026315789

1 row selected.

SQL> select owner,count(*)
  2  from t1
  3  group by owner
  4  order by 2 desc;

OWNER                            COUNT(*)
------------------------------ ----------
SYS                                 42057
PUBLIC                              37138
APEX_050000                          3192
ORDSYS                               3157
APEX_040200                          3021
MDSYS                                1647
SYSTEM                                589
MCDONAC                               470
XDB                                   403
CTXSYS                                401
WMSYS                                 371
SH                                    300
DVSYS                                 292
ORDDATA                               274
LBACSYS                               237
ORDS_METADATA                         184
GSMADMIN_INTERNAL                     104
OE                                     93
DBSNMP                                 55
IX                                     48
SOE                                    47
SCOTT                                  39
HR                                     34
OLAPSYS                                25
OJVMSYS                                22
APEX_LISTENER                          22
DVF                                    19
FLOWS_FILES                            12
ORDPLUGINS                             10
PM                                     10
AUDSYS                                  9
SI_INFORMTN_SCHEMA                      8
OUTLN                                   8
BI                                      8
ORACLE_OCM                              6
APPQOSSYS                               5
TOP_DOG                                 3
JOE_SCHMO                               2

38 rows selected.

So we can see that SYS, PUBLIC et al are popular values, whereas JOE_SCHMO is not, and for example, APEX_PUBLIC_USER even less so (because it does not appear in the table at all, but is indeed a valid user). Now we’ll calculate a histogram with 25 buckets (which is less than the 38 distinct values in the column) but represents the majority of all data in the table.


SQL> begin
  2     dbms_stats.gather_table_stats(user,'T1',
  3             method_opt=>'for columns OWNER size 25');
  4  end;
  5  /

PL/SQL procedure successfully completed.

SQL> select
  2  column_name,histogram,num_nulls,num_distinct,sample_size,density
  3  from user_tab_col_statistics
  4  where table_name='T1'
  5  and column_name='OWNER';

COLUMN_NAME          HISTOGRAM        NUM_NULLS NUM_DISTINCT SAMPLE_SIZE    DENSITY
-------------------- --------------- ---------- ------------ ----------- ----------
OWNER                TOP-FREQUENCY            0           38       94322 .000005301

1 row selected.

We can see the new 12c histogram, the TOP-FREQUENCY which has done what was alluded to above, namely a frequency histogram the popular values rather than a generic height-balanced one.  Now lets count the number of rows in the table, and look at the end point values for the values in the histogram.  We’ll need those shortly.


SQL> col c new_value nvals
SQL> select count(*) c from t1;

         C
----------
     94322

1 row selected.

SQL> select endpoint_number,endpoint_actual_value
  2  from user_tab_histograms
  3  where table_name ='T1'
  4  and column_name ='OWNER';

ENDPOINT_NUMBER ENDPOINT_ACTUAL_VALUE
--------------- ---------------------
           3021 APEX_040200
           6213 APEX_050000
           6235 APEX_LISTENER
           6636 CTXSYS
           6691 DBSNMP
           6983 DVSYS
           7087 GSMADMIN_INTERNAL
           7121 HR
           7169 IX
           7406 LBACSYS
           7876 MCDONAC
           9523 MDSYS
           9616 OE
           9641 OLAPSYS
           9915 ORDDATA
          13072 ORDSYS
          13256 ORDS_METADATA
          50394 PUBLIC
          50433 SCOTT
          50733 SH
          50780 SOE
          92837 SYS
          93426 SYSTEM
          93797 WMSYS
          94200 XDB

25 rows selected.

Our first query now shows the benefit of this kind of histogram. When I query for a popular value (one in the histogram), I get very good cost estimation.  There are 403 XDB rows and the execution plan got it spot on.


SQL> set autotrace on explain
SQL> select count(*) from t1
  2  where owner ='XDB';

  COUNT(*)
----------
       403

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |   441   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   403 |  2418 |   441   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"='XDB')


So what do we do with an nonpopular value ?  After all, it is not in the histogram.  The documentation tells us:

image

So let’s give that a go, and see if the costs align.


SQL> set autotrace off
SQL> select density
  2  from user_tab_col_statistics
  3  where table_name='T1'
  4  and column_name='OWNER';

   DENSITY
----------
.000005301

1 row selected.

SQL>
SQL>
SQL> select &&nvals*density
  2  from user_tab_col_statistics
  3  where table_name='T1'
  4  and column_name='OWNER';

94322*DENSITY
-------------
           .5

1 row selected.

So according to this, we would expect a cost value of 1 (ie, 0.5 rounded up), but look what happens:



SQL> set autotrace on explain
SQL> select count(*) from t1
  2  where owner ='APEX_PUBLIC_USER';

  COUNT(*)
----------
         0

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |   441   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     9 |    54 |   441   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"='APEX_PUBLIC_USER')

SQL>
SQL>

The challenge now becomes, where does the “9” come from ? The secret lies in the 10053 trace file.  When we look at this, some new information comes to light.



SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for T1[T1] 
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Column (#1): 
    NewDensity:0.000099, OldDensity:0.000005 BktCnt:94208.000000, PopBktCnt:94208.000000, PopValCnt:25, NDV:38
  Column (#1): OWNER(VARCHAR2)
    AvgLen: 6 NDV: 38 Nulls: 0 Density: 0.000099
    Histogram: Top-Freq  #Bkts: 94208  UncompBkts: 94208  EndPtVals: 25  ActualVal: yes


Notice the presence of a value “NewDensity”. If we use that value instead of the “density” from user_tab_cols, then we arrive at our estimate of 9 rows as per the explain plan. Since we are doing a costing for a nonpopular value, the NewDensity is a density designed to more closer reflect that of the nonpopular values, not all of rows.

“NewDensity” appears to be: number of nonpopular rows / ( number of distinct nonpopular values * num_rows )