Subtle variations in optimizer stats

Subtle variances in the way you work with statistics can have a significant impact on how your optimizer plans work out…so you need to take care.

Let’s look at the following example


SQL> create table T (
  2    x varchar2(20) , y varchar2(100));

Table created.

SQL> insert into T
  2  select 'x' , rpad('z',100) from all_objects;

94117 rows created.

SQL> insert into T
  2  select 'X' , rpad('z',100) from all_objects;

94117 rows created.

SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true)

PL/SQL procedure successfully completed.

SQL> create index TX on T ( upper(x) ) ;

Index created.

So you can see that in reality, ALL of the rows have a single value for UPPER(X), namely “X”. So let’s look at an execution plan.


SQL> exec dbms_stats.gather_index_stats(user,'TX')

PL/SQL procedure successfully completed.

SQL> explain plan for select * from T where upper(x) = :b1
  2  /

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------
Plan hash value: 1501193905

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |  1882 |   189K|   354   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T    |  1882 |   189K|   354   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | TX   |   753 |       |   342   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(UPPER("X")=:B1)

14 rows selected.

That seems incorrect. Bind peeking etc aside (because we are just using standard EXPLAIN), we would expect to not use an index when the there is only 1 value across the whole table.

Let’s try that experiment again.


SQL> drop table T;

Table dropped.

SQL>
SQL> create table T (
  2    x varchar2(20) , y varchar2(100));

Table created.

SQL>
SQL> insert into T
  2  select 'x' , rpad('z',100) from all_objects;

94117 rows created.

SQL>
SQL> insert into T
  2  select 'X' , rpad('z',100) from all_objects;

94117 rows created.

SQL>
SQL> create index TX on T ( upper(x) ) ;

Index created.

SQL>
SQL> exec dbms_stats.gather_table_stats(user,'T',cascade=>true)

PL/SQL procedure successfully completed.

SQL>
SQL> exec dbms_stats.gather_index_stats(user,'TX')

PL/SQL procedure successfully completed.

SQL>
SQL> explain plan for select * from T where upper(x) = :b1
  2  /

Explained.

SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   188K|    18M|   786   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |   188K|    18M|   786   (1)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(UPPER("X")=:B1)

13 rows selected.

And this time it has got the estimate just right. So what did we do different ? If you look at the two scripts, in the first one, we did

“gather table stats, create index, gather index stats”

and in the second one, we did

“create index, gather table stats, gather index stats”

The subtle difference there is what statistics were calculated at column level on the hidden column (which got created as part of our function based index).

In the second (“correct”) example, statistics were gathered on ALL of the columns, including the hidden one


SQL> select COLUMN_NAME,NUM_DISTINCT,AVG_COL_LEN, num_nulls, density
  2  from user_tab_cols
  3  where  table_name = 'T'
  4  order by COLUMN_ID;

COLUMN_NAME                    NUM_DISTINCT AVG_COL_LEN  NUM_NULLS    DENSITY
------------------------------ ------------ ----------- ---------- ----------
X                                         2           2          0         .5
Y                                         1         101          0          1
SYS_NC00003$                              1           2          0          1

whereas in the first example, because the hidden column did not yet exist when we gathered the table stats, it got missed.


SQL> select COLUMN_NAME,NUM_DISTINCT,AVG_COL_LEN, num_nulls, density
  2  from user_tab_cols
  3  where  table_name = 'T'
  4  order by COLUMN_ID;

COLUMN_NAME                    NUM_DISTINCT AVG_COL_LEN  NUM_NULLS    DENSITY
------------------------------ ------------ ----------- ---------- ----------
X                                         2           2          0         .5
Y                                         1         101          0          1
SYS_NC00003$

So if you see execution plans that don’t seem right…double check your statistics – you might be missing some.

Learning is not a spectator sport

Leave a ReplyCancel reply

Trending

First dabblings with APEXlang

Vanishing objects; hello ORA-08103

A very robust query

The secret APEX performance hack