India bound !

As part of the activities organised by the Oracle Ace Program and the All India Oracle User Group, the OTN Yathra tour is coming up in a few weeks.

OTNYathra2016

I’ll be speaking on 12c developer features and advanced SQL techniques at the following locations:

  • 23-Apr-16 Saturday,  Chennai
  • 24-Apr-16 Sunday, Bangalore
  • 26-Apr-16 Tuesday, Hyderabad
  • 28-Apr-16 Thursday, Pune

If you’re in any of these places, come along for some great education, good fun and don’t be shy – come up and say Hello.

Full details are here

(Special thanks to Sai for his organizational efforts in getting this all together).

The death of the demo script

On AskTom, one of the things we always strive to do is provide demo scripts in our answers where appropriate.  That way, people can see our line of thought, and hopefully get an understanding as to how we arrived at a solution.  And there’s two very important concepts in that last sentence:

  • our line of thought
  • how we arrived

The demo script itself, is a representation of that, not necessarily a cast iron guarantee that you should be cut/pasting it into your Production system!  If you are honestly expecting that a demo script that we run, will be exactly how it will run on your system, you’re perhaps being a little naive.

Why ? Because for a long time in the speaker community, there’s been a humorous theme about how the concept of demo scripts has required more and more “terms and conditions” as each version of Oracle got more sophisticated.  Namely, the evolution of  what a demo script might be impacted by :

(Now before anyone takes me to task here on versions and features etc, just like demo scripts themselves, I’m using this as an example of what you need be aware of, not a definitive or exhaustive list)

Oracle 7:

“Here’s my script…and chances are, it will just run like this on your Oracle 7 installation.”

Oracle 8: (multiple blocksizes)

“Here’s my script…it was built an 8k blocksize, your results may differ if you are not.”

Oracle 8i: (locally managed tablespace, and auto segment space management introduced)

“Here’s my script… it was built an 8k blocksize, on a locally managed tablespace, with uniform extent size 1m, in segment space auto-managed, your results may differ if you are not.”

Oracle 9i: (system stats, cpu costing introduced)

“Here’s my script… it was built an 8k blocksize, on a locally managed tablespace, with uniform extent size 1m, in segment space auto-managed, with no system stats calculated or stored, your results may differ if you are not.”

Oracle 10g: (NOWORKLOAD system stats always on, dynamic sampling defaults)

“Here’s my script… it was built an 8k blocksize, on a locally managed tablespace, with uniform extent size 1m, in segment space auto-managed, with NOWORKLOAD system stats gathered, dynamic sampling set to default, your results may differ if you are not.”

Oracle 10.2: (auto histogram, ‘default’ system stats)

“Here’s my script… it was built an 8k blocksize, on a locally managed tablespace, with uniform extent size 1m, in segment space auto-managed, with installation default system stats gathered, dynamic sampling set to default, stats gathered with default parameters, and no previous queries run against your table other than the one in the demo, your results may differ if you are not.”

Oracle 11: (auto multiblock read count)

“Here’s my script… it was built an 8k blocksize, on a locally managed tablespace, with uniform extent size 1m, in segment space auto-managed, with installation default system stats gathered, dynamic sampling set to default, stats gathered with default parameters, and no previous queries run against your table other than the one in the demo, with multiblock read count explicitly set to 128, and a file system that supports 1meg physical I/O, your results may differ if you are not”

Oracle 11.2: (serial direct read, exadata-specific system stats)

“Here’s my script… it was built an 8k blocksize, on a locally managed tablespace, with uniform extent size 1m, in segment space auto-managed, with installation default system stats gathered, dynamic sampling set to default, stats gathered with default parameters, and no previous queries run against your table other than the one in the demo, with multiblock read count explicitly set to 128, and a file system that supports 1meg physical I/O, with serial direct enabled, and a buffer cache size of exactly 2G, and on a non-exadata system, your results may differ if you are not”

Oracle 12: (auto-stats, clustering factor optimization, automatic extended stats)

“Here’s my script… it was built an 8k blocksize, on a locally managed tablespace, with uniform extent size 1m, in segment space auto-managed, with installation default system stats gathered, dynamic sampling set to default, stats gathered with default parameters, and no previous queries run against your table other than the one in the demo, with multiblock read count explicitly set to 128, or a file system that supports 1meg physical I/O, with serial direct enabled, and a buffer cache size of exactly 2G, and on a non-exadata system, with clustering factor TABLE_CACHED_BLOCKS left unchanged, and no hidden columns automatically added to your table by extended stats, your results may differ if you are not”

Oracle 12.1.0.2: (in-memory, large table caching, full database caching, on-chip functionality)

“Here’s my script… it was built an 8k blocksize, on a locally managed tablespace, with uniform extent size 1m, in segment space auto-managed, with installation default system stats gathered, dynamic sampling set to default, stats gathered with default parameters, and no previous queries run against your table other than the one in the demo, with multiblock read count explicitly set to 128, or a file system that supports 1meg physical I/O, with serial direct enabled, and a buffer cache size of exactly 2G, and on a non-exadata system, with clustering factor TABLE_CACHED_BLOCKS left unchanged, and no hidden columns automatically added to your table by extended stats, on an Intel chipset, without large table caching activated, and no auto capture of sql plans, your results may differ if you are not”

And I’ve not even mentioned the myriad of platform differences (things might run slightly different on Windows versus Linux versus AIX ), and of course, the question may have come from someone running Oracle Apps, which depending on the version being used, these parameters such as those below may be set to non-default values:

  • _b_tree_bitmap_plans
  • _fast_full_scan_enabled
  • _like_with_bind_as_equality
  • _sort_elimination_cost_ratio
  • optimizer_secure_view_merging

which changes how queries are optimized and even how they may be transformed before optimization.

But what if we eliminate the version discrepancy ? … Rest assured, even if you have the exact same version of software as us, the same potential for difference applies.  We’re typically on our VM inside our laptop on a private network, and you are on a server with different CPU, different RAM, different parameters, different storage, different platform.  What patches have you applied ? What parameters have you set ? There are just so many variables in play here.

So are demo scripts are waste of time?… Of course not.  Just remember what they are – an addenda to a line of thought, an application of logic, a glimpse into a potential useful process.  Your job is then to take advantage of that thought and logic and discover for yourself the possible benefits on your system.

Happy scripting ! Smile

Views as “stored text”

You’ll often see comments that views are not copies of the data, they are merely the stored text of a query that defines.

This is by and large true, but don’t forget, this is not  the same as saying we simply take whatever text you give us, and store it.

Let’s take a look at a simple example proving that this is not the case.


SQL> create table t as select * from all_objects;

Table created.

SQL>
SQL> create or replace
  2  view V as select * from T;

View created.

SQL>
SQL> select text_vc from user_views
  2  where view_name = 'V';

TEXT_VC
------------------------------------------------------------
select "OWNER","OBJECT_NAME","SUBOBJECT_NAME","OBJECT_ID","D
ATA_OBJECT_ID","OBJECT_TYPE","CREATED","LAST_DDL_TIME","TIME
STAMP","STATUS","TEMPORARY","GENERATED","SECONDARY","NAMESPA
CE","EDITION_NAME","SHARING","EDITIONABLE","ORACLE_MAINTAINE
D" from T

Notice that “*” has been expanded out to the full list of columns.

Sidebar: Notice also the nice column TEXT_VC which is the varchar2 equivalent of the older style LONG column TEXT. You’ll get that on 12c.

Interestingly, aliases will be preserved in the view definition if provided


SQL> create or replace
  2  view V as select t.* from T;

View created.

SQL>
SQL> select text_vc from user_views
  2  where view_name = 'V';

TEXT_VC
------------------------------------------------------------
select t."OWNER",t."OBJECT_NAME",t."SUBOBJECT_NAME",t."OBJEC
T_ID",t."DATA_OBJECT_ID",t."OBJECT_TYPE",t."CREATED",t."LAST
_DDL_TIME",t."TIMESTAMP",t."STATUS",t."TEMPORARY",t."GENERAT
ED",t."SECONDARY",t."NAMESPACE",t."EDITION_NAME",t."SHARING"
,t."EDITIONABLE",t."ORACLE_MAINTAINED" from T

One important takeaway from this, is that since the “*” is not preserved in the view definition, if you add or drop columns to the base table, the view needs to be recreated to correctly reflect that change.

Database dictionary corruption ? Maybe not.

At first glance, this looks like a major drama. Some sort of database dictionary corruption, or internal error. But we’ll see, it’s actually fine.

Let’s do a simple describe on an existing table


SQL> desc MY_TABLE
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 EMPNO                                  NUMBER(38)
 ENAME                                  VARCHAR2(30)
 HIREDATE                               DATE
 DEPTNO                                 NUMBER(38)
 SAL                                    NUMBER(38)

Now of course, that could a synonym, or a view… so let’s also make sure that we are actually dealing with a real table.


SQL> select object_type, object_name
  2  from   user_objects
  3  where  object_name = 'MY_TABLE';

OBJECT_TYPE             OBJECT_NAME
----------------------- ----------------------------------------
TABLE                   MY_TABLE

So then, it’s definitely a table.  Let’s find out more of the attributes for that table.


SQL> select table_name
  2  from   user_tables
  3  where  table_name = 'MY_TABLE';

no rows selected

Uh oh….. you can feel the adrenalin levels rising…Is my database corrupt? Do I need to recover ? Maybe I’ll check to see if I can query the table.


SQL> select * from MY_TABLE;

     EMPNO ENAME                          HIREDATE      DEPTNO        SAL
---------- ------------------------------ --------- ---------- ----------
      1234 John                           16-MAR-16         10       1000
      1235 Sue                            04-MAR-16         20       2000
      1236 Jill                           17-DEC-15         30       2200
      1237 Mary                           08-NOV-15         30       3400

That all seems good.  So how did things go so wrong ? Well, it’s actually perfectly normal behaviour. xxx_TABLES is all of the relational tables. And MY_TABLE is not a relational table. I created it like this:


SQL> create or replace
  2  type my_obj as object ( empno int,  ename varchar2(30), hiredate date, deptno int, sal int);
  3  /

Type created.

SQL> create table MY_TABLE of my_obj;

Table created.

And if you want to find out where the object tables are, then naturally Smile you look in xxx_OBJECT_TABLES


SQL> select table_name, table_type
  2  from user_object_tables
  3  where table_name = 'MY_TABLE';

TABLE_NAME                     TABLE_TYPE
------------------------------ ------------------------------
MY_TABLE                       MY_OBJ

The first matching row

I was recently asked when presented with a query along the lines:


SQL> select * from
  2  ( select x
  3    from   t1
  4    where  x = :b1
  5    union all
  6    select x
  7    from   t2
  8    where  x = :b1
  9  )
 10  where rownum = 1
 11  /

and asked – will the “ROWNUM” lead Oracle to be smart enough to stop after the “top” half of the UNION ALL if it gets row back ?

Let’s find out. First we create tablespaces for tables to be stored, and plonk a table in each.


SQL> create tablespace TS1 datafile 'C:\ORACLE\ORADATA\NP12\T1.DBF' size 10m;

Tablespace created.

SQL> create tablespace TS2 datafile 'C:\ORACLE\ORADATA\NP12\T2.DBF' size 10m;

Tablespace created.

SQL> create table T1 ( x number) tablespace ts1;

Table created.

SQL> insert /*+ APPEND */ into T1
  2  select rownum from dual connect by level < 10;

9 rows created.

SQL> create table T2 ( x number) tablespace ts2;

Table created.

SQL> insert /*+ APPEND */ into T2
  2  select rownum from dual connect by level < 10;

9 rows created.

Now the big question is, if I am doing a “find the first row and exit” style of query, will Oracle be smart enough to stop before it hits table T2 ?


SQL> alter tablespace ts2 offline;

Tablespace altered.

SQL> select * from t2;
select * from t2
              *
ERROR at line 1:
ORA-00376: file 9 cannot be read at this time
ORA-01110: data file 9: 'C:\ORACLE\ORADATA\NP12\T2.DBF'


SQL> variable b1 number
SQL> exec :b1 := 1

PL/SQL procedure successfully completed.

SQL> select * from
  2  ( select x
  3    from   t1
  4    where  x = :b1
  5    union all
  6    select x
  7    from   t2
  8    where  x = :b1
  9  )
 10  where rownum = 1;

         X
----------
         1

Yes it does! And just to reinforce it, I’ll reset “b1” to zero, so it will not find anything in T1 and hence will need to move on to T2


SQL> variable b1 number
SQL> exec :b1 := 0

PL/SQL procedure successfully completed.

SQL> select * from
  2  ( select x
  3    from   t1
  4    where  x = :b1
  5    union all
  6    select x
  7    from   t2
  8    where  x = :b1
  9  )
 10  where rownum = 1;
  from   t2
         *
ERROR at line 7:
ORA-00376: file 9 cannot be read at this time
ORA-01110: data file 9: 'C:\ORACLE\ORADATA\NP12\T2.DBF'


SQL>

Nice to know also that even if we need to do dynamic sampling, we’re still not going to get in trouble. I’ll repeat the tests without the APPEND hint, so the tables do not pick up auto statistics.


SQL> create tablespace TS1 datafile 'C:\ORACLE\ORADATA\NP12\T1.DBF' size 10m;

Tablespace created.

SQL> create tablespace TS2 datafile 'C:\ORACLE\ORADATA\NP12\T2.DBF' size 10m;

Tablespace created.

SQL> create table T1 ( x number) tablespace ts1;

Table created.

SQL> insert into T1
  2  select rownum from dual connect by level < 10;

9 rows created.

SQL> create table T2 ( x number) tablespace ts2;

Table created.

SQL> insert into T2
  2  select rownum from dual connect by level < 10;

9 rows created.

SQL> alter tablespace ts2 offline;

Tablespace altered.

SQL> select * from t2;
select * from t2
              *
ERROR at line 1:
ORA-00376: file 9 cannot be read at this time
ORA-01110: data file 9: 'C:\ORACLE\ORADATA\NP12\T2.DBF'


SQL> variable b1 number
SQL> exec :b1 := 1

PL/SQL procedure successfully completed.

SQL> select * from
  2  ( select x
  3    from   t1
  4    where  x = :b1
  5    union all
  6    select x
  7    from   t2
  8    where  x = :b1
  9  )
 10  where rownum = 1;

         X
----------
         1

SQL> variable b1 number
SQL> exec :b1 := 0

PL/SQL procedure successfully completed.

SQL> select * from
  2  ( select x
  3    from   t1
  4    where  x = :b1
  5    union all
  6    select x
  7    from   t2
  8    where  x = :b1
  9  )
 10  where rownum = 1;
  from   t2
         *
ERROR at line 7:
ORA-00376: file 9 cannot be read at this time
ORA-01110: data file 9: 'C:\ORACLE\ORADATA\NP12\T2.DBF'


SQL>
SQL> alter tablespace ts2 online;

Tablespace altered.

SQL> set autotrace on
SQL> variable b1 number
SQL> exec :b1 := 0

PL/SQL procedure successfully completed.

SQL> select * from
  2  ( select x
  3    from   t1
  4    where  x = :b1
  5    union all
  6    select x
  7    from   t2
  8    where  x = :b1
  9  )
 10  where rownum = 1;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 957534384

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    13 |     2   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY       |      |       |       |            |          |
|   2 |   VIEW               |      |     1 |    13 |     2   (0)| 00:00:01 |
|   3 |    UNION-ALL         |      |       |       |            |          |
|*  4 |     TABLE ACCESS FULL| T1   |     1 |    13 |     3   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL| T2   |     1 |    13 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   4 - filter("X"=TO_NUMBER(:B1))
   5 - filter("X"=TO_NUMBER(:B1))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

Tightening up your data model

Having NOT NULL constraints declared on columns that are genuinely not nullable not only is good practice, but can yield performance gains. Here’s a little routine that lists those columns that probably need a NOT NULL constraint. It looks at all columns and

  • check to see if a constraint with definition “COLUMN_NAME” IS NOT NULL is present
  • counts the null values

If the latter is zero, and no constraint was found, then we print out ‘alter table … ‘ DDL for those columns that may need such a constraint defined. It works across a nominated schema and/or single table.


SQL> create or replace
  2  procedure check_uncons_columns_for_null(
  3                   p_schema varchar2,
  4                   p_table_name varchar2 default null) is
  5    cursor c_list is
  6     select t.table_name, t.column_name,
  7            max(t.column_name) over (partition by t.table_name  ) as lastcol
  8     from (
  9       select a.table_name, a.column_name
 10       from dba_tab_columns a
 11       where a.owner = p_schema
 12       and a.table_name = nvl(upper(p_table_name),a.table_name)
 13       ) t,
 14       (
 15       select a.table_name, b.column_name, a.search_condition
 16       from dba_cons_columns b, dba_constraints a
 17       where a.owner = p_schema
 18       and   a.constraint_type = 'C'
 19       and   a.table_name = nvl(upper(p_table_name),a.table_name)
 20       and   a.table_name = b.table_name
 21       and   a.owner = b.owner
 22       and   a.constraint_name = b.constraint_name
 23       ) c
 24    where t.table_name = c.table_name(+)
 25    and   t.column_name = c.column_name(+)
 26    order by 1,2;
 27
 28    str0 varchar2(32767);
 29    str1 varchar2(32767);
 30    str2 varchar2(32767);
 31    str3 varchar2(32767);
 32
 33    search_cond varchar2(32767);
 34
 35    prev varchar2(100) := '*';
 36    cnt number;
 37    trailer varchar2(5) := ','||chr(10);
 38
 39  procedure do_sql(thesql varchar2) is
 40    tcursor integer;
 41    dummy integer;
 42  begin
 43    tcursor := dbms_sql.open_cursor;
 44    dbms_sql.parse(tcursor,thesql,2);
 45    dummy := dbms_sql.execute(tcursor);
 46    dbms_sql.close_cursor(tcursor);
 47  end;
 48
 49  begin
 50  for i in c_list loop
 51
 52    if prev != i.table_name then
 53      str0 := 'declare ';
 54      str1 := 'begin select ';
 55      str2 := ' into ';
 56      str3 := ' '; cnt := 1;
 57    end if;
 58
 59    --
 60    -- approximation only
 61    --
 62    if search_cond is null or search_cond != '"'||i.column_name||'" IS NOT NULL' then
 63      str0 := str0 || 'v'||ltrim(cnt)||' number;';
 64      str1 := str1 || 'sum(decode('||i.column_name||',null,1,0))'||trailer;
 65      str2 := str2 || 'v'||ltrim(cnt)||trailer;
 66      str3 := str3 || 'if v'||ltrim(cnt)||' = 0 then '||
 67        'dbms_output.put_line(''alter table '||p_schema||'.'||i.table_name||
 68        ' modify ('||i.column_name||' not null);''); end if;'||chr(10);
 69    end if;
 70    if i.column_name = i.lastcol then
 71      str1 := rtrim(str1,trailer);
 72      str2 := rtrim(str2,trailer) ||' from '||p_schema||'.'||i.table_name||';';
 73      str3 := rtrim(str3,trailer) ||' end;';
 74      do_sql(str0||' '||str1||' '||str2||' '||str3);
 75    end if;
 76    prev := i.table_name;
 77    cnt := cnt + 1;
 78
 79
 80  end loop;
 81  end;
 82  /

Procedure created.

SQL> set serverout on
SQL> exec check_uncons_columns_for_null(user)
alter table SCOTT.CHD modify (A not null);
alter table SCOTT.CHD modify (DATA not null);
alter table SCOTT.CHD modify (F not null);
alter table SCOTT.CHD modify (H not null);
alter table SCOTT.CHD modify (W1 not null);
alter table SCOTT.CHD modify (W2 not null);
alter table SCOTT.PAR modify (A not null);
alter table SCOTT.PAR modify (DATA not null);
alter table SCOTT.PAR modify (F not null);
alter table SCOTT.PAR modify (W not null);

PL/SQL procedure successfully completed.

SQL>
SQL>

TOP-N histograms on 12c

I had an interesting question on AskTom today, with a nicely provided test case !!!,  so I thought I’d share it here with some additional commentary.

The basic premise of the TOP-N histogram in 12c is that when the number of distinct values in a column exceeded the number of available buckets for histograms, the historical fall-back position was to go to the height-balanced histograms, which is better than no histogram at all, but loses much of the precision we’d like to see from (say) a frequency based histogram. The TOP-N histogram represents a “halfway house” between the two histogram types. When we exceed the number of available histogram buckets, it might be the case that the overwhelming majority of rows do fall within the desired number of buckets, with just a few outliers being the proverbial “straw the broke the camels back”.

In such an instance, it makes more sense to perhaps store a frequency histogram for the most prevalent values, and discard (aka, lump into a single “nonpopular” bucket) the outliers on the premise that either they will not be queried anyway, or that even if they do, their “absence” from the histogram lets us come up with an appropriate cost of access anyway.

(* – I’m only using the term “nonpopular” to align with the documentation…perhaps an unpopular choice Smile )

Let’s look at an example of this in action. We’ll use ALL_OBJECTS as a base, and the OWNER column will be ideal for a skewed set of data.


SQL> create table t1 as select * from all_objects;

Table created.

SQL> select column_name,histogram,num_nulls,num_distinct,sample_size,density
  2  from user_tab_col_statistics
  3  where table_name='T1'
  4  and column_name='OWNER';

COLUMN_NAME          HISTOGRAM        NUM_NULLS NUM_DISTINCT SAMPLE_SIZE    DENSITY
-------------------- --------------- ---------- ------------ ----------- ----------
OWNER                NONE                     0           38       94322 .026315789

1 row selected.

SQL> select owner,count(*)
  2  from t1
  3  group by owner
  4  order by 2 desc;

OWNER                            COUNT(*)
------------------------------ ----------
SYS                                 42057
PUBLIC                              37138
APEX_050000                          3192
ORDSYS                               3157
APEX_040200                          3021
MDSYS                                1647
SYSTEM                                589
MCDONAC                               470
XDB                                   403
CTXSYS                                401
WMSYS                                 371
SH                                    300
DVSYS                                 292
ORDDATA                               274
LBACSYS                               237
ORDS_METADATA                         184
GSMADMIN_INTERNAL                     104
OE                                     93
DBSNMP                                 55
IX                                     48
SOE                                    47
SCOTT                                  39
HR                                     34
OLAPSYS                                25
OJVMSYS                                22
APEX_LISTENER                          22
DVF                                    19
FLOWS_FILES                            12
ORDPLUGINS                             10
PM                                     10
AUDSYS                                  9
SI_INFORMTN_SCHEMA                      8
OUTLN                                   8
BI                                      8
ORACLE_OCM                              6
APPQOSSYS                               5
TOP_DOG                                 3
JOE_SCHMO                               2

38 rows selected.

So we can see that SYS, PUBLIC et al are popular values, whereas JOE_SCHMO is not, and for example, APEX_PUBLIC_USER even less so (because it does not appear in the table at all, but is indeed a valid user). Now we’ll calculate a histogram with 25 buckets (which is less than the 38 distinct values in the column) but represents the majority of all data in the table.


SQL> begin
  2     dbms_stats.gather_table_stats(user,'T1',
  3             method_opt=>'for columns OWNER size 25');
  4  end;
  5  /

PL/SQL procedure successfully completed.

SQL> select
  2  column_name,histogram,num_nulls,num_distinct,sample_size,density
  3  from user_tab_col_statistics
  4  where table_name='T1'
  5  and column_name='OWNER';

COLUMN_NAME          HISTOGRAM        NUM_NULLS NUM_DISTINCT SAMPLE_SIZE    DENSITY
-------------------- --------------- ---------- ------------ ----------- ----------
OWNER                TOP-FREQUENCY            0           38       94322 .000005301

1 row selected.

We can see the new 12c histogram, the TOP-FREQUENCY which has done what was alluded to above, namely a frequency histogram the popular values rather than a generic height-balanced one.  Now lets count the number of rows in the table, and look at the end point values for the values in the histogram.  We’ll need those shortly.


SQL> col c new_value nvals
SQL> select count(*) c from t1;

         C
----------
     94322

1 row selected.

SQL> select endpoint_number,endpoint_actual_value
  2  from user_tab_histograms
  3  where table_name ='T1'
  4  and column_name ='OWNER';

ENDPOINT_NUMBER ENDPOINT_ACTUAL_VALUE
--------------- ---------------------
           3021 APEX_040200
           6213 APEX_050000
           6235 APEX_LISTENER
           6636 CTXSYS
           6691 DBSNMP
           6983 DVSYS
           7087 GSMADMIN_INTERNAL
           7121 HR
           7169 IX
           7406 LBACSYS
           7876 MCDONAC
           9523 MDSYS
           9616 OE
           9641 OLAPSYS
           9915 ORDDATA
          13072 ORDSYS
          13256 ORDS_METADATA
          50394 PUBLIC
          50433 SCOTT
          50733 SH
          50780 SOE
          92837 SYS
          93426 SYSTEM
          93797 WMSYS
          94200 XDB

25 rows selected.

Our first query now shows the benefit of this kind of histogram. When I query for a popular value (one in the histogram), I get very good cost estimation.  There are 403 XDB rows and the execution plan got it spot on.


SQL> set autotrace on explain
SQL> select count(*) from t1
  2  where owner ='XDB';

  COUNT(*)
----------
       403

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |   441   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   403 |  2418 |   441   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"='XDB')


So what do we do with an nonpopular value ?  After all, it is not in the histogram.  The documentation tells us:

image

So let’s give that a go, and see if the costs align.


SQL> set autotrace off
SQL> select density
  2  from user_tab_col_statistics
  3  where table_name='T1'
  4  and column_name='OWNER';

   DENSITY
----------
.000005301

1 row selected.

SQL>
SQL>
SQL> select &&nvals*density
  2  from user_tab_col_statistics
  3  where table_name='T1'
  4  and column_name='OWNER';

94322*DENSITY
-------------
           .5

1 row selected.

So according to this, we would expect a cost value of 1 (ie, 0.5 rounded up), but look what happens:



SQL> set autotrace on explain
SQL> select count(*) from t1
  2  where owner ='APEX_PUBLIC_USER';

  COUNT(*)
----------
         0

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     6 |   441   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     9 |    54 |   441   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"='APEX_PUBLIC_USER')

SQL>
SQL>

The challenge now becomes, where does the “9” come from ? The secret lies in the 10053 trace file.  When we look at this, some new information comes to light.



SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for T1[T1] 
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Column (#1): 
    NewDensity:0.000099, OldDensity:0.000005 BktCnt:94208.000000, PopBktCnt:94208.000000, PopValCnt:25, NDV:38
  Column (#1): OWNER(VARCHAR2)
    AvgLen: 6 NDV: 38 Nulls: 0 Density: 0.000099
    Histogram: Top-Freq  #Bkts: 94208  UncompBkts: 94208  EndPtVals: 25  ActualVal: yes


Notice the presence of a value “NewDensity”. If we use that value instead of the “density” from user_tab_cols, then we arrive at our estimate of 9 rows as per the explain plan. Since we are doing a costing for a nonpopular value, the NewDensity is a density designed to more closer reflect that of the nonpopular values, not all of rows.

“NewDensity” appears to be: number of nonpopular rows / ( number of distinct nonpopular values * num_rows )