Hybrid histograms

Just a quick post here so I could consolidate some information about histograms in 12c.

On my last Office Hours session, one of the questions that was posed was asking for an explanation of the new types of histograms in use in 12c.  So I had a few slides and described conceptually at a high level how they work and what they represent.  If you’re new to 12c, or new to optimizer histograms in general, then take a look at the video below to get up to speed.  But for those who want to dig into a deeper level, I also saw recently two absolutely cracking articles which discuss the algorithms and implementation in much more detail.  So I’ve linked them here as well.

http://www.oracle.com/technetwork/articles/database/histogram-construction-oracle-12c-4426845.html

http://www.oracle.com/technetwork/articles/database/maintenance-histograms-db-12c-4426850.html

On that note…if you celebrate Easter – have a safe and happy break.  So you on the other side of the weekend!

Hare, Easter Bunny, Plush Bunny, Easter

Trip down memory lane

I did a little video for St Patricks day, but it also brought back memories of my first computer experiences.  A Sinclair ZX80, a Commodore 64, and many other machines that I thought were so cool for their time.

Feel free to share your experiences in comments.

Text indexes for numbers

We had an AskTOM question recently about being able to search for numbers within a concatenated list. The particular issue was a list of mobile phone numbers, but this is really just about tackling the larger issue of finding numbers within list.

Let’s create a simple example to see where things can break down (and how we can fix them).


SQL> create table t (  cid int, mobs varchar2(1000));

Table created.

SQL>
SQL> insert into t
  2  select c, listagg(num,',') within group ( order by num )
  3  from
  4  ( select mod(rownum,1000) c, trunc(dbms_random.value(1000000,9999999)) num from dual connect by level <= 4000 )
  5  group by c;

1000 rows created.

SQL>
SQL> exec dbms_stats.gather_table_stats('','T');

PL/SQL procedure successfully completed.

SQL>
SQL> select * from t
  2  where rownum <= 10;

       CID MOBS
---------- --------------------------------------------------
         0 2644307,3565512,5481105,7725189
         1 1570287,2092729,6127058,6546683
         2 6018800,6408347,6592531,8456137
         3 2087673,3086382,6692756,9377699
         4 2964558,3887606,6305557,7441515
         5 2219544,4331436,5246494,5303583
         6 1005450,1625403,2271986,4493049
         7 2605217,5143371,7444316,9073658
         8 1205487,4660509,5148296,9578099
         9 3736741,8385346,8758352,9496363

10 rows selected.

My sample table has 1000 rows and there a four “mobile phone” numbers concatenated into a list for each row.

(Side note: I’m not using 555-prefixed numbers like you’ll see in the movies Smile Why phone numbers in movies start with 555)

Now let us try query the table for one of the phone numbers as highlighted in red in the previous list.


SQL>
SQL> select * from t where mobs like '%7444316%';

       CID MOBS
---------- ----------------------------------------
         7 2605217,5143371,7444316,9073658

1 row selected.

SQL> set autotrace traceonly explain
SQL> select * from t where mobs like '%7444316%';

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    50 |  1800 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    50 |  1800 |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MOBS" LIKE '%7444316%' AND "MOBS" IS NOT NULL)

I got the correct row back from the query, but the full scan of the table might be an issue as this table grows to larger sizes. (I’m working on the assumption here that a search for a single number will never return a large number of rows).

Since this is a “term that could be anywhere within the string” style of search, creating a Text index on the column seems an obvious choice. So I’ll do that and try again:


SQL> set autotrace off
SQL>
SQL> create index ix on t ( mobs ) indextype is ctxsys.context;

Index created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from t where contains(mobs,'7444316') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1339481741

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    36 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |     1 |    36 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | IX   |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("MOBS",'7444316')>0)

That is looking a lot better. I’ll be able to take advantage of this index….or so it might first appear! (Cue ominous sounding music…)

There is an ever so slight problem here – we don’t get the rows we need!


SQL> set autotrace off
SQL> select * from t where contains(mobs,'7444316') > 0;

no rows selected

In fact, even if we concocted a search for the value of the entire column, we still do not get a result back from our query.



SQL> select * from t where contains(mobs,'2605217,5143371,7444316,9073658') > 0;

no rows selected

You need to careful with indexing terms that are not what could be thought of as “common” text. After all, it is a Text index, and by default, the assumption here is that we are indexing document style text.

But the solution is easy – we just need to manipulate the standard lexer to avoid common numeric separators (comma, period, etc) perturbing our index entries. I’m opting for a tilde (~) here because it does not appear in the source column.



SQL>
SQL> drop index ix;

Index dropped.

SQL> begin
  2        ctx_ddl.drop_preference('my_lexer');
  3        ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER');
  4        ctx_ddl.set_attribute('my_lexer', 'numgroup', '~');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> create index ix on t ( mobs ) indextype is ctxsys.context PARAMETERS('lexer my_lexer');

Index created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from t
  2  where contains(mobs,'7444316') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1339481741

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    36 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |     1 |    36 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | IX   |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("MOBS",'7444316')>0)

SQL>
SQL> set autotrace off
SQL> select * from t
  2  where contains(mobs,'7444316') > 0;

       CID MOBS
---------- --------------------------------------------------
         7 2605217,5143371,7444316,9073658

1 row selected.

Our execution plan still shows that we can take advantage of the Text index, but I’m also getting the results I expected.

As an aside, Text indexes are one of those gems in the Oracle database that often gets a great amount of new functionality with each new release. So every time you upgrade, take a fresh look at the Text Index documentation. You might get a lot of cool ideas for how to use them in your applications.

Cool stuff with partition elimination

Sometimes in the IT world, the term “surprise” is not a good one.

“I woke up this morning and got a surprise…my database was down.”

“I ran a SELECT COUNT(*) on my most important table, and got a surprise result of zero rows.”

and so forth. Generally as IT professionals, encountering the unexpected is not a good start to the day Smile.

But sometimes, surprises can be a nice thing.  Here is my example for today – when I found that the database can do a remarkably good job when it comes to reducing the workload with partitioned tables.  

Most people will already be familiar with the concept of partition elimination.  If you have a table partitioned into (say) yearly segments, and you ask for all of the data for the past 2 years, then the optimizer is intelligent enough to only scan the relevant partitions rather than the entire table.  Here is an example of that in action. I have a table that is partitioned by year on a END_DATE column, and then sub-partitioned into quarters based on a START_DATE column.


SQL> create table t
  2      partition by range( end_dt )
  3      subpartition by range( start_dt )
  4      (
  5         partition p_2014 values less than ( to_date('01-Jan-2015','dd-mon-yyyy') )
  6                 (
  7                         subpartition sp_2014_q1 values less than ( to_date('01-apr-2014','dd-mon-yyyy') ) ,
  8                         subpartition sp_2014_q2 values less than ( to_date('01-jul-2014','dd-mon-yyyy') ) ,
  9                         subpartition sp_2014_q3 values less than ( to_date('01-oct-2014','dd-mon-yyyy') ) ,
 10                         subpartition sp_2014_q4 values less than ( to_date('01-jan-2015','dd-mon-yyyy') )
 11                ) ,
 12        partition p_2015 values less than ( to_date('01-Jan-2016','dd-mon-yyyy') )
 13                (
 14                        subpartition sp_2015_q1 values less than ( to_date('01-apr-2015','dd-mon-yyyy') ) ,
 15                        subpartition sp_2015_q2 values less than ( to_date('01-jul-2015','dd-mon-yyyy') ) ,
 16                        subpartition sp_2015_q3 values less than ( to_date('01-oct-2015','dd-mon-yyyy') ) ,
 17                        subpartition sp_2015_q4 values less than ( to_date('01-jan-2016','dd-mon-yyyy') )
 18                )
 19     )
 20     as
 21     select a.* ,
 22        to_date('01-Jan-2014','dd-mon-yyyy') + mod(rownum,730)-1 as start_dt,
 23        to_date('01-Jan-2014','dd-mon-yyyy') + mod(rownum,730) as end_dt
 24     from all_objects a;

Table created.

SQL>
SQL> alter table t modify start_dt not null;

Table altered.

SQL> alter table t modify end_dt not null;

Table altered.

SQL>
SQL> set autotrace traceonly explain
SQL> select *
  2      from t
  3      where start_dt = to_date('13-Jan-2015','dd-mon-yyyy') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 2183203583

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL    |      |   101 | 15049 |   491   (1)| 00:00:01 |     1 |     2 |
|   2 |   PARTITION RANGE SINGLE|      |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
|*  3 |    TABLE ACCESS FULL    | T    |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_DT"=TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


You can see that we had to scan all of the partitions, but within each of the partitions we only had to scan a single sub-partition (as indicated by “PARTITION RANGE SINGLE”).  So for each year (based on END_DT) we scanned a single one of the 4 subpartitions.  We are doing only 25% of the work of scanning the entire table.  But one pleasant surprise I saw today was how the optimizer can take advantage of additional information to improve things even more.  Let us now add a fairly obvious rule about the data:


SQL>
SQL> alter table t add constraint t_chk check( start_dt < end_dt );

Table altered.


And have another look at that query execution plan.


SQL>
SQL> set autotrace traceonly explain
SQL> select *
  2      from t
  3      where start_dt = to_date('13-Jan-2015','dd-mon-yyyy') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3874588989

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |   101 | 15049 |   247   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE |      |   101 | 15049 |   247   (1)| 00:00:01 |     2 |     2 |
|   2 |   PARTITION RANGE SINGLE|      |   101 | 15049 |   247   (1)| 00:00:01 |     1 |     1 |
|*  3 |    TABLE ACCESS FULL    | T    |   101 | 15049 |   247   (1)| 00:00:01 |     5 |     5 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_DT"=TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "END_DT">TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

SQL>
SQL>

How cool is that!  The constraint let us remove even more partitions from consideration. In fact, we ended up only scanning a single partition. Moreover, we never specified END_DT as a predicate, but you can see in the FILTER section, we could synthesize such a predicate using the rule defined by the check constraint we added.

I suppose the moral of the story is two-fold here.

1) The optimizer can be pretty smart with partition elimination,

2) As always, never hide rules and facts about the data from the database.  Make sure you define those constraints in the database tier.

ROWNUM and ORDER BY

Just a quick revisit on an old topic so I could link to this post in a Stack Overflow discussion.



SQL> create table t ( x int );

Table created.

SQL>
SQL> insert into t
  2  select dbms_random.value(1,1000)
  3  from dual
  4  connect by level <= 100;

100 rows created.

SQL>
SQL> select count(*) from t;

  COUNT(*)
----------
       100

1 row selected.

--
-- DOES NOT WORK
--
SQL>
SQL> select * from t
  2  where rownum <= 10
  3  order by x desc;

         X
----------
       995
       909
       873
       871
       640
       555
       318
       197
       134
        70

10 rows selected.

--
-- CORRECT ALTERNATIVES
--

SQL>
SQL> select *
  2  from
  3    ( select * from t order by x desc )
  4  where rownum <= 10;

         X
----------
      1000
       995
       991
       977
       977
       971
       955
       909
       909
       900

10 rows selected.

SQL> select * from t
  2  order by x desc
  3  fetch first 10 rows only;

         X
----------
      1000
       995
       991
       977
       977
       971
       955
       909
       909
       900

10 rows selected.

SQL>
SQL> select x
  2  from
  3    ( select t.*,
  4         row_number() over ( order by x desc ) r
  5      from t  )
  6  where r <= 10;

         X
----------
      1000
       995
       991
       977
       977
       971
       955
       909
       909
       900

10 rows selected.

As you can see, you cannot simply have WHERE ROWNUM and ORDER BY in the same SELECT statement if you are after the “top” rows based on some criteria.

Choose your names wisely

It all seems simple enough.  You pick a name for an object…and that is the only object that have than name right ?  For example:


SQL> create table MY_UNIQUE_NAME ( x int );

Table created.

SQL>
SQL> create synonym MY_UNIQUE_NAME for MY_UNIQUE_NAME;
create synonym MY_UNIQUE_NAME for MY_UNIQUE_NAME
*
ERROR at line 1:
ORA-01471: cannot create a synonym with same name as object


SQL>
SQL> create view MY_UNIQUE_NAME as select * from MY_UNIQUE_NAME;
create view MY_UNIQUE_NAME as select * from MY_UNIQUE_NAME
            *
ERROR at line 1:
ORA-00955: name is already used by an existing object


SQL>
SQL> create package MY_UNIQUE_NAME is
  2    x int;
  3  end;
  4  /
create package MY_UNIQUE_NAME is
*
ERROR at line 1:
ORA-00955: name is already used by an existing object


So naturally we’d expect an error if we continued on with other object types.  But then this happens ?


SQL> create or replace
  2  trigger MY_UNIQUE_NAME
  3  before insert on MY_UNIQUE_NAME
  4  begin
  5    null;
  6  end;
  7  /

Trigger created.

SQL>
SQL> create index MY_UNIQUE_NAME on MY_UNIQUE_NAME ( x );

Index created.

It might all seem a bit random.  But this is all related to what is known as the NAMESPACE, that is, names of certain objects must be unique within a particular NAMESPACE.  We can see the namespaces for the objects by querying DBA_OBJECTS



SQL> select distinct namespace, object_type from dba_Objects order by 1,2;

 NAMESPACE OBJECT_TYPE
---------- -----------------------
         1 CREDENTIAL
         1 DESTINATION
         1 FUNCTION
         1 INDEXTYPE
         1 JAVA CLASS
         1 JOB
         1 JOB CLASS
         1 LIBRARY
         1 OPERATOR
         1 PACKAGE
         1 PROCEDURE
         1 PROGRAM
         1 SCHEDULE
         1 SCHEDULER GROUP
         1 SEQUENCE
         1 SYNONYM
         1 TABLE
         1 TABLE PARTITION
         1 TABLE SUBPARTITION
         1 TYPE
         1 VIEW
         1 WINDOW
         2 PACKAGE BODY
         2 TYPE BODY
         3 TRIGGER
         4 INDEX
         4 INDEX PARTITION
         5 CLUSTER
         8 LOB
         8 LOB PARTITION
         9 DIRECTORY
        10 QUEUE
        13 JAVA SOURCE
        14 JAVA RESOURCE
        19 MATERIALIZED VIEW
        21 CONTEXT
        23 RULE SET
        24 CONSUMER GROUP
        24 RESOURCE PLAN
        25 XML SCHEMA
        32 JAVA DATA
        36 RULE
        38 EVALUATION CONTEXT
        51 UNDEFINED
        52 UNDEFINED
        64 EDITION
        88 CONTAINER
        93 UNIFIED AUDIT POLICY
       132 LOCKDOWN PROFILE
           DATABASE LINK

50 rows selected.



Because indexes, triggers are in a different namespace to tables, synonyms, procedures, packages etc, they can share the same name.

I’d still probably recommend unique names across all namespaces, just so there is no confusion.

connor_speaking