Hyper-partitioned index avoidance thingamajig

As you can tell, I have no idea on a name for what I am about to describe. So let me start from the beginning, and set the scene for an idea I have to utilize a cool new 18c feature.

Often in a transactional-style system the busiest table (let us call it SALES for the sake of this discussion) is also

  • the biggest table, after all, it has all of our sales in it,
  • the most demanded for table, in that, almost every query in our application wants to access it in some way shape or form.

This is in effect the database version of the Pareto Principle. Everyone wants a slice of that SALES “pie”, and the piece of that pie that is in most demand is typically the most recent data. Your application may have pages that will be showing:

  • sales in the past hour,
  • sales in the past day,
  • products sold in the past hour,
  • largest purchase amount for sales in the past few hours,
  • suspicious activity for today,
  • A customer will want their sales for the last few days

The list goes on and on, the common theme being that the data being requested is bound by a range of time in the recent past. Our SALES table already will have a primary key, probably being some sort of unique transaction ID for each purchase, but to the satisfy the style of requests in the list above, we will probably need:

  • an index on the time(stamp) of the sale,
  • an index on the product ID that was sold,
  • an index on the customer ID who made the purchase,
  • potentially even an index on the sale amount

You can see the troubled waters into which we are sailing here. It is our biggest and busiest table, and here we are, adding index after index after index to improve query performance, whilst at the same time:

  1. adding overhead/contention to DML operations on the SALES table,
  2. increasing the size of the database,
  3. increasing the duration of the backups,
  4. increasing time for maintenance and copies to Development and Test

None of this is looking great but we might be thinking “What choice do we have?”

Here is perhaps an alternate strategy: Let’s not create any indexes.  The challenges (1) through (4) above evaporate to non-existence. But of course, now we have a remaining challenge in getting those queries to run efficiently.

I’ll tackle that in a different way – I will partition the SALES table very “aggressively”, hence my blog post title “hyper-partitioning”. Considering the typical query requirements I listed above, I will partition my SALES down to as small as a partition for every hour.


SQL> create table sales
  2    ( ts   timestamp,
  3      id   int,
  4      amt  number,
  5      product int,
  6      customer int,
  7      item_cnt int,
  8      terminal int,
  9      operator int,
 10      credit_card int,
         ...
         ...
 17    )
 18  partition by range ( ts )
 19  interval ( numtodsinterval(1,'HOUR') )
 20  (
 21    partition p1 values less than ( timestamp '2018-07-01 00:00:00' )
 22  );

Table created.

SQL>
SQL> insert /*+ APPEND */ into sales (ts,id,amt,product,customer)
  2  select date '2018-07-01' + rownum / 240, rownum, dbms_random.value(1,100),
  3        dbms_random.value(1,100),dbms_random.value(1,100)
  4  from dual
  5  connect by level 
SQL> set serverout on
SQL> declare
  2    h varchar2(1000);
  3  begin
  4  for i in (
  5    select partition_name, high_value
  6    from user_tab_partitions
  7    where table_name = 'SALES'
  8    and   interval = 'YES'
  9    and   partition_name like 'SYS_P%'
 10    order by partition_position
 11  ) loop
 12    h := i.high_value;
 13    execute immediate 'select to_char('||h||'-0.01,''yyyymmdd_hh24'') from dual' into h;
 14    execute immediate 'alter table sales rename partition '||i.partition_name||' to p'||h;
 15  end loop;
 16  end;
 17  /

PL/SQL procedure successfully completed.

SQL>
SQL> col high_value format a60
SQL> select partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'SALES'
  4  order by partition_position;

PARTITION_NAME                 HIGH_VALUE
------------------------------ ------------------------------------------------------------
P1                             TIMESTAMP' 2018-07-01 00:00:00'
P20180701_00                   TIMESTAMP' 2018-07-01 01:00:00'
P20180701_01                   TIMESTAMP' 2018-07-01 02:00:00'
P20180701_02                   TIMESTAMP' 2018-07-01 03:00:00'
P20180701_03                   TIMESTAMP' 2018-07-01 04:00:00'
P20180701_04                   TIMESTAMP' 2018-07-01 05:00:00'
P20180701_05                   TIMESTAMP' 2018-07-01 06:00:00'
P20180701_06                   TIMESTAMP' 2018-07-01 07:00:00'
P20180701_07                   TIMESTAMP' 2018-07-01 08:00:00'
P20180701_08                   TIMESTAMP' 2018-07-01 09:00:00'
P20180701_09                   TIMESTAMP' 2018-07-01 10:00:00'
P20180701_10                   TIMESTAMP' 2018-07-01 11:00:00'
P20180701_11                   TIMESTAMP' 2018-07-01 12:00:00'
P20180701_12                   TIMESTAMP' 2018-07-01 13:00:00'
P20180701_13                   TIMESTAMP' 2018-07-01 14:00:00'
P20180701_14                   TIMESTAMP' 2018-07-01 15:00:00'
P20180701_15                   TIMESTAMP' 2018-07-01 16:00:00'
P20180701_16                   TIMESTAMP' 2018-07-01 17:00:00'
P20180701_17                   TIMESTAMP' 2018-07-01 18:00:00'
P20180701_18                   TIMESTAMP' 2018-07-01 19:00:00'
P20180701_19                   TIMESTAMP' 2018-07-01 20:00:00'
P20180701_20                   TIMESTAMP' 2018-07-01 21:00:00'
P20180701_21                   TIMESTAMP' 2018-07-01 22:00:00'
P20180701_22                   TIMESTAMP' 2018-07-01 23:00:00'
P20180701_23                   TIMESTAMP' 2018-07-02 00:00:00'
P20180702_00                   TIMESTAMP' 2018-07-02 01:00:00'
P20180702_01                   TIMESTAMP' 2018-07-02 02:00:00'
P20180702_02                   TIMESTAMP' 2018-07-02 03:00:00'
P20180702_03                   TIMESTAMP' 2018-07-02 04:00:00'
P20180702_04                   TIMESTAMP' 2018-07-02 05:00:00'
P20180702_05                   TIMESTAMP' 2018-07-02 06:00:00'
P20180702_06                   TIMESTAMP' 2018-07-02 07:00:00'
P20180702_07                   TIMESTAMP' 2018-07-02 08:00:00'
P20180702_08                   TIMESTAMP' 2018-07-02 09:00:00'
P20180702_09                   TIMESTAMP' 2018-07-02 10:00:00'
P20180702_10                   TIMESTAMP' 2018-07-02 11:00:00'
P20180702_11                   TIMESTAMP' 2018-07-02 12:00:00'
P20180702_12                   TIMESTAMP' 2018-07-02 13:00:00'
P20180702_13                   TIMESTAMP' 2018-07-02 14:00:00'
P20180702_14                   TIMESTAMP' 2018-07-02 15:00:00'
P20180702_15                   TIMESTAMP' 2018-07-02 16:00:00'

41 rows selected.

I’ve run a small anonymous block to rename the (system-named) interval partitions into some sensible names to reflect the date range the partition covers. Let’s now look at the typical queries we will now be performing on the SALES table:


SQL> set autotrace traceonly explain
SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00';

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     9 |   387 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     9 |   387 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     9 |   387 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00'
  3  and product = 12;

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    43 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PRODUCT"=12 AND "TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL>
SQL> select * from sales
  2  where ts > timestamp '2018-07-02 15:00:00'
  3  and customer = 25;

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |       |     1 |    43 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR|       |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
|*  2 |   TABLE ACCESS FULL      | SALES |     1 |    43 |     7   (0)| 00:00:01 |    41 |1048575|
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("CUSTOMER"=25 AND "TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL>
SQL> select max(amt) from sales
  2  where ts > timestamp '2018-07-02 15:00:00';

---------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |       |     1 |    33 |     7   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE           |       |     1 |    33 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|       |     9 |   297 |     7   (0)| 00:00:01 |    41 |1048575|
|*  3 |    TABLE ACCESS FULL      | SALES |     9 |   297 |     7   (0)| 00:00:01 |    41 |1048575|
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("TS">TIMESTAMP' 2018-07-02 15:00:00.000000000')

SQL> set autotrace off

All of them scan a tiny portion of the data, namely, just the hours of sales data relevant to the query, and the query response times will be relatively consistent for all cases no matter which customer, product or other predicate will be passed because the data to be scanned is a fixed number of hours.

But there’s a problem here. If I am partitioning to the hour, or even to the minute…then it won’t be long before I have a lot of partitions. In the latter case (minutes) I will be up to over 500,000 partitions in the first year of SALES alone! That is a lot of database metadata to store. There is the partitions themselves, plus optimizer statistics on them, plus historical optimizer statistics, plus potentially histograms on every column.  A lot of optimizer data might lead to expensive parse times because there is just so much information to wade through when optimizing queries.

But we only need the extreme granularity of partitions for the SALES table for today. Once today ticks over and becomes “yesterday”, then we might only need a partition for the entire day.  And once “yesterday” ticks over to “last week”, then maybe only weekly partitions are needed and so forth.

One of the cool things in 18c is ability to do this style of maintenance with negligible disruption to service. Because I have named my partitions in a logical fashion, here is a simple routine to merge “yesterdays” hourly partitions into a single one for the day.


SQL> set serverout on
SQL> declare
  2    d date := date '2018-07-01';
  3    ddl varchar2(4000);
  4  begin
  5    select listagg(partition_name||chr(10),',') within group ( order by partition_position )
  6    into   ddl
  7    from   user_tab_partitions
  8    where  table_name = 'SALES'
  9    and    partition_name like 'P'||to_char(d,'yyyymmdd')||'%';
 10
 11    ddl := 'alter table sales merge partitions '||ddl||' into partition p'||to_char(d,'yyyymmdd')||' online';
 12
 13    dbms_output.put_line(ddl);
 14    execute immediate ddl;
 15  end;
 16  /
alter table sales merge partitions
 P20180701_00
,P20180701_01
,P20180701_02
,P20180701_03
,P20180701_04
,P20180701_05
,P20180701_06
,P20180701_07
,P20180701_08
,P20180701_09
,P20180701_10
,P20180701_11
,P20180701_12
,P20180701_13
,P20180701_14
,P20180701_15
,P20180701_16
,P20180701_17
,P20180701_18
,P20180701_19
,P20180701_20
,P20180701_21
,P20180701_22
,P20180701_23
 into partition p20180701 online

PL/SQL procedure successfully completed.

Now I have a single partition for yesterday’s data, and hourly partitions for today’s data.


SQL> col high_value format a60
SQL> select partition_name, high_value
  2  from user_tab_partitions
  3  where table_name = 'SALES'
  4  order by partition_position;

PARTITION_NAME                 HIGH_VALUE
------------------------------ -------------------------------
P1                             TIMESTAMP' 2018-07-01 00:00:00'
P20180701                      TIMESTAMP' 2018-07-02 00:00:00'
P20180702_00                   TIMESTAMP' 2018-07-02 01:00:00'
P20180702_01                   TIMESTAMP' 2018-07-02 02:00:00'
P20180702_02                   TIMESTAMP' 2018-07-02 03:00:00'
P20180702_03                   TIMESTAMP' 2018-07-02 04:00:00'
P20180702_04                   TIMESTAMP' 2018-07-02 05:00:00'
P20180702_05                   TIMESTAMP' 2018-07-02 06:00:00'
P20180702_06                   TIMESTAMP' 2018-07-02 07:00:00'
P20180702_07                   TIMESTAMP' 2018-07-02 08:00:00'
P20180702_08                   TIMESTAMP' 2018-07-02 09:00:00'
P20180702_09                   TIMESTAMP' 2018-07-02 10:00:00'
P20180702_10                   TIMESTAMP' 2018-07-02 11:00:00'
P20180702_11                   TIMESTAMP' 2018-07-02 12:00:00'
P20180702_12                   TIMESTAMP' 2018-07-02 13:00:00'
P20180702_13                   TIMESTAMP' 2018-07-02 14:00:00'
P20180702_14                   TIMESTAMP' 2018-07-02 15:00:00'
P20180702_15                   TIMESTAMP' 2018-07-02 16:00:00'

18 rows selected.

I stress – this is not my recommendation to race out and partition every transactional table you have, and drop all of the indexes Smile. But it with so many online partitioning operations in 18c, it raises some exciting new opportunities there were not available in previous releases. So start thinking about how you can exploit this to get advantages with the partitioning option that might sit “outside the box” of the standard usage.

18.3 As easy as 1…2…3

Well, finally it’s here! 18c for on-premise installation so the world can all get stuck into the cool new features of the latest release on their own laptops Smile  At least that is what I’ll be doing!

Naturally as soon as I heard the news, I downloaded the software and got ready to set aside the day for installation and creation of an 18c database. But I didn’t need that long – I didn’t need that long at all. Just a few clicks and a few commands and there it was – my 18c database up and running.

Check out how easy it is with my three videos.

Software Installation

Listener Creation

Database Creation

It really is as easy as 1…2…3

Enjoy 18c !

More triggers are better

Yes, you heard me correctly. If you have got one trigger on a table, then you might be surprised to find that perhaps having a second one will be a better option. Then again, I also love the sweet scent of a clickbaity, inflammatory blog post title to draw the readers in Smile so you’ll just have to read on to see which is true.

As much as I am not a fan of using triggers to populate metadata in tables, I also concede that it is a popular approach taken by many of us as we code up our applications. Sure, we’d like to have our application code store that all important “who did this change” information with each database row, but more often than not, this get shoe-horned into the codebase via triggers as an afterthought to the development process.

But our well-read developer will often be thinking: “I’m a savvy developer – I’ve been reading blogs, and websites that tell me that less triggers is better, so I’ll use as few as I can” which leads to a scenario like the one I’ll present now.

I’ll start with a standard parent and child table setup, connected via an obvious foreign key


SQL> create table par (
  2    id           number(10)    not null primary key,
  3    description  varchar2(20)  not null,
  4    created_by   varchar2(20)  not null,
  5    updated_by   varchar2(20)  not null
  6    );

Table created.

SQL>
SQL> create sequence par_seq;

Sequence created.

SQL>
SQL> create sequence chd_seq;

Sequence created.

SQL>
SQL> create table chd (
  2    id           number(10)    not null primary key,
  3    par_id       number(10)    not null,
  4    constraint  fk foreign key ( par_id ) references par ( id )
  5    );

Table created.

SQL>
SQL>
SQL> create index chd_fk_idx on chd (par_id);

Index created.

I’ve got a sequence for each table, so in order to “bind” that sequence to the primary key for each table I’ll create a trigger.  So I’ll add some trigger code to implement some other common application requirements at the same time:

  • Populate the primary key with a sequence value,
  • Populate the CREATED_BY, UPDATED_BY columns on insert of a new row
  • Amend the UPDATED_BY column when I later update that row

Here is a trigger to implement that.


SQL> create or replace trigger trg_par
  2  before insert or update on par
  3  for each row
  4  begin
  5      if inserting then
  6         :new.id := par_seq.nextval;
  7         :new.created_by := user;
  8         :new.updated_by := user;
  9      end if;
 10
 11      if updating then
 12         :new.updated_by := user;
 13      end if;
 14  end;
 15  /

Trigger created.

Now my application is ready go.  I’ll insert some data to verify that my trigger has not broken any functionality.


SQL> insert into par (description) values ('test');

1 row created.

SQL> insert into par (description) values ('test2');

1 row created.

SQL> select * from par;

  ID DESCRIPTION          CREATED_BY           UPDATED_BY
---- -------------------- -------------------- --------------------
   1 test                 MCDONAC              MCDONAC
   2 test2                MCDONAC              MCDONAC

2 rows selected.

SQL>
SQL> insert into chd (id,par_id) values (chd_seq.nextval,1);

1 row created.

SQL> insert into chd (id,par_id) values (chd_seq.nextval,2);

1 row created.

SQL> select * from chd;

  ID     PAR_ID
---- ----------
   1          1
   2          2

2 rows selected.

SQL>
SQL> commit;

Commit complete.

So far so good. Let us see now what happens when I do a simple update on the parent table on the DESCRIPTION column.  Note that this column is not involved with any primary key, or index or foreign key relationship – it’s just a simple string column.


SQL> update par set description = 'anything' where id = 1;

1 row updated.

SQL> select locked_mode, object_name
  2  from v$locked_object l, all_objects ob
  3  where ob.object_id =l.object_id;

LOCKED_MODE OBJECT_NAME
----------- ----------------------------------------
          3 PAR
          3 CHD

2 rows selected.

Notice the locks that have been taken. Understandably, I had to take a lock on the PAR table because I’ve just updated a row on the table, so I need to make sure that no-one does anything like drop it, or modify the structure whilst I’ve got an outstanding transaction. But here is the interesting part – we also took a lock on the child table CHD. That seems superfluous because as I said – we are not performing an update of anything that could possibly impact the child table in any way. 

Why did the database choose to add this extra lock?

The cause is due to the way I coded the trigger.  Notice that the trigger has a reference to the primary key in the trigger body:


  5      if inserting then
  6         :new.id := par_seq.nextval;

Even though that part of the trigger body is not executed (because I am performing an update not an insert) the mere presence of it is enough for the database to cautiously add that extra lock on the CHD table because we “just might” be messing around with the parent table primary key here.

So now I’ll roll back that update and convert the single triggers to two triggers – one for insert and one dedicated for update.


SQL> rollback;

Rollback complete.

SQL>
SQL> drop trigger trg_par ;

Trigger dropped.

SQL>
SQL> create or replace trigger trg1_par
  2  before insert on par
  3  for each row
  4  begin
  5      :new.id := par_seq.nextval;
  6      :new.created_by := user;
  7      :new.updated_by := user;
  8  end;
  9  /

Trigger created.

SQL>
SQL> create or replace trigger trg2_par
  2  before update on par
  3  for each row
  4  begin
  5      if updating then
  6          :new.updated_by := user;
  7      end if;
  8  end;
  9  /

Trigger created.

Now I’ll run the same update


SQL> update par set description = 'anything' where id = 1;

1 row updated.

SQL> select locked_mode, object_name
  2  from v$locked_object l, all_objects ob
  3  where ob.object_id =l.object_id;

LOCKED_MODE OBJECT_NAME
----------- ----------------------------------------
          3 PAR

1 row selected.

Because the reference to the primary key columns for table PAR are no longer in the firing (update) trigger, I no longer longer see the lock on the CHD table.

In practice, it is unlikely that this additional lock is going to cause you a lot of harm – it will block some operations on the child table such as a truncate or DDL to make a structural change, so it is reasonably to assume that these are rare occurrences. But I’m a fan of the philosophy of: Have as few locks as possible, but always as many as required. So in this case, you might to want to consider opting for two triggers instead of one.

And of course, perhaps getting to zero triggers might be the best option Smile

DDL for constraints – subtle things

The DBMS_METADATA package is very cool. I remember the days of either hand-crafting DDL statements based on queries to the data dictionary, or many a DBA will be familiar with running “imp show=y” or “imp indexfile=…” in order to then laboriously extract the DDL required from the import log file.  DBMS_METADATA removed all of those annoyances to give us a simple API to get the true and complete DDL for a database object.

But when extracting DDL from the database using the DBMS_METADATA package, you need to be aware of some subtleties especially if you plan on executing that DDL in the database.

Consider this example – I have a few tables and I want to extract the referential integrity constraints for one of them. Being a good cautious developer Smile I’ll just output the DDL first before attempt to do any execution of the statements:


SQL> create table tab1(id number, name varchar2(100),
  2                      constraint pk_tab1_id primary key(id));

Table created.

SQL> create table tab2(id number, name varchar2(100),
  2                      constraint pk_tab2_id primary key(id));

Table created.

SQL> create table tab3(id number, name varchar2(100), int_id number,
  2                      constraint pk_tab3_id primary key(id),
  3                      constraint fk_tab1_id foreign key(int_id) references tab1(id),
  4                      constraint fk_tab2_id foreign key(int_id) references tab2(id));

Table created.

SQL>
SQL> set serverout on
SQL> begin
  2    for i in (
  3      select t.table_name,
  4          dbms_metadata.get_dependent_ddl('REF_CONSTRAINT', table_name) ddl
  5      from   user_tables t
  6      where  table_name = 'TAB3'
  7      )
  8    loop
  9            dbms_output.put_line(i.ddl);
 10            --execute immediate i.ddl;
 11    end loop;
 12  end;
 13  /

  ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB1_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB1" ("ID") ENABLE

ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB2_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB2" ("ID") ENABLE

PL/SQL procedure successfully completed.

That all looks fine – I have my two ALTER statements ready to go.  So now I’ll comment back in the ‘execute immediate’ command and all should be fine.


SQL> set serverout on
SQL> begin
  2    for i in (
  3      select t.table_name,
  4             dbms_metadata.get_dependent_ddl('REF_CONSTRAINT', table_name) ddl
  5      from   user_tables t
  6      where  table_name = 'TAB3'
  7      )
  8    loop
  9            dbms_output.put_line(i.ddl);
 10            execute immediate i.ddl;
 11    end loop;
 12  end;
 13  /

  ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB1_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB1" ("ID") ENABLE

ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB2_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB2" ("ID") ENABLE
begin
*
ERROR at line 1:
ORA-01735: invalid ALTER TABLE option
ORA-06512: at line 10
ORA-06512: at line 10

The seems an odd result. Since in this simple example I’m just running the commands straight back into the same database, I might have expected a “Constraint already exists”, or “Object name exists” style of error, but this is different. This error is telling that the statement is invalid – which obviously should not be the case if it came straight out of DBMS_METADATA. But a simple amendment to my anonymous block will reveal the answer. I will output a line of dashes each time I cycle through the cursor loop



SQL> set serverout on
SQL> begin
  2    for i in (
  3      select t.table_name,
  4             dbms_metadata.get_dependent_ddl('REF_CONSTRAINT', table_name) ddl
  5      from   user_tables t
  6      where  table_name = 'TAB3'
  7      )
  8    loop
  9            dbms_output.put_line('========================================');
 10            dbms_output.put_line(i.ddl);
 11            --execute immediate i.ddl;
 12    end loop;
 13  end;
 14  /
========================================

  ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB1_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB1" ("ID") ENABLE

ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB2_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB2" ("ID") ENABLE

PL/SQL procedure successfully completed.

And therein lies the issue. I got two ALTER commands back on screen, but in reality they both came back from a single row fetched from the cursor. The ALTER commands were separated by a carriage return, but if I try to execute that single row, then the statement is invalid because it is an attempt to run two ALTER commands in a single statement. The output looks like two rows from the cursor but it was not.

That is a problem if I want to store that DDL in a table or a file, because I can’t use it as it currently stands, and I don’t want to have to write some scripts to parse that DDL to add semi-colons or split it into multiple commands, because one of the motivations for DBMS_METADATA in the first place was to avoid all that irritation.

There is an easy fix to this. Rather than getting the dependent DDL for a table, we can get the “direct” DDL for the constraints themselves.  In this way, you’ll get a row from the cursor for each constraint, and hence one DDL statement per constraint as well.



SQL> set serverout on
SQL> begin
  2    for i in (
  3         select t.table_name,
  4                dbms_metadata.get_ddl('REF_CONSTRAINT', constraint_name) ddl
  5         from user_constraints t
  6         where table_name = 'TAB3'
  7         and constraint_type = 'R'
  8      )
  9    loop
 10            dbms_output.put_line('========================================');
 11            dbms_output.put_line(i.ddl);
 12            --execute immediate i.ddl;
 13    end loop;
 14  end;
 15  /
========================================

  ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB1_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB1" ("ID") ENABLE
========================================

  ALTER TABLE "MCDONAC"."TAB3" ADD CONSTRAINT "FK_TAB2_ID" FOREIGN KEY ("INT_ID") REFERENCES "MCDONAC"."TAB2" ("ID") ENABLE

PL/SQL procedure successfully completed.

Text indexes for numbers

We had an AskTOM question recently about being able to search for numbers within a concatenated list. The particular issue was a list of mobile phone numbers, but this is really just about tackling the larger issue of finding numbers within list.

Let’s create a simple example to see where things can break down (and how we can fix them).


SQL> create table t (  cid int, mobs varchar2(1000));

Table created.

SQL>
SQL> insert into t
  2  select c, listagg(num,',') within group ( order by num )
  3  from
  4  ( select mod(rownum,1000) c, trunc(dbms_random.value(1000000,9999999)) num from dual connect by level <= 4000 )
  5  group by c;

1000 rows created.

SQL>
SQL> exec dbms_stats.gather_table_stats('','T');

PL/SQL procedure successfully completed.

SQL>
SQL> select * from t
  2  where rownum <= 10;

       CID MOBS
---------- --------------------------------------------------
         0 2644307,3565512,5481105,7725189
         1 1570287,2092729,6127058,6546683
         2 6018800,6408347,6592531,8456137
         3 2087673,3086382,6692756,9377699
         4 2964558,3887606,6305557,7441515
         5 2219544,4331436,5246494,5303583
         6 1005450,1625403,2271986,4493049
         7 2605217,5143371,7444316,9073658
         8 1205487,4660509,5148296,9578099
         9 3736741,8385346,8758352,9496363

10 rows selected.

My sample table has 1000 rows and there a four “mobile phone” numbers concatenated into a list for each row.

(Side note: I’m not using 555-prefixed numbers like you’ll see in the movies Smile Why phone numbers in movies start with 555)

Now let us try query the table for one of the phone numbers as highlighted in red in the previous list.


SQL>
SQL> select * from t where mobs like '%7444316%';

       CID MOBS
---------- ----------------------------------------
         7 2605217,5143371,7444316,9073658

1 row selected.

SQL> set autotrace traceonly explain
SQL> select * from t where mobs like '%7444316%';

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    50 |  1800 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    50 |  1800 |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MOBS" LIKE '%7444316%' AND "MOBS" IS NOT NULL)

I got the correct row back from the query, but the full scan of the table might be an issue as this table grows to larger sizes. (I’m working on the assumption here that a search for a single number will never return a large number of rows).

Since this is a “term that could be anywhere within the string” style of search, creating a Text index on the column seems an obvious choice. So I’ll do that and try again:


SQL> set autotrace off
SQL>
SQL> create index ix on t ( mobs ) indextype is ctxsys.context;

Index created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from t where contains(mobs,'7444316') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1339481741

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    36 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |     1 |    36 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | IX   |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("MOBS",'7444316')>0)

That is looking a lot better. I’ll be able to take advantage of this index….or so it might first appear! (Cue ominous sounding music…)

There is an ever so slight problem here – we don’t get the rows we need!


SQL> set autotrace off
SQL> select * from t where contains(mobs,'7444316') > 0;

no rows selected

In fact, even if we concocted a search for the value of the entire column, we still do not get a result back from our query.



SQL> select * from t where contains(mobs,'2605217,5143371,7444316,9073658') > 0;

no rows selected

You need to careful with indexing terms that are not what could be thought of as “common” text. After all, it is a Text index, and by default, the assumption here is that we are indexing document style text.

But the solution is easy – we just need to manipulate the standard lexer to avoid common numeric separators (comma, period, etc) perturbing our index entries. I’m opting for a tilde (~) here because it does not appear in the source column.



SQL>
SQL> drop index ix;

Index dropped.

SQL> begin
  2        ctx_ddl.drop_preference('my_lexer');
  3        ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER');
  4        ctx_ddl.set_attribute('my_lexer', 'numgroup', '~');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> create index ix on t ( mobs ) indextype is ctxsys.context PARAMETERS('lexer my_lexer');

Index created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from t
  2  where contains(mobs,'7444316') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1339481741

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    36 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |     1 |    36 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | IX   |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("MOBS",'7444316')>0)

SQL>
SQL> set autotrace off
SQL> select * from t
  2  where contains(mobs,'7444316') > 0;

       CID MOBS
---------- --------------------------------------------------
         7 2605217,5143371,7444316,9073658

1 row selected.

Our execution plan still shows that we can take advantage of the Text index, but I’m also getting the results I expected.

As an aside, Text indexes are one of those gems in the Oracle database that often gets a great amount of new functionality with each new release. So every time you upgrade, take a fresh look at the Text Index documentation. You might get a lot of cool ideas for how to use them in your applications.

Cool stuff with partition elimination

Sometimes in the IT world, the term “surprise” is not a good one.

“I woke up this morning and got a surprise…my database was down.”

“I ran a SELECT COUNT(*) on my most important table, and got a surprise result of zero rows.”

and so forth. Generally as IT professionals, encountering the unexpected is not a good start to the day Smile.

But sometimes, surprises can be a nice thing.  Here is my example for today – when I found that the database can do a remarkably good job when it comes to reducing the workload with partitioned tables.  

Most people will already be familiar with the concept of partition elimination.  If you have a table partitioned into (say) yearly segments, and you ask for all of the data for the past 2 years, then the optimizer is intelligent enough to only scan the relevant partitions rather than the entire table.  Here is an example of that in action. I have a table that is partitioned by year on a END_DATE column, and then sub-partitioned into quarters based on a START_DATE column.


SQL> create table t
  2      partition by range( end_dt )
  3      subpartition by range( start_dt )
  4      (
  5         partition p_2014 values less than ( to_date('01-Jan-2015','dd-mon-yyyy') )
  6                 (
  7                         subpartition sp_2014_q1 values less than ( to_date('01-apr-2014','dd-mon-yyyy') ) ,
  8                         subpartition sp_2014_q2 values less than ( to_date('01-jul-2014','dd-mon-yyyy') ) ,
  9                         subpartition sp_2014_q3 values less than ( to_date('01-oct-2014','dd-mon-yyyy') ) ,
 10                         subpartition sp_2014_q4 values less than ( to_date('01-jan-2015','dd-mon-yyyy') )
 11                ) ,
 12        partition p_2015 values less than ( to_date('01-Jan-2016','dd-mon-yyyy') )
 13                (
 14                        subpartition sp_2015_q1 values less than ( to_date('01-apr-2015','dd-mon-yyyy') ) ,
 15                        subpartition sp_2015_q2 values less than ( to_date('01-jul-2015','dd-mon-yyyy') ) ,
 16                        subpartition sp_2015_q3 values less than ( to_date('01-oct-2015','dd-mon-yyyy') ) ,
 17                        subpartition sp_2015_q4 values less than ( to_date('01-jan-2016','dd-mon-yyyy') )
 18                )
 19     )
 20     as
 21     select a.* ,
 22        to_date('01-Jan-2014','dd-mon-yyyy') + mod(rownum,730)-1 as start_dt,
 23        to_date('01-Jan-2014','dd-mon-yyyy') + mod(rownum,730) as end_dt
 24     from all_objects a;

Table created.

SQL>
SQL> alter table t modify start_dt not null;

Table altered.

SQL> alter table t modify end_dt not null;

Table altered.

SQL>
SQL> set autotrace traceonly explain
SQL> select *
  2      from t
  3      where start_dt = to_date('13-Jan-2015','dd-mon-yyyy') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 2183203583

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL    |      |   101 | 15049 |   491   (1)| 00:00:01 |     1 |     2 |
|   2 |   PARTITION RANGE SINGLE|      |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
|*  3 |    TABLE ACCESS FULL    | T    |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_DT"=TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


You can see that we had to scan all of the partitions, but within each of the partitions we only had to scan a single sub-partition (as indicated by “PARTITION RANGE SINGLE”).  So for each year (based on END_DT) we scanned a single one of the 4 subpartitions.  We are doing only 25% of the work of scanning the entire table.  But one pleasant surprise I saw today was how the optimizer can take advantage of additional information to improve things even more.  Let us now add a fairly obvious rule about the data:


SQL>
SQL> alter table t add constraint t_chk check( start_dt < end_dt );

Table altered.


And have another look at that query execution plan.


SQL>
SQL> set autotrace traceonly explain
SQL> select *
  2      from t
  3      where start_dt = to_date('13-Jan-2015','dd-mon-yyyy') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3874588989

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |   101 | 15049 |   247   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE |      |   101 | 15049 |   247   (1)| 00:00:01 |     2 |     2 |
|   2 |   PARTITION RANGE SINGLE|      |   101 | 15049 |   247   (1)| 00:00:01 |     1 |     1 |
|*  3 |    TABLE ACCESS FULL    | T    |   101 | 15049 |   247   (1)| 00:00:01 |     5 |     5 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_DT"=TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "END_DT">TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

SQL>
SQL>

How cool is that!  The constraint let us remove even more partitions from consideration. In fact, we ended up only scanning a single partition. Moreover, we never specified END_DT as a predicate, but you can see in the FILTER section, we could synthesize such a predicate using the rule defined by the check constraint we added.

I suppose the moral of the story is two-fold here.

1) The optimizer can be pretty smart with partition elimination,

2) As always, never hide rules and facts about the data from the database.  Make sure you define those constraints in the database tier.

Choose your names wisely

It all seems simple enough.  You pick a name for an object…and that is the only object that have than name right ?  For example:


SQL> create table MY_UNIQUE_NAME ( x int );

Table created.

SQL>
SQL> create synonym MY_UNIQUE_NAME for MY_UNIQUE_NAME;
create synonym MY_UNIQUE_NAME for MY_UNIQUE_NAME
*
ERROR at line 1:
ORA-01471: cannot create a synonym with same name as object


SQL>
SQL> create view MY_UNIQUE_NAME as select * from MY_UNIQUE_NAME;
create view MY_UNIQUE_NAME as select * from MY_UNIQUE_NAME
            *
ERROR at line 1:
ORA-00955: name is already used by an existing object


SQL>
SQL> create package MY_UNIQUE_NAME is
  2    x int;
  3  end;
  4  /
create package MY_UNIQUE_NAME is
*
ERROR at line 1:
ORA-00955: name is already used by an existing object


So naturally we’d expect an error if we continued on with other object types.  But then this happens ?


SQL> create or replace
  2  trigger MY_UNIQUE_NAME
  3  before insert on MY_UNIQUE_NAME
  4  begin
  5    null;
  6  end;
  7  /

Trigger created.

SQL>
SQL> create index MY_UNIQUE_NAME on MY_UNIQUE_NAME ( x );

Index created.

It might all seem a bit random.  But this is all related to what is known as the NAMESPACE, that is, names of certain objects must be unique within a particular NAMESPACE.  We can see the namespaces for the objects by querying DBA_OBJECTS



SQL> select distinct namespace, object_type from dba_Objects order by 1,2;

 NAMESPACE OBJECT_TYPE
---------- -----------------------
         1 CREDENTIAL
         1 DESTINATION
         1 FUNCTION
         1 INDEXTYPE
         1 JAVA CLASS
         1 JOB
         1 JOB CLASS
         1 LIBRARY
         1 OPERATOR
         1 PACKAGE
         1 PROCEDURE
         1 PROGRAM
         1 SCHEDULE
         1 SCHEDULER GROUP
         1 SEQUENCE
         1 SYNONYM
         1 TABLE
         1 TABLE PARTITION
         1 TABLE SUBPARTITION
         1 TYPE
         1 VIEW
         1 WINDOW
         2 PACKAGE BODY
         2 TYPE BODY
         3 TRIGGER
         4 INDEX
         4 INDEX PARTITION
         5 CLUSTER
         8 LOB
         8 LOB PARTITION
         9 DIRECTORY
        10 QUEUE
        13 JAVA SOURCE
        14 JAVA RESOURCE
        19 MATERIALIZED VIEW
        21 CONTEXT
        23 RULE SET
        24 CONSUMER GROUP
        24 RESOURCE PLAN
        25 XML SCHEMA
        32 JAVA DATA
        36 RULE
        38 EVALUATION CONTEXT
        51 UNDEFINED
        52 UNDEFINED
        64 EDITION
        88 CONTAINER
        93 UNIFIED AUDIT POLICY
       132 LOCKDOWN PROFILE
           DATABASE LINK

50 rows selected.



Because indexes, triggers are in a different namespace to tables, synonyms, procedures, packages etc, they can share the same name.

I’d still probably recommend unique names across all namespaces, just so there is no confusion.

connor_speaking