Text indexes for numbers

We had an AskTOM question recently about being able to search for numbers within a concatenated list. The particular issue was a list of mobile phone numbers, but this is really just about tackling the larger issue of finding numbers within list.

Let’s create a simple example to see where things can break down (and how we can fix them).


SQL> create table t (  cid int, mobs varchar2(1000));

Table created.

SQL>
SQL> insert into t
  2  select c, listagg(num,',') within group ( order by num )
  3  from
  4  ( select mod(rownum,1000) c, trunc(dbms_random.value(1000000,9999999)) num from dual connect by level <= 4000 )
  5  group by c;

1000 rows created.

SQL>
SQL> exec dbms_stats.gather_table_stats('','T');

PL/SQL procedure successfully completed.

SQL>
SQL> select * from t
  2  where rownum <= 10;

       CID MOBS
---------- --------------------------------------------------
         0 2644307,3565512,5481105,7725189
         1 1570287,2092729,6127058,6546683
         2 6018800,6408347,6592531,8456137
         3 2087673,3086382,6692756,9377699
         4 2964558,3887606,6305557,7441515
         5 2219544,4331436,5246494,5303583
         6 1005450,1625403,2271986,4493049
         7 2605217,5143371,7444316,9073658
         8 1205487,4660509,5148296,9578099
         9 3736741,8385346,8758352,9496363

10 rows selected.

My sample table has 1000 rows and there a four “mobile phone” numbers concatenated into a list for each row.

(Side note: I’m not using 555-prefixed numbers like you’ll see in the movies Smile Why phone numbers in movies start with 555)

Now let us try query the table for one of the phone numbers as highlighted in red in the previous list.


SQL>
SQL> select * from t where mobs like '%7444316%';

       CID MOBS
---------- ----------------------------------------
         7 2605217,5143371,7444316,9073658

1 row selected.

SQL> set autotrace traceonly explain
SQL> select * from t where mobs like '%7444316%';

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    50 |  1800 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    50 |  1800 |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MOBS" LIKE '%7444316%' AND "MOBS" IS NOT NULL)

I got the correct row back from the query, but the full scan of the table might be an issue as this table grows to larger sizes. (I’m working on the assumption here that a search for a single number will never return a large number of rows).

Since this is a “term that could be anywhere within the string” style of search, creating a Text index on the column seems an obvious choice. So I’ll do that and try again:


SQL> set autotrace off
SQL>
SQL> create index ix on t ( mobs ) indextype is ctxsys.context;

Index created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from t where contains(mobs,'7444316') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1339481741

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    36 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |     1 |    36 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | IX   |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("MOBS",'7444316')>0)

That is looking a lot better. I’ll be able to take advantage of this index….or so it might first appear! (Cue ominous sounding music…)

There is an ever so slight problem here – we don’t get the rows we need!


SQL> set autotrace off
SQL> select * from t where contains(mobs,'7444316') > 0;

no rows selected

In fact, even if we concocted a search for the value of the entire column, we still do not get a result back from our query.



SQL> select * from t where contains(mobs,'2605217,5143371,7444316,9073658') > 0;

no rows selected

You need to careful with indexing terms that are not what could be thought of as “common” text. After all, it is a Text index, and by default, the assumption here is that we are indexing document style text.

But the solution is easy – we just need to manipulate the standard lexer to avoid common numeric separators (comma, period, etc) perturbing our index entries. I’m opting for a tilde (~) here because it does not appear in the source column.



SQL>
SQL> drop index ix;

Index dropped.

SQL> begin
  2        ctx_ddl.drop_preference('my_lexer');
  3        ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER');
  4        ctx_ddl.set_attribute('my_lexer', 'numgroup', '~');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> create index ix on t ( mobs ) indextype is ctxsys.context PARAMETERS('lexer my_lexer');

Index created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from t
  2  where contains(mobs,'7444316') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1339481741

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    36 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |     1 |    36 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | IX   |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("MOBS",'7444316')>0)

SQL>
SQL> set autotrace off
SQL> select * from t
  2  where contains(mobs,'7444316') > 0;

       CID MOBS
---------- --------------------------------------------------
         7 2605217,5143371,7444316,9073658

1 row selected.

Our execution plan still shows that we can take advantage of the Text index, but I’m also getting the results I expected.

As an aside, Text indexes are one of those gems in the Oracle database that often gets a great amount of new functionality with each new release. So every time you upgrade, take a fresh look at the Text Index documentation. You might get a lot of cool ideas for how to use them in your applications.

Updating indexes with partition maintenance

An index is basically a structure that maps keys (values) in columns to the physical location of their corresponding rows in a table.  So if you move the rows (ie, change the physical location of a row) then the index entries for those rows need to be updated, or the index is no longer usable.  And as most people are aware, the latter is the default when you perform a partition maintenance operation on a table that re-locates rows.  For example, we’ll create a simple partitioned table, add both a local and a global index, split one of the partitions, and then see what the impact on those indexes is.


SQL>
SQL> drop table t purge;

Table dropped.

SQL>
SQL> create table t (x int, y int, z int )
  2  partition by range (x)
  3  (
  4    partition p1 values less than (4000),
  5    partition p2 values less than (8000)
  6  );

Table created.

SQL>
SQL> insert into t select q*1000,rownum,rownum from dba_objects, ( select rownum q from dual connect by level  commit;

Commit complete.

SQL>
SQL> create index ix_local on t ( y ) local;

Index created.

SQL> create index ix_global on t ( z ) ;

Index created.

SQL>
SQL> alter table t split partition p2 at (6000)
  2    into ( partition p2a, partition p2b ) ;

Table altered.

SQL>
SQL> select index_name, partition_name, status
  2  from user_ind_partitions
  3  where index_name like 'IX%';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
IX_LOCAL                       P1                   USABLE
IX_LOCAL                       P2A                  UNUSABLE
IX_LOCAL                       P2B                  UNUSABLE

3 rows selected.

SQL>
SQL> select index_name, status
  2  from user_indexes
  3  where index_name like 'IX%';

INDEX_NAME                     STATUS
------------------------------ --------
IX_GLOBAL                      UNUSABLE
IX_LOCAL                       N/A

2 rows selected.

SQL>
SQL>
SQL>

The result is what we would expect.  The global index has become unusable because a portion of the rows that had entries in the index have been moved as part of the partition split operation.  The local index has also become unusable but only for those partitions that were impacted by the split.  Partition P1 was not affected by the split, and hence the corresponding local index partition is still valid.  Just a quick footnote – the “N/A” in USER_INDEXES for the local index represents that the STATUS of the local index should be determined by looking at the status for each underlying partition.  In early releases of Oracle this could create problems in terms of availability for partitioned tables, because if you tried to use an index that had become unusable, you would get an error: ORA-01502: index or partition of such index is in unusable state,  which is not the most pleasant thing to be sending back to your application users Smile

In Oracle 9, a workaround to the problem was introduced, the UPDATE GLOBAL INDEXES clause.  This keeps the (global) index entries valid by correcting the index entries as the partition operation is performed.  This has some costs in terms of resource consumption, because obviously maintaining a large amount of index entries will consume CPU and redo, but it kept the index available to application users.  Here’s a demo of that in action:


SQL> drop table t purge;

Table dropped.

SQL>
SQL> create table t (x int, y int, z int )
  2  partition by range (x)
  3  (
  4    partition p1 values less than (4000),
  5    partition p2 values less than (8000)
  6  );

Table created.

SQL>
SQL> insert into t select q*1000,rownum,rownum from dba_objects, ( select rownum q from dual connect by level  commit;

Commit complete.

SQL>
SQL> create index ix_local on t ( y ) local;

Index created.

SQL> create index ix_global on t ( z ) ;

Index created.

SQL>
SQL> alter table t split partition p2 at (6000)
  2    into ( partition p2a, partition p2b )
  3    update global indexes;

Table altered.

SQL>
SQL> select index_name, partition_name, status
  2  from user_ind_partitions
  3  where index_name like 'IX%';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
IX_LOCAL                       P1                   USABLE
IX_LOCAL                       P2A                  UNUSABLE
IX_LOCAL                       P2B                  UNUSABLE

3 rows selected.

SQL>
SQL> select index_name, status
  2  from user_indexes
  3  where index_name like 'IX%';

INDEX_NAME                     STATUS
------------------------------ --------
IX_GLOBAL                      VALID
IX_LOCAL                       N/A

2 rows selected.

SQL>
SQL> alter index ix_local rebuild partition P2A;

Index altered.

SQL> alter index ix_local rebuild partition P2B;

Index altered.

But notice that the local index has not been spared from being marked unusable.  We had to rebuild each local partition after the operation. You might be thinking that only having UPDATE GLOBAL INDEXES was a hence a waste of time, but don’t forget that some partition operations, such as DROP and TRUNCATE do not impact local indexes, so updating the global index entries is all that is required:


SQL> alter table t drop partition p2a update global indexes;

Table altered.

SQL>
SQL> select index_name, partition_name, status
  2  from user_ind_partitions
  3  where index_name like 'IX%';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
IX_LOCAL                       P1B                  USABLE
IX_LOCAL                       P2B                  USABLE

2 rows selected.

SQL>
SQL> select index_name,  status
  2  from user_indexes
  3  where index_name like 'IX%';

INDEX_NAME                     STATUS
------------------------------ --------
IX_GLOBAL                      VALID
IX_LOCAL                       N/A

2 rows selected.

SQL>
SQL>

But if you do have existing scripts that contain the UPDATE GLOBAL INDEXES clause, it is perhaps worth revisiting them to take advantage of the more complete implementation of this facility, which arrived in Oracle 11.  The UPDATE INDEXES clause will take care of both global and local indexes during partition maintenance operations.


SQL>
SQL> alter table t split partition p1 at (2000)
  2    into ( partition p1a, partition p1b )
  3    update indexes;

Table altered.

SQL>
SQL> select index_name, partition_name, status
  2  from user_ind_partitions
  3  where index_name like 'IX%';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
IX_LOCAL                       P1A                  USABLE
IX_LOCAL                       P1B                  USABLE
IX_LOCAL                       P2A                  USABLE
IX_LOCAL                       P2B                  USABLE

4 rows selected.

SQL>
SQL> select index_name,  status
  2  from user_indexes
  3  where index_name like 'IX%';

INDEX_NAME                     STATUS
------------------------------ --------
IX_GLOBAL                      VALID
IX_LOCAL                       N/A

2 rows selected.

SQL>
SQL> alter table t drop partition p1a update indexes;

Table altered.

SQL>
SQL> select index_name, partition_name, status
  2  from user_ind_partitions
  3  where index_name like 'IX%';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
IX_LOCAL                       P1B                  USABLE
IX_LOCAL                       P2A                  USABLE
IX_LOCAL                       P2B                  USABLE

3 rows selected.

SQL>
SQL> select index_name,  status
  2  from user_indexes
  3  where index_name like 'IX%';

INDEX_NAME                     STATUS
------------------------------ --------
IX_GLOBAL                      VALID
IX_LOCAL                       N/A

2 rows selected.

SQL>

Not that I can see a real need for it, but it’s worth noting that these are your only two options.  There is no such thing as keeping only the local indexes entries updated and not the global ones.


SQL>  alter table t split partition p2 at (6000)
  2      into ( partition p2a, partition p2b ) update local indexes;
    into ( partition p2a, partition p2b ) update local indexes
                                                 *
ERROR at line 2:
ORA-00905: missing keyword

Finally, remember that since Oracle 10g we decided that it is better to allow a query to run slowly rather than have it crash with an error, so the default value for “skip_unusable_indexes” is now TRUE.  So if you have indexes that have a status of UNUSABLE, you will not detect this via errors in your SQL queries.  The queries will just be optimized as if the indexes did not exist, which might have an impact on their execution time.  Here is a a simple starting point for a monitoring query to keep an eye on unusable indexes (and rebuild them if necessary):


SQL> set serverout on
SQL> declare
  2    procedure ddl(p_cmd varchar2) is
  3    begin
  4      --execute immediate p_cmd;
  5      dbms_output.put_line(p_cmd);
  6    end;
  7  begin
  8      for i in (
  9          select index_owner, index_name, partition_name, 'partition' ddl_type
 10          from all_ind_partitions
 11          where status = 'UNUSABLE'
 12          union all
 13          select index_owner, index_name, subpartition_name, 'subpartition' ddl_type
 14          from all_ind_subpartitions
 15          where status = 'UNUSABLE'
 16          union all
 17          select owner, index_name, null, null
 18          from all_indexes
 19          where status = 'UNUSABLE'
 20      )
 21      loop
 22        if i.ddl_type is null then
 23          ddl('alter index '||i.index_owner||'.'||i.index_name||' rebuild');
 24        else
 25          ddl('alter index '||i.index_owner||'.'||i.index_name||' modify '||i.ddl_type||' '||i.partition_name||' rebuild');
 26        end if;
 27      end loop;
 28  end;
 29  /
alter index MCDONAC.T_PAR_IX modify partition P1 rebuild
alter index MCDONAC.T_PAR_IX modify partition P2 rebuild

PL/SQL procedure successfully completed.

And finally, as Tim reminded me, with 12c Release 2, many partition operations (and many other maintenance operations as well) can now be done online simply by specifying ONLINE as a suffix to the operation. That (as the name suggests) will keep all indexes in a USABLE state.


SQL>  create table t (x int, y int, z int )
  2      partition by range (x)
  3      (
  4        partition p1 values less than (4000),
  5        partition p2 values less than (8000)
  6      );

Table created.

SQL>
SQL> insert into t select q*1000,rownum,rownum from dba_objects, ( select rownum q from dual connect by level 
SQL> commit;

Commit complete.

SQL>
SQL> create index ix_local on t ( y ) local;

Index created.

SQL>
SQL> create index ix_global on t ( z ) ;

Index created.

SQL>
SQL> alter table t split partition p2 at (6000)
  2      into ( partition p2a, partition p2b ) online;

Table altered.

SQL> select index_name, partition_name, status
  2  from user_ind_partitions
  3  where index_name like 'IX%';

INDEX_NAME                     PARTITION_NAME                 STATUS
------------------------------ ------------------------------ --------
IX_LOCAL                       P1                             USABLE
IX_LOCAL                       P2A                            USABLE
IX_LOCAL                       P2B                            USABLE

3 rows selected.

SQL>
SQL> select index_name, status
  2  from user_indexes
  3  where index_name like 'IX%';

INDEX_NAME                     STATUS
------------------------------ --------
IX_GLOBAL                      VALID
IX_LOCAL                       N/A

2 rows selected.



Index compression–quick tip

If you’re appropriately licensed and want to use advanced index compression, you can take advantage of the setting a tablespace to automatically add compression as a default at a nominated level in the database.  From the docs:

Here is an example of that in action.   We’ll set our tablespace default accordingly


SQL> create tablespace demo
  2  datafile 'C:\ORACLE\ORADATA\DB122\DEMO.DBF'
  3  size 100M
  4  default index compress advanced high;

Tablespace created.

Now we’ll create a table and an index on it in that tablespace


SQL> column index_name format a10
SQL> column tablespace_name format a15
SQL> create table t tablespace demo as
  2  select *
  3  from dba_objects;

Table created.

SQL> create index t_idx on t(owner) tablespace demo;

Index created.

SQL> select index_name,compression
  2  from user_indexes
  3  where table_name ='T';

INDEX_NAME COMPRESSION  
---------- ------------- 
T_IDX      ADVANCED HIGH   

1 row selected.

You can see that even though we did not specify any kind of compression for the index, it picked up the setting from the tablespace. Thus existing DDL scripts you have for indexes et al will not need to be changed.

Similarly, rebuilding the index does not lose the setting


SQL> alter index t_idx rebuild tablespace demo;

Index altered.

SQL> select index_name,compression
  2  from user_indexes
  3  where table_name ='T';

INDEX_NAME COMPRESSION   
---------- ------------- 
T_IDX      ADVANCED HIGH 

1 row selected.

You can see what the defaults are for a tablespace by querying DBA_TABLESPACES as per normal


SQL> select tablespace_name,compress_for,index_compress_for
  2  from user_tablespaces where tablespace_name = 'DEMO';

TABLESPACE_NAME COMPRESS_FOR                   INDEX_COMPRES
--------------- ------------------------------ -------------
DEMO                                           ADVANCED HIGH

Apex Interactive Grid and IOT’s

I love the interactive grid in Application Express.  And here’s why… (Warning: Ranting mode is now on Smile)

You can tell people

  • here’s an application built with almost no code, in fact, you probably could have built it yourself
  • it’s multi-user, with optimistic locking built in for you
  • it’s secure
  • it’s backed up and recoverable,
  • it’s scales ridiculously well,
  • it doesn’t need any complicated middle tier, or software libraries,
  • it can be accessed anywhere you have a browser…which is…anywhere!
  • it has responsive look and feel,
  • it was built with software that doesn’t cost a single dollar,
  • it centralises the data so you have a single source of truth

and after you have told them all of that….do you know what they’ll say ?

“Yeah…but I like to double-click on a field to edit it…So I’ll just use Excel and store it on my hard drive”

AGGGGHHHHHHHH!!!!!!!!!!!!!  Somebody…..shoot….me……

Interactive grids blow that flaccid argument out of the water!  So get on board to Application Express 5.1+ for some interactive grid awesome-ness.

One discovery I did make with interactive grids, is that if you based the grid on an Index-Organized table, you will erroneously get a ROWID column in your grid

int_grid_iot_proj_short

 

This is not a major drama – just delete the item from the designer and it will still work just fine, and this minor detail is fixed in an upcoming release.

How important is "up to date" to you ?

I wrote a post a while back showing how one of the options with an Oracle Text index is to keep that index perfectly updated with its underlying table.  (Long termers with Oracle may know that this is a relatively recent addition to Text indexes, which historically were updated with calls to CTXSYS.CTX_DDL.SYNC_INDEX, either on demand or via the scheduler).

I also said that there is “no such thing as a free lunch”, as refreshing that index for every single commit introduces overheads.  So let’s explore that a little more thoroughly

First I sourced a file of hundreds of thousands of random english sentences, and exposed them to Oracle via an external table


SQL> DROP TABLE sentences;

Table dropped.

SQL>
SQL> CREATE TABLE sentences (
  2    line  VARCHAR2(4000)
  3  )
  4  ORGANIZATION EXTERNAL
  5  (
  6    TYPE ORACLE_LOADER
  7    DEFAULT DIRECTORY temp
  8    ACCESS PARAMETERS
  9    (
 10      RECORDS DELIMITED BY NEWLINE
 11      BADFILE 'sentences.bad'
 12      LOGFILE 'sentences.log'
 13      FIELDS TERMINATED BY '@'
 14      MISSING FIELD VALUES ARE NULL
 15      (
 16        line  CHAR(4000)
 17      )
 18    )
 19    LOCATION ('english.dat')
 20  )
 21  REJECT LIMIT UNLIMITED
 22  /

Table created.

SQL> SELECT * FROM sentences where rownum <= 10;

LINE
-------------------------------------------------------------
Let's try something.
I have to go to sleep.
Today is June 18th and it is Muiriel's birthday!
Muiriel is 20 now.
The password is "Muiriel".
I will be back soon.
I'm at a loss for words.
This is never going to end.
I just don't know what to say.
That was an evil bunny.

10 rows selected.

And now we will load 50,000 of those sentences into a Text-indexed table, one row at a time, committing (and syncing) for every row inserted.


SQL> create table t1 (
  2  x int,
  3  y varchar2(3000) ) tablespace demo
  4  ;

Table created.

SQL>
SQL>
SQL> begin
  2  ctx_ddl.create_preference('demo_text_idx', 'BASIC_STORAGE');
  3  ctx_ddl.set_attribute('demo_text_idx', 'I_TABLE_CLAUSE', 'tablespace demo');
  4  ctx_ddl.set_attribute('demo_text_idx', 'K_TABLE_CLAUSE', 'tablespace demo');
  5  ctx_ddl.set_attribute('demo_text_idx', 'R_TABLE_CLAUSE', 'tablespace demo');
  6  ctx_ddl.set_attribute('demo_text_idx', 'N_TABLE_CLAUSE', 'tablespace demo');
  7  ctx_ddl.set_attribute('demo_text_idx', 'I_INDEX_CLAUSE', 'tablespace demo');
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> create index t1_iz on T1  ( y )
  2  indextype is ctxsys.context
  3  parameters ('sync (on commit) storage demo_text_idx');

Index created.

SQL> conn mcdonac/******
Connected.

SQL> set timing on
SQL>
SQL> begin
  2  for i in ( select rownum r, line from sentences where rownum <= 50000 )   3  loop   4    insert into t1 values (i.r, i.line );   5    commit;   6  end loop;   7  end;   8  / PL/SQL procedure successfully completed. Elapsed: 00:14:34.98 SQL> @stat

[some stats omitted]

 SID_STAT# NAME                                                    VALUE
---------- -------------------------------------------------- ----------
         6 user commits                                           250000
         9 recursive calls                                       4351635
        14 session logical reads                                 7248007
        19 CPU used by this session                                35831
        55 physical read total bytes                           410017792
       128 db block gets                                         2387570
       132 consistent gets                                       4860437
       146 physical read bytes                                 407281664
       147 db block changes                                      3112305
       148 consistent changes                                      50041
       159 physical write bytes                                   712704
       258 redo size                                           666954796
      1089 execute count                                         2000136

1162 rows selected.

SQL> select segment_name, bytes from dba_segments
  2  where tablespace_name = 'DEMO'
  3  order by 1;

SEGMENT_NAME                                  BYTES
---------------------------------------- ----------
DR$T1_IZ$I                                 14680064
DR$T1_IZ$R                                    65536
DR$T1_IZ$X                                 10485760
DRC$T1_IZ$R                                   65536
SYS_IL0000162067C00006$$                      65536
SYS_IL0000162072C00002$$                      65536
SYS_IOT_TOP_162070                          2097152
SYS_IOT_TOP_162076                            65536
SYS_LOB0000162067C00006$$                    131072
SYS_LOB0000162072C00002$$                   2293760
T1                                          3145728

11 rows selected.

So we can see that

  • the process took a little over 14 minutes
  • consumed around 660M of redo
  • the segment size totalled around 31M

Let’s now repeat the exercise with a more relaxed strategy on index freshness.  We will still commit every row inserted, but only sync the index every 100 rows.



SQL> drop table t1 purge;

Table dropped.

SQL>
SQL> create table t1 (
  2  x int,
  3  y varchar2(3000) ) tablespace demo
  4  ;

Table created.

SQL>
SQL>
SQL> create index t1_iz on T1  ( y )
  2  indextype is ctxsys.context
  3  parameters ('storage demo_text_idx');

Index created.

SQL>
SQL> conn mcdonac/******
Connected.

SQL> begin
  2  for i in ( select rownum r, line from sentences where rownum <= 50000 )   3  loop   4    insert into t1 values (i.r, i.line );   5    commit;   6   7    if mod(i.r,100) = 0 then   8      ctxsys.ctx_ddl.sync_index('T1_IZ');   9    end if;  10  end loop;  11  end;  12  / PL/SQL procedure successfully completed. Elapsed: 00:00:38.45 SQL> select segment_name, bytes from dba_segments
  2  where tablespace_name = 'DEMO'
  3  order by 1;

SEGMENT_NAME                        BYTES
------------------------------ ----------
DR$T1_IZ$I                       10485760
DR$T1_IZ$R                          65536
DR$T1_IZ$X                        7340032
DRC$T1_IZ$R                         65536
SYS_IL0000162255C00006$$            65536
SYS_IL0000162260C00002$$            65536
SYS_IOT_TOP_162258                2097152
SYS_IOT_TOP_162264                  65536
SYS_LOB0000162255C00006$$          131072
SYS_LOB0000162260C00002$$         2293760
T1                                3145728

11 rows selected.

Elapsed: 00:00:00.02
SQL>
SQL> @stat

 SID_STAT# NAME                                                    VALUE
---------- -------------------------------------------------- ----------
         6 user commits                                            52000
         9 recursive calls                                        349084
        14 session logical reads                                 1352820
        19 CPU used by this session                                 2719
        55 physical read total bytes                             6709248
       128 db block gets                                          940374
       132 consistent gets                                        412446
       146 physical read bytes                                   4087808
       147 db block changes                                      1184146
       148 consistent changes                                        539
       159 physical write bytes                                   712704
       258 redo size                                           270443980
      1089 execute count                                          176163

1162 rows selected.


By relaxing our sync strategy, now we have

  • the process took less than 1 minute
  • consumed around 270M of redo
  • the segment size totalled around 24M

Hence if your document loading apparatus is high volume OLTP style, you might want to carefully consider the impact of saying to your customers “Hey, we’ll keep that index updated in real time”.  If it truly is high volume OLTP style…it is unlikely they would even notice a more streamlined approach with regular sync’s, where “regular” is defined either on a time interval or pending-document count threshold.  Conversely, if document loading is rare, then an on-commit strategy combined with an occasional optimize and/or rebuild might be perfectly adequate.

Direct mode operations on IOT’s

An AskTom contributor brought to my attention, that direct mode insert on index organized tables now appears possible in 12c.  We can see the difference by running a simple script in both v11 and v12


SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE    11.2.0.4.0      Production
TNS for 64-bit Windows: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production

5 rows selected.

SQL> drop table t purge;

Table dropped.

SQL> create table t(x int primary key, y int) organization index;

Table created.

SQL> insert /*+ append */ into t select 1 x, 1 y from dual;

1 row created.

SQL> select * from t;  -- no error, so no direct mode was done

         X          Y
---------- ----------
         1          1

1 row selected.


SQL> select * from v$version;

BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production              0
PL/SQL Release 12.1.0.2.0 - Production                                                    0
CORE    12.1.0.2.0      Production                                                        0
TNS for 64-bit Windows: Version 12.1.0.2.0 - Production                                   0
NLSRTL Version 12.1.0.2.0 - Production                                                    0

5 rows selected.

SQL> drop table t purge;

Table dropped.

SQL> create table t(x int primary key, y int) organization index;

Table created.

SQL> insert /*+ append */ into t select 1 x, 1 y from dual;

1 row created.

SQL> select * from t;  -- now we do get the error, so append hint was respected
select * from t
              *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel

So that looks pretty cool. People may recall an older blog post of mine where it was not possible to do a large scale IOT population without a large underlying sort operation. So I thought I’d revisit that, and see what we can discover.

In this case, we are using table T_SOURCE as the source of our data. It was created simply as:


SQL> create table T_SOURCE ( n , r , constraint T_SOURCE_PK primary key( n ))
  2  organization index
  3  as select rownum, rpad(rownum,128)
  4  from dual
  5  connect by level <= 1000000
  6  /

Table created.

SQL>
SQL> exec dbms_stats.gather_table_stats('','t_source')

PL/SQL procedure successfully completed.

So we are going to copy those 1million rows from T_SOURCE to a new table, and try to avoid having to sort the rows by asking the optimizer to “walk” the source data in index order.

So let’s try direct mode first. You’ll references to a script called “mystat”, that’s simply a query to v$mystat to pick up session level statistics that might be relevant.

We’ll run these before and after to see what deltas pop out.


SQL> truncate table t_tgt;

Table truncated.

SQL> alter table T_TGT nologging;

Table altered.

SQL> @mystat
Enter value for statname: redo size

NAME                                                    VALUE
-------------------------------------------------- ----------
redo size                                              135736
redo size for lost write detection                          0
redo size for direct writes                                 0

3 rows selected.

SQL> @mystat
Enter value for statname: sorts

NAME                                                    VALUE
-------------------------------------------------- ----------
sorts (memory)                                            117
sorts (disk)                                                0
sorts (rows)                                              359

3 rows selected.

SQL> set timing on
SQL> insert /*+ APPEND */ into T_TGT select /*+ index_asc(t_source) */ * from T_SOURCE order by 1  ;

1000000 rows created.

Elapsed: 00:00:46.30

SQL> select count(*) from t_tgt where rownum < 10;
select count(*) from t_tgt where rownum < 10
                     *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel


SQL> commit;

Commit complete.

SQL> @mystat
Enter value for statname: redo size

NAME                                                    VALUE
-------------------------------------------------- ----------
redo size                                           497096676
redo size for lost write detection                          0
redo size for direct writes                                 0

3 rows selected.

SQL> @mystat
Enter value for statname: sorts

NAME                                                    VALUE
-------------------------------------------------- ----------
sorts (memory)                                            122
sorts (disk)                                                1
sorts (rows)                                          1000436

3 rows selected.

SQL> @mystat
Enter value for statname: direct

NAME                                                    VALUE
-------------------------------------------------- ----------
db block gets direct                                        0
consistent gets direct                                      0
physical reads direct                                   19231
physical writes direct                                  19231
physical reads direct temporary tablespace              19231
physical writes direct temporary tablespace             19231

 

So we can make the following deductions

  • Direct mode seems to have been activated, due to the ORA-12838 error.
  • I didn’t avoid the sorting, because I sorted 1,000,000 rows
  • The NOLOGGING attribute seems of no benefit here – still consumed 500megabytes of redo

So whether it was really a direct mode operation is perhaps up for debate, because all of the direct reads/writes recorded were all against temporary tablespace.

Let’s explore that last one a little.

I repeated the same population script, but before I did so, I set my workarea_size_policy to manual, and cranked up my sort_area_size to 1G.

I’ll save you all the output again, but here’s the stats output


SQL> @mystat
Enter value for statname: direct

NAME                                                    VALUE
-------------------------------------------------- ----------
db block gets direct                                        0
consistent gets direct                                      0
physical reads direct                                       0
physical writes direct                                      0
physical reads direct temporary tablespace                  0
physical writes direct temporary tablespace                 0

So…no direct operations here now. This would appear to confirm that the direct mode operations were simply sorting operations.

Now we’ll perform the same process without using the direct mode hint. We’ll set the table back to logging because we’ve seen that’s of no benefit, and use a fresh session to pick up fresh statistics


SQL> truncate table t_tgt;

Table truncated.

SQL> alter table T_TGT logging;

Table altered.

SQL> @mystat
Enter value for statname: redo size

NAME                                                    VALUE
-------------------------------------------------- ----------
redo size                                              137236
redo size for lost write detection                          0
redo size for direct writes                                 0

3 rows selected.

SQL> @mystat
Enter value for statname: sorts

NAME                                                    VALUE
-------------------------------------------------- ----------
sorts (memory)                                             21
sorts (disk)                                                0
sorts (rows)                                               86

3 rows selected.

SQL> insert into T_TGT select /*+ index_asc(t_source) */ * from T_SOURCE order by 1  ;

1000000 rows created.

Elapsed: 00:00:13.96
SQL> set timing off
SQL> select count(*) from t_tgt where rownum < 10;

  COUNT(*)
----------
         9

1 row selected.

SQL> commit;

Commit complete.

SQL> @mystat
Enter value for statname: redo size

NAME                                                    VALUE
-------------------------------------------------- ----------
redo size                                           500630220
redo size for lost write detection                          0
redo size for direct writes                                 0

3 rows selected.

SQL> @mystat
Enter value for statname: sorts

NAME                                                    VALUE
-------------------------------------------------- ----------
sorts (memory)                                             26
sorts (disk)                                                0
sorts (rows)                                              146

3 rows selected.

SQL> @mystat
Enter value for statname: direct

NAME                                                    VALUE
-------------------------------------------------- ----------
db block gets direct                                        0
consistent gets direct                                      0
physical reads direct                                       0
physical writes direct                                      0
physical reads direct temporary tablespace                  0
physical writes direct temporary tablespace                 0

So for the conventional mode we observe

  • I didn’t have to sort the rows
  • I consumed similar amount of redo
  • It was three times faster than the case where we had to so all that sorting to disk

This just goes to show, that (like everything in software), the advice of “Do ‘X’ and in every case it will be faster” is never a good thing to adhere to.

Direct mode insert is a very cool facility, but it doesn’t mean that it’s going to be the best option in every situation.

Quick tip on Function Based Indexes

For “normal” indexes, USER_IND_COLUMNS will contain the column name(s) for an index, but things (appear to) go astray when looking at function-based indexes.


SQL> create table blah ( x varchar2(30));

Table created.

SQL> create index blah_ix on blah ( upper(x));

Index created.

SQL> select column_name from user_ind_columns
  2  where index_name = 'BLAH_IX'
  3  /

COLUMN_NAME
------------------------------
SYS_NC00002$

Don’t panic. Simply take a look at USER_IND_EXPRESSIONS to find the function that you used.


SQL> select column_expression from user_ind_expressions
  2  where index_name = 'BLAH_IX'
  3  /

COLUMN_EXPRESSION
--------------------------------------------------------------------------------
UPPER("X")

Easy peasy