I wrote a post a while back showing how one of the options with an Oracle Text index is to keep that index perfectly updated with its underlying table. (Long termers with Oracle may know that this is a relatively recent addition to Text indexes, which historically were updated with calls to CTXSYS.CTX_DDL.SYNC_INDEX, either on demand or via the scheduler).
I also said that there is “no such thing as a free lunch”, as refreshing that index for every single commit introduces overheads. So let’s explore that a little more thoroughly
First I sourced a file of hundreds of thousands of random english sentences, and exposed them to Oracle via an external table
SQL> DROP TABLE sentences;
Table dropped.
SQL>
SQL> CREATE TABLE sentences (
2 line VARCHAR2(4000)
3 )
4 ORGANIZATION EXTERNAL
5 (
6 TYPE ORACLE_LOADER
7 DEFAULT DIRECTORY temp
8 ACCESS PARAMETERS
9 (
10 RECORDS DELIMITED BY NEWLINE
11 BADFILE 'sentences.bad'
12 LOGFILE 'sentences.log'
13 FIELDS TERMINATED BY '@'
14 MISSING FIELD VALUES ARE NULL
15 (
16 line CHAR(4000)
17 )
18 )
19 LOCATION ('english.dat')
20 )
21 REJECT LIMIT UNLIMITED
22 /
Table created.
SQL> SELECT * FROM sentences where rownum <= 10;
LINE
-------------------------------------------------------------
Let's try something.
I have to go to sleep.
Today is June 18th and it is Muiriel's birthday!
Muiriel is 20 now.
The password is "Muiriel".
I will be back soon.
I'm at a loss for words.
This is never going to end.
I just don't know what to say.
That was an evil bunny.
10 rows selected.
And now we will load 50,000 of those sentences into a Text-indexed table, one row at a time, committing (and syncing) for every row inserted.
SQL> create table t1 (
2 x int,
3 y varchar2(3000) ) tablespace demo
4 ;
Table created.
SQL>
SQL>
SQL> begin
2 ctx_ddl.create_preference('demo_text_idx', 'BASIC_STORAGE');
3 ctx_ddl.set_attribute('demo_text_idx', 'I_TABLE_CLAUSE', 'tablespace demo');
4 ctx_ddl.set_attribute('demo_text_idx', 'K_TABLE_CLAUSE', 'tablespace demo');
5 ctx_ddl.set_attribute('demo_text_idx', 'R_TABLE_CLAUSE', 'tablespace demo');
6 ctx_ddl.set_attribute('demo_text_idx', 'N_TABLE_CLAUSE', 'tablespace demo');
7 ctx_ddl.set_attribute('demo_text_idx', 'I_INDEX_CLAUSE', 'tablespace demo');
8 end;
9 /
PL/SQL procedure successfully completed.
SQL> create index t1_iz on T1 ( y )
2 indextype is ctxsys.context
3 parameters ('sync (on commit) storage demo_text_idx');
Index created.
SQL> conn mcdonac/******
Connected.
SQL> set timing on
SQL>
SQL> begin
2 for i in ( select rownum r, line from sentences where rownum <= 50000 ) 3 loop 4 insert into t1 values (i.r, i.line ); 5 commit; 6 end loop; 7 end; 8 / PL/SQL procedure successfully completed. Elapsed: 00:14:34.98 SQL> @stat
[some stats omitted]
SID_STAT# NAME VALUE
---------- -------------------------------------------------- ----------
6 user commits 250000
9 recursive calls 4351635
14 session logical reads 7248007
19 CPU used by this session 35831
55 physical read total bytes 410017792
128 db block gets 2387570
132 consistent gets 4860437
146 physical read bytes 407281664
147 db block changes 3112305
148 consistent changes 50041
159 physical write bytes 712704
258 redo size 666954796
1089 execute count 2000136
1162 rows selected.
SQL> select segment_name, bytes from dba_segments
2 where tablespace_name = 'DEMO'
3 order by 1;
SEGMENT_NAME BYTES
---------------------------------------- ----------
DR$T1_IZ$I 14680064
DR$T1_IZ$R 65536
DR$T1_IZ$X 10485760
DRC$T1_IZ$R 65536
SYS_IL0000162067C00006$$ 65536
SYS_IL0000162072C00002$$ 65536
SYS_IOT_TOP_162070 2097152
SYS_IOT_TOP_162076 65536
SYS_LOB0000162067C00006$$ 131072
SYS_LOB0000162072C00002$$ 2293760
T1 3145728
11 rows selected.
So we can see that
- the process took a little over 14 minutes
- consumed around 660M of redo
- the segment size totalled around 31M
Let’s now repeat the exercise with a more relaxed strategy on index freshness. We will still commit every row inserted, but only sync the index every 100 rows.
SQL> drop table t1 purge;
Table dropped.
SQL>
SQL> create table t1 (
2 x int,
3 y varchar2(3000) ) tablespace demo
4 ;
Table created.
SQL>
SQL>
SQL> create index t1_iz on T1 ( y )
2 indextype is ctxsys.context
3 parameters ('storage demo_text_idx');
Index created.
SQL>
SQL> conn mcdonac/******
Connected.
SQL> begin
2 for i in ( select rownum r, line from sentences where rownum <= 50000 ) 3 loop 4 insert into t1 values (i.r, i.line ); 5 commit; 6 7 if mod(i.r,100) = 0 then 8 ctxsys.ctx_ddl.sync_index('T1_IZ'); 9 end if; 10 end loop; 11 end; 12 / PL/SQL procedure successfully completed. Elapsed: 00:00:38.45 SQL> select segment_name, bytes from dba_segments
2 where tablespace_name = 'DEMO'
3 order by 1;
SEGMENT_NAME BYTES
------------------------------ ----------
DR$T1_IZ$I 10485760
DR$T1_IZ$R 65536
DR$T1_IZ$X 7340032
DRC$T1_IZ$R 65536
SYS_IL0000162255C00006$$ 65536
SYS_IL0000162260C00002$$ 65536
SYS_IOT_TOP_162258 2097152
SYS_IOT_TOP_162264 65536
SYS_LOB0000162255C00006$$ 131072
SYS_LOB0000162260C00002$$ 2293760
T1 3145728
11 rows selected.
Elapsed: 00:00:00.02
SQL>
SQL> @stat
SID_STAT# NAME VALUE
---------- -------------------------------------------------- ----------
6 user commits 52000
9 recursive calls 349084
14 session logical reads 1352820
19 CPU used by this session 2719
55 physical read total bytes 6709248
128 db block gets 940374
132 consistent gets 412446
146 physical read bytes 4087808
147 db block changes 1184146
148 consistent changes 539
159 physical write bytes 712704
258 redo size 270443980
1089 execute count 176163
1162 rows selected.
By relaxing our sync strategy, now we have
- the process took less than 1 minute
- consumed around 270M of redo
- the segment size totalled around 24M
Hence if your document loading apparatus is high volume OLTP style, you might want to carefully consider the impact of saying to your customers “Hey, we’ll keep that index updated in real time”. If it truly is high volume OLTP style…it is unlikely they would even notice a more streamlined approach with regular sync’s, where “regular” is defined either on a time interval or pending-document count threshold. Conversely, if document loading is rare, then an on-commit strategy combined with an occasional optimize and/or rebuild might be perfectly adequate.
One comment