Text indexes for numbers

We had an AskTOM question recently about being able to search for numbers within a concatenated list. The particular issue was a list of mobile phone numbers, but this is really just about tackling the larger issue of finding numbers within list.

Let’s create a simple example to see where things can break down (and how we can fix them).


SQL> create table t (  cid int, mobs varchar2(1000));

Table created.

SQL>
SQL> insert into t
  2  select c, listagg(num,',') within group ( order by num )
  3  from
  4  ( select mod(rownum,1000) c, trunc(dbms_random.value(1000000,9999999)) num from dual connect by level <= 4000 )
  5  group by c;

1000 rows created.

SQL>
SQL> exec dbms_stats.gather_table_stats('','T');

PL/SQL procedure successfully completed.

SQL>
SQL> select * from t
  2  where rownum <= 10;

       CID MOBS
---------- --------------------------------------------------
         0 2644307,3565512,5481105,7725189
         1 1570287,2092729,6127058,6546683
         2 6018800,6408347,6592531,8456137
         3 2087673,3086382,6692756,9377699
         4 2964558,3887606,6305557,7441515
         5 2219544,4331436,5246494,5303583
         6 1005450,1625403,2271986,4493049
         7 2605217,5143371,7444316,9073658
         8 1205487,4660509,5148296,9578099
         9 3736741,8385346,8758352,9496363

10 rows selected.

My sample table has 1000 rows and there a four “mobile phone” numbers concatenated into a list for each row.

(Side note: I’m not using 555-prefixed numbers like you’ll see in the movies Smile Why phone numbers in movies start with 555)

Now let us try query the table for one of the phone numbers as highlighted in red in the previous list.


SQL>
SQL> select * from t where mobs like '%7444316%';

       CID MOBS
---------- ----------------------------------------
         7 2605217,5143371,7444316,9073658

1 row selected.

SQL> set autotrace traceonly explain
SQL> select * from t where mobs like '%7444316%';

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    50 |  1800 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    50 |  1800 |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MOBS" LIKE '%7444316%' AND "MOBS" IS NOT NULL)

I got the correct row back from the query, but the full scan of the table might be an issue as this table grows to larger sizes. (I’m working on the assumption here that a search for a single number will never return a large number of rows).

Since this is a “term that could be anywhere within the string” style of search, creating a Text index on the column seems an obvious choice. So I’ll do that and try again:


SQL> set autotrace off
SQL>
SQL> create index ix on t ( mobs ) indextype is ctxsys.context;

Index created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from t where contains(mobs,'7444316') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1339481741

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    36 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |     1 |    36 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | IX   |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("MOBS",'7444316')>0)

That is looking a lot better. I’ll be able to take advantage of this index….or so it might first appear! (Cue ominous sounding music…)

There is an ever so slight problem here – we don’t get the rows we need!


SQL> set autotrace off
SQL> select * from t where contains(mobs,'7444316') > 0;

no rows selected

In fact, even if we concocted a search for the value of the entire column, we still do not get a result back from our query.



SQL> select * from t where contains(mobs,'2605217,5143371,7444316,9073658') > 0;

no rows selected

You need to careful with indexing terms that are not what could be thought of as “common” text. After all, it is a Text index, and by default, the assumption here is that we are indexing document style text.

But the solution is easy – we just need to manipulate the standard lexer to avoid common numeric separators (comma, period, etc) perturbing our index entries. I’m opting for a tilde (~) here because it does not appear in the source column.



SQL>
SQL> drop index ix;

Index dropped.

SQL> begin
  2        ctx_ddl.drop_preference('my_lexer');
  3        ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER');
  4        ctx_ddl.set_attribute('my_lexer', 'numgroup', '~');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> create index ix on t ( mobs ) indextype is ctxsys.context PARAMETERS('lexer my_lexer');

Index created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from t
  2  where contains(mobs,'7444316') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1339481741

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    36 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |     1 |    36 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | IX   |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("MOBS",'7444316')>0)

SQL>
SQL> set autotrace off
SQL> select * from t
  2  where contains(mobs,'7444316') > 0;

       CID MOBS
---------- --------------------------------------------------
         7 2605217,5143371,7444316,9073658

1 row selected.

Our execution plan still shows that we can take advantage of the Text index, but I’m also getting the results I expected.

As an aside, Text indexes are one of those gems in the Oracle database that often gets a great amount of new functionality with each new release. So every time you upgrade, take a fresh look at the Text Index documentation. You might get a lot of cool ideas for how to use them in your applications.

2 thoughts on “Text indexes for numbers

  1. Hello Connor,

    You will probably not believe me, but I have done exactly the same back in 2011 🙂

    And, the interesting was that I did it with practically no effective knowledge of Oracle Text 🙂

    The original query did not have a text index at all, but was using an INSTR in the WHERE condition.
    It produced the correct results because the separators in the string were handled correctly,
    but it was terribly slow !

    I just had a feeling that an Oracle text index might be the solution, and started with a quick browse
    through the documentation … and finally I was so proud of having been able to help “out of nowhere” 🙂

    It is a great feature indeed, and, since then, I have a permanent plan of reading more and mastering it
    “more consciously” and deeply one day 🙂

    Thanks a lot & Best Regards,
    Iudith Mentzel

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.