Text indexes for numbers

We had an AskTOM question recently about being able to search for numbers within a concatenated list. The particular issue was a list of mobile phone numbers, but this is really just about tackling the larger issue of finding numbers within list.

Let’s create a simple example to see where things can break down (and how we can fix them).


SQL> create table t (  cid int, mobs varchar2(1000));

Table created.

SQL>
SQL> insert into t
  2  select c, listagg(num,',') within group ( order by num )
  3  from
  4  ( select mod(rownum,1000) c, trunc(dbms_random.value(1000000,9999999)) num from dual connect by level <= 4000 )
  5  group by c;

1000 rows created.

SQL>
SQL> exec dbms_stats.gather_table_stats('','T');

PL/SQL procedure successfully completed.

SQL>
SQL> select * from t
  2  where rownum <= 10;

       CID MOBS
---------- --------------------------------------------------
         0 2644307,3565512,5481105,7725189
         1 1570287,2092729,6127058,6546683
         2 6018800,6408347,6592531,8456137
         3 2087673,3086382,6692756,9377699
         4 2964558,3887606,6305557,7441515
         5 2219544,4331436,5246494,5303583
         6 1005450,1625403,2271986,4493049
         7 2605217,5143371,7444316,9073658
         8 1205487,4660509,5148296,9578099
         9 3736741,8385346,8758352,9496363

10 rows selected.

My sample table has 1000 rows and there a four “mobile phone” numbers concatenated into a list for each row.

(Side note: I’m not using 555-prefixed numbers like you’ll see in the movies Smile Why phone numbers in movies start with 555)

Now let us try query the table for one of the phone numbers as highlighted in red in the previous list.


SQL>
SQL> select * from t where mobs like '%7444316%';

       CID MOBS
---------- ----------------------------------------
         7 2605217,5143371,7444316,9073658

1 row selected.

SQL> set autotrace traceonly explain
SQL> select * from t where mobs like '%7444316%';

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    50 |  1800 |     5   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    50 |  1800 |     5   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MOBS" LIKE '%7444316%' AND "MOBS" IS NOT NULL)

I got the correct row back from the query, but the full scan of the table might be an issue as this table grows to larger sizes. (I’m working on the assumption here that a search for a single number will never return a large number of rows).

Since this is a “term that could be anywhere within the string” style of search, creating a Text index on the column seems an obvious choice. So I’ll do that and try again:


SQL> set autotrace off
SQL>
SQL> create index ix on t ( mobs ) indextype is ctxsys.context;

Index created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from t where contains(mobs,'7444316') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1339481741

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    36 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |     1 |    36 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | IX   |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("MOBS",'7444316')>0)

That is looking a lot better. I’ll be able to take advantage of this index….or so it might first appear! (Cue ominous sounding music…)

There is an ever so slight problem here – we don’t get the rows we need!


SQL> set autotrace off
SQL> select * from t where contains(mobs,'7444316') > 0;

no rows selected

In fact, even if we concocted a search for the value of the entire column, we still do not get a result back from our query.



SQL> select * from t where contains(mobs,'2605217,5143371,7444316,9073658') > 0;

no rows selected

You need to careful with indexing terms that are not what could be thought of as “common” text. After all, it is a Text index, and by default, the assumption here is that we are indexing document style text.

But the solution is easy – we just need to manipulate the standard lexer to avoid common numeric separators (comma, period, etc) perturbing our index entries. I’m opting for a tilde (~) here because it does not appear in the source column.



SQL>
SQL> drop index ix;

Index dropped.

SQL> begin
  2        ctx_ddl.drop_preference('my_lexer');
  3        ctx_ddl.create_preference('my_lexer', 'BASIC_LEXER');
  4        ctx_ddl.set_attribute('my_lexer', 'numgroup', '~');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> create index ix on t ( mobs ) indextype is ctxsys.context PARAMETERS('lexer my_lexer');

Index created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from t
  2  where contains(mobs,'7444316') > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 1339481741

------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    36 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T    |     1 |    36 |     4   (0)| 00:00:01 |
|*  2 |   DOMAIN INDEX              | IX   |       |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("MOBS",'7444316')>0)

SQL>
SQL> set autotrace off
SQL> select * from t
  2  where contains(mobs,'7444316') > 0;

       CID MOBS
---------- --------------------------------------------------
         7 2605217,5143371,7444316,9073658

1 row selected.

Our execution plan still shows that we can take advantage of the Text index, but I’m also getting the results I expected.

As an aside, Text indexes are one of those gems in the Oracle database that often gets a great amount of new functionality with each new release. So every time you upgrade, take a fresh look at the Text Index documentation. You might get a lot of cool ideas for how to use them in your applications.

Cool stuff with partition elimination

Sometimes in the IT world, the term “surprise” is not a good one.

“I woke up this morning and got a surprise…my database was down.”

“I ran a SELECT COUNT(*) on my most important table, and got a surprise result of zero rows.”

and so forth. Generally as IT professionals, encountering the unexpected is not a good start to the day Smile.

But sometimes, surprises can be a nice thing.  Here is my example for today – when I found that the database can do a remarkably good job when it comes to reducing the workload with partitioned tables.  

Most people will already be familiar with the concept of partition elimination.  If you have a table partitioned into (say) yearly segments, and you ask for all of the data for the past 2 years, then the optimizer is intelligent enough to only scan the relevant partitions rather than the entire table.  Here is an example of that in action. I have a table that is partitioned by year on a END_DATE column, and then sub-partitioned into quarters based on a START_DATE column.


SQL> create table t
  2      partition by range( end_dt )
  3      subpartition by range( start_dt )
  4      (
  5         partition p_2014 values less than ( to_date('01-Jan-2015','dd-mon-yyyy') )
  6                 (
  7                         subpartition sp_2014_q1 values less than ( to_date('01-apr-2014','dd-mon-yyyy') ) ,
  8                         subpartition sp_2014_q2 values less than ( to_date('01-jul-2014','dd-mon-yyyy') ) ,
  9                         subpartition sp_2014_q3 values less than ( to_date('01-oct-2014','dd-mon-yyyy') ) ,
 10                         subpartition sp_2014_q4 values less than ( to_date('01-jan-2015','dd-mon-yyyy') )
 11                ) ,
 12        partition p_2015 values less than ( to_date('01-Jan-2016','dd-mon-yyyy') )
 13                (
 14                        subpartition sp_2015_q1 values less than ( to_date('01-apr-2015','dd-mon-yyyy') ) ,
 15                        subpartition sp_2015_q2 values less than ( to_date('01-jul-2015','dd-mon-yyyy') ) ,
 16                        subpartition sp_2015_q3 values less than ( to_date('01-oct-2015','dd-mon-yyyy') ) ,
 17                        subpartition sp_2015_q4 values less than ( to_date('01-jan-2016','dd-mon-yyyy') )
 18                )
 19     )
 20     as
 21     select a.* ,
 22        to_date('01-Jan-2014','dd-mon-yyyy') + mod(rownum,730)-1 as start_dt,
 23        to_date('01-Jan-2014','dd-mon-yyyy') + mod(rownum,730) as end_dt
 24     from all_objects a;

Table created.

SQL>
SQL> alter table t modify start_dt not null;

Table altered.

SQL> alter table t modify end_dt not null;

Table altered.

SQL>
SQL> set autotrace traceonly explain
SQL> select *
  2      from t
  3      where start_dt = to_date('13-Jan-2015','dd-mon-yyyy') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 2183203583

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL    |      |   101 | 15049 |   491   (1)| 00:00:01 |     1 |     2 |
|   2 |   PARTITION RANGE SINGLE|      |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
|*  3 |    TABLE ACCESS FULL    | T    |   101 | 15049 |   491   (1)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_DT"=TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


You can see that we had to scan all of the partitions, but within each of the partitions we only had to scan a single sub-partition (as indicated by “PARTITION RANGE SINGLE”).  So for each year (based on END_DT) we scanned a single one of the 4 subpartitions.  We are doing only 25% of the work of scanning the entire table.  But one pleasant surprise I saw today was how the optimizer can take advantage of additional information to improve things even more.  Let us now add a fairly obvious rule about the data:


SQL>
SQL> alter table t add constraint t_chk check( start_dt < end_dt );

Table altered.


And have another look at that query execution plan.


SQL>
SQL> set autotrace traceonly explain
SQL> select *
  2      from t
  3      where start_dt = to_date('13-Jan-2015','dd-mon-yyyy') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3874588989

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |   101 | 15049 |   247   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE |      |   101 | 15049 |   247   (1)| 00:00:01 |     2 |     2 |
|   2 |   PARTITION RANGE SINGLE|      |   101 | 15049 |   247   (1)| 00:00:01 |     1 |     1 |
|*  3 |    TABLE ACCESS FULL    | T    |   101 | 15049 |   247   (1)| 00:00:01 |     5 |     5 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("START_DT"=TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "END_DT">TO_DATE(' 2015-01-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

SQL>
SQL>

How cool is that!  The constraint let us remove even more partitions from consideration. In fact, we ended up only scanning a single partition. Moreover, we never specified END_DT as a predicate, but you can see in the FILTER section, we could synthesize such a predicate using the rule defined by the check constraint we added.

I suppose the moral of the story is two-fold here.

1) The optimizer can be pretty smart with partition elimination,

2) As always, never hide rules and facts about the data from the database.  Make sure you define those constraints in the database tier.

Choose your names wisely

It all seems simple enough.  You pick a name for an object…and that is the only object that have than name right ?  For example:


SQL> create table MY_UNIQUE_NAME ( x int );

Table created.

SQL>
SQL> create synonym MY_UNIQUE_NAME for MY_UNIQUE_NAME;
create synonym MY_UNIQUE_NAME for MY_UNIQUE_NAME
*
ERROR at line 1:
ORA-01471: cannot create a synonym with same name as object


SQL>
SQL> create view MY_UNIQUE_NAME as select * from MY_UNIQUE_NAME;
create view MY_UNIQUE_NAME as select * from MY_UNIQUE_NAME
            *
ERROR at line 1:
ORA-00955: name is already used by an existing object


SQL>
SQL> create package MY_UNIQUE_NAME is
  2    x int;
  3  end;
  4  /
create package MY_UNIQUE_NAME is
*
ERROR at line 1:
ORA-00955: name is already used by an existing object


So naturally we’d expect an error if we continued on with other object types.  But then this happens ?


SQL> create or replace
  2  trigger MY_UNIQUE_NAME
  3  before insert on MY_UNIQUE_NAME
  4  begin
  5    null;
  6  end;
  7  /

Trigger created.

SQL>
SQL> create index MY_UNIQUE_NAME on MY_UNIQUE_NAME ( x );

Index created.

It might all seem a bit random.  But this is all related to what is known as the NAMESPACE, that is, names of certain objects must be unique within a particular NAMESPACE.  We can see the namespaces for the objects by querying DBA_OBJECTS



SQL> select distinct namespace, object_type from dba_Objects order by 1,2;

 NAMESPACE OBJECT_TYPE
---------- -----------------------
         1 CREDENTIAL
         1 DESTINATION
         1 FUNCTION
         1 INDEXTYPE
         1 JAVA CLASS
         1 JOB
         1 JOB CLASS
         1 LIBRARY
         1 OPERATOR
         1 PACKAGE
         1 PROCEDURE
         1 PROGRAM
         1 SCHEDULE
         1 SCHEDULER GROUP
         1 SEQUENCE
         1 SYNONYM
         1 TABLE
         1 TABLE PARTITION
         1 TABLE SUBPARTITION
         1 TYPE
         1 VIEW
         1 WINDOW
         2 PACKAGE BODY
         2 TYPE BODY
         3 TRIGGER
         4 INDEX
         4 INDEX PARTITION
         5 CLUSTER
         8 LOB
         8 LOB PARTITION
         9 DIRECTORY
        10 QUEUE
        13 JAVA SOURCE
        14 JAVA RESOURCE
        19 MATERIALIZED VIEW
        21 CONTEXT
        23 RULE SET
        24 CONSUMER GROUP
        24 RESOURCE PLAN
        25 XML SCHEMA
        32 JAVA DATA
        36 RULE
        38 EVALUATION CONTEXT
        51 UNDEFINED
        52 UNDEFINED
        64 EDITION
        88 CONTAINER
        93 UNIFIED AUDIT POLICY
       132 LOCKDOWN PROFILE
           DATABASE LINK

50 rows selected.



Because indexes, triggers are in a different namespace to tables, synonyms, procedures, packages etc, they can share the same name.

I’d still probably recommend unique names across all namespaces, just so there is no confusion.

connor_speaking

NULL’s vs NOT NULL’s and Performance

When it comes to giving the cost based optimiser the best possible chance to make the “right” decisions, many DBA’s are diligent in keeping statistics up to date, using histograms where appropriate, creating more indexes (or removing surplus indexes).

However one often neglected area is that the the null-ness of columns also impacts the optimiser decisions. NULL and NOT NULL do more than just act as constraints, they also add (or detract) to the value of indexes on those columns. Here’s an example of how the null-ness of a column impacts optimizer decisions. I have a table T which is a copy of DBA_OBJECTS, indexed on OBJECT_ID.


SQL> create table t as
  2  select * from dba_objects
  3  where object_id is not null;

Table created.

SQL>
SQL> create index IX on T ( object_id );

Index created.

Now I want to perform a standard pagination style query, namely, get the first 5 rows in order of OBJECT_ID


SQL>
SQL> set autotrace traceonly
SQL> select *
  2  from
  3    ( select * from t
  4      order by object_id
  5    )
  6  where rownum <= 5;

5 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3299198703

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     5 |  2405 |       |  2755   (1)| 00:00:01 |
|*  1 |  COUNT STOPKEY          |      |       |       |       |            |          |
|   2 |   VIEW                  |      | 78750 |    36M|       |  2755   (1)| 00:00:01 |
|*  3 |    SORT ORDER BY STOPKEY|      | 78750 |     9M|    14M|  2755   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL   | T    | 78750 |     9M|       |   428   (1)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)
   3 - filter(ROWNUM<=5)


Statistics
----------------------------------------------------------
         52  recursive calls
        130  db block gets
       1591  consistent gets
          0  physical reads
      25420  redo size
       2735  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          5  rows processed

I scanned the entire table and burned around 1500 consistent gets. Now I’ll let the database know what I already know, that is, that the OBJECT_ID column is contains no nulls. Then I’ll repeat the same query.


SQL>
SQL> alter table T modify object_id not null;

Table altered.

SQL>
SQL> select *
  2  from
  3    ( select * from t
  4      order by object_id
  5    )
  6  where rownum <= 5;

5 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3114946973

--------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |     5 |  2405 |     3   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |      |       |       |            |          |
|   2 |   VIEW                        |      |     5 |  2405 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T    | 78750 |     9M|     3   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | IX   |     5 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)


Statistics
----------------------------------------------------------
         85  recursive calls
        132  db block gets
         72  consistent gets
          1  physical reads
      27192  redo size
       2735  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          5  rows processed

Woo hoo! 20 times more efficient. The extra information we provided to the database allowed for more execution paths to be considered by the optimizer.

That’s all well and good. By how do we know which columns might be missing an appropriate NOT NULL constraint ?

Well, the following routine might help Smile

The PL/SQL procedure below deduces a ‘number of rows that are null’ count for all columns that are indexed for all tables in the schema (passed as parameter P_SCHEMA), although you can pass a table name to restrict the procedure to just that table. For any column that contains no nulls, the appropriate ‘alter table … ( column_name not null)’ command will be outputted. (Make sure you set serveroutput on).

Notes:

  • The client for whom I wrote this script, had no NOT NULL constraints on any table so the procedure only takes a rough stab at ALL_CONSTRAINTS for existing constraints.
  • It tries to keep things sensible – avoiding complex data types, tables that are auto-generated or used for queues etc, but there’s no guarantee it won’t either miss a table, or pick up it should not.
  • This is a brute force approach – it works by scanning every table in the schema, so use your discretion as to when would be a suitable time for running this routine.  But it will only scan each table once to determine the null count for all candidate columns.
  • (As with any diagnosis script), you should not apply it’s recommendations without some careful thought first.

12.2 version


create or replace 
procedure check_indexed_columns_for_null(
                 p_schema varchar2, 
                 p_table_name varchar2 default null) is
  cursor x is
   select 
     table_name,
     column_name,
     count(*) over ( partition by table_name ) as colcount
     from
     (
       select 
         table_name,
         column_name,
         min(existing_constraint)
       from 
       (
       select  
           a.table_name, 
           a.column_name, 
           ( select  count(*)
             from    all_constraints x,
                     all_cons_columns cc
             where   x.owner = c.owner
             and     x.table_name = c.table_name
             and     cc.owner      = x.owner
             and     cc.constraint_name = x.constraint_name
             and     
            (
               ( x.constraint_type = 'C'  and replace(search_condition_vc,'"') = a.column_name||' IS NOT NULL' )
                   or 
               ( x.constraint_type = 'P'  and cc.column_name = a.column_name )
             )
            ) existing_constraint
       from 
         all_ind_columns a,
         all_tables c,
         all_tab_columns ccol
       where a.index_owner = p_schema
       and a.index_owner = p_schema
       and a.table_name = nvl(upper(p_table_name),a.table_name)
       and c.table_name = a.table_name
       and c.owner      = a.table_owner
       and c.owner      = ccol.owner
       and c.table_name = ccol.table_name
       and a.column_name = ccol.column_name
       and c.secondary = 'N'
       and c.temporary = 'N'
       and c.nested    = 'NO'
       and c.external  = 'NO'
       and ccol.data_type_owner is null
       and ccol.data_type not in ('LONG','LONG RAW','CLOB','UROWID','UNDEFINED','NCLOB','BLOB','BFILE','ROWID')
       and (c.owner,c.table_name) not in ( select owner, queue_table from all_queue_tables where owner = p_schema )
       )
       group by 
         table_name,
         column_name
       having min(existing_constraint) = 0
     );

  str0 varchar2(32767); 
  str1 varchar2(32767); 
  str2 varchar2(32767); 
  str3 varchar2(32767);

  prev varchar2(100) := '*';
  cnt number;
  trailer varchar2(5);

procedure do_sql(thesql varchar2) is
  tcursor integer;
  dummy integer;
begin
  -- dbms_output.put_line(thesql);
  execute immediate thesql;
end;

begin
for i in x loop
  if prev != i.table_name then
    str0 := 'declare ';
    str1 := 'begin select '; str2 := ' into ';
    str3 := ' '; cnt := 1;
  end if;
  if cnt = i.colcount then 
    trailer := ' ';
  else
    trailer := ','||chr(10);
  end if;
  str0 := str0 || 'v'||ltrim(cnt)||' number;';
  str1 := str1 || 'sum(decode('||i.column_name||',null,1,0))'||trailer;
  str2 := str2 || 'v'||ltrim(cnt)||trailer;
  str3 := str3 || 'if v'||ltrim(cnt)||' = 0 then '||
    'dbms_output.put_line(''alter table '||p_schema||'.'||i.table_name||
    ' modify ('||i.column_name||' not null);''); end if;'||chr(10);
  if cnt = i.colcount then
    str2 := str2 ||' from '||p_schema||'.'||i.table_name||';';
    str3 := str3 ||' end;';
    do_sql(str0||' '||str1||' '||str2||' '||str3);
  end if;
  prev := i.table_name;
  cnt := cnt + 1;
end loop;
end;
/
sho err

set serverout on
exec check_indexed_columns_for_null('SCOTT')

11.2 version

(Needs a direct CREATE TABLE grant to the owner of the procedure)


create or replace
procedure check_indexed_columns_for_null(
                 p_schema varchar2,
                 p_table_name varchar2 default null) is
  cursor c_template is
    select table_name, column_name, 0 colcount
    from   all_tab_columns;

  type rowlist is table of c_template%rowtype;  
  r    rowlist;
  rc   sys_refcursor;

  str0 varchar2(32767);
  str1 varchar2(32767);
  str2 varchar2(32767);
  str3 varchar2(32767);

  prev varchar2(100) := '*';
  cnt number;
  trailer varchar2(5);

procedure do_sql(thesql varchar2) is
  tcursor integer;
  dummy integer;
begin
  dbms_output.put_line(thesql);
  execute immediate thesql;
end;

begin

  begin
    execute immediate 'drop table tmp$cons purge';
  exception when others then null;
  end;

  execute immediate 'create table tmp$cons as  select owner, table_name, constraint_name, constraint_type, to_lob(search_condition) search_condition_vc '||
                    'from all_constraints';

  open rc for
    q'{select
      table_name,
      column_name,
      count(*) over ( partition by table_name ) as colcount
      from
      (
        select
          table_name,
          column_name,
          min(existing_constraint)
        from
        (
        select
            a.table_name,
            a.column_name,
            ( select  count(*)
              from    ( select owner, table_name, constraint_name, constraint_type, cast(search_condition_vc as varchar2(4000)) search_condition_vc 
                        from tmp$cons ) x,
                      all_cons_columns cc
              where   x.owner = c.owner
              and     x.table_name = c.table_name
              and     cc.owner      = x.owner
              and     cc.constraint_name = x.constraint_name
              and
             (
                ( x.constraint_type = 'C'  and replace(search_condition_vc,'"') = a.column_name||' IS NOT NULL' )
                    or
                ( x.constraint_type = 'P'  and cc.column_name = a.column_name )
              )
             ) existing_constraint
        from
          all_ind_columns a,
          all_tables c,
          all_tab_columns ccol
        where a.index_owner = :p_schema
        and a.index_owner = :p_schema
        and a.table_name = nvl(upper(:p_table_name),a.table_name)
        and c.table_name = a.table_name
        and c.owner      = a.table_owner
        and c.owner      = ccol.owner
        and c.table_name = ccol.table_name
        and a.column_name = ccol.column_name
        and c.secondary = 'N'
        and c.temporary = 'N'
        and c.nested    = 'NO'
        and (c.owner,c.table_name) not in ( select owner, table_name from all_external_tables where owner = :p_schema )
        and ccol.data_type_owner is null
        and ccol.data_type not in ('LONG','LONG RAW','CLOB','UROWID','UNDEFINED','NCLOB','BLOB','BFILE','ROWID')
        and (c.owner,c.table_name) not in ( select owner, queue_table from all_queue_tables where owner = :p_schema )
        )
        group by
          table_name,
          column_name
        having min(existing_constraint) = 0
      )
      }' using p_schema,p_schema,p_table_name,p_schema,p_schema;

  fetch rc bulk collect into r;
  close rc;
  
  for i in 1 .. r.count loop
    if prev != r(i).table_name then
      str0 := 'declare ';
      str1 := 'begin select '; str2 := ' into ';
      str3 := ' '; cnt := 1;
    end if;
    if cnt = r(i).colcount then
      trailer := ' ';
    else
      trailer := ','||chr(10);
    end if;
    str0 := str0 || 'v'||ltrim(cnt)||' number;';
    str1 := str1 || 'sum(decode('||r(i).column_name||',null,1,0))'||trailer;
    str2 := str2 || 'v'||ltrim(cnt)||trailer;
    str3 := str3 || 'if v'||ltrim(cnt)||' = 0 then '||
      'dbms_output.put_line(''alter table '||p_schema||'.'||r(i).table_name||
      ' modify ('||r(i).column_name||' not null);''); end if;'||chr(10);
    if cnt = r(i).colcount then
      str2 := str2 ||' from '||p_schema||'.'||r(i).table_name||';';
      str3 := str3 ||' end;';
      do_sql(str0||' '||str1||' '||str2||' '||str3);
    end if;
    prev := r(i).table_name;
    cnt := cnt + 1;
  end loop;
end;
/

 

If you want to see the underlying table scan queries that are being run, simply comment back in the “dbms_output.put_line” in the DO_SQL subroutine.

Enjoy!

Getting started…adding an account to use

If you’ve read my previous post about getting started with the Oracle database, then hopefully you now have your very own database installed and running, and you have a explored a little with the sample schemas using SQL Developer.  Perhaps now you want to venture out into your own database development, and for that, you will want to create your own user account and create your own tables.  Here’s another video which will guide you through the process.

The video shows the process of creating a user via SQL Developer. If you are a command line addict, then here’s the commands you will need run to enact the same. You can use SQL Plus or SQLcl to run these commands. If you’ve never done that, simply

  • open up a terminal or MS-DOC command window,
  • type sqlplus and hit Enter
  • Type SYSTEM for the username and hit Enter
  • Type the password you specified when you created the database as per my previous blog post.

You will see a prompt “SQL>”.  You enter the following commands to create your first user account.


SQL> create user DEVELOPER identified by MyCaseSensitivePassword;

User created.

SQL> grant connect, resource to DEVELOPER;

Grant succeeded.

SQL> alter user DEVELOPER quota unlimited on USERS;

User altered.

And you are now good to go. Fire up SQL Developer as shown in the video, and you are ready to connect as your new user account as shown at this moment in the video.

Remember – we’re doing this to learn and explore Oracle. It is very easy to see all of functionality available to you, and to leap in and start building a fully-fledged application.  But don’t forget – for that, you need more than just a play-thing at home. You need backups; you need security; you need scalability; you need redundancy and failover.  So if you’re ready to get serious, you probably need to think about using a cloud account or similar.

Enjoy your database !

Clone a table

Sometimes doing a CREATE TABLE AS SELECT is all we need to copy the data from an existing table.  But what if we want more than that ?  What if we really want to clone that table to match the original as closely as possible.  We had a question along these lines on AskTOM today.  A standard CTAS copies the NOT NULL attributes and the data types, but not really much else.  We know that Data Pump will take care of it, but that is more complex than a simple CTAS.

So here is a simple routine to wrap the Data Pump calls so that the CTAS can be achieved with just as simple a command.  A database link pointing back to the same database is all we need.

Note:  The true innovation in this blog post came from Laurent’s excellent idea here.  I am just adding a small wrapper to make the process a little more palatable.  So all credit to Laurent here please.


SQL> create table emp as select * from scott.emp;

Table created.

SQL> create sequence seq start with 8000;

Sequence created.

SQL> alter table emp modify empno default seq.nextval;

Table altered.

SQL> alter table emp add primary key ( empno );

Table altered.

SQL> alter table emp add unique ( ename );

Table altered.

SQL> alter table emp compress;

Table altered.

SQL> alter table emp enable row movement;

Table altered.

And here is the routine to clone it, whilst keeping all of those additional bits of metadata.


SQL> create or replace
  2  procedure clone_tab(p_source varchar2,p_target varchar2) is
  3    n number;
  4    g varchar2(30);
  5    j varchar2(30);
  6  begin
  7    select global_name into g from global_name;
  8    begin
  9      execute immediate 'alter session close database link tmp$1';
 10    exception
 11      when others then null;
 12    end;
 13
 14    begin
 15      execute immediate 'drop database link tmp$1';
 16    exception
 17      when others then null;
 18    end;
 19
 20    execute immediate 'create database link tmp$1 using '''||g||'''';
 21
 22    if p_target like '%.%' or p_source like '%.%' then
 23      raise_application_error(-20000,'No schema prefix allowed');
 24    end if;
 25
 26    n := dbms_datapump.open('IMPORT','TABLE','TMP$1');
 27    dbms_datapump.metadata_filter(n,'NAME_LIST',''''||upper(p_source)||'''');
 28    dbms_datapump.metadata_remap(n,'REMAP_TABLE',upper(p_source),upper(p_target));
 29    dbms_datapump.start_job(n);
 30    dbms_datapump.wait_for_job(n, j);
 31  end;
 32  /

Procedure created.

SQL>
SQL> set serverout on
SQL> exec clone_tab('emp','emp2');

PL/SQL procedure successfully completed.

SQL>
SQL> select dbms_metadata.get_ddl('TABLE','EMP2',user) from dual
  2
SQL> select dbms_metadata.get_ddl('TABLE','EMP2',user) from dual;

DBMS_METADATA.GET_DDL('TABLE','EMP2',USER)
---------------------------------------------------------------------------------------

  CREATE TABLE "MCDONAC"."EMP2"
   (    "EMPNO" NUMBER(4,0) DEFAULT "MCDONAC"."SEQ"."NEXTVAL" NOT NULL ENABLE,
        "ENAME" VARCHAR2(10),
        "JOB" VARCHAR2(9),
        "MGR" NUMBER(4,0),
        "HIREDATE" DATE,
        "SAL" NUMBER(7,2),
        "COMM" NUMBER(7,2),
        "DEPTNO" NUMBER(2,0),
         UNIQUE ("ENAME")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE,
         PRIMARY KEY ("EMPNO")
  USING INDEX PCTFREE 10 INITRANS 2 MAXTRANS 255 COMPUTE STATISTICS
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
 COMPRESS BASIC LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS"  ENABLE ROW MOVEMENT

SQL> select count(*) from emp;

  COUNT(*)
----------
        14

SQL> select count(*) from emp2;

  COUNT(*)
----------
        14

Licensed for Advanced Compression? Don’t forget the network

We often think of Advanced Compression being exclusively about compressing data “at rest”, ie, on some sort of storage device.  And don’t get me wrong, if we consider just that part of Advanced Compression, that still covers a myriad of opportunities that could yield benefits for your databases and database applications:

  • Heat maps
  • Automatic Data Optimization
  • XML, JSON and LOB compression (including de-duplication)
  • Compression on backups
  • Compression on Data Pump files
  • Additional compression options on indexes and tables
  • Compressed Flashback Data Archive storage
  • Storage snapshot compression

However, if you are licensed for the option, there are other things that you can also take advantage of when it comes to compression of data on the network.

In 12c, Advanced Network Compression is part of the Advanced Compression option.  Obviously, the modern data centre typically co-locates application servers and database servers within a high speed high bandwidth network, but once data needs to be accessed over less optimal networks, for example, users running ad-hoc queries to their desktops, then compressing data across the wire can yield benefits.

Trust me, when you live in Perth, Western Australia, any kind of improvement to access data across the network is a good thing! Smile

Here’s an example of the option in action. Here’s my source table – which I’ve deliberately created with lots of potential for compression:


SQL create table comp_Test
  2  as select rownum x, rpad('x',1000,'x') y from dual
  3  connect by level <= 100000;

Table created.

Every row is almost a replica of the others, and column Y is 1000 repeated characters. So it is an artificial but perfect candidate to see the compression in action. To ensure that the network transmission of the data is a impactful factor here, I’ve located this table in a “database server” (my spare laptop) accessed via a slow wireless network from my other machine. We can see the impact of the network by starting with the default settings

Default settings


C:\temp>sqlplus scott/tiger@pdb122

SQL*Plus: Release 12.2.0.1.0 Production on Wed Dec 27 13:48:05 2017

Copyright (c) 1982, 2016, Oracle.  All rights reserved.

Last Successful login time: Wed Dec 27 2017 13:48:26 +08:00

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production


SQL> set timing on
SQL> set arraysize 1000
SQL> set feedback only
SQL> select * from scott.comp_test;

100000 rows selected.

Elapsed: 00:00:24.68

So a touch more than 1000 bytes * 100000 rows yields the following session level statistics


SQL> @netstat

NAME                                                              VALUE
------------------------------------------------------------ ----------
bytes sent via SQL*Net to client                              102138497

Now I’ll activate compression by adding the following parameters to sqlnet.ora on client and server.


SQLNET.COMPRESSION=on
SQLNET.COMPRESSION_LEVELS=(low,high)
SQLNET.COMPRESSION_THRESHOLD=1024

and repeat the test in the a fresh connection. To see if we can potentially use compression, we can check V$SESSION_CONNECT_INFO


SQL> select sid, network_service_banner from V$SESSION_CONNECT_INFO 
  2  where sid = sys_context('USERENV','SID')
  3  @pr
==============================
SID                           : 324
NETWORK_SERVICE_BANNER        : Windows NT TCP/IP NT Protocol Adapter for 64-bit Windows: Version 12.2.0.1.0 - Production
==============================
SID                           : 324
NETWORK_SERVICE_BANNER        : Oracle Advanced Network Compression Service for 64-bit Windows: Version 12.2.0.1.0 - Production
==============================
SID                           : 324
NETWORK_SERVICE_BANNER        : NTS Authentication service adapter for 64-bit Windows: Version 2.0.0.0.0 - Production
==============================
SID                           : 324
NETWORK_SERVICE_BANNER        : Encryption service for 64-bit Windows: Version 12.2.0.1.0 - Production
==============================
SID                           : 324
NETWORK_SERVICE_BANNER        : Crypto-checksumming service for 64-bit Windows: Version 12.2.0.1.0 - Production

or if you enable a SQL*Net trace, you’ll entries like this in the log files


nlstddp_dump_ptable:   SQLNET.COMPRESSION_LEVELS = HIGH
nlstddp_dump_ptable:   SQLNET.COMPRESSION = ON

Compression enabled


SQL> set timing on
SQL> set arraysize 1000
SQL> set feedback only
SQL> select * from scott.comp_test;

100000 rows selected.

Elapsed: 00:00:01.93

You can see the huge difference in elapsed time. Unfortunately, as far I can ascertain, the session network statistics are derived after decompression, because the bytes transferred still reflect the original sizes


SQL> @netstat

NAME                                                              VALUE
------------------------------------------------------------ ----------
bytes sent via SQL*Net to client                              102114310

We can also take advantage of the larger SDU sizes available in 12c. Now my sqlnet.ora looks like this:


DEFAULT_SDU_SIZE=1048576
SQLNET.COMPRESSION=on
SQLNET.COMPRESSION_LEVELS=(low,high)
SQLNET.COMPRESSION_THRESHOLD=1024

and then the test can be repeated.

Compression on, larger SDU


SQL> set timing on
SQL> set arraysize 1000
SQL> set feedback only
SQL> select * from scott.comp_test;

100000 rows selected.

Elapsed: 00:00:00.98

(* – elapsed times are averaged across a number of executions)

There is a whitepaper on Advanced Compression with more details here, so if you have the license, make sure you take a look to see if there are some benefits for you to maximize the return on your investment.