NULL’s vs NOT NULL’s and Performance

Posted by

When it comes to giving the cost based optimiser the best possible chance to make the “right” decisions, many DBA’s are diligent in keeping statistics up to date, using histograms where appropriate, creating more indexes (or removing surplus indexes).

However one often neglected area is that the the null-ness of columns also impacts the optimiser decisions. NULL and NOT NULL do more than just act as constraints, they also add (or detract) to the value of indexes on those columns. Here’s an example of how the null-ness of a column impacts optimizer decisions. I have a table T which is a copy of DBA_OBJECTS, indexed on OBJECT_ID.


SQL> create table t as
  2  select * from dba_objects
  3  where object_id is not null;

Table created.

SQL>
SQL> create index IX on T ( object_id );

Index created.

Now I want to perform a standard pagination style query, namely, get the first 5 rows in order of OBJECT_ID


SQL>
SQL> set autotrace traceonly
SQL> select *
  2  from
  3    ( select * from t
  4      order by object_id
  5    )
  6  where rownum <= 5;

5 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3299198703

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     5 |  2405 |       |  2755   (1)| 00:00:01 |
|*  1 |  COUNT STOPKEY          |      |       |       |       |            |          |
|   2 |   VIEW                  |      | 78750 |    36M|       |  2755   (1)| 00:00:01 |
|*  3 |    SORT ORDER BY STOPKEY|      | 78750 |     9M|    14M|  2755   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL   | T    | 78750 |     9M|       |   428   (1)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)
   3 - filter(ROWNUM<=5)


Statistics
----------------------------------------------------------
         52  recursive calls
        130  db block gets
       1591  consistent gets
          0  physical reads
      25420  redo size
       2735  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          5  rows processed

I scanned the entire table and burned around 1500 consistent gets. Now I’ll let the database know what I already know, that is, that the OBJECT_ID column is contains no nulls. Then I’ll repeat the same query.


SQL>
SQL> alter table T modify object_id not null;

Table altered.

SQL>
SQL> select *
  2  from
  3    ( select * from t
  4      order by object_id
  5    )
  6  where rownum <= 5;

5 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3114946973

--------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |     5 |  2405 |     3   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |      |       |       |            |          |
|   2 |   VIEW                        |      |     5 |  2405 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T    | 78750 |     9M|     3   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | IX   |     5 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)


Statistics
----------------------------------------------------------
         85  recursive calls
        132  db block gets
         72  consistent gets
          1  physical reads
      27192  redo size
       2735  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          5  rows processed

Woo hoo! 20 times more efficient. The extra information we provided to the database allowed for more execution paths to be considered by the optimizer.

That’s all well and good. By how do we know which columns might be missing an appropriate NOT NULL constraint ?

Well, the following routine might help Smile

The PL/SQL procedure below deduces a ‘number of rows that are null’ count for all columns that are indexed for all tables in the schema (passed as parameter P_SCHEMA), although you can pass a table name to restrict the procedure to just that table. For any column that contains no nulls, the appropriate ‘alter table … ( column_name not null)’ command will be outputted. (Make sure you set serveroutput on).

Notes:

  • The client for whom I wrote this script, had no NOT NULL constraints on any table so the procedure only takes a rough stab at ALL_CONSTRAINTS for existing constraints.
  • It tries to keep things sensible – avoiding complex data types, tables that are auto-generated or used for queues etc, but there’s no guarantee it won’t either miss a table, or pick up it should not.
  • This is a brute force approach – it works by scanning every table in the schema, so use your discretion as to when would be a suitable time for running this routine.  But it will only scan each table once to determine the null count for all candidate columns.
  • (As with any diagnosis script), you should not apply it’s recommendations without some careful thought first.

12.2 version


create or replace 
procedure check_indexed_columns_for_null(
                 p_schema varchar2, 
                 p_table_name varchar2 default null) is
  cursor x is
   select 
     table_name,
     column_name,
     count(*) over ( partition by table_name ) as colcount
     from
     (
       select 
         table_name,
         column_name,
         min(existing_constraint)
       from 
       (
       select  
           a.table_name, 
           a.column_name, 
           ( select  count(*)
             from    all_constraints x,
                     all_cons_columns cc
             where   x.owner = c.owner
             and     x.table_name = c.table_name
             and     cc.owner      = x.owner
             and     cc.constraint_name = x.constraint_name
             and     
            (
               ( x.constraint_type = 'C'  and replace(search_condition_vc,'"') = a.column_name||' IS NOT NULL' )
                   or 
               ( x.constraint_type = 'P'  and cc.column_name = a.column_name )
             )
            ) existing_constraint
       from 
         all_ind_columns a,
         all_tables c,
         all_tab_columns ccol
       where a.index_owner = p_schema
       and a.index_owner = p_schema
       and a.table_name = nvl(upper(p_table_name),a.table_name)
       and c.table_name = a.table_name
       and c.owner      = a.table_owner
       and c.owner      = ccol.owner
       and c.table_name = ccol.table_name
       and a.column_name = ccol.column_name
       and c.secondary = 'N'
       and c.temporary = 'N'
       and c.nested    = 'NO'
       and c.external  = 'NO'
       and ccol.data_type_owner is null
       and ccol.data_type not in ('LONG','LONG RAW','CLOB','UROWID','UNDEFINED','NCLOB','BLOB','BFILE','ROWID')
       and (c.owner,c.table_name) not in ( select owner, queue_table from all_queue_tables where owner = p_schema )
       )
       group by 
         table_name,
         column_name
       having min(existing_constraint) = 0
     );

  str0 varchar2(32767); 
  str1 varchar2(32767); 
  str2 varchar2(32767); 
  str3 varchar2(32767);

  prev varchar2(100) := '*';
  cnt number;
  trailer varchar2(5);

procedure do_sql(thesql varchar2) is
  tcursor integer;
  dummy integer;
begin
  -- dbms_output.put_line(thesql);
  execute immediate thesql;
end;

begin
for i in x loop
  if prev != i.table_name then
    str0 := 'declare ';
    str1 := 'begin select '; str2 := ' into ';
    str3 := ' '; cnt := 1;
  end if;
  if cnt = i.colcount then 
    trailer := ' ';
  else
    trailer := ','||chr(10);
  end if;
  str0 := str0 || 'v'||ltrim(cnt)||' number;';
  str1 := str1 || 'sum(decode('||i.column_name||',null,1,0))'||trailer;
  str2 := str2 || 'v'||ltrim(cnt)||trailer;
  str3 := str3 || 'if v'||ltrim(cnt)||' = 0 then '||
    'dbms_output.put_line(''alter table '||p_schema||'.'||i.table_name||
    ' modify ('||i.column_name||' not null);''); end if;'||chr(10);
  if cnt = i.colcount then
    str2 := str2 ||' from '||p_schema||'.'||i.table_name||';';
    str3 := str3 ||' end;';
    do_sql(str0||' '||str1||' '||str2||' '||str3);
  end if;
  prev := i.table_name;
  cnt := cnt + 1;
end loop;
end;
/
sho err

set serverout on
exec check_indexed_columns_for_null('SCOTT')

11.2 version

(Needs a direct CREATE TABLE grant to the owner of the procedure)


create or replace
procedure check_indexed_columns_for_null(
                 p_schema varchar2,
                 p_table_name varchar2 default null) is
  cursor c_template is
    select table_name, column_name, 0 colcount
    from   all_tab_columns;

  type rowlist is table of c_template%rowtype;  
  r    rowlist;
  rc   sys_refcursor;

  str0 varchar2(32767);
  str1 varchar2(32767);
  str2 varchar2(32767);
  str3 varchar2(32767);

  prev varchar2(100) := '*';
  cnt number;
  trailer varchar2(5);

procedure do_sql(thesql varchar2) is
  tcursor integer;
  dummy integer;
begin
  dbms_output.put_line(thesql);
  execute immediate thesql;
end;

begin

  begin
    execute immediate 'drop table tmp$cons purge';
  exception when others then null;
  end;

  execute immediate 'create table tmp$cons as  select owner, table_name, constraint_name, constraint_type, to_lob(search_condition) search_condition_vc '||
                    'from all_constraints';

  open rc for
    q'{select
      table_name,
      column_name,
      count(*) over ( partition by table_name ) as colcount
      from
      (
        select
          table_name,
          column_name,
          min(existing_constraint)
        from
        (
        select
            a.table_name,
            a.column_name,
            ( select  count(*)
              from    ( select owner, table_name, constraint_name, constraint_type, cast(search_condition_vc as varchar2(4000)) search_condition_vc 
                        from tmp$cons ) x,
                      all_cons_columns cc
              where   x.owner = c.owner
              and     x.table_name = c.table_name
              and     cc.owner      = x.owner
              and     cc.constraint_name = x.constraint_name
              and
             (
                ( x.constraint_type = 'C'  and replace(search_condition_vc,'"') = a.column_name||' IS NOT NULL' )
                    or
                ( x.constraint_type = 'P'  and cc.column_name = a.column_name )
              )
             ) existing_constraint
        from
          all_ind_columns a,
          all_tables c,
          all_tab_columns ccol
        where a.index_owner = :p_schema
        and a.index_owner = :p_schema
        and a.table_name = nvl(upper(:p_table_name),a.table_name)
        and c.table_name = a.table_name
        and c.owner      = a.table_owner
        and c.owner      = ccol.owner
        and c.table_name = ccol.table_name
        and a.column_name = ccol.column_name
        and c.secondary = 'N'
        and c.temporary = 'N'
        and c.nested    = 'NO'
        and (c.owner,c.table_name) not in ( select owner, table_name from all_external_tables where owner = :p_schema )
        and ccol.data_type_owner is null
        and ccol.data_type not in ('LONG','LONG RAW','CLOB','UROWID','UNDEFINED','NCLOB','BLOB','BFILE','ROWID')
        and (c.owner,c.table_name) not in ( select owner, queue_table from all_queue_tables where owner = :p_schema )
        )
        group by
          table_name,
          column_name
        having min(existing_constraint) = 0
      )
      }' using p_schema,p_schema,p_table_name,p_schema,p_schema;

  fetch rc bulk collect into r;
  close rc;
  
  for i in 1 .. r.count loop
    if prev != r(i).table_name then
      str0 := 'declare ';
      str1 := 'begin select '; str2 := ' into ';
      str3 := ' '; cnt := 1;
    end if;
    if cnt = r(i).colcount then
      trailer := ' ';
    else
      trailer := ','||chr(10);
    end if;
    str0 := str0 || 'v'||ltrim(cnt)||' number;';
    str1 := str1 || 'sum(decode('||r(i).column_name||',null,1,0))'||trailer;
    str2 := str2 || 'v'||ltrim(cnt)||trailer;
    str3 := str3 || 'if v'||ltrim(cnt)||' = 0 then '||
      'dbms_output.put_line(''alter table '||p_schema||'.'||r(i).table_name||
      ' modify ('||r(i).column_name||' not null);''); end if;'||chr(10);
    if cnt = r(i).colcount then
      str2 := str2 ||' from '||p_schema||'.'||r(i).table_name||';';
      str3 := str3 ||' end;';
      do_sql(str0||' '||str1||' '||str2||' '||str3);
    end if;
    prev := r(i).table_name;
    cnt := cnt + 1;
  end loop;
end;
/

 

If you want to see the underlying table scan queries that are being run, simply comment back in the “dbms_output.put_line” in the DO_SQL subroutine.

Enjoy!

4 comments

  1. Hey Connor,

    I simply copy-pasted your procedure code and executed it in 11.2.0.4 and it wasn’t able to compile, throwing several errors:-

    Error(49,12): PL/SQL: ORA-00904: “C”.”EXTERNAL”: invalid identifier
    Error(79,14): PLS-00364: loop index variable ‘i’ use is invalid

    Also, throws an error for “search_condition_vc” variable being undeclared.

  2. Hello Connor,

    Your innermost query is checking for the presence of a CHECK constraint with a “col IS NOT NULL” condition.

    As far as I know, there exists a difference between how the optimizer uses such a constraint
    vs NOT NULL-ness of a column.

    For example, a query containing a “col IS NULL” predicate will benefit from the presence of a NOT NULL
    defined for the column, but NOT from the presence of a check constraint with “col IS NOT NULL”.

    So, I wonder whether it is not advisable to alter a column to NOT NULL even it already does have such a
    check constraint, or even replace that check constraint by making the column NOT NULL instead.

    Second, regarding the example of the pagination query:

    I think that if the table does have correct statistics in place, the optimizer still does have enough information
    about the number of NULL vs NOT NULL values in the “object_id” column.
    So, considering the “high” number of NON-NULL values and the “low” number of NULL-s (in fact, effectively 0) and also that the possible (few) NULL-s always sort last when using an ascending ORDER BY,
    then for a pagination style query, at least in the case of a hardcoded limit like “ROWNUM <= 5",
    the optimizer could still come up with a better execution plan, in the worst case by performing a "concatenation" of the two operations, the INDEX FULL SCAN and the TABLE ACCESS FULL, when the second operation would be effectively executed only if the first one did not return "enough rows" (here 5).

    That is, a kind of "adapting execution plan" … maybe not yet implemented.

    Cheers & Best Regards,
    Iudith Mentzel

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.