When it comes to giving the cost based optimiser the best possible chance to make the “right” decisions, many DBA’s are diligent in keeping statistics up to date, using histograms where appropriate, creating more indexes (or removing surplus indexes).
However one often neglected area is that the the null-ness of columns also impacts the optimiser decisions. NULL and NOT NULL do more than just act as constraints, they also add (or detract) to the value of indexes on those columns. Here’s an example of how the null-ness of a column impacts optimizer decisions. I have a table T which is a copy of DBA_OBJECTS, indexed on OBJECT_ID.
SQL> create table t as
2 select * from dba_objects
3 where object_id is not null;
Table created.
SQL>
SQL> create index IX on T ( object_id );
Index created.
Now I want to perform a standard pagination style query, namely, get the first 5 rows in order of OBJECT_ID
SQL>
SQL> set autotrace traceonly
SQL> select *
2 from
3 ( select * from t
4 order by object_id
5 )
6 where rownum <= 5;
5 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3299198703
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 2405 | | 2755 (1)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | | |
| 2 | VIEW | | 78750 | 36M| | 2755 (1)| 00:00:01 |
|* 3 | SORT ORDER BY STOPKEY| | 78750 | 9M| 14M| 2755 (1)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T | 78750 | 9M| | 428 (1)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=5)
3 - filter(ROWNUM<=5)
Statistics
----------------------------------------------------------
52 recursive calls
130 db block gets
1591 consistent gets
0 physical reads
25420 redo size
2735 bytes sent via SQL*Net to client
607 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
5 rows processed
I scanned the entire table and burned around 1500 consistent gets. Now I’ll let the database know what I already know, that is, that the OBJECT_ID column is contains no nulls. Then I’ll repeat the same query.
SQL>
SQL> alter table T modify object_id not null;
Table altered.
SQL>
SQL> select *
2 from
3 ( select * from t
4 order by object_id
5 )
6 where rownum <= 5;
5 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3114946973
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 | 2405 | 3 (0)| 00:00:01 |
|* 1 | COUNT STOPKEY | | | | | |
| 2 | VIEW | | 5 | 2405 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| T | 78750 | 9M| 3 (0)| 00:00:01 |
| 4 | INDEX FULL SCAN | IX | 5 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(ROWNUM<=5)
Statistics
----------------------------------------------------------
85 recursive calls
132 db block gets
72 consistent gets
1 physical reads
27192 redo size
2735 bytes sent via SQL*Net to client
607 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
5 rows processed
Woo hoo! 20 times more efficient. The extra information we provided to the database allowed for more execution paths to be considered by the optimizer.
That’s all well and good. By how do we know which columns might be missing an appropriate NOT NULL constraint ?
Well, the following routine might help
The PL/SQL procedure below deduces a ‘number of rows that are null’ count for all columns that are indexed for all tables in the schema (passed as parameter P_SCHEMA), although you can pass a table name to restrict the procedure to just that table. For any column that contains no nulls, the appropriate ‘alter table … ( column_name not null)’ command will be outputted. (Make sure you set serveroutput on).
Notes:
- The client for whom I wrote this script, had no NOT NULL constraints on any table so the procedure only takes a rough stab at ALL_CONSTRAINTS for existing constraints.
- It tries to keep things sensible – avoiding complex data types, tables that are auto-generated or used for queues etc, but there’s no guarantee it won’t either miss a table, or pick up it should not.
- This is a brute force approach – it works by scanning every table in the schema, so use your discretion as to when would be a suitable time for running this routine. But it will only scan each table once to determine the null count for all candidate columns.
- (As with any diagnosis script), you should not apply it’s recommendations without some careful thought first.
12.2 version
create or replace
procedure check_indexed_columns_for_null(
p_schema varchar2,
p_table_name varchar2 default null) is
cursor x is
select
table_name,
column_name,
count(*) over ( partition by table_name ) as colcount
from
(
select
table_name,
column_name,
min(existing_constraint)
from
(
select
a.table_name,
a.column_name,
( select count(*)
from all_constraints x,
all_cons_columns cc
where x.owner = c.owner
and x.table_name = c.table_name
and cc.owner = x.owner
and cc.constraint_name = x.constraint_name
and
(
( x.constraint_type = 'C' and replace(search_condition_vc,'"') = a.column_name||' IS NOT NULL' )
or
( x.constraint_type = 'P' and cc.column_name = a.column_name )
)
) existing_constraint
from
all_ind_columns a,
all_tables c,
all_tab_columns ccol
where a.index_owner = p_schema
and a.index_owner = p_schema
and a.table_name = nvl(upper(p_table_name),a.table_name)
and c.table_name = a.table_name
and c.owner = a.table_owner
and c.owner = ccol.owner
and c.table_name = ccol.table_name
and a.column_name = ccol.column_name
and c.secondary = 'N'
and c.temporary = 'N'
and c.nested = 'NO'
and c.external = 'NO'
and ccol.data_type_owner is null
and ccol.data_type not in ('LONG','LONG RAW','CLOB','UROWID','UNDEFINED','NCLOB','BLOB','BFILE','ROWID')
and (c.owner,c.table_name) not in ( select owner, queue_table from all_queue_tables where owner = p_schema )
)
group by
table_name,
column_name
having min(existing_constraint) = 0
);
str0 varchar2(32767);
str1 varchar2(32767);
str2 varchar2(32767);
str3 varchar2(32767);
prev varchar2(100) := '*';
cnt number;
trailer varchar2(5);
procedure do_sql(thesql varchar2) is
tcursor integer;
dummy integer;
begin
-- dbms_output.put_line(thesql);
execute immediate thesql;
end;
begin
for i in x loop
if prev != i.table_name then
str0 := 'declare ';
str1 := 'begin select '; str2 := ' into ';
str3 := ' '; cnt := 1;
end if;
if cnt = i.colcount then
trailer := ' ';
else
trailer := ','||chr(10);
end if;
str0 := str0 || 'v'||ltrim(cnt)||' number;';
str1 := str1 || 'sum(decode('||i.column_name||',null,1,0))'||trailer;
str2 := str2 || 'v'||ltrim(cnt)||trailer;
str3 := str3 || 'if v'||ltrim(cnt)||' = 0 then '||
'dbms_output.put_line(''alter table '||p_schema||'.'||i.table_name||
' modify ('||i.column_name||' not null);''); end if;'||chr(10);
if cnt = i.colcount then
str2 := str2 ||' from '||p_schema||'.'||i.table_name||';';
str3 := str3 ||' end;';
do_sql(str0||' '||str1||' '||str2||' '||str3);
end if;
prev := i.table_name;
cnt := cnt + 1;
end loop;
end;
/
sho err
set serverout on
exec check_indexed_columns_for_null('SCOTT')
11.2 version
(Needs a direct CREATE TABLE grant to the owner of the procedure)
create or replace
procedure check_indexed_columns_for_null(
p_schema varchar2,
p_table_name varchar2 default null) is
cursor c_template is
select table_name, column_name, 0 colcount
from all_tab_columns;
type rowlist is table of c_template%rowtype;
r rowlist;
rc sys_refcursor;
str0 varchar2(32767);
str1 varchar2(32767);
str2 varchar2(32767);
str3 varchar2(32767);
prev varchar2(100) := '*';
cnt number;
trailer varchar2(5);
procedure do_sql(thesql varchar2) is
tcursor integer;
dummy integer;
begin
dbms_output.put_line(thesql);
execute immediate thesql;
end;
begin
begin
execute immediate 'drop table tmp$cons purge';
exception when others then null;
end;
execute immediate 'create table tmp$cons as select owner, table_name, constraint_name, constraint_type, to_lob(search_condition) search_condition_vc '||
'from all_constraints';
open rc for
q'{select
table_name,
column_name,
count(*) over ( partition by table_name ) as colcount
from
(
select
table_name,
column_name,
min(existing_constraint)
from
(
select
a.table_name,
a.column_name,
( select count(*)
from ( select owner, table_name, constraint_name, constraint_type, cast(search_condition_vc as varchar2(4000)) search_condition_vc
from tmp$cons ) x,
all_cons_columns cc
where x.owner = c.owner
and x.table_name = c.table_name
and cc.owner = x.owner
and cc.constraint_name = x.constraint_name
and
(
( x.constraint_type = 'C' and replace(search_condition_vc,'"') = a.column_name||' IS NOT NULL' )
or
( x.constraint_type = 'P' and cc.column_name = a.column_name )
)
) existing_constraint
from
all_ind_columns a,
all_tables c,
all_tab_columns ccol
where a.index_owner = :p_schema
and a.index_owner = :p_schema
and a.table_name = nvl(upper(:p_table_name),a.table_name)
and c.table_name = a.table_name
and c.owner = a.table_owner
and c.owner = ccol.owner
and c.table_name = ccol.table_name
and a.column_name = ccol.column_name
and c.secondary = 'N'
and c.temporary = 'N'
and c.nested = 'NO'
and (c.owner,c.table_name) not in ( select owner, table_name from all_external_tables where owner = :p_schema )
and ccol.data_type_owner is null
and ccol.data_type not in ('LONG','LONG RAW','CLOB','UROWID','UNDEFINED','NCLOB','BLOB','BFILE','ROWID')
and (c.owner,c.table_name) not in ( select owner, queue_table from all_queue_tables where owner = :p_schema )
)
group by
table_name,
column_name
having min(existing_constraint) = 0
)
}' using p_schema,p_schema,p_table_name,p_schema,p_schema;
fetch rc bulk collect into r;
close rc;
for i in 1 .. r.count loop
if prev != r(i).table_name then
str0 := 'declare ';
str1 := 'begin select '; str2 := ' into ';
str3 := ' '; cnt := 1;
end if;
if cnt = r(i).colcount then
trailer := ' ';
else
trailer := ','||chr(10);
end if;
str0 := str0 || 'v'||ltrim(cnt)||' number;';
str1 := str1 || 'sum(decode('||r(i).column_name||',null,1,0))'||trailer;
str2 := str2 || 'v'||ltrim(cnt)||trailer;
str3 := str3 || 'if v'||ltrim(cnt)||' = 0 then '||
'dbms_output.put_line(''alter table '||p_schema||'.'||r(i).table_name||
' modify ('||r(i).column_name||' not null);''); end if;'||chr(10);
if cnt = r(i).colcount then
str2 := str2 ||' from '||p_schema||'.'||r(i).table_name||';';
str3 := str3 ||' end;';
do_sql(str0||' '||str1||' '||str2||' '||str3);
end if;
prev := r(i).table_name;
cnt := cnt + 1;
end loop;
end;
/
If you want to see the underlying table scan queries that are being run, simply comment back in the “dbms_output.put_line” in the DO_SQL subroutine.
Enjoy!
Hey Connor,
I simply copy-pasted your procedure code and executed it in 11.2.0.4 and it wasn’t able to compile, throwing several errors:-
Error(49,12): PL/SQL: ORA-00904: “C”.”EXTERNAL”: invalid identifier
Error(79,14): PLS-00364: loop index variable ‘i’ use is invalid
Also, throws an error for “search_condition_vc” variable being undeclared.
I built it on 12.2. It’s time for you to upgrade ! 🙂
I’ll port it to 11g and post an addenda.
Hello Connor,
Your innermost query is checking for the presence of a CHECK constraint with a “col IS NOT NULL” condition.
As far as I know, there exists a difference between how the optimizer uses such a constraint
vs NOT NULL-ness of a column.
For example, a query containing a “col IS NULL” predicate will benefit from the presence of a NOT NULL
defined for the column, but NOT from the presence of a check constraint with “col IS NOT NULL”.
So, I wonder whether it is not advisable to alter a column to NOT NULL even it already does have such a
check constraint, or even replace that check constraint by making the column NOT NULL instead.
Second, regarding the example of the pagination query:
I think that if the table does have correct statistics in place, the optimizer still does have enough information
about the number of NULL vs NOT NULL values in the “object_id” column.
So, considering the “high” number of NON-NULL values and the “low” number of NULL-s (in fact, effectively 0) and also that the possible (few) NULL-s always sort last when using an ascending ORDER BY,
then for a pagination style query, at least in the case of a hardcoded limit like “ROWNUM <= 5",
the optimizer could still come up with a better execution plan, in the worst case by performing a "concatenation" of the two operations, the INDEX FULL SCAN and the TABLE ACCESS FULL, when the second operation would be effectively executed only if the first one did not return "enough rows" (here 5).
That is, a kind of "adapting execution plan" … maybe not yet implemented.
Cheers & Best Regards,
Iudith Mentzel