A common pattern I see in database code is the use of a PL/SQL package to hold constants to be used throughout the rest of the PL/SQL code base.
In most cases, there is nothing wrong with this practice, but once you start incorporating this model into your SQL statements (as opposed to just using them purely in the PL/SQL components). Because global package variables are not referencable in SQL, a common “workaround” is to create PL/SQL functions to expose those values to the SQL engine.
That is where you might hit a few unexpected things with the optimizer, which we can see from the example below
First I’ll create a table and I’ve added a fairly “broad” index, that is, one that covers many columns with the aim of allowing certain queries to satisfy much of their data discovery solely via the index.
SQL> create table t as
2 select * from dba_objects;
Table created.
SQL>
SQL> create index ix on t ( owner, object_name, created, object_type, object_id, status );
Index created.
Now I’ll take a query which while not being able to drill directly into the index on all of its predicates, it can at least the index as a filtering mechanism.
SQL> explain plan for
2 select t.*, rowid
3 from t
4 where owner = :b1
5 and object_name = :b2
6 and (
7 nvl(object_type,:b8) = :b9
8 or
9 ( object_type = :b3 and object_id > to_number(:b4) )
10 or
11 ( status = :b5 and object_type = :b6 )
12 or
13 ( status in (:b10,:b11) and created = nvl(:b12,sysdate) )
14 ) ;
Explained.
SQL>
SQL> select * from dbms_xplan.display();
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3947747388
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 131 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 131 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IX | 1 | | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("OWNER"=:B1 AND "OBJECT_NAME"=:B2)
filter("OBJECT_TYPE"=:B6 AND "STATUS"=:B5 OR NVL("OBJECT_TYPE",:B8)=:B9 OR
"OBJECT_TYPE"=:B3 AND "OBJECT_ID">TO_NUMBER(:B4) OR "CREATED"=NVL(:B12,SYSDATE@!)
AND ("STATUS"=:B10 OR "STATUS"=:B11))
17 rows selected.
You can see that we will be able to navigate directly to the relevant OWNER,OBJECT_NAME elements in the index because they are the leading columns. This is the ACCESS part of the execution plan. But also, although the other predicates where are mixture of OR/AND/IN, we could “remain” in the index to satisfy the rest of those predicates. This is the FILTER part of the plan, and both ACCESS and FILTER occurred on line 2 of the plan whilst we are scanning the index.
But now, lets assume that the use of SYSDATE is to be abstraced away with a function call. Maybe we want to be able to override it to a static value or adjust it based on client timezones etc. My new function THE_DATE will just return SYSDATE so logically it is identical to the original query
SQL> create or replace
2 function the_date return date is
3 begin
4 return sysdate;
5 end;
6 /
Function created.
Now lets take a look at resultant execution plan
SQL> explain plan for
2 select t.*, rowid
3 from t
4 where owner = :b1
5 and object_name = :b2
6 and (
7 nvl(object_type,:b8) = :b9
8 or
9 ( object_type = :b3 and object_id > to_number(:b4) )
10 or
11 ( status = :b5 and object_type = :b6 )
12 or
13 ( status in (:b10,:b11) and created = nvl(:b12,the_date) )
14 ) ;
Explained.
SQL> select * from dbms_xplan.display();
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3947747388
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 131 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 131 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IX | 1 | | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OBJECT_TYPE"=:B6 AND "STATUS"=:B5 OR NVL("OBJECT_TYPE",:B8)=:B9 OR
"OBJECT_TYPE"=:B3 AND "OBJECT_ID">TO_NUMBER(:B4) OR
"CREATED"=NVL(:B12,"THE_DATE"()) AND ("STATUS"=:B10 OR "STATUS"=:B11))
2 - access("OWNER"=:B1 AND "OBJECT_NAME"=:B2)
17 rows selected.
The FILTER and ACCESS information looks the same, but check where the operations will occur. Now only the OWNER, OBJECT_NAME access can be done on the index (line 2), but then we are scanning the table (line 1) before we run the FILTER operation. Depending on the distribution of the data, this could mean a lot more I/O for this query.
This limitation is most likely because the optimizer does not know what is inside that function. It does not know what it might be doing – there is nothiing to indicate that it is returning a static value.
Thus a quick fix would be get the value from the function, store it in a local variable, and that use that variable in the query. In this case, that is also more accurate because in this particular example, each call to a function that returns SYSDATE could return a slightly different value during the execution of the query.
If that’s not possible, then you can tweak the query to let the optimizer know we’re just getting one fixed value, which returns the plan back to utilizing the index for all of the predicate evaluations.
SQL> explain plan for
2 with qq as ( select /*+ materialize */ the_date xx from dual )
3 select t.*, t.rowid
4 from t, qq
5 where owner = :b1
6 and object_name = :b2
7 and (
8 nvl(object_type,:b8) = :b9
9 or
10 ( object_type = :b3 and object_id > to_number(:b4) )
11 or
12 ( status = :b5 and object_type = :b6 )
13 or
14 ( status in (:b10,:b11) and created = nvl(:b12,xx) )
15 ) ;
Explained.
SQL>
SQL> select * from dbms_xplan.display();
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------
Plan hash value: 1909715515
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 140 | 7 (0)| 00:00:01 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6909_56ECC793 | | | | |
| 3 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
| 4 | NESTED LOOPS | | 1 | 140 | 5 (0)| 00:00:01 |
| 5 | NESTED LOOPS | | 1 | 140 | 5 (0)| 00:00:01 |
| 6 | VIEW | | 1 | 9 | 2 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6909_56ECC793 | 1 | 9 | 2 (0)| 00:00:01 |
|* 8 | INDEX RANGE SCAN | IX | 1 | | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS BY INDEX ROWID | T | 1 | 131 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
8 - access("OWNER"=:B1 AND "OBJECT_NAME"=:B2)
filter(NVL("OBJECT_TYPE",:B8)=:B9 OR "OBJECT_TYPE"=:B3 AND "OBJECT_ID">TO_NUMBER(:B4) OR
"OBJECT_TYPE"=:B6 AND "STATUS"=:B5 OR ("STATUS"=:B10 OR "STATUS"=:B11) AND "CREATED"=NVL(:B12,"XX"))
23 rows selected.
TL;DR: If you are going to use PL/SQL functions to return static values to use in SQL queries, the most common example of this being exposing PL/SQL package constants to the SQL engine, then consider evaluation those functions once to store the result in a local variable, and then use that variable as a standard bind variable in your SQL. Happy optimizer = happy customers.
What happens if the function is deterministic?
Hello Connor, All,
I asked myself the same question and tested this, and neither the DETERMINISTIC nor the RESULT_CACHE
declarations do change the optimizer behavior, though, they would probably slightly improve performance.
Declaring a function without arguments as DETERMINISTIC should be sufficient for the optimizer
to consider it the same as a constant.
Or, maybe even a separate new declaration of a function as CONSTANT could be implemented
so that to be recognized as such by the optimizer.
In fact, I don’t exactly understand the reason for the restriction that prevents SQL to access package variables.
Cheers & Best Regards,
Iudith Mentzel