In a perfect world, the optimizer would reach out from the server room, say to us: “Hey, lets grab a coffee and have a chat about that query of yours”. Because ultimately, that is the task we are bestowing on the optimizer – to know what our intent was in terms of running a query in a way that meets the performance needs of our applications. It generally does a pretty good job even without the coffee , but if we can keep that caffeine hit in mind, we can do our bit as SQL developers to give the optimizer as much assistance as we can.
Here’s such an example. Let’s assume users of your application can perform searches for street addresses. They nominate what kind of search they want to do (street name or suburb), and then provide a value to search on.
Here’s some base tables to support the application.
SQL> create table address ( street int, suburb int, data char(100));
Table created.
SQL> insert into address
2 select mod(rownum,1000), mod(rownum,10), rownum from dual connect by level
SQL> commit;
Commit complete.
SQL>
SQL> exec dbms_stats.gather_table_stats('','ADDRESS');
PL/SQL procedure successfully completed.
SQL> create index ix1 on address ( street );
Index created.
SQL> create index ix2 on address ( suburb );
Index created.
To run that search from the application, probably the most straightforward SQL that handles the requirement is:
SQL> select data
2 from address
3 where ( :choice = 1 and street = :val )
4 or ( :choice = 2 and suburb = :val );
on the assumption here that the application sends “1” for street search type, and “2” for a suburb search.
Let’s give that a run for a nominated street and see what the optimizer makes of this SQL.
SQL> variable choice number = 1
SQL> variable val number = 6
SQL> set feedback only
SQL> select data
2 from address
3 where ( :choice = 1 and street = :val )
4 or ( :choice = 2 and suburb = :val );
100 rows selected.
SQL> set feedback on
SQL> select * from dbms_xplan.display_cursor();
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID cwwcr79bfx5zz, child number 0
-------------------------------------
select data from address where ( :choice = 1 and street = :val ) or
( :choice = 2 and suburb = :val )
Plan hash value: 3645838471
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 445 (100)| |
|* 1 | TABLE ACCESS FULL| ADDRESS | 101 | 10908 | 445 (1)| 00:00:01 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(((:CHOICE=2 AND "SUBURB"=:VAL) OR ("STREET"=:VAL AND
:CHOICE=1)))
Notice that even though we have indexes on both STREET and SUBURB it opted for a full table scan. The driving metrics behind this decision is that the selectivity on the SUBURB index is very poor (only 10 distinct values). Of course, this means that users who perform searches on STREET are also going to be punished with this table scan.
In an ideal world, the optimizer would be clever enough to dive into the SQL, notice that the same bind variable is being used in each predicate, and that the values for that bind variable are mutually exclusive, and thus know that only one of the OR conditions can ever be true and optimize the query with that knowledge. But as I mentioned, we never get that chance to “grab a coffee” with the optimizer.
However, as developers, we can assist by restructuring our query to give that little bit more information to the optimizer. I’ve rewritten the query as as UNION ALL query to separate both predicates into their own SQL.
SQL> set feedback only
SQL> select data
2 from address
3 where ( :choice = 1 and street = :val )
4 union all
5 select data
6 from address
7 where ( :choice = 2 and suburb = :val );
100 rows selected.
SQL> set feedback on
SQL>
SQL> select * from dbms_xplan.display_cursor();
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 3x1nc068ntchg, child number 0
-------------------------------------
select data from address where ( :choice = 1 and street = :val )
union all select data from address where ( :choice = 2 and suburb
= :val )
Plan hash value: 1995695946
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 545 (100)| |
| 1 | UNION-ALL | | | | | |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS | 100 | 10500 | 101 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IX1 | 100 | | 1 (0)| 00:00:01 |
|* 5 | FILTER | | | | | |
|* 6 | TABLE ACCESS FULL | ADDRESS | 10000 | 1015K| 444 (1)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:CHOICE=1)
4 - access("STREET"=:VAL)
5 - filter(:CHOICE=2)
6 - filter("SUBURB"=:VAL)
After running that SQL, the plan might look worse, but the key elements here are line 2 and 5. Notice that these are FILTER steps which means that if they evaluate to false, then the underlying steps will not be run. In effect, the optimizer now knows that it will only need to run one “half” of the UNION ALL based on the outcome of the two FILTER steps.
That all might sound good in theory but we need to prove this hypothesis. I’ll run the first query and check the statistics:
SQL> set autotrace traceonly stat
SQL> select data
2 from address
3 where ( :choice = 1 and street = :val )
4 or ( :choice = 2 and suburb = :val );
100 rows selected.
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1578 consistent gets
0 physical reads
0 redo size
11184 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100 rows processed
and now compare that to the second query.
SQL> select data
2 from address
3 where ( :choice = 1 and street = :val )
4 union all
5 select data
6 from address
7 where ( :choice = 2 and suburb = :val );
100 rows selected.
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
103 consistent gets
0 physical reads
0 redo size
11184 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100 rows processed
Nice stuff. Note that this does not stop the full table scan when searching for SUBURB, but what has been achieved that we are getting much better search performance for STREET lookups.
Using the GATHER_PLAN_STATISTICS hint, it is also possible to see the benefit of the FILTER steps by looking at the Starts/A-Rows columns.
SQL> select /*+ gather_plan_statistics */ data
2 from address
3 where ( :choice = 1 and street = :val )
4 union all
5 select data
6 from address
7 where ( :choice = 2 and suburb = :val );
100 rows selected.
SQL> set lines 200
SQL> set feedback on
SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------
SQL_ID ch0h5t76r7d2m, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ data from address where (
:choice = 1 and street = :val ) union all select data from address
where ( :choice = 2 and suburb = :val )
Plan hash value: 1995695946
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 100 |00:00:00.01 | 103 |
| 1 | UNION-ALL | | 1 | | 100 |00:00:00.01 | 103 |
|* 2 | FILTER | | 1 | | 100 |00:00:00.01 | 103 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| ADDRESS | 1 | 100 | 100 |00:00:00.01 | 103 |
|* 4 | INDEX RANGE SCAN | IX1 | 1 | 100 | 100 |00:00:00.01 | 3 |
|* 5 | FILTER | | 1 | | 0 |00:00:00.01 | 0 |
|* 6 | TABLE ACCESS FULL | ADDRESS | 0 | 10000 | 0 |00:00:00.01 | 0 |
-----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:CHOICE=1)
4 - access("STREET"=:VAL)
5 - filter(:CHOICE=2)
6 - filter("SUBURB"=:VAL)
Notice that we never commenced (started) the full scan for the SUBURB lookup, because the overarching FILTER step returned nothing.
So when you discover that you know something about a query that the optimizer does not, look for ways pass that information onto the optimizer so that it (and your users) benefit from the knowledge you have as a developer of your applications.
One comment