Perhaps the most famous (or infamous) performance tuning problem you’ll find spanning decades of blog posts when it comes to Oracle and SQL is the “mystery” of why Oracle is not using an index when it is totally obvious to us as the developer that it should be. The demo code is always along the following lines
SQL> select *
2 from t
3 where rownum <= 10;
PK DATA
---------- ---------------------------------------------------------------
1 I_FILE#_BLOCK#
2 I_OBJ3
3 I_TS1
4 I_CON1
5 IND$
6 CDEF$
7 C_TS#
8 I_CCOL2
9 I_PROXY_DATA$
10 I_CDEF4
10 rows selected.
SQL>
SQL> select index_name
2 from user_indexes
3 where table_name = 'T';
INDEX_NAME
------------------------------
IX
SQL>
SQL> select column_name
2 from user_ind_columns
3 where index_name = 'IX';
COLUMN_NAME
------------------------------
PK
SQL>
SQL> set autotrace traceonly explain
SQL> select *
2 from t
3 where pk = 12;
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 38 | 152 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 38 | 152 (2)| 00:00:01 |
--------------------------------------------------------------------------
SQL> select /*+ index(t ix) */ *
2 from t
3 where pk = 12;
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 38 | 152 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 38 | 152 (2)| 00:00:01 |
--------------------------------------------------------------------------
It just looks like Oracle is broken doesn’t it? There is my primary key column, it is indexed (and rest assured I have not pulled any stunts like making the index unusable or invisible) and yet the database steadfastly refuses to use it, even when I force it with a hint.
Savvy readers will spot something in the output the gives the true cause. Notice that the output for the SELECT on the table shows that the PK column data is left justified. A DESCRIBE command gives the game away.
SQL> desc t
Name Null? Type
----------------------------- -------- --------------
PK VARCHAR2(40)
DATA VARCHAR2(128)
The PK column is a VARCHAR2 column even though it is holding numeric data. If I include the entire output from the execution plan, you can immediately see why an index was not used, because the database silently added a TO_NUMBER function around the PK column in order to align the datatypes.
SQL> select /*+ index(t ix) */ *
2 from t
3 where pk = 12;
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 38 | 152 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 1 | 38 | 152 (2)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(TO_NUMBER("PK")=12)
As I mentioned, this is a well known issue and many of us can now spot it immediately when it occurs.
But a customer approached me recently with a performance tuning issue that did not appear to be related to this (but as we’ll soon see, my assumption was incorrect). This time I’ll make it obvious that the PK is indeed a string by prefixing every value with “str”, and then I’ll re-run the same demo, this time with a bind variable
SQL> create table t
2 as select 'str'||to_char(rownum) pk, object_name data
3 from dba_objects;
Table created.
SQL>
SQL> create index ix on t ( pk );
Index created.
SQL>
SQL> select *
2 from t
3 where rownum <= 10;
PK DATA
---------- ----------------------------------------------------------------------------------
str1 I_FILE#_BLOCK#
str2 I_OBJ3
str3 I_TS1
str4 I_CON1
str5 IND$
str6 CDEF$
str7 C_TS#
str8 I_CCOL2
str9 I_PROXY_DATA$
str10 I_CDEF4
10 rows selected.
SQL>
SQL> select index_name
2 from user_indexes
3 where table_name = 'T';
INDEX_NAME
------------------------------
IX
SQL>
SQL> select column_name
2 from user_ind_columns
3 where index_name = 'IX';
COLUMN_NAME
------------------------------
PK
SQL>
SQL> set autotrace traceonly explain
SQL> exec :input := 'str123'
PL/SQL procedure successfully completed.
SQL> select *
2 from t
3 where pk = :input;
Execution Plan
----------------------------------------------------------
Plan hash value: 3947747388
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 41 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 41 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IX | 1 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
So far everything looks fine – the index is being used as expected.
But is it? What happens when I execute the query?
SQL> select *
2 from t
3 where pk = :input;
PK DATA
---------- ----------------------------------------------------------------
str123 OPQTYPE$
SQL> select * from dbms_xplan.display_cursor();
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
------------------------
SQL_ID 9nmz653d6u2wm, child number 0
-------------------------------------
select * from t where pk = :input
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 163 (100)| |
|* 1 | TABLE ACCESS FULL| T | 880 | 36080 | 163 (2)| 00:00:01 |
--------------------------------------------------------------------------
I have gone back to scanning the entire table, and yet it is obvious here that everything here is strings and no number datatypes to confuse the database.
Before I reveal the answer, because as the blog post title suggests, this is also a datatype conversion issue, I’ll digress slightly as a “cautionary tale” about tuning SQL. When the customer first approached me, they were requesting “assistance with stored outlines and Exadata smart scan”. They were encountered the full table scan as shown above and were trying apply a stored outline in order to force an index. Because that was not working, they were then asking me for information about how the optimized costed full scans under Exadata and whether that was impacting the decision and so forth.
So what did I do? I took a narrow view and spent a good couple of hours or so exploring potential issues with stored outlines and smart scan, and of course, there was absolutely nothing wrong with stored outlines and smart scan!
Let me reveal more of the demo code to show the true cause.
SQL> variable input nvarchar2(10)
SQL> exec :input := 'str123'
SQL> select * from dbms_xplan.display_cursor();
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------
SQL_ID 9nmz653d6u2wm, child number 0
-------------------------------------
select * from t where pk = :input
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 163 (100)| |
|* 1 | TABLE ACCESS FULL| T | 880 | 36080 | 163 (2)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(SYS_OP_C2C("PK")=:INPUT)
The bind variable I was using was not a VARCHAR2 but NVARCHAR2. That subtle difference means we have a datatype mismatch, just like the common NUMBER versus VARCHAR2 mismatch demo. Thus the database has to perform an conversion to bring them into alignment. In this case, the internal function SYS_OP_C2C is used to change the VARCHAR2 values in the PK column to an NVARCHAR2.
For this customer, they had not explicitly been using NVARCHAR2 so this was a revelation to them. It turned out that some of their driver software automatically bound variables as NVARCHAR2 when the database characterset was UTF8.
So there are two lessons to take away here:
- Even if all of the inputs to your SQL appear to be strings, that is still not a guarantee that you won’t get caught by the datatype conversion trap. Always check the predicates section of your SQL plan to ensure no silent conversions are taking place.
- When someone requests your help, don’t fall into the trap of focussing your attention on what they say is the problem. Always take that time to step back, and take a look at the root issue and work from there. Sometimes you might end up walking down some same paths that have already been taken, but at least your vision will not be clouded by assumptions (and/or errors) made by others.
For this customer, we temporarily created an index on the expression SYS_OP_C2C(their_column) which solved the performance issue, until such stage as they could roll out a new version of their code which correctly bound their variables to the right datatypes.
Thanks Connor, good to remember that !
However, I have a problem with lesson 1. you mention.
You are talking about “strings”, however, you don’t define it. What is a “string” ?
I don’t know what a “string” is in the context of Oracle-datatypes, a “string” simply does not exist there.
In Oracle, there exist datatypes like “CHAR”, “VARCHAR”, “VARCHAR2”, “NVARCHAR2” (and maybe others), which users and developers tend to think of “string”s.
But in reality, they are not strings, they are just CHAR, VARCHAR, VARCHAR2, …
This inaccuracy in our head in reality might be the root-cause for the issue you mention in my opinion.
java (and other programming languages) for example has a clear concept of “string”. Interesting things may and do also happen when fetching an Oracle-CHAR/VARCHAR/VARCHAR2/NVARCHAR2 into a java-string (or the other direction) – depending on the jdbc-driver you use and your local client settings (NLS or other).
I would like to not use the wording “string” at all when talking about Oracle-SQL-datatypes.
Very interesting, thanks for the info.
I was wondering, what is the best solution?
1. creating an index on the expression SYS_OP_C2C(their_column) or
2. creating a function-based index like this: create index my_indx on jbatista.t(to_nchar(PK));
I tried both ways and they worked…
SQL> create index my_indx on jbatista.t(to_nchar(PK));
Index created.
SQL> variable input nvarchar2(10)
SQL> exec :input := ‘str123’
PL/SQL procedure successfully completed.
SQL> select * from jbatista.t where pk = :input;
PK DATA
——- —————
str123 I_OPQTYPE1
SQL> SELECT * FROM TABLE(DBMS_XPLAN.display_cursor());
PLAN_TABLE_OUTPUT
——————————————————-
SQL_ID 98pv9dgkr0t4h, child number 0
——————————————————-
select * from jbatista.t where pk = :input
Plan hash value: 3355105303
———————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————————–
| 0 | SELECT STATEMENT | | | | 74 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 937 | 31858 | 74 (0)| 00:00:01 |
PLAN_TABLE_OUTPUT
———————————————————————————————–
|* 2 | INDEX RANGE SCAN | MY_INDX | 375 | | 1 (0)| 00:00:01 |
———————————————————————————————–
Predicate Information (identified by operation id):
—————————————————
2 – access(“T”.”SYS_NC00003$”=:INPUT)
Probably the latter is the better alternative because its in the documentation. Thanks for the input!
OK, thanks
One more question, is there a way to identify queries where Oracle change the SQL to address type mismatches? Trying to be proactive…
Regards,
jorge
Jorge,
you may start with
select
s.sql_id, s.sql_text, s.child_number,
t.plan_table_output
from
v$sql s,
table(dbms_xplan.display_cursor(s.sql_id, s.child_number)) t
where
t.plan_table_output like ‘%filter(_%(%’
and t.plan_table_output not like ‘%filter((%’
🙂