Another day…another "use the right datatype" post

December 29, 2018

2–3 minutes

Here’s an interesting little oddity (aka bug) with scalar queries.

We’ll start with a simple working example


SQL> create table t1 ( c1 number );

Table created.

SQL> insert into t1 values (1);

1 row created.

SQL> create table t2 ( c1 int, c2 varchar2(10));

Table created.

SQL>
SQL> insert into t2 values(1,'t1');

1 row created.

SQL> insert into t2 values(1,'t01');

1 row created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','T1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('','T2')

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> select a.c1,
  2    ( select max(b.c2) from t2 b where a.c1 = b.c1 )
  3  from t1 a;

        C1 (SELECTMAX
---------- ----------
         1 t1

1 row selected.

That all seems straightforward:

We got the value 1 from T1.C1,
used that as an input to the query into T2
got the maximum of the 2 matching rows from table T2
return the result as a column outer query

Let us now repeat the example, but we’ll now make column C1 in table T2 a VARCHAR2. The query will remain unchanged, so now we are comparing a numeric “1” with a varchar2 value of “1”.


SQL> create table t1 ( c1 number );

Table created.

SQL> insert into t1 values (1);

1 row created.

SQL> create table t2 ( c1 varchar2(10), c2 varchar2(10));

Table created.

SQL>
SQL> insert into t2 values('1','t1');

1 row created.

SQL> insert into t2 values('01','t01');

1 row created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','T1')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('','T2')

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> select a.c1,
  2    ( select max(b.c2) from t2 b where a.c1 = b.c1 )
  3  from t1 a;

        C1 (SELECTMAX
---------- ----------
         1 t1
         1 t01

2 rows selected.

Whoops…that doesn’t look right.

Don’t get me wrong, that is an error in the way that we are processing the query, because we should not be getting 2 rows back from it. But it is another example of where you can encounter boundary cases (and hence bugs) when you stray from the well-trodden route. Something is obviously going awry during the implicit conversion, because if we explicitly take care of it, then things are fine.


SQL> select a.c1,
  2    ( select max(b.c2) from t2 b where a.c1 = to_number(b.c1) )
  3  from t1 a;

        C1 (SELECTMAX
---------- ----------
         1 t1

1 row selected.

I’d love it if there was an init.ora parameter that could be used in (say) development environments that would error out any SQL statement with an implicit data type conversion, but until that day comes (and it may be never!) then please make sure you take care with your data types!

Happy New Year everyone!

6 responses to “Another day…another "use the right datatype" post”

Iudith Mentzel

December 30, 2018 at 3:29 am

Hello Connor,

The same seems to happen if we use a LATERAL join:

This one correctly returns 2 rows:

select a.c1, v.m
from t1 a,
lateral ( select b.c2 m from t2 b where a.c1 = b.c1 ) v
/

C1 M
——————————-
1 t1
1 t01

2 rows selected.

But this one does the same !!!

select a.c1, v.m, v.c
from t1 a,
lateral ( select max(b.c2) m, count(*) c from t2 b where a.c1 = b.c1 ) v
/

C1 M
——————————-
1 t01
1 t1

2 rows selected.

However, if I add one more column to the LATERAL view, it works ok

select a.c1, v.m, v.c
from t1 a,
lateral ( select max(b.c2) m, count(*) c from t2 b where a.c1 = b.c1 ) v
/

C1 M C
—————-
1 t1 2

In the scalar query, if we use COUNT instead of MAX, it also works correctly

select a.c1,
( select count(b.c2) from t2 b where a.c1 = b.c1 )
from t1 a
/

Loading…

Reply
1. Iudith Mentzel
  
  December 30, 2018 at 3:32 am
  
  ( continued … )
  
  select a.c1,
  ( select count(b.c2) from t2 b where a.c1 = b.c1 )
  from t1 a
  /
  
  C1 (SELECTCOUNT(B.C2)FROMT2BWHEREA.C1=B.C1)
  1 2
  
  Very strange …
  
  I wish you a BRIGHT AND HAPPY NEW YEAR 2019
  and I’m looking forward to see you in January :):)
  
  Best Regards,
  Iudith
  
  Loading…
  
  Reply
jayt22

December 30, 2018 at 7:12 pm

This applies to any and all implicit type conversions as well of course. Different clients and databases can have different default NLS date formats, so if you compare a string date to a DATE type you can get an error or worse, wrong results.

Implicit conversions also used to be able to disable indexes…

Happy 2019!

Loading…

Reply
hourim

January 1, 2019 at 7:54 pm

Hello
“Implicit conversions also used to be able to disable indexes…”

This depends on the implicit data type conversion hierarchy

https://www.red-gate.com/simple-talk/sql/oracle/oracle-data-type-implicit-conversion-hierarchy/

Best regards
Mohamed Houri

Loading…

Reply
Racer I.

January 2, 2019 at 3:15 pm

Hi,

If you look at the explain plan there may be some hints :

Oracle first does a GROUP BY (presumably on t2.c1), which yields two rows (for ‘1’ and ’01’).
Then an outer join (with TO_NUMBER) which results in still 2 rows. No additional grouping.
Oracle seems to assume that first GROUP BY then TO_NUMBER (or any other conversion) yields
the same as first TO_NUMBER then GROUP BY which is not the case here.
If you use ‘1’/’1′ (or as I did at first 1/1 which converts to the same) you get only one row.
If you use an explizit TO_NUMBER only the access changes to

1 – access(“A”.”C1″=”ITEM_1″(+))

Which probably means that the GROUP BY will now be over TO_NUMBER(t2.c1) which yields only one row.

So Oracle probably transforms the queries like this :
{code}
select a.c1, b.c2 y — wrong, two rows
from t1 a
LEFT JOIN (select c1, MIN(c2) c2 from t2 GROUP BY c1) b ON (a.c1 = b.c1)

select a.c1, b.c2 y — ok, one row
from t1 a
LEFT JOIN (select TO_NUMBER(c1) c1, MIN(c2) c2 from t2 GROUP BY TO_NUMBER(c1)) b ON (a.c1 = b.c1)
{code}
—
{code}
—————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————————–
| 0 | SELECT STATEMENT | | 82 | 4674 | 5 (20)| 00:00:01 |
|* 1 | HASH JOIN OUTER | | 82 | 4674 | 5 (20)| 00:00:01 |
| 2 | TABLE ACCESS STORAGE FULL | T1 | 82 | 1066 | 2 (0)| 00:00:01 |
| 3 | VIEW | VW_SSQ_1 | 82 | 3608 | 3 (34)| 00:00:01 |
| 4 | HASH GROUP BY | | 82 | 3608 | 3 (34)| 00:00:01 |
| 5 | TABLE ACCESS STORAGE FULL| T2 | 82 | 3608 | 2 (0)| 00:00:01 |
—————————————————————————————–

Predicate Information (identified by operation id):
—————————————————

1 – access(“A”.”C1″=TO_NUMBER(“ITEM_1″(+)))
{code}
regards,

Racer I.

Loading…

Reply
Racer I.

January 2, 2019 at 3:34 pm

Hi,

The plan for COUNT looks quite different. More like a CROSS JOIN (hopefully only logically)
followed by filtering on the join (with TO_NUMBER so Ok).
{code}
———————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————–
| 0 | SELECT STATEMENT | | 82 | 1066 | 7 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 44 | | |
|* 2 | TABLE ACCESS STORAGE FULL| TX2 | 1 | 44 | 2 (0)| 00:00:01 |
| 3 | TABLE ACCESS STORAGE FULL | TX1 | 82 | 1066 | 2 (0)| 00:00:01 |
———————————————————————————–

Predicate Information (identified by operation id):
—————————————————

2 – filter(TO_NUMBER(“B”.”C1″)=:B1)

{code}
If you switch to TO_CHAR you get ‘1’ as COUNT :
{code}
select a.c1,
( select count(b.c2) from tx2 b where TO_CHAR(a.c1) = b.c1) y
from tx1 a;
{code}
regards,

Racer I.

Loading…

Reply