Literals and VARCHAR2

We all know that comparing two VARCHAR2 fields of different lengths will be treated by Oracle as being not EQUAL:


SQL> drop table T purge;

Table dropped.

SQL> create table T ( x varchar2(10), y varchar2(10));

Table created.

SQL> insert into T values ('abc','abc    ');

1 row created.

SQL> select * from T where x = y;

no rows selected

SQL> select * from T where y = 'abc';

no rows selected

But interestingly enough, if there are no references to tables and/or columns, that is, you are just using plain literals, VARCHAR2 is not the datatype that is in play. For example:


SQL> select * from dual where 'abc' = 'abc  ';

D
-
X

SQL> set serverout on
SQL> exec begin if 'abc' = 'abc  ' then dbms_output.put_line('YES'); end if; end;
YES

PL/SQL procedure successfully completed.



You can see this from the DUMP command, type=1 is VARCHAR2, and type=96 is CHAR.


SQL> select dump(x) from t;

DUMP(X)
-------------------------------
Typ=1 Len=3: 97,98,99


SQL> select dump('abc') from dual;
 
DUMP('ABC')
----------------------
Typ=96 Len=3: 97,98,99

NUMBER data type…what harm can it do ?

There’s a somewhat sour discussion going on based attached to the video at https://www.youtube.com/watch?v=jZW-uLb52xk

Whether you agree or disagree with the video or the comments, or (sadly) the animosity in them, it does lead to an interesting bit of investigation when it comes to data types with arbitrary precision, which was stumbled upon by a friend at work.

Let’s start with a simple comparison between two dates. In this case, I’ve just used times, because it still serves to demonstrate the anomaly.

SQL> SELECT (TO_DATE('14:03:15','hh24:mi:ss')- TO_DATE('14:02:25','hh24:mi:ss')) * 24 * 60 * 60 delta
  2  FROM   dual;

     DELTA
----------
        50

Thankfully, the database has come back with the right answer of 50 seconds. Of course, we might want to remember that result – so lets store it in a variable:

SQL> variable x number
SQL> begin
  2  SELECT (TO_DATE('14:03:15','hh24:mi:ss')- TO_DATE('14:02:25','hh24:mi:ss')) * 24 * 60 * 60
  3  into   :x
  4  FROM   dual;
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> print x

         X
----------
        50

So far so good…let’s now make the tiniest of changes. We’ll use a local PL/SQL variable

SQL> set serverout on
SQL> declare
  2    x number;
  3  begin
  4  SELECT (TO_DATE('14:03:15','hh24:mi:ss')- TO_DATE('14:02:25','hh24:mi:ss')) * 24 * 60 * 60
  5  into x
  6  FROM   dual;
  7  dbms_output.put_line(x);
  8  end;
  9  /
50.00000000000000000000000000000000000004

PL/SQL procedure successfully completed.

The use of arbitrary precision for ‘x’ shows how things can go a little bit awry. Similarly, lets look at what happens if convert the dates to simple ‘seconds past midnight’.

SQL> SELECT to_char(TO_DATE('14:03:15','hh24:mi:ss'),'SSSSS') d1,
  2         to_char(TO_DATE('14:02:25','hh24:mi:ss'),'SSSSS') d2
  3  FROM   dual;

D1    D2
----- -----
50595 50545

You might think that the calculation would the same, but when we insert those numbers into the equivalent calculation, we get a slightly different answer

SQL> exec dbms_output.put_line((50595/86400 - 50545/86400)* 24 * 60 * 60 );
49.99999999999999999999999999999999999968

Ultimately, this probably boils down to the fact that the certain division operations can never give a prefect answer in floating point arithmetic. In the example above, 50595/86400 yields the never ending result 0.58559027777777777777777…

Putting PL/SQL aside, there might also be hidden costs if you start using unbounded precision in your database tables. I always remember this example from Steve Adams many years ago.

SQL> create table T ( x1 number, x2 number(6,3));

Table created.

SQL> insert into T values ( 3*(1/3), 3*(1/3) );

1 row created.

SQL> commit;

Commit complete.

SQL> select * from T;

        X1         X2
---------- ----------
         1          1

No strange decimals there…but then take look at how you stored that data

SQL> select vsize(x1), vsize(x2) from T;

 VSIZE(X1)  VSIZE(X2)
---------- ----------
        21          2

Bottom line – things can go astray when you dont keep a handle on the appropriate precision to use for your data types.

Datatype discipline

(This is based on some ‘issues’ a client was having, so the example is architected to make a particular point, but is relevant nonetheless)

In the distributed database world, we often have different names for the same piece of information.

Consider two tables which hold customer information. In one table, they called the customer key “REF_NUM”, and in the other table “CUST_ID”.

That’s not ideal, but it’s not a big drama because at least the data types align.

        SQL> desc T1

                   Name                                        Null?    Type
                   ------------------------------------------- -------- ----------------------
                   REF_NUM                                     NOT NULL NUMBER(10)
                   T1_DATA                                              VARCHAR2(1000)

        SQL> desc T2

                   Name                                        Null?    Type
                   ------------------------------------------- -------- ----------------------
                   CUST_ID                                     NOT NULL NUMBER(10)
                   T2_DATA                                              VARCHAR2(1000)

When I do a join between these two tables, it costs me about 9400 I/O’s

        SQL> select *
          2  from  t1, t2
          3  where t1.ref_num = t2.cust_id
          4  and   t1.t1_data like '100%';

        Statistics
        ----------------------------------------------------------
                  0  recursive calls
                  1  db block gets
               9408  consistent gets
                  0  physical reads

Consider now the same two tables, with exactly the same data, but the data types are not aligned. In one system, they used a numeric, and in the other system they used a string

        SQL> desc T1

                   Name                                        Null?    Type
                   ------------------------------------------- -------- ----------------------
                   REF_NUM                                     NOT NULL NUMBER(10)
                   T1_DATA                                              VARCHAR2(1000)

        SQL> desc T2

                   Name                                        Null?    Type
                   ------------------------------------------- -------- ----------------------
                   CUST_ID                                     NOT NULL VARCHAR2(10)
                   T2_DATA                                              VARCHAR2(1000)

Now look what happens to my SQL

        SQL> select *
          2  from  t1, t2
          3  where t1.ref_num = t2.cust_id
          4  and   t1.t1_data like '100%';

        Statistics
        ----------------------------------------------------------
                  0  recursive calls
                  2  db block gets
              18428  consistent gets
                  0  physical reads

It costs DOUBLE the amount of I/O’s – it runs twice as slowly.

Moreover, it’s not just performance that is at risk…You’re whole application might just start to crash randomly

Using the same tables above (with the conflicting data types) here’s a SQL that works fine “today”…

        SQL> select count(*)
          2  from  t1, t2
          3  where t1.ref_num = t2.cust_id
          4  and   t2.t2_data like '100%';

          COUNT(*)
        ----------
                71

Then a pesky user comes along and does what users do…adds data to the one of the tables.

        SQL> insert into T2 values ('X123','100some data');

        1 row created.

And now let’s re-run that same query again

        SQL> select count(*)
          2  from  t1, t2
          3  where t1.ref_num = t2.cust_id
          4  and   t2.t2_data like '100%';

        ERROR: ORA-01722: invalid number

Ker-splat….your application is toast…

Not being disciplined with data type selection causes pain later.