(This is based on some ‘issues’ a client was having, so the example is architected to make a particular point, but is relevant nonetheless)
In the distributed database world, we often have different names for the same piece of information.
Consider two tables which hold customer information. In one table, they called the customer key “REF_NUM”, and in the other table “CUST_ID”.
That’s not ideal, but it’s not a big drama because at least the data types align.
SQL> desc T1
Name Null? Type
------------------------------------------- -------- ----------------------
REF_NUM NOT NULL NUMBER(10)
T1_DATA VARCHAR2(1000)
SQL> desc T2
Name Null? Type
------------------------------------------- -------- ----------------------
CUST_ID NOT NULL NUMBER(10)
T2_DATA VARCHAR2(1000)
When I do a join between these two tables, it costs me about 9400 I/O’s
SQL> select *
2 from t1, t2
3 where t1.ref_num = t2.cust_id
4 and t1.t1_data like '100%';
Statistics
----------------------------------------------------------
0 recursive calls
1 db block gets
9408 consistent gets
0 physical reads
Consider now the same two tables, with exactly the same data, but the data types are not aligned. In one system, they used a numeric, and in the other system they used a string
SQL> desc T1
Name Null? Type
------------------------------------------- -------- ----------------------
REF_NUM NOT NULL NUMBER(10)
T1_DATA VARCHAR2(1000)
SQL> desc T2
Name Null? Type
------------------------------------------- -------- ----------------------
CUST_ID NOT NULL VARCHAR2(10)
T2_DATA VARCHAR2(1000)
Now look what happens to my SQL
SQL> select *
2 from t1, t2
3 where t1.ref_num = t2.cust_id
4 and t1.t1_data like '100%';
Statistics
----------------------------------------------------------
0 recursive calls
2 db block gets
18428 consistent gets
0 physical reads
It costs DOUBLE the amount of I/O’s – it runs twice as slowly.
Moreover, it’s not just performance that is at risk…You’re whole application might just start to crash randomly
Using the same tables above (with the conflicting data types) here’s a SQL that works fine “today”…
SQL> select count(*)
2 from t1, t2
3 where t1.ref_num = t2.cust_id
4 and t2.t2_data like '100%';
COUNT(*)
----------
71
Then a pesky user comes along and does what users do…adds data to the one of the tables.
SQL> insert into T2 values ('X123','100some data');
1 row created.
And now let’s re-run that same query again
SQL> select count(*)
2 from t1, t2
3 where t1.ref_num = t2.cust_id
4 and t2.t2_data like '100%';
ERROR: ORA-01722: invalid number
Ker-splat….your application is toast…
Not being disciplined with data type selection causes pain later.
Connor, you still thinking procedurally!
Abstract the data type and store everything as a LOB.
Pronto! No more data conversion issues.
Now all you need is a “scalable” application server, clustered across 10000 nodes, to dynamically convert and interpret the data as needed so it can produce whatever results one might want.
Simple, and always the correct result no matter what data is input!
/sarcasm
ker-splat!