XMLTABLE and data “corruption”

Posted by

One of the cool things with the native SQL handling of complex datatypes is the ability to take something like XML, and with just a simple column mapping definition, get the data you want back in familiar rows and columns form, which typically is a better fit for all sorts of tools, applications and ultimately for digesting by the customers of those applications

Here’s a (very) trivial example of how we can extract data from an XML record using XMLTABLE


SQL> declare
  2    l_xml xmltype:= xmltype('<set><tag>abc</tag></set>');
  3
  4    cursor c_xmltab is
  5      select col1
  6      from xmltable('/set' passing l_xml
  7              columns col1 path 'tag' 
  8              ) x;
  9
 10    l_rows c_xmltab%rowtype;
 11  begin
 12    open c_xmltab;
 13    fetch c_xmltab into l_rows;
 14    dbms_output.put_line('l_rows.col1: '||l_rows.col1);
 15    close c_xmltab;
 16  end;
 17  /
l_rows.col1: abc

PL/SQL procedure successfully completed.

I’m using PL/SQL here but I could have just as easily used plain SQL or even wrapped that SQL in a CREATE VIEW statement in order to make the results appear as if we were dealing with a table and not XML at all.


SQL> create or replace view v_xmltab as
  2  select col1
  3    from xmltable('/set' passing xmltype('<set><tag>abc</tag></set>')
  4                  columns col1 path 'tag'
  5                  ) x;

View created.

SQL> select * from v_xmltab;

COL
---
abc

Notice in the XMLTABLE definition I simply specified that COL1 was from the path TAG. I didn’t nominate whether it was a string, a date or a number. So what does the database do with that? It opts to be as “generous” as possible, as we can see from the resulting view definition.


SQL> desc v_xmltab
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 COL1                                   VARCHAR2(32767)

32K might be fine for most situations, but if you had a query or view that was extracting dozens of columns, then having the database think that every query is going return rows that each will be of length in the hundreds of thousands of bytes is probably not such a great idea.

You can of course specify the datatype you want right there in the COLUMNS definition within XMLTABLE. Here’s the same code with the additional VARCHAR2(3) clause, and you can see that this is preserved all the way into the view definition.


SQL> declare
  2    l_xml xmltype:= xmltype('<set><tag>abc</tag></set>');
  3    cursor c_xmltab is
  4      select col1
  5      from xmltable('/set' passing l_xml
  6              columns col1 varchar2(3) path 'tag'
  7              ) x;
  8
  9    l_rows c_xmltab%rowtype;
 10
 11  begin
 12    open c_xmltab;
 13    fetch c_xmltab into l_rows;
 14    dbms_output.put_line('l_rows.col1: '||l_rows.col1);
 15    close c_xmltab;
 16  end;
 17  /
l_rows.col1: abc

PL/SQL procedure successfully completed.


SQL> create or replace view v_xmltab as
  2  select col1
  3    from xmltable('/set' passing xmltype('<set><tag>abc</tag></set>')
  4                  columns col1 varchar2(3) path 'tag'
  5                  ) x;

View created.

SQL> desc v_xmltab
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 COL1                                   VARCHAR2(3)

However, you need to be careful here when choosing the size of this datatype. Consider the following example, where I have left everything the same except that the XML content now consists of 5 characters, which I’m going to try map into a VARCHAR2(3). Normally whenever you exceed the size of a VARCHAR2, you would expect something like ORA-40478: output value too large, or a similar error to be emitted.


SQL> declare
  2    l_xml xmltype:= xmltype('<set><tag>abcde</tag></set>');
  3    cursor c_xmltab is
  4      select col1
  5      from xmltable('/set' passing l_xml
  6              columns col1 varchar2(3) path 'tag'
  7              ) x;
  8
  9    l_rows c_xmltab%rowtype;
 10
 11  begin
 12    open c_xmltab;
 13    fetch c_xmltab into l_rows;
 14    dbms_output.put_line('l_rows.col1: '||l_rows.col1);
 15    close c_xmltab;
 16  end;
 17  /
l_rows.col1: abc

PL/SQL procedure successfully completed.

But things are different with XMLTABLE. The database silently truncated the “abcde” down to “abc” in order for it to fit within the nominated VARCHAR2(3) specification. Similarly, the view definition also silently truncates the data.


SQL> create or replace view v_xmltab as
  2  select col1
  3    from xmltable('/set' passing xmltype('<set><tag>abcde</tag></set>')
  4                  columns col1 varchar2(3) path 'tag'
  5                  ) x;

View created.

SQL>
SQL> desc v_xmltab
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 COL1                                   VARCHAR2(3)

SQL> select * from v_xmltab;

COL
---
abc

Thus it is important that you if you are going to pick a datatype size, make sure it will be large enough to handle the largest size you would possibly expect. Just in case you’re thinking “This is behaviour we should never see in the database”, the XMLTABLE processing can be thought as being similar to the CAST function, namely, we are casting some XML data into an equivalent string datatype. And what does CAST do? Exactly the same.


SQL> select cast('12345' as varchar2(2)) from dual;

CA
--
12

You’ll be pleased to know that if you are no longer “old school cool” and have left XML behind in favour of JSON, then you have more options in this area. By default, you’ll see the same style of behaviour if you do not specify a datatype at all. Note that you’ll get a VARCHAR2(4000) even if your database has MAX_STRING_SIZE set to EXTENDED.


SQL> select col1
  2  from   ( select '{ "tag" : "abc" }' data from dual ),
  3         json_table(data, '$'
  4           columns (col1   path '$.tag'));

COL1
------------------------------------------------------------
abc

SQL>
SQL> select col1
  2  from   ( select '{ "tag" : "abc" }' data from dual ),
  3         json_table(data, '$'
  4           columns (col1   varchar2(3) path '$.tag'));

COL
---
abc

SQL> create or replace
  2  view v_jsontab as
  3  select col1
  4  from   ( select '{ "tag" : "abc" }' data from dual ),
  5         json_table(data, '$'
  6           columns (col1   path '$.tag'));

View created.

SQL>
SQL> desc v_jsontab
 Name                          Null?    Type
 ----------------------------- -------- ------------------
 COL1                                   VARCHAR2(4000)

But if you do pick a VARCHAR2 size which is too small for the JSON data you want to process, then rather than get a truncated value, you’ll get a null (indicating we could not successfully process the data), and you also have the choice of adding the “ERROR ON ERROR” clause to let you know that things did not proceed as planned.


SQL> select col1
  2  from   ( select '{ "tag" : "abcde" }' data from dual ),
  3         json_table(data, '$'
  4           columns (col1   varchar2(3) path '$.tag'));

COL
---


SQL>
SQL> select col1
  2  from   ( select '{ "tag" : "abcde" }' data from dual ),
  3         json_table(data, '$'
  4           columns (col1   varchar2(3) path '$.tag' error on error));
       json_table(data, '$'
       *
ERROR at line 3:
ORA-40478: output value too large (maximum: 3)

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.