Go back a few decades before XML and JSON and you could easily categorise data
into (roughly) two categories:
- small data, like names, addresses, descriptions, numbers, dates, etc
- big data, like documents, videos, images, etc
That was pretty much “it” when it came to storing data in your database. As such, in Oracle we had VARCHAR2, DATE, NUMBER etc for the small data, and CLOB, BLOB for the big stuff.
Anything that fell into the “big” category was typically orders of magnitude larger than the small stuff. Documents, images, videos are typically measured in megabytes, as opposed to bytes, and as such, the CLOB and BLOB datatypes in Oracle reflected this. LOBs were stored in multiple chunks, with an associated index segment (a LOB INDEX) to allow us to navigate through the LOB. When querying a table containing a LOB, it made more sense for the client issuing the query to be sent just a pointer to the LOB with the actual retrieval deferred to such point as the client explicitly said: “Yep, I need that particular LOB data”
That logic still makes a lot of sense today when you have large LOBs. But of course, the world changed when XML came along, and now JSON. Suddenly it has become commonplace to have “large” objects, where the definition of large is no longer megabytes like days gone by. We now had a
“middle ground”, where we now had a lot of data in the kilobyte range. It was too big for our “small” datatypes, but too small to need all of the chunking and indexing of traditional LOBs.
For example, I’ll create a table of CLOB data which is just a source for some tables I’ll create later
SQL> create table data_source as
2 with base as (
3 select level as id, 'data ' || level as info
4 connect by level <= 80
5 ),
6 one_clob as (
7 select json_arrayagg(json_object(*) returning clob) as jdoc
8 from base
9 )
10 select rownum pk, jdoc
11 from one_clob
12 connect by level <= 1000
13 /
Table created.
And I’ll create a table with a CLOB in the standard way and populate it with the source data
SQL> create table t_reference_clob(
2 pk int,
3 jdoc clob
4 )
5 /
Table created.
SQL> insert into t_reference_clob select * from data_source;
1000 rows created.
Let’s now query that table in the same way a client program would. I’ve deliberately done this test in a database that is some 4000km from the client to help illustrate my point.
SQL> set autotrace traceonly stat
SQL> select * from t_reference_clob;
1000 rows selected.
Elapsed: 00:01:06.92
Statistics
----------------------------------------------------------
18 recursive calls
14 db block gets
1051 consistent gets
2000 physical reads
2548 redo size
491795 bytes sent via SQL*Net to client
242494 bytes received via SQL*Net from client
1012 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1000 rows processed
1000 rows selected.
Notice the SQL*Net round trips. Because our client treats this CLOB data with a perspective of “This could be anything from a few bytes to gigabytes“, the client makes a trip to retrieve the CLOB data for each row returned, navigating down through the LOB structure to pick up all of the pieces. In this case, our LOB is small but the client program is unaware of this. Because my database is on the other side of the continent, all those round trips add up to give me over a minute elapsed time.
In 23ai onwards, we have enhanced LOBs to let you convey to the database how you would like the LOB data to be retrieved. If the data is small, it makes more sense to query that data like a standard string value. This can now be done declaratively:
SQL> create table t_value_clob(
2 pk int,
3 jdoc clob
4 )
5 lob (jdoc) query as value;
Table created.
SQL>
SQL>
SQL> insert into t_value_clob select * from data_source;
1000 rows created.
Now I re-run my query – notice the improvement in round trips (and hence performance)
SQL> set autotrace traceonly stat
SQL> select * from t_value_clob;
1000 rows selected.
Elapsed: 00:00:02.92
Statistics
----------------------------------------------------------
17 recursive calls
14 db block gets
1051 consistent gets
0 physical reads
2696 redo size
4357795 bytes sent via SQL*Net to client
530 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1000 rows processed
That is all well and good if all of the LOBs in your table are small. But what if there’s a mix? You can also decide on a query by query basis how you would the LOB data to be retrieved. Here’s my original table queried, but with the LOB_VALUE function
SQL> select pk, lob_value(jdoc)
2 from t_reference_clob;
1000 rows selected.
Statistics
----------------------------------------------------------
9 recursive calls
4 db block gets
33 consistent gets
1000 physical reads
596 redo size
4357829 bytes sent via SQL*Net to client
530 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1000 rows processed
If your data happens to be JSON, then you can also provide that instruction to the database as part of the JSON_SERIALIZE function
SQL> select pk, json_serialize(jdoc returning clob value)
2 from t_reference_clob;
1000 rows selected.
Statistics
----------------------------------------------------------
11 recursive calls
4 db block gets
47 consistent gets
1000 physical reads
596 redo size
4357829 bytes sent via SQL*Net to client
530 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1000 rows processed
As you might expect, you really see the benefits of this function if all LOBs in your result set can be queried in this way. Here’s a second example where I have two LOBs and I query them in various combinations with LOB_VALUE. Whilst you do still get some benefits using LOB_VALUE on one of the columns, the real value comes from reducing the total SQL*Net roundtrips as much as you can.
SQL> insert into t_reference_clob2 select pk, jdoc, jdoc from t_source_clob;
1000 rows created.
SQL> commit;
Commit complete.
SQL>
SQL>
SQL> set arraysize 100
SQL> set autotrace traceonly stat
SQL> set timing on
SQL> select jdoc1, lob_value(jdoc2) from t_reference_clob2;
1000 rows selected.
Elapsed: 00:01:02.98
Statistics
----------------------------------------------------------
21 recursive calls
17 db block gets
1052 consistent gets
3000 physical reads
3020 redo size
4838916 bytes sent via SQL*Net to client
242403 bytes received via SQL*Net from client
1012 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000 rows processed
SQL> select * from t_reference_clob2;
1000 rows selected.
Elapsed: 00:02:06.45
Statistics
----------------------------------------------------------
17 recursive calls
8 db block gets
2034 consistent gets
4000 physical reads
1112 redo size
976870 bytes sent via SQL*Net to client
484361 bytes received via SQL*Net from client
2012 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1000 rows processed
SQL> select lob_value(jdoc1), lob_value(jdoc2) from t_reference_clob2;
1000 rows selected.
Elapsed: 00:00:02.09
Statistics
----------------------------------------------------------
9 recursive calls
6 db block gets
32 consistent gets
2000 physical reads
1060 redo size
8704927 bytes sent via SQL*Net to client
548 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000 rows processed
SQL>
Shout out to Oracle ACE Anthony Harper, with whom discussions on the LOB_VALUE function inspired this post.




Leave a reply to Anthony Harper Cancel reply