Go back a few decades before XML and JSON and you could easily categorise data
into (roughly) two categories:

  • small data, like names, addresses, descriptions, numbers, dates, etc
  • big data, like documents, videos, images, etc

That was pretty much “it” when it came to storing data in your database. As such, in Oracle we had VARCHAR2, DATE, NUMBER etc for the small data, and CLOB, BLOB for the big stuff.

Anything that fell into the “big” category was typically orders of magnitude larger than the small stuff. Documents, images, videos are typically measured in megabytes, as opposed to bytes, and as such, the CLOB and BLOB datatypes in Oracle reflected this. LOBs were stored in multiple chunks, with an associated index segment (a LOB INDEX) to allow us to navigate through the LOB. When querying a table containing a LOB, it made more sense for the client issuing the query to be sent just a pointer to the LOB with the actual retrieval deferred to such point as the client explicitly said: “Yep, I need that particular LOB data

That logic still makes a lot of sense today when you have large LOBs. But of course, the world changed when XML came along, and now JSON. Suddenly it has become commonplace to have “large” objects, where the definition of large is no longer megabytes like days gone by. We now had a
“middle ground”, where we now had a lot of data in the kilobyte range. It was too big for our “small” datatypes, but too small to need all of the chunking and indexing of traditional LOBs.

For example, I’ll create a table of CLOB data which is just a source for some tables I’ll create later


SQL> create table data_source as
  2  with base as (
  3    select level as id, 'data ' || level as info
  4    connect by level <= 80
  5  ),
  6  one_clob as (
  7  select json_arrayagg(json_object(*) returning clob) as jdoc
  8  from base
  9  )
 10  select rownum pk, jdoc
 11  from one_clob
 12  connect by level <= 1000
 13  /

Table created.

And I’ll create a table with a CLOB in the standard way and populate it with the source data


SQL> create table t_reference_clob(
  2  pk int,
  3  jdoc clob
  4  )
  5  /

Table created.

SQL> insert into t_reference_clob select * from data_source;

1000 rows created.

Let’s now query that table in the same way a client program would. I’ve deliberately done this test in a database that is some 4000km from the client to help illustrate my point.


SQL> set autotrace traceonly stat
SQL> select * from t_reference_clob;

1000 rows selected.

Elapsed: 00:01:06.92

Statistics
----------------------------------------------------------
         18  recursive calls
         14  db block gets
       1051  consistent gets
       2000  physical reads
       2548  redo size
     491795  bytes sent via SQL*Net to client
     242494  bytes received via SQL*Net from client
       1012  SQL*Net roundtrips to/from client 
          1  sorts (memory)
          0  sorts (disk)
       1000  rows processed

1000 rows selected.



Notice the SQL*Net round trips. Because our client treats this CLOB data with a perspective of “This could be anything from a few bytes to gigabytes“, the client makes a trip to retrieve the CLOB data for each row returned, navigating down through the LOB structure to pick up all of the pieces. In this case, our LOB is small but the client program is unaware of this. Because my database is on the other side of the continent, all those round trips add up to give me over a minute elapsed time.

In 23ai onwards, we have enhanced LOBs to let you convey to the database how you would like the LOB data to be retrieved. If the data is small, it makes more sense to query that data like a standard string value.  This can now be done declaratively:


SQL> create table t_value_clob(
  2  pk int,
  3  jdoc clob
  4  )
  5  lob (jdoc) query as value;

Table created.

SQL>
SQL>
SQL> insert into t_value_clob select * from data_source;

1000 rows created.

Now I re-run my query – notice the improvement in round trips (and hence performance)


SQL> set autotrace traceonly stat
SQL> select * from t_value_clob;

1000 rows selected.

Elapsed: 00:00:02.92

Statistics
----------------------------------------------------------
         17  recursive calls
         14  db block gets
       1051  consistent gets
          0  physical reads
       2696  redo size
    4357795  bytes sent via SQL*Net to client
        530  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client 
          1  sorts (memory)
          0  sorts (disk)
       1000  rows processed

That is all well and good if all of the LOBs in your table are small. But what if there’s a mix? You can also decide on a query by query basis how you would the LOB data to be retrieved. Here’s my original table queried, but with the LOB_VALUE function


SQL> select pk, lob_value(jdoc)
  2  from  t_reference_clob;

1000 rows selected.


Statistics
----------------------------------------------------------
          9  recursive calls
          4  db block gets
         33  consistent gets
       1000  physical reads
        596  redo size
    4357829  bytes sent via SQL*Net to client
        530  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       1000  rows processed

If your data happens to be JSON, then you can also provide that instruction to the database as part of the JSON_SERIALIZE function


SQL> select pk, json_serialize(jdoc returning clob value)
  2  from  t_reference_clob;

1000 rows selected.


Statistics
----------------------------------------------------------
         11  recursive calls
          4  db block gets
         47  consistent gets
       1000  physical reads
        596  redo size
    4357829  bytes sent via SQL*Net to client
        530  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       1000  rows processed

As you might expect, you really see the benefits of this function if all LOBs in your result set can be queried in this way. Here’s a second example where I have two LOBs and I query them in various combinations with LOB_VALUE. Whilst you do still get some benefits using LOB_VALUE on one of the columns, the real value comes from reducing the total SQL*Net roundtrips as much as you can.


SQL> insert into t_reference_clob2 select pk, jdoc, jdoc from t_source_clob;

1000 rows created.

SQL> commit;

Commit complete.

SQL>
SQL>
SQL> set arraysize 100
SQL> set autotrace traceonly stat
SQL> set timing on
SQL> select jdoc1, lob_value(jdoc2) from t_reference_clob2;

1000 rows selected.

Elapsed: 00:01:02.98

Statistics
----------------------------------------------------------
         21  recursive calls
         17  db block gets
       1052  consistent gets
       3000  physical reads
       3020  redo size
    4838916  bytes sent via SQL*Net to client
     242403  bytes received via SQL*Net from client
       1012  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1000  rows processed

SQL> select * from t_reference_clob2;

1000 rows selected.

Elapsed: 00:02:06.45

Statistics
----------------------------------------------------------
         17  recursive calls
          8  db block gets
       2034  consistent gets
       4000  physical reads
       1112  redo size
     976870  bytes sent via SQL*Net to client
     484361  bytes received via SQL*Net from client
       2012  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       1000  rows processed
       
       
SQL> select lob_value(jdoc1), lob_value(jdoc2) from t_reference_clob2;

1000 rows selected.

Elapsed: 00:00:02.09

Statistics
----------------------------------------------------------
          9  recursive calls
          6  db block gets
         32  consistent gets
       2000  physical reads
       1060  redo size
    8704927  bytes sent via SQL*Net to client
        548  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1000  rows processed

SQL>


Shout out to Oracle ACE Anthony Harper, with whom discussions on the LOB_VALUE function inspired this post.

7 responses to “Turbo boost your LOBs and JSON”

  1. I had no idea this very real problem now has such an easy, declarative solution. Thanks not only for the information, but also for linking to Anthony Harper’s blog, which was a great discovery.

    Did you or could you look into the JSON datatype, assuming an up-to-date Oracle driver on the client side?

    1. I did … The JSON datatype looks like it takes care of this as well (I’ll add this an addenda in due course)

      SQL> create table t_json(
      2 pk int,
      3 jdoc json
      4 )
      5 /

      Table created.

      SQL>
      SQL> insert into t_json select * from data_source;

      1000 rows created.

      SQL> set autotrace traceonly stat
      SQL> select * from t_json;

      1000 rows selected.

      Statistics
      ----------------------------------------------------------
      23 recursive calls
      15 db block gets
      412 consistent gets
      0 physical reads
      2596 redo size
      1694286 bytes sent via SQL*Net to client
      1259 bytes received via SQL*Net from client
      68 SQL*Net roundtrips to/from client
      1 sorts (memory)
      0 sorts (disk)
      1000 rows processed

      SQL> truncate table t_json;

      Table truncated.

      SQL> insert into t_json
      2 select pk, '{ "x1" : '||jdoc||','||'"x2" : '||jdoc||','||'"x3" : '||jdoc||'}'
      3 from data_source;

      1000 rows created.

      SQL> set autotrace traceonly stat
      SQL> select * from t_json;

      1000 rows selected.

      Statistics
      ----------------------------------------------------------
      14 recursive calls
      1 db block gets
      1105 consistent gets
      0 physical reads
      184 redo size
      4908286 bytes sent via SQL*Net to client
      1259 bytes received via SQL*Net from client
      68 SQL*Net roundtrips to/from client
      1 sorts (memory)
      0 sorts (disk)
      1000 rows processed

      SQL>

  2. iudithd5bf8e4d8d Avatar
    iudithd5bf8e4d8d

    Hello All,
    We owe a lot to Anthony, each of his Devgym quizzes is a very good lesson, usually payed “at a high price” 🙂

    A few weeks ago we had a quiz on this topic, from where we learned that JSON_SERIALIZE will always return a CLOB as VALUE,
    even if you explicitly specify JSON_SERIALIZE( data RETURNING CLOB REFERENCE).

    In that example, “data” was defined as a “data clob check (data is json)” with “lob(data) query as reference”.

    As Anthony specifies in his explanation, this is the behavior as by now, though it looks a little inconsistent
    with both clauses REFERENCE / VALUE being specifiable in JSON_SERIALIZE.

    Cheers & Best Regards,
    Iudith Mentzel

  3. Connor,

    Thanks for the post, I like the autotrace approach as a metric to confirming use of a value lob.

    Unfortunately, on VirtualBox 23.9Free, autotrace fails with the following message even when the appropriate grants are in place:

    Unable to gather statistics please ensure user has correct access.The statistic feature requires that the user is granted select on v_$sesstat, v_$statname and v_$session.

    This issue prevents seeing the SQL*Net roundtrips in a VirtualBox installation. On OCI ADW the stats agree with what you are showing here.

    Interestingly, if you choose to serialize a reference clob as a reference clob:

    select json_serialize(jdoc returning clob reference) as jserial from t_reference_clob;

    It shows the same reduced round trips on OCI and the value lob hint appears in the explain plan on both OCI Autonomous and VirtualBox 23Free:

    SQL> column grantee format a8SQL> column table_name format a12SQL> set pagesize 100SQL>SQL> prompt show that user has correct grants for v_$session, v_$sesstat and v_$statnameshow that user has correct grants for v_$session, v_$sesstat and v_$statnameSQL> select grantee, table_name, privilege2 from all_tab_privs3 where grantee = user and grantor = ‘SYS’4 and table_name in (‘V_$SESSION’, ‘V_$SESSTAT’,’V_$STATNAME’)5 /

    GRANTEE TABLE_NAME PRIVILEGE

    DEVGYM V_$SESSION SELECTDEVGYM V_$SESSTAT SELECTDEVGYM V_$STATNAME SELECT

    SQL> set autotrace traceonly statAutotrace TraceOnlyExhibits the performance statistics with silent query outputSQL> set arraysize 100SQL>SQL> select json_serialize(jdoc returning clob reference) as jserval2 from t_reference_clob3 /

    1,000 rows selected. PLAN_TABLE_OUTPUT SQL_ID bzcsh9qmpv72m, child number 0

    select json_serialize(jdoc returning clob reference) as jserval fromt_reference_clob

    Plan hash value: 3600930306 | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |

    | 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 25 | | 1 | TABLE ACCESS FULL| T_REFERENCE_CLOB | 1 | 1000 | 1000 |00:00:00.01 | 25 | Note

    dynamic statistics used: dynamic sampling (level=2)

    Unable to gather statistics please ensure user has correct access.The statistic feature requires that the user is granted select on v_$sesstat, v_$statname and v_$session.SQL>SQL> set autotrace offAutotrace DisabledSQL>SQL> explain plan for2 select json_serialize(jdoc returning clob reference) as jserval3 from t_reference_clob4 /

    Explained.

    SQL>SQL> select * from dbms_xplan.display(format => ‘BASIC PROJECTION’)2 / PLAN_TABLE_OUTPUT

    Plan hash value: 3600930306 | Id | Operation | Name |

    | 0 | SELECT STATEMENT | | | 1 | TABLE ACCESS FULL| T_REFERENCE_CLOB | Column Projection Information (identified by operation id):

    1 – “JDOC” /*+ LOB_BY_VALUE */ [LOB,4000]

    13 rows selected.

    If I knew I had a large piece of json and explicitly used json_serialize to return a reference clob, I would not expect to see a value lob returned.

    Anthony Harper

  4. Rajeshwaran Jeyabal Avatar
    Rajeshwaran Jeyabal

    One other information, these JSON datatypes are created as VALUE BASED lobs by default.

    demo@ADB26AI> create table t ( x number, y json );

    Table created.

    demo@ADB26AI> select VALUE_BASED
    2 from user_lobs
    3 where table_name =’T’
    4 and column_name =’Y’; VAL

    YES

  5. Rajeshwaran Jeyabal Avatar
    Rajeshwaran Jeyabal

    Hi Connor,

    Could you please share the DDL for the table t_reference_clob2?
    Is it similar to t_reference_clob, but with two CLOB columns?

Leave a reply to iudithd5bf8e4d8d Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trending