JSON and the optimizer

Going back as far as version 12 of the database, we’ve had some nifty JSON features to allow extraction, generation and manipulation of JSON documents directly within the database. That poses some interesting challenges to the optimizer when it comes to delving into JSON documents.

For example, consider the three simple queries below.


SQL> select 
  2      json_value(b,'$.db.items[*].tag1') c1
  3  from   t ;

C1
--------------
Oracle

SQL>
SQL> select 
  2      json_value(b,'$.db.items[*].tag1') c1,
  3      json_value(b,'$.db.list[*].tag2') c2
  4  from   t ;

C1             C2
-------------- --------------
Oracle         Database

SQL>
SQL> select 
  2      json_value(b,'$.db.items[*].tag1') c1,
  3      json_value(b,'$.db.list[*].tag2') c2,
  4      json_value(b,'$.db.elems[*].tag3') c3
  5  from   t ;

C1             C2             C3
-------------- -------------- --------------
Oracle         Database       23ai

Each query expression has to dig into a different part of a JSON document to dig out (in this case), the three values “Oracle”, “Database” and “23ai”.

In this case, that is not really a hardship because my JSON document is less than a kilobyte in size.


SQL> select dbms_lob.getlength(b) from t;

DBMS_LOB.GETLENGTH(B)
---------------------
                  785

But of course, JSON aims to be all things to all people, and thus a JSON document could be a kilobyte like above, but it could be hundreds of megabytes in size.

Here is the same example on a JSON document I’ve inflated out to 50 megabytes. Take note now of the timings for each execution.


SQL> set timing on
SQL> select 
  2      json_value(b,'$.db.items[*].tag1') c1
  3  from   t ;

C1
--------------
Oracle

Elapsed: 00:00:00.18
SQL>
SQL> select 
  2      json_value(b,'$.db.items[*].tag1') c1,
  3      json_value(b,'$.db.list[*].tag2') c2
  4  from   t ;

C1             C2
-------------- --------------
Oracle         Database

Elapsed: 00:00:00.37
SQL>
SQL> select 
  2      json_value(b,'$.db.items[*].tag1') c1,
  3      json_value(b,'$.db.list[*].tag2') c2,
  4      json_value(b,'$.db.elems[*].tag3') c3
  5  from   t ;

C1             C2             C3
-------------- -------------- --------------
Oracle         Database       23ai

Elapsed: 00:00:00.53

You can see the linear growth in execution time, as we go digging around in the JSON for the required keys.
It takes about 0.18seconds to find a key in my large JSON document, and thus “n” keys via “n” JSON_VALUE clauses takes “n” * 0.18seconds.

I could add an index onto the JSON to improve this, but the optimizer also has a cunning trick up its sleeve to assist here.

Let me run the queries again (seemingly unaltered) and look at the timings:


SQL> select 
  2      json_value(b,'$.db.items[*].tag1') c1
  3  from   t ;

C1
--------------
Oracle

Elapsed: 00:00:00.17
SQL>
SQL> select 
  2      json_value(b,'$.db.items[*].tag1') c1,
  3      json_value(b,'$.db.list[*].tag2') c2
  4  from   t ;

C1             C2
-------------- --------------
Oracle         Database

Elapsed: 00:00:00.16
SQL>
SQL> select 
  2      json_value(b,'$.db.items[*].tag1') c1,
  3      json_value(b,'$.db.list[*].tag2') c2,
  4      json_value(b,'$.db.elems[*].tag3') c3
  5  from   t ;

C1             C2             C3
-------------- -------------- --------------
Oracle         Database       23ai

Elapsed: 00:00:00.16

Suddenly, when we have 2 or more JSON_VALUE clauses, we are no longer getting any drop-off in performance – the big question is “How?”. A look at the execution plan reveals how the optimizer used a “cheat code”.


SQL> set autotrace on explain
SQL> select 
  2      json_value(b,'$.db.items[*].tag1') c1,
  3      json_value(b,'$.db.list[*].tag2') c2,
  4      json_value(b,'$.db.elems[*].tag3') c3
  5  from   t ;

C1             C2             C3
-------------- -------------- --------------
Oracle         Database       23ai

Elapsed: 00:00:00.07

Execution Plan
----------------------------------------------------------
Plan hash value: 1386969201

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |  8168 |  1906K|    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS          |      |  8168 |  1906K|    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL    | T    |     1 |   233 |     3   (0)| 00:00:01 |
|   3 |   JSONTABLE EVALUATION |      |       |       |            |          |
-------------------------------------------------------------------------------

There is a now “JSONTABLE EVALUATION” step in the plan, even though we never specified a JSON_TABLE clause in the query. A poke around in a 10053 trace reveals how to optimizer has transformed the original query to become the following:


select p.c_03$ c1,p.c_02$ c2, p.c_01$ c3 
from t t,
json_table( t.b, '

When the optimizer sees that we will be doing multiple JSON_VALUE expressions, it can transform the query to use a single JSON_TABLE so that the required JSON elements are extracted once, and then each requested value obtained without further effort. As far I can tell, it will do this without consideration to the size of the JSON. It appears to be a simple "is there more than one JSON_VALUE?" check, at which point we will flip over to the JSON_TRANSFORM. If you have particular scenarios where you know the JSON_VALUE evaluation will be very efficient (eg, only 2 JSON_VALUE expressions against small JSON documents), then it might be slightly more efficient to avoid the transformation. To do this, you can use the NO_JSON_TABLE_TRANSFORM hint. In fact, to demonstrate to the slow-down effect in the first demo above, that is the very hint I had to use. I just omitted it from the text above. 😀 This parameter is documented in the 23ai documentation, but works in 21c and 19c as well. columns( c_01$ varchar2(4000) path '$.db.elems[*].tag3' null on error , c_02$ varchar2(4000) path '$.db.list[*].tag2' null on error , c_03$ varchar2(4000) path '$.db.items[*].tag1' null on error , ) ) p

As far I can tell, it will do this without consideration to the size of the JSON. It appears to be a simple “is there more than one JSON_VALUE?” check, at which point we will flip over to the JSON_TRANSFORM. If you have particular scenarios where you know the JSON_VALUE evaluation will be very efficient (eg, only 2 JSON_VALUE expressions against small JSON documents), then it might be slightly more efficient to avoid the transformation. To do this, you can use the NO_JSON_TABLE_TRANSFORM hint.

In fact, to demonstrate to the slow-down effect in the first demo above, that is the very hint I had to use. I just omitted it from the text above. 😀

This parameter is documented in the 23ai documentation, but works in 21c and 19c as well.

Learning is not a spectator sport

JSON and the optimizer

Like this:

Leave a ReplyCancel reply

Trending

Bind Variables – Worse than you were told

Where is my TNSPING?

Generate INSERTs the new 26ai way

INSERT-VALUES – Bigger is NOT always Better!

JSON and the optimizer

Share this!

Like this:

Leave a ReplyCancel reply

Trending

Bind Variables – Worse than you were told

Where is my TNSPING?

Generate INSERTs the new 26ai way

INSERT-VALUES – Bigger is NOT always Better!

Discover more from Learning is not a spectator sport