UPDATE SET ROW – Convenience Might Be Costly – Learning is not a spectator sport

A cool feature in PL/SQL is the ability to perform common DML operations without needing to reference the individual columns of a table. The ability to do a “SELECT *” into a ROWTYPE variable dates as far back as Oracle 7, but the benefits of that were limited because the moment the needed to issue an UPDATE or INSERT with that variable you were back to referencing each of the columns.

But with Oracle 9, PL/SQL was enhanced to allow ROWTYPE variables to be used directly in INSERT and UPDATE. The benefits here were in both convenience (less typing) and maintainability (if you dropped/added columns, no code changes would be needed in that part of the code).

The SQL engine does not truly support this concept – the mechanism via which PL/SQL achieved this was simply to exploit the data dictionary to derive what columns would be needed. For example, I’ll create a table called T based on some columns from DBA_OBJECTS


SQL> create table t_parent as select distinct rownum x, owner
  2        ,object_name
  3        ,subobject_name
  4        ,object_id
  5        ,data_object_id
  6        ,object_type
  7        ,created
  8        ,last_ddl_time
  9        ,rpad('x',32000) lobcol
 10  from dba_objects d;

Table created.

and then use a 10046 trace to check the DML issued when the SET ROW clause is used from an anonymous block.


SQL> alter session set events = '10046 trace name context forever, level 1'

Session altered.

SQL> declare
  2    r  t%rowtype;
  3    pk int;
  4  begin
  5    select * into r from t where x = 123;
  6    update t set row = r where x = pk;
  7  end;
  8  /

PL/SQL procedure successfully completed.

PARSING IN CURSOR #1918952408816 len=142 dep=0 uid=109 oct=47 lid=109 tim=424165845952 hv=4226578133 ad='7ffe2e15fb48' sqlid='2u5garrxysxqp'
declare
  r  t%rowtype;
  pk int;
begin
  select * into r from t where x = 123;
  update t set row = r where x = pk;
end;
END OF STMT
=====================

PARSING IN CURSOR #1918953162216 len=228 dep=1 uid=109 oct=6 lid=109 tim=424165847980 hv=2552771006 ad='7ffe164c2268' sqlid='cqak5jqc2hcdy'
UPDATE T 
SET "X" = :B1 ,
    "OWNER" = :B2 ,
    "OBJECT_NAME" = :B3 ,
    "SUBOBJECT_NAME" = :B4 ,
    "OBJECT_ID" = :B5 ,
    "DATA_OBJECT_ID" = :B6 ,
    "OBJECT_TYPE" = :B7 ,
    "CREATED" = :B8 ,
    "LAST_DDL_TIME" = :B9 ,
    "TIMESTAMP" = :B10 
WHERE X = :B1 
END OF STMT

You can that the SQL in the PL/SQL source code does not represent what actually got executed by the database engine. We simply expanded the column list out to construct a “normal” UPDATE statement. For some applications, that is going to be just fine. You get a programming convenience, plus a little bit of future proofing for table changes.

But…and there is always a but 😁. It is important to realise the trade-offs you could be making. Let me recreate my table, this time with a LOB column.


SQL> create table t_parent as select distinct rownum x, owner
  2        ,object_name
  3        ,subobject_name
  4        ,object_id
  5        ,data_object_id
  6        ,object_type
  7        ,created
  8        ,last_ddl_time
  9        ,rpad('x',32000) lobcol
 10  from dba_objects d;

Table created.

SQL> alter table t_parent add primary key (x);

Table altered.

The dictionary expansion does not distinguish between data types, and there are no optimizations to omit columns if they have not been altered by UPDATE etc. All columns are included, so lets build a little benchmark procedure to hammer some heavy updates at this table. My first version will code up the conventional UPDATE command and omit the LOB on the assumption that it is not being changed.


SQL> create or replace
  2  procedure smasher(s int) is
  3    r t_parent%rowtype;
  4    pk int;
  5    cnt int;
  6    s1 timestamp;
  7  begin
  8    dbms_random.seed(s);
  9    select count(*) into cnt from t_parent;
 10
 11    s1 := localtimestamp;
 12    for i in 1 .. 100000 loop
 13      pk := dbms_random.value(1,cnt);
 14      select * into r from t_parent where x = pk;
 15      update t_parent set
 16        owner = r.owner
 17        ,object_name = r.object_name
 18        ,subobject_name = r.subobject_name
 19        ,object_id = r.object_id
 20        ,data_object_id = r.data_object_id
 21        ,object_type = r.object_type
 22        ,created = r.created
 23        ,last_ddl_time = r.last_ddl_time
 24      where x = pk;
 25      commit;
 26    end loop;
 27    dbms_output.put_line(localtimestamp-s1);
 28  end;
 29  /

Procedure created.

SQL> set timing on
SQL> exec smasher(10)

PL/SQL procedure successfully completed.

Elapsed: 00:00:12.74

We took just under 13 seconds to run our 100,000 changes. Now, as a savvy developer 😁, I decide to simplify the code by converting the UPDATE to a simple SET ROW command.


SQL> create or replace
  2  procedure smasher(s int) is
  3    r t_parent%rowtype;
  4    pk int;
  5    cnt int;
  6    s1 timestamp;
  7  begin
  8    dbms_random.seed(s);
  9    select count(*) into cnt from t_parent;
 10
 11    s1 := localtimestamp;
 12    for i in 1 .. 100000 loop
 13      pk := dbms_random.value(1,cnt);
 14      select * into r from t_parent where x = pk;
 15      update t_parent set row = r where x = pk;
 16      commit;
 17    end loop;
 18    dbms_output.put_line(localtimestamp-s1);
 19  end;
 20  /

Procedure created.

SQL> set timing on
SQL> exec smasher(10)

PL/SQL procedure successfully completed.

Elapsed: 00:00:36.13

and boom! I’ve just been hit with a 300% performance penalty. Updating a complete LOB, especially one that is stored out of line, is more expensive than simply updating column values in place, and we pay that price even if we didn’t touch the LOB contents.

There is another scenario to be aware of, even if you are not using LOB columns. Take another look at the trace file, taking note of the columns that are updated:


PARSING IN CURSOR #1918953162216 len=228 dep=1 uid=109 oct=6 lid=109 tim=424165847980 hv=2552771006 ad='7ffe164c2268' sqlid='cqak5jqc2hcdy'
UPDATE T 
SET "X" = :B1 ,
    "OWNER" = :B2 ,
    "OBJECT_NAME" = :B3 ,
    "SUBOBJECT_NAME" = :B4 ,
    "OBJECT_ID" = :B5 ,
    "DATA_OBJECT_ID" = :B6 ,
    "OBJECT_TYPE" = :B7 ,
    "CREATED" = :B8 ,
    "LAST_DDL_TIME" = :B9 ,
    "TIMESTAMP" = :B10 
WHERE X = :B1 
END OF STMT

All columns are updated including the primary key, even though it will be just setting the value to itself. As we know from my podcast (shameless plug), if update a primary key in a table, then this can have locking implications for any child tables that refer back to this parent table.

Lets put that scenario in place. I’ll have a parent and child table, and also a semaphore table which is used solely to allow some cross-session coordination for my benchmark.


SQL> create table semaphore as select rownum seq from emp ;

Table created.

SQL> create table t_parent as select distinct rownum x, owner
  2        ,object_name
  3        ,subobject_name
  4        ,object_id
  5        ,data_object_id
  6        ,object_type
  7        ,created
  8        ,last_ddl_time
  9        ,timestamp from dba_objects d;

Table created.

SQL> create table t_child as select t_parent.* from t_parent, ( select 1 from dual connect by level <= 10);

Table created.

SQL> alter table t_parent add primary key (x);

Table altered.

SQL> alter table t_child add foreign key ( x) references t_parent (x );

Table altered.

Now I’m going to have 2 sessions concurrently smashing away UPDATE statements against the parent table with random values. First I’ll adjust my benchmark procedure to update the parent table using my conventional UPDATE DML which will not nominate the primary key column.


SQL> create or replace
  2  procedure smasher(s int) is
  3    r t_parent%rowtype;
  4    pk int;
  5    cnt int;
  6    s1 timestamp;
  7  begin
  8    dbms_random.seed(s);
  9    select count(*) into cnt from t_parent;
 10
 11    s1 := localtimestamp;
 12    for i in 1 .. 100000 loop
 13      pk := dbms_random.value(1,cnt);
 14      select * into r from t_parent where x = pk;
 15      update t_parent set
 16        owner = r.owner
 17        ,object_name = r.object_name
 18        ,subobject_name = r.subobject_name
 19        ,object_id = r.object_id
 20        ,data_object_id = r.data_object_id
 21        ,object_type = r.object_type
 22        ,created = r.created
 23        ,last_ddl_time = r.last_ddl_time
 24        ,timestamp     = r.timestamp
 25      where x = pk;
 26      commit;
 27    end loop;
 28    dbms_output.put_line(localtimestamp-s1);
 29  end;
 30  /

Procedure created.

And now I can use my semaphore table to let 2 sessions commence 100,000 updates at the same time.


Session 1
=========
SQL> lock table semaphore in exclusive mode;

Table(s) Locked.

Two more sessions sit waiting me to commit on session 1, and then they start their benchmark run. For brevity, you can assume I got sessions 2 and 3 queued up, and then I came back and committed here in session 1.


Session 2
=========
SQL> set serverout on
SQL> select * from semaphore where seq = 2 for update;

       SEQ
----------
         1

SQL> exec smasher(10)

+000000000 00:00:04.929000000

PL/SQL procedure successfully completed.


Session 3
=========
SQL> set serverout on
SQL> select * from semaphore where seq = 3 for update;

       SEQ
----------
         1

SQL> exec smasher(20)

+000000000 00:00:04.713000000

PL/SQL procedure successfully completed.

Once again, lets assume our developer has decided to simplify the code with SET ROW. Here is my amended benchmark routine


SQL> create or replace
  2  procedure smasher(s int) is
  3    r t_parent%rowtype;
  4    pk int;
  5    cnt int;
  6    s1 timestamp;
  7  begin
  8    dbms_random.seed(s);
  9    select count(*) into cnt from t_parent;
 10
 11    s1 := localtimestamp;
 12    for i in 1 .. 100000 loop
 13      pk := dbms_random.value(1,cnt);
 14      select * into r from t_parent where x = pk;
 15      update t_parent set row = r where x = pk;
 16      commit;
 17    end loop;
 18    dbms_output.put_line(localtimestamp-s1);
 19  end;
 20  /

Procedure created.

And now I’ll run the exact same set of updates in 2 sessions (using the same semaphore setup to control the start moment)


Session 2
=========
SQL> set serverout on
SQL> select * from semaphore where seq = 2 for update;

       SEQ
----------
         1

SQL> exec smasher(10)

+000000000 00:00:06.328000000

PL/SQL procedure successfully completed.


Session 3
=========
SQL> set serverout on
SQL> select * from semaphore where seq = 3 for update;

       SEQ
----------
         1

SQL> exec smasher(20)

+000000000 00:00:06.293000000

PL/SQL procedure successfully completed.

For that programming convenience, I’m taking about a 30% performance hit. This is because, as I update the primary key on the parent table, I’m taking a transient lock on the child table, because there is no foreign key index on the child. In environments where there is a higher amount of concurrent activity on the table, you can expect the degradation to be even higher.

“Easy fix!” I hear you say. “I’ll just add a foreign key index”. Lets go ahead and do that.


SQL> create index t_childix on t_child ( x );

Index created.


Session 2
=========
SQL> set serverout on
SQL> select * from semaphore where seq = 2 for update;

       SEQ
----------
         1

SQL> exec smasher(10)

+000000000 00:00:05.252000000

PL/SQL procedure successfully completed.


Session 3
=========
SQL> set serverout on
SQL> select * from semaphore where seq = 3 for update;

       SEQ
----------
         1

SQL> exec smasher(20)

+000000000 00:00:05.317000000

PL/SQL procedure successfully completed.

We have gained some of the performance back, but it is still slower than the conventional update because we are still touching the primary key index on the parent table when we did not have to. Also, we now have an index that we might not have needed solely to assist with a programming convenience.

Don’t get me wrong. I’m not dunking on the feature and telling you not to use it. But always be aware of the potential costs that might arise in certain scenarios. I’m logging an enhancement request so that we do not include the primary key in the target columns of the UPDATE, but until that comes to fruition, just take an informed look at each case you’re using it for before making it a company-wide standard.

Learning is not a spectator sport

UPDATE SET ROW – Convenience Might Be Costly

Like this:

Leave a ReplyCancel reply

Trending

Vanishing objects; hello ORA-08103

A very robust query

The secret APEX performance hack

Signal to noise ratio – the AWR trap