The simplest things….can be risky

Java and Oracle expert Lukas Eder tweeted yesterday about a potential optimization that could be done when reviewing database SQL code.


This looks to be a logical thing to do.  Why scan the table T twice to perform an update, when the same job could be done in a single pass.  The benefits seem obvious:

  • less I/O work
  • less time the data is spent locked
  • less risk of an error between the two operations

so don’t get me wrong – the consolidation is going to be a good thing in the majority of cases

And therein lies the rub – the “majority” of cases is the not the same as “all” cases, and that is why I don’t think a tool should ever automatically perform this change. I’d be cool with a tool making a recommendation but let’s see why you cannot just assume that the consolidation is correct.

Here’s our table with a single row and single business rule implement with a check constraint

SQL> create table t ( a int, b int );

Table created.

SQL> alter table t add constraint chk check ( a < b ) ;

Table altered.

SQL> insert into t values (1,2);

1 row created.

SQL> commit;

Commit complete.

Now I’ll implement the application in the original “unoptimized” way:

SQL> update t set a = a + 1;
update t set a = a + 1
ERROR at line 1:
ORA-02290: check constraint (MCDONAC.CHK) violated

SQL> update t set b = b*2;

1 row updated.

You can see that the first update failed – it violated the check constraint. Of course, it is not definitively clear whether this should be the case based on the business requirements, because I haven’t elaborated on whether these two updates should be two transactions or a single transaction. The correctness is not really the point I’m trying to make here, but that if I now choose to consolidate the update, I end up with a different application behaviour.

I’ll roll back the change above, and repeat the experiment using the consolidate update:

SQL> roll;
Rollback complete.
SQL> update t set a = a + 1, b = b*2;

1 row updated.

This time the update completes successfully. If a tool had automatically done this, then I will get a different behaviour in my application. That might be a good might not be. I could eliminate the difference by implementing the constraint in a DEFERRED usage, but we’re starting to depart even further from the existing implementation of the application code, which means more scrutiny and more regression testing.

So by all means, explore opportunities to improve the performance of your SQL by re-arranging it, consolidating it, and aiming to get more done with less work. But be careful that you do not unknowingly change the way your application works when you do so.


Performing a large correlated update

We had a question on AskTom recently asking for the best way to update a new column on a very large table.  Four in-place update options are at our disposal:

  • Serial update SQL
  • Parallel DML update SQL
  • PL/SQL batching (as long as sensible restart-after-error logic is readily available)

all of which will achieve the required outcome but they have a significant redo and undo cost associated with them.  Also, depending on the size of the new column, such an update might create chaos with row migration, because whilst unlikely, there is definitely the potential for every row to grow beyond the available block space required to hold it in situ.

So a common approach to tackling a large “update” is to recast the problem to creating a new version (copy) of the table.  This can be done with a CREATE TABLE AS SELECT (CTAS).  Using this method we get the benefits of less resource consumption, but two issues can get in the way of this approach:

  1. It requires significant downtime whilst the new table is populated
  2. It requires careful planning to ensure all of the dependent objects (indexes, constraints, etc etc) are correctly recreated on the new table

But perhaps there is an alternative – it might not be 100% as fast as a pure CTAS, but what if it solves both the issues mentioned above ?  That alternative is DBMS_REDEFINITION

People often dismiss DBMS_REDEFINITION as purely for changing the structure of a table without changing the underlying data (for example, partitioning a table).  But they fail to take notice of the column mapping functionality that is available – and as long as our column mapping expressions are deterministic, then we can get more value out of the package, including a correlated update.

So here’s our example:

We will have a table called T1, which is a copy of DBA_OBJECTS and will be the table to which we will want a add a new column

SQL> create table t1  as
  2  select owner, object_name, subobject_name,
  3       object_id, data_object_id, object_type,
  4       created, last_ddl_time, timestamp, status,
  5       temporary, generated, secondary
  6    from all_objects
  7    where object_id is not null;

Table created.

SQL> alter table t1
  2    add constraint t1_pk
  3    primary key(object_id);

Table altered.

T2 is the table from which we will want to source the new data to apply to T1. In this case, T2 has a primary key of OBJECT_ID which can be used to source a new data element called FNAME. In our example, this is just owner and object name concatenated.

SQL> create table t2 as select object_id, owner||'.'||object_name fname
  2  from all_objects
  3    where object_id is not null;

Table created.

SQL> alter table t2
  2    add constraint t2_pk
  3    primary key(object_id);

Table altered.

So here is the requirement. T1 gets a new column called FULL_NAME. It must be populated via the FNAME column on T2 via the OBJECT_ID lookup. We can see that T1 starts with FULL_NAME being null, and we can see a sample of the values that should come from T2

SQL> alter table t1 add full_name varchar2(200);

Table altered.

SQL> select object_id, owner, full_name from t1
  2  where rownum < 10;

 OBJECT_ID OWNER                          FULL_NAME
---------- ------------------------------ --------------------------------------------------
        30 SYS
        47 SYS
        32 SYS
        49 SYS
        17 SYS
         2 SYS
        29 SYS
        45 SYS
        10 SYS

9 rows selected.

SQL> select object_id, fname from t2
  2  where rownum < 10;

---------- ---------------------------
        30 SYS.I_COBJ#
        47 SYS.I_USER2
        32 SYS.CCOL$
        49 SYS.I_COL2
        17 SYS.FILE$
         2 SYS.C_OBJ#
        29 SYS.C_COBJ#
        45 SYS.I_TS1
        10 SYS.C_USER#

We can’t do joins or subqueries in DBMS_REDEFINITION but we can do deterministic expressions. In our case, we’ll use a deterministic PL/SQL function to mimic the join

SQL> create or replace
  2  function get_full(p_id int )  return varchar2 deterministic is
  3    f varchar2(100);
  4  begin
  5    select  fname
  6    into f
  7    from t2
  8    where object_id = p_id;
 10    return f;
 11  end;
 12  /

Function created.

As per normal, we create an interim table to hold the converted data as we redefine the table.

SQL> create table t_interim (
  2   owner                         varchar2(128),
  3   object_name                   varchar2(128),
  4   subobject_name                varchar2(128),
  5   object_id                     number,
  6   data_object_id                number,
  7   object_type                   varchar2(23),
  8   created                       date,
  9   last_ddl_time                 date,
 10   timestamp                     varchar2(19),
 11   status                        varchar2(7),
 12   temporary                     varchar2(1),
 13   generated                     varchar2(1),
 14   secondary                     varchar2(1),
 15   full_name                     varchar2(200)
 16  );

Table created.

And now we are ready to use DBMS_REDEFINITION. The critical part here is the COL_MAPPING parameter. We are using the call to GET_FULL to populate the new column that will ultimately end up on T1.

SQL> declare
  2      l_colmap varchar(512);
  3    begin
  4      l_colmap :=
  5            'OWNER
  6            ,OBJECT_NAME
  7            ,SUBOBJECT_NAME
  8            ,OBJECT_ID
  9            ,DATA_OBJECT_ID
 10            ,OBJECT_TYPE
 11            ,CREATED
 12            ,LAST_DDL_TIME
 13            ,TIMESTAMP
 14            ,STATUS
 15            ,TEMPORARY
 16            ,GENERATED
 17            ,SECONDARY
 18            ,get_full(OBJECT_ID) FULL_NAME';
 20      dbms_redefinition.start_redef_table
 21      (  uname           => user,
 22         orig_table      => 'T1',
 23         int_table       => 'T_INTERIM',
 24         col_mapping   => l_colmap );
 25   end;
 26   /

PL/SQL procedure successfully completed.

And here is the nice part. By using DBMS_REDEFINITION, we dont need to worry about the dependencies – we can let COPY_TABLE_DEPENDENTS do the heavy lifting.

SQL> variable nerrors number
SQL> begin
  2    dbms_redefinition.copy_table_dependents
  3      ( user, 'T1', 'T_INTERIM',
  4        copy_indexes => dbms_redefinition.cons_orig_params,
  5        num_errors => :nerrors );
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> print nerrors


SQL> begin
  2    dbms_redefinition.finish_redef_table
  3      ( user, 'T1', 'T_INTERIM' );
  4  end;
  5  /

PL/SQL procedure successfully completed.

And we’re done ! A correlated update using DBMS_REDEFINITION.

SQL> select object_id, owner, full_name from t1
  2  where rownum < 10;

 OBJECT_ID OWNER                          FULL_NAME
---------- ------------------------------ --------------------------------------------------
        30 SYS                            SYS.I_COBJ#
        47 SYS                            SYS.I_USER2
        32 SYS                            SYS.CCOL$
        49 SYS                            SYS.I_COL2
        17 SYS                            SYS.FILE$
         2 SYS                            SYS.C_OBJ#
        29 SYS                            SYS.C_COBJ#
        45 SYS                            SYS.I_TS1
        10 SYS                            SYS.C_USER#

9 rows selected.


You might be thinking “But we’ll be slow because we’re calling that PL/SQL millions of times”.  Yes, this will be slower than a pure CTAS operation, but by using DBMS_REDEFINITION the process is online with nearly zero service disruption.  Hence the overall execution time is not nearly so critical anymore as a CTAS where applications are unavailable during the activity.