We had a question on AskTom recently asking for the best way to update a new column on a very large table. Four in-place update options are at our disposal:
- Serial update SQL
- Parallel DML update SQL
- DBMS_PARALLEL_EXECUTE
- PL/SQL batching (as long as sensible restart-after-error logic is readily available)
all of which will achieve the required outcome but they have a significant redo and undo cost associated with them. Also, depending on the size of the new column, such an update might create chaos with row migration, because whilst unlikely, there is definitely the potential for every row to grow beyond the available block space required to hold it in situ.
So a common approach to tackling a large “update” is to recast the problem to creating a new version (copy) of the table. This can be done with a CREATE TABLE AS SELECT (CTAS). Using this method we get the benefits of less resource consumption, but two issues can get in the way of this approach:
- It requires significant downtime whilst the new table is populated
- It requires careful planning to ensure all of the dependent objects (indexes, constraints, etc etc) are correctly recreated on the new table
But perhaps there is an alternative – it might not be 100% as fast as a pure CTAS, but what if it solves both the issues mentioned above ? That alternative is DBMS_REDEFINITION
People often dismiss DBMS_REDEFINITION as purely for changing the structure of a table without changing the underlying data (for example, partitioning a table). But they fail to take notice of the column mapping functionality that is available – and as long as our column mapping expressions are deterministic, then we can get more value out of the package, including a correlated update.
So here’s our example:
We will have a table called T1, which is a copy of DBA_OBJECTS and will be the table to which we will want a add a new column
SQL>
SQL> create table t1 as
2 select owner, object_name, subobject_name,
3 object_id, data_object_id, object_type,
4 created, last_ddl_time, timestamp, status,
5 temporary, generated, secondary
6 from all_objects
7 where object_id is not null;
Table created.
SQL>
SQL> alter table t1
2 add constraint t1_pk
3 primary key(object_id);
Table altered.
T2 is the table from which we will want to source the new data to apply to T1. In this case, T2 has a primary key of OBJECT_ID which can be used to source a new data element called FNAME. In our example, this is just owner and object name concatenated.
SQL>
SQL> create table t2 as select object_id, owner||'.'||object_name fname
2 from all_objects
3 where object_id is not null;
Table created.
SQL>
SQL> alter table t2
2 add constraint t2_pk
3 primary key(object_id);
Table altered.
So here is the requirement. T1 gets a new column called FULL_NAME. It must be populated via the FNAME column on T2 via the OBJECT_ID lookup. We can see that T1 starts with FULL_NAME being null, and we can see a sample of the values that should come from T2
SQL> alter table t1 add full_name varchar2(200);
Table altered.
SQL>
SQL> select object_id, owner, full_name from t1
2 where rownum < 10;
OBJECT_ID OWNER FULL_NAME
---------- ------------------------------ --------------------------------------------------
30 SYS
47 SYS
32 SYS
49 SYS
17 SYS
2 SYS
29 SYS
45 SYS
10 SYS
9 rows selected.
SQL>
SQL> select object_id, fname from t2
2 where rownum < 10;
OBJECT_ID FNAME
---------- ---------------------------
30 SYS.I_COBJ#
47 SYS.I_USER2
32 SYS.CCOL$
49 SYS.I_COL2
17 SYS.FILE$
2 SYS.C_OBJ#
29 SYS.C_COBJ#
45 SYS.I_TS1
10 SYS.C_USER#
We can’t do joins or subqueries in DBMS_REDEFINITION but we can do deterministic expressions. In our case, we’ll use a deterministic PL/SQL function to mimic the join
SQL>
SQL> create or replace
2 function get_full(p_id int ) return varchar2 deterministic is
3 f varchar2(100);
4 begin
5 select fname
6 into f
7 from t2
8 where object_id = p_id;
9
10 return f;
11 end;
12 /
Function created.
As per normal, we create an interim table to hold the converted data as we redefine the table.
SQL>
SQL> create table t_interim (
2 owner varchar2(128),
3 object_name varchar2(128),
4 subobject_name varchar2(128),
5 object_id number,
6 data_object_id number,
7 object_type varchar2(23),
8 created date,
9 last_ddl_time date,
10 timestamp varchar2(19),
11 status varchar2(7),
12 temporary varchar2(1),
13 generated varchar2(1),
14 secondary varchar2(1),
15 full_name varchar2(200)
16 );
Table created.
And now we are ready to use DBMS_REDEFINITION. The critical part here is the COL_MAPPING parameter. We are using the call to GET_FULL to populate the new column that will ultimately end up on T1.
SQL>
SQL>
SQL> declare
2 l_colmap varchar(512);
3 begin
4 l_colmap :=
5 'OWNER
6 ,OBJECT_NAME
7 ,SUBOBJECT_NAME
8 ,OBJECT_ID
9 ,DATA_OBJECT_ID
10 ,OBJECT_TYPE
11 ,CREATED
12 ,LAST_DDL_TIME
13 ,TIMESTAMP
14 ,STATUS
15 ,TEMPORARY
16 ,GENERATED
17 ,SECONDARY
18 ,get_full(OBJECT_ID) FULL_NAME';
19
20 dbms_redefinition.start_redef_table
21 ( uname => user,
22 orig_table => 'T1',
23 int_table => 'T_INTERIM',
24 col_mapping => l_colmap );
25 end;
26 /
PL/SQL procedure successfully completed.
And here is the nice part. By using DBMS_REDEFINITION, we dont need to worry about the dependencies – we can let COPY_TABLE_DEPENDENTS do the heavy lifting.
SQL>
SQL> variable nerrors number
SQL> begin
2 dbms_redefinition.copy_table_dependents
3 ( user, 'T1', 'T_INTERIM',
4 copy_indexes => dbms_redefinition.cons_orig_params,
5 num_errors => :nerrors );
6 end;
7 /
PL/SQL procedure successfully completed.
SQL>
SQL> print nerrors
NERRORS
----------
0
SQL> begin
2 dbms_redefinition.finish_redef_table
3 ( user, 'T1', 'T_INTERIM' );
4 end;
5 /
PL/SQL procedure successfully completed.
And we’re done ! A correlated update using DBMS_REDEFINITION.
SQL>
SQL> select object_id, owner, full_name from t1
2 where rownum < 10;
OBJECT_ID OWNER FULL_NAME
---------- ------------------------------ --------------------------------------------------
30 SYS SYS.I_COBJ#
47 SYS SYS.I_USER2
32 SYS SYS.CCOL$
49 SYS SYS.I_COL2
17 SYS SYS.FILE$
2 SYS SYS.C_OBJ#
29 SYS SYS.C_COBJ#
45 SYS SYS.I_TS1
10 SYS SYS.C_USER#
9 rows selected.
SQL>
SQL>
You might be thinking “But we’ll be slow because we’re calling that PL/SQL millions of times”. Yes, this will be slower than a pure CTAS operation, but by using DBMS_REDEFINITION the process is online with nearly zero service disruption. Hence the overall execution time is not nearly so critical anymore as a CTAS where applications are unavailable during the activity.
Got some thoughts? Leave a comment