We had a question on AskTom recently asking for the best way to update a new column on a very large table. Four in-place update options are at our disposal:
- Serial update SQL
- Parallel DML update SQL
- DBMS_PARALLEL_EXECUTE
- PL/SQL batching (as long as sensible restart-after-error logic is readily available)
all of which will achieve the required outcome but they have a significant redo and undo cost associated with them. Also, depending on the size of the new column, such an update might create chaos with row migration, because whilst unlikely, there is definitely the potential for every row to grow beyond the available block space required to hold it in situ.
So a common approach to tackling a large “update” is to recast the problem to creating a new version (copy) of the table. This can be done with a CREATE TABLE AS SELECT (CTAS). Using this method we get the benefits of less resource consumption, but two issues can get in the way of this approach:
- It requires significant downtime whilst the new table is populated
- It requires careful planning to ensure all of the dependent objects (indexes, constraints, etc etc) are correctly recreated on the new table
But perhaps there is an alternative – it might not be 100% as fast as a pure CTAS, but what if it solves both the issues mentioned above ? That alternative is DBMS_REDEFINITION
People often dismiss DBMS_REDEFINITION as purely for changing the structure of a table without changing the underlying data (for example, partitioning a table). But they fail to take notice of the column mapping functionality that is available – and as long as our column mapping expressions are deterministic, then we can get more value out of the package, including a correlated update.
So here’s our example:
We will have a table called T1, which is a copy of DBA_OBJECTS and will be the table to which we will want a add a new column
SQL>
SQL> create table t1 as
2 select owner, object_name, subobject_name,
3 object_id, data_object_id, object_type,
4 created, last_ddl_time, timestamp, status,
5 temporary, generated, secondary
6 from all_objects
7 where object_id is not null;
Table created.
SQL>
SQL> alter table t1
2 add constraint t1_pk
3 primary key(object_id);
Table altered.
T2 is the table from which we will want to source the new data to apply to T1. In this case, T2 has a primary key of OBJECT_ID which can be used to source a new data element called FNAME. In our example, this is just owner and object name concatenated.
SQL>
SQL> create table t2 as select object_id, owner||'.'||object_name fname
2 from all_objects
3 where object_id is not null;
Table created.
SQL>
SQL> alter table t2
2 add constraint t2_pk
3 primary key(object_id);
Table altered.
So here is the requirement. T1 gets a new column called FULL_NAME. It must be populated via the FNAME column on T2 via the OBJECT_ID lookup. We can see that T1 starts with FULL_NAME being null, and we can see a sample of the values that should come from T2
SQL> alter table t1 add full_name varchar2(200);
Table altered.
SQL>
SQL> select object_id, owner, full_name from t1
2 where rownum < 10;
OBJECT_ID OWNER FULL_NAME
---------- ------------------------------ --------------------------------------------------
30 SYS
47 SYS
32 SYS
49 SYS
17 SYS
2 SYS
29 SYS
45 SYS
10 SYS
9 rows selected.
SQL>
SQL> select object_id, fname from t2
2 where rownum < 10;
OBJECT_ID FNAME
---------- ---------------------------
30 SYS.I_COBJ#
47 SYS.I_USER2
32 SYS.CCOL$
49 SYS.I_COL2
17 SYS.FILE$
2 SYS.C_OBJ#
29 SYS.C_COBJ#
45 SYS.I_TS1
10 SYS.C_USER#
We can’t do joins or subqueries in DBMS_REDEFINITION but we can do deterministic expressions. In our case, we’ll use a deterministic PL/SQL function to mimic the join
SQL>
SQL> create or replace
2 function get_full(p_id int ) return varchar2 deterministic is
3 f varchar2(100);
4 begin
5 select fname
6 into f
7 from t2
8 where object_id = p_id;
9
10 return f;
11 end;
12 /
Function created.
As per normal, we create an interim table to hold the converted data as we redefine the table.
SQL>
SQL> create table t_interim (
2 owner varchar2(128),
3 object_name varchar2(128),
4 subobject_name varchar2(128),
5 object_id number,
6 data_object_id number,
7 object_type varchar2(23),
8 created date,
9 last_ddl_time date,
10 timestamp varchar2(19),
11 status varchar2(7),
12 temporary varchar2(1),
13 generated varchar2(1),
14 secondary varchar2(1),
15 full_name varchar2(200)
16 );
Table created.
And now we are ready to use DBMS_REDEFINITION. The critical part here is the COL_MAPPING parameter. We are using the call to GET_FULL to populate the new column that will ultimately end up on T1.
SQL>
SQL>
SQL> declare
2 l_colmap varchar(512);
3 begin
4 l_colmap :=
5 'OWNER
6 ,OBJECT_NAME
7 ,SUBOBJECT_NAME
8 ,OBJECT_ID
9 ,DATA_OBJECT_ID
10 ,OBJECT_TYPE
11 ,CREATED
12 ,LAST_DDL_TIME
13 ,TIMESTAMP
14 ,STATUS
15 ,TEMPORARY
16 ,GENERATED
17 ,SECONDARY
18 ,get_full(OBJECT_ID) FULL_NAME';
19
20 dbms_redefinition.start_redef_table
21 ( uname => user,
22 orig_table => 'T1',
23 int_table => 'T_INTERIM',
24 col_mapping => l_colmap );
25 end;
26 /
PL/SQL procedure successfully completed.
And here is the nice part. By using DBMS_REDEFINITION, we dont need to worry about the dependencies – we can let COPY_TABLE_DEPENDENTS do the heavy lifting.
SQL>
SQL> variable nerrors number
SQL> begin
2 dbms_redefinition.copy_table_dependents
3 ( user, 'T1', 'T_INTERIM',
4 copy_indexes => dbms_redefinition.cons_orig_params,
5 num_errors => :nerrors );
6 end;
7 /
PL/SQL procedure successfully completed.
SQL>
SQL> print nerrors
NERRORS
----------
0
SQL> begin
2 dbms_redefinition.finish_redef_table
3 ( user, 'T1', 'T_INTERIM' );
4 end;
5 /
PL/SQL procedure successfully completed.
And we’re done ! A correlated update using DBMS_REDEFINITION.
SQL>
SQL> select object_id, owner, full_name from t1
2 where rownum < 10;
OBJECT_ID OWNER FULL_NAME
---------- ------------------------------ --------------------------------------------------
30 SYS SYS.I_COBJ#
47 SYS SYS.I_USER2
32 SYS SYS.CCOL$
49 SYS SYS.I_COL2
17 SYS SYS.FILE$
2 SYS SYS.C_OBJ#
29 SYS SYS.C_COBJ#
45 SYS SYS.I_TS1
10 SYS SYS.C_USER#
9 rows selected.
SQL>
SQL>
You might be thinking “But we’ll be slow because we’re calling that PL/SQL millions of times”. Yes, this will be slower than a pure CTAS operation, but by using DBMS_REDEFINITION the process is online with nearly zero service disruption. Hence the overall execution time is not nearly so critical anymore as a CTAS where applications are unavailable during the activity.
Connor,
I can’t help thinking that you ought to have made something of the fact that a function that queries the database shouldn’t be declared deterministic since someone can change the result of the function call by changing the content of the table.
Somewhere you need a session to lock the table exclusively for the duration, or make it readonly.
(P.S. I’m not saying I wouldn’t do this type of thing myself, but the principle needs a mention.)
Indeed, and remiss of me not to mention it. I’ve “generic-ised” the post from the AskTom question – where the lookup table was static, so yes, definitely worth a mention.
You could conceivable work around this with something like:
create or replace
function get_full(p_id int, p_scn int ) return varchar2 deterministic is
f varchar2(100);
begin
select fname
into f
from t2 as of scn p_scn
where object_id = p_id;
return f;
end;
/
and then the process could be:
lock table t2 in exclusive mode;
col x new_value y
select dbms_flashback.get_system_change_number x from dual;
commit;
declare
l_colmap varchar(512);
begin
l_colmap :=
‘OWNER
,OBJECT_NAME
,SUBOBJECT_NAME
,OBJECT_ID
,DATA_OBJECT_ID
,OBJECT_TYPE
,CREATED
,LAST_DDL_TIME
,TIMESTAMP
,STATUS
,TEMPORARY
,GENERATED
,SECONDARY
,get_full(OBJECT_ID,&&y) FULL_NAME’;
dbms_redefinition.start_redef_table
( uname => user,
orig_table => ‘T1’,
int_table => ‘T_INTERIM’,
col_mapping => l_colmap );
end;
/
and whilst this appeared to work, I see this interesting artifact:
SQL> select version_count
2 from v$sqlarea
3 where sql_text like ‘SELECT FNAME FROM T2 AS OF SCN :B2 WHERE OBJECT_ID = :B1%’;
VERSION_COUNT
————-
486
so perhaps a more appropriate solution would be a private copy of the lookup table.
Cheers,
Connor
Interestingly, in 12.2 they’ve added a procedure called execute_update in dbms_redefinition. You might think this would achieve what you’ve written here without the additional plsql call and deterministic function lie and it kind of does BUT it is really as offline as the CTAS method – it will just handle your constraints, indexes etc. All DML done to the the table during the execute_update call just get ignored, I can see this causing trouble.
What are the advantages/disadvantages of creating a large associative array with a deterministic ‘getter’ function?
create or replace package connor_test as
function get_entry(p_object_id in integer) return varchar2 deterministic;
end connor_test;
/
create or replace package body connor_test as
type hashtab is table of varchar2(200) index by pls_integer;
myhash hashtab;
function get_entry(p_object_id in integer) return varchar2 deterministic is
begin
return myhash(p_object_id);
end get_entry;
begin
for x in (select object_id, owner || ‘.’ || object_name as fullname
from all_objects) loop
myhash(x.object_id) := x.fullname;
end loop;
end connor_test;
/
Nice example but too strong rights are required. Beside:
grant execute on dbms_redefinition
a user needs:
CREATE ANY TABLE
ALTER ANY TABLE
DROP ANY TABLE
LOCK ANY TABLE
SELECT ANY TABLE
The following additional privileges are required to execute COPY_TABLE_DEPENDENTS:
CREATE ANY TRIGGER
CREATE ANY INDEX
So, hard to persuade a DBA to give me such grants …
regards,
Dusan