Multi-table insert

Posted by

An interesting question came through on AskTom recently.  The requirement was to perform a single pass through a source table, and load the data into three target tables.

Now that’s trivially achieved with a multi-table insert, but there was a subtle “twist” on this requirement.  Each of the three target tables may already contain some, none or all of the rows from the source table.  Hence the requirement was to “fill in the blanks”.

So here’s a little demo of one way we could achieve this.

First, here is our source table with 10 rows (1 through 10)


SQL> create table t_source as select rownum s from dual connect by level <= 10; Table created. SQL>
SQL> select * from t_source;

         S
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

10 rows selected.

And here are our three target tables, T1, T2 and T3, each with a subset of the rows already


SQL> create table t1 as select rownum x from dual connect by level <= 5; Table created. SQL> create table t2 as select rownum y from dual connect by level <= 3; Table created. SQL> create table t3 as select rownum z from dual connect by level <= 6; Table created. SQL> select * from t1 order by 1;

         X
----------
         1
         2
         3
         4
         5

5 rows selected.

SQL> select * from t2 order by 1;

         Y
----------
         1
         2
         3

3 rows selected.

SQL> select * from t3 order by 1;

         Z
----------
         1
         2
         3
         4
         5
         6

6 rows selected.

Now obviously we could perform a simple insert-select-where-not-exists style operation for each table, but we need to meet our poster’s requirement of a single pass through the source table. So we will take advantage of an outer join to pick up just those rows that do not already match.



SQL> insert all
  2    when in_tab1 is null then
  3      into t1 (x ) values (s )
  4    when in_tab2 is null then
  5      into t2 (y ) values (s )
  6    when in_tab3 is null then
  7      into t3 (z ) values (s )
  8  select
  9    t_source.s,
 10    t1.x in_tab1,
 11    t2.y in_tab2,
 12    t3.z in_tab3
 13  from t_source, t1, t2, t3
 14  where t_source.s = t1.x(+)
 15  and t_source.s = t2.y(+)
 16  and t_source.s = t3.z(+)
 17  /

16 rows created.

SQL>
SQL> select * from t1 order by 1;

         X
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

10 rows selected.

SQL> select * from t2 order by 1;

         Y
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

10 rows selected.

SQL> select * from t3 order by 1;

         Z
----------
         1
         2
         3
         4
         5
         6
         7
         8
         9
        10

10 rows selected.

SQL>

And the job is done. Our poster never really elaborated on why a single pass was necessary – but let’s assume it was due to the source table being large. If we look at the execution plan, we see a swag of cascading hash joins, so whilst a single pass of the source table has been achieved, there is no guarantee that we’re not going to end up with other issues in processing all of those “concurrent” joins.



---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |    10 |   120 |     8   (0)| 00:00:01 |
|*  1 |  HASH JOIN OUTER     |          |    10 |   120 |     8   (0)| 00:00:01 |
|*  2 |   HASH JOIN OUTER    |          |    10 |    90 |     6   (0)| 00:00:01 |
|*  3 |    HASH JOIN OUTER   |          |    10 |    60 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T_SOURCE |    10 |    30 |     2   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| T2       |     3 |     9 |     2   (0)| 00:00:01 |
|   6 |    TABLE ACCESS FULL | T1       |     5 |    15 |     2   (0)| 00:00:01 |
|   7 |   TABLE ACCESS FULL  | T3       |     6 |    18 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T_SOURCE"."S"="T3"."Z"(+))
   2 - access("T_SOURCE"."S"="T1"."X"(+))
   3 - access("T_SOURCE"."S"="T2"."Y"(+))

But that’s often life on AskTom.We only get half the story Smile

14 comments

  1. Loved the solution. It’s a great day when I learn something new. But I’m not so thrilled with the old-hat, Oracle-style join syntax. 😉 I’ve adopted ANSI join syntax and never looked back. And it gives insight into the difference between join conditions and filtering conditions.

    SELECT t0.s
    ,t1.x in_tab1
    ,t2.y in_tab2
    ,t3.z in_tab3
    FROM t_source t0
    LEFT JOIN t1 ON t0.s = t1.x
    LEFT JOIN t2 ON t0.s = t2.y
    LEFT JOIN t3 ON t0.s = t3.z;

      1. Those are my sentiments too, Krishna. On my project, new (+) outer join syntax won’t pass a peer code review. And if I happen to modify code near such syntax, that is an occasion to “fix what ain’t broke” under the rule of leaving code a little better than you found it.

        1. true. as you know the (+) syntax was implemented by oracle for outer joins ways ahead of ANSI. But now it is deprecated. Students are taught the old syntax a little bit in order deal with legacy code.

  2. OMG….there’s ansi join syntax on my blog…I feel the need to wash:-)

    Signed,
    The Dinosaur

    Seriously though – Thanks for stopping by and taking the time to comment. Happy New Year

  3. I should mention that my main reason for switching was related to “old” too (me). I kept forgetting which “side” the (+) goes on. And throw a literal equality or IS NULL test into the mix and it got even tougher. I just hated the syntax.

    Happy New Year to you too.

  4. In the case of the three tables, the predicates WHERE … IS NULL is used and full table scans result. So an index such as the following would alleviate the matters a little bit:

    create index t1_idx on t1(idx,1)

  5. If the various hash joins do indeed cause a problem, they could be replaced by a sort operation:

    create table emp as select * from scott.emp;
    create table emp1 as select * from emp where deptno = 10;
    create table emp2 as select * from emp where deptno = 20;
    create table emp3 as select * from emp where deptno = 30;

    insert all
    when bitand(flag,1) > 0 then into emp1
    values(empno, ename, job, mgr, hiredate, sal, comm, deptno)
    when bitand(flag,2) > 0 then into emp2
    values(empno, ename, job, mgr, hiredate, sal, comm, deptno)
    when bitand(flag,4) > 0 then into emp3
    values(empno, ename, job, mgr, hiredate, sal, comm, deptno)
    select sum(flag) flag,
    empno, ename, job, mgr, hiredate, sal, comm, deptno
    from (
    select 7 flag, e.* from emp e
    union all
    select -1 flag, e.* from emp1 e
    union all
    select -2 flag, e.* from emp2 e
    union all
    select -4 flag, e.* from emp3 e
    )
    group by empno, ename, job, mgr, hiredate, sal, comm, deptno
    having sum(flag) > 0;

    Happy New Year, and don’t mess it up before we get there 😉

  6. insert all

    when ins_table = ‘t1’ then

    into t1 (x) values (ins_val)

    when ins_table = ‘t2’ then

    into t2 (x) values (ins_val)

    when ins_table = ‘t3’ then

    into t3 (x) values (ins_val)

    select

    s ins_val, ‘t1’ ins_table

    from

    ts JOIN t1 on (ts.s t1.x)

    union

    select

    s ins_val, ‘t2’ ins_table

    from

    ts JOIN t2 on (ts.s t2.x)

    union

    select

    s ins_val, ‘t3’ ins_table

    from

    ts JOIN t2 on (ts.s t3.x);
    /

  7. insert all

    when ins_table = ‘t1’ then

    into t1 (x) values (ins_val)

    when ins_table = ‘t2’ then

    into t2 (x) values (ins_val)

    when ins_table = ‘t3’ then

    into t3 (x) values (ins_val)
    select
    s ins_val, ‘t1’ ins_table
    from
    ts JOIN t1 on (ts.s != t1.x)

    union
    select
    s ins_val, ‘t2’ ins_table
    from
    ts JOIN t2 on (ts.s != t2.x)
    union
    select
    s ins_val, ‘t3’ ins_table
    from
    ts JOIN t2 on (ts.s != t3.x);
    /

  8. in my first set of code somehow ‘not equal to’ has disappeared during upload. so ignore and read the second set of code!

  9. Connor, I can’t thank you enough for informing me about this insert feature. I’m rewriting the first package I wrote that replaced a COBOL program that maintained a multi-organization “master” table. I wasn’t happy with it and there’s been some requirement changes. It gets “full” snapshots from various organizations and the code has to determine the action (insert, update, delete) by analyzing the differences between the snapshots and the central table. But it must also produce an actions report showing the inserts, updates, deletes and rejects (of course you can’t assume the records in the data files are clean). I’ve boiled it down to one DML statement per action, but now I can incorporate the edits into the select statement and insert rejects into the report table sing INSERT FIRST. No more making a second pass for the report. All the work is done in SQL and the PL/SQL just wraps and controls it.

    Thanks again!

  10. Typical use case for single pass multi-table insert is when your input data is large and/or hard to extract. For example populating dimension tables with new entries from a flat-file extract.

    The only problem is .. multi-table insert is SLOOOW … which defeats the purpose of doing single pass.

    Same thing with “log errors” extention – slows down things dramatically.

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.