Partition loading in direct mode

Posted by

Direct mode insert using the APPEND hint is a cool piece of technology that lets you load bulk data into a table very quickly and efficiently. Obviously there are a number of implications of doing so which you can read about here, but the one that catches most people out is the that you are locking the table during the load and once the load is completed, the table is “disabled” until the transaction ends. Here’s a quick example of that in action:


SQL> create table t
  2  as select * from dba_objects d
  3  where 1=0;

Table created.

SQL> insert /*+ APPEND */ into t
  2  select * from dba_objects d;

91020 rows created.

--
-- No further INSERTs are possible
--
SQL> insert /*+ APPEND */ into t
  2  select * from dba_objects d;
insert /*+ APPEND */ into t
                          *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel


--
-- No further DML at all is possible
--
SQL> delete from t;
delete from t
            *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel


--
-- Not even a SELECT statement is allowed
--
SQL> select count(*) from t;
select count(*) from t
                     *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel


--
-- I must end the transaction first with commit or rollback before normal service is resumed
--
SQL> commit;

Commit complete.

SQL> select count(*) from t;

  COUNT(*)
----------
     91020 

This makes sense given that a direct mode insert is manipulating the high water mark (HWM) for a table, so it pretty much has to be an all or nothing process for the session issuing the load, because the HWM is in a state of flux until we commit or rollback.

However, what about a partitioned table? Can I do a direct mode insert into a single partition, whilst leaving the other partitions available in the same session? Let’s try it out.


SQL> create table t
  2  partition by list ( x )
  3  ( partition p1 values (1),
  4    partition p2 values (2)
  5  )
  6  as select 1 x, d.* from dba_objects d
  7  where 1=0;

Table created.

--
-- Only load into partition P1
--
SQL> insert /*+ APPEND */ into t partition (p1)
  2  select 1 x, d.* from dba_objects d;

91020 rows created.

--
-- And now see if partition P2 can be loaded as part of the same transaction
--
SQL> insert /*+ APPEND */ into t partition (p2)
  2  select 2 x, d.* from dba_objects d;
insert /*+ APPEND */ into t partition (p2)
                          *
ERROR at line 1:
ORA-12838: cannot read/modify an object after modifying it in parallel

Unfortunately not. Even though we never touched partition P2 with the first INSERT, and partition P2 is a different physical segment, we still cannot do additional direct loads on it. One easy workaround to this is to place that second load in a separate transaction, for example:


SQL> declare
  2    pragma autonomous_transaction;
  3  begin
  4    insert /*+ APPEND */ into t partition (p2)
  5    select 2 x, d.* from dba_objects d;
  6    commit;
  7  end;
  8  /

PL/SQL procedure successfully completed.

You’re probably thinking “That’s a bit silly. If we’re going to commit for each partition, then why wouldn’t we just commit after the load of the first partition P1 anyway?”. That’s a valid point, but what the above example shows is that you can do direct path loads into separate partitions concurrently. This opens up opportunities for (dramatically) increasing the throughput of direct path loads if you can segment the source data into its designated target partitions at load time. I’ll extend the table above to have 10 partitions, and use a little DBMS_JOB code to now load 10 partitions all concurrently.


SQL> create table t
  2  partition by list ( x )
  3  ( partition p1 values (1),
  4    partition p2 values (2),
  5    partition p3 values (3),
  6    partition p4 values (4),
  7    partition p5 values (5),
  8    partition p6 values (6),
  9    partition p7 values (7),
 10    partition p8 values (8),
 11    partition p9 values (9),
 12    partition p10 values (10)
 13  )
 14  as select 1 x, d.* from dba_objects d
 15  where 1=0;

Table created.

SQL>
SQL> declare
  2    j int;
  3    l_sql varchar2(200) :=
  4      'begin
  5         insert /*+ APPEND */ into t partition (p@)
  6         select @ x, d.* from dba_objects d;
  7         commit;
  8       end;';
  9  begin
 10    for i in 1 .. 10 loop
 11      dbms_job.submit(j,replace(l_sql,'@',i));
 12    end loop;
 13    
 14  end;
 15  /

PL/SQL procedure successfully completed.

SQL>
SQL> select job, what from user_jobs
  2  where what like '%dba_objects%';

       JOB WHAT
---------- --------------------------------------------------
       394 begin
           insert /*+ APPEND */ into t partition (p1)
           select 1 x, d.* from dba_objects d;
           commit;
           end;

       395 begin
           insert /*+ APPEND */ into t partition (p2)
           select 2 x, d.* from dba_objects d;
           commit;
           end;

       396 begin
           insert /*+ APPEND */ into t partition (p3)
           select 3 x, d.* from dba_objects d;
           commit;
           end;

       397 begin
           insert /*+ APPEND */ into t partition (p4)
           select 4 x, d.* from dba_objects d;
           commit;
           end;

       398 begin
           insert /*+ APPEND */ into t partition (p5)
           select 5 x, d.* from dba_objects d;
           commit;
           end;

       399 begin
           insert /*+ APPEND */ into t partition (p6)
           select 6 x, d.* from dba_objects d;
           commit;
           end;

       400 begin
           insert /*+ APPEND */ into t partition (p7)
           select 7 x, d.* from dba_objects d;
           commit;
           end;

       401 begin
           insert /*+ APPEND */ into t partition (p8)
           select 8 x, d.* from dba_objects d;
           commit;
           end;

       402 begin
           insert /*+ APPEND */ into t partition (p9)
           select 9 x, d.* from dba_objects d;
           commit;
           end;

       403 begin
           insert /*+ APPEND */ into t partition (p10)
           select 10 x, d.* from dba_objects d;
           commit;
           end;


10 rows selected.

SQL>
SQL> commit;

Commit complete.

SQL> select subobject_name, cnt
  2  from (
  3    select dbms_rowid.rowid_object(rowid) obj, count(*) cnt
  4    from   t
  5    group by dbms_rowid.rowid_object(rowid)
  6    ), user_objects
  7  where obj = data_object_id
  8  order by to_number(substr(subobject_name,2));

SUBOBJECT_        CNT
---------- ----------
P1              91020 
P2              91020 
P3              91020 
P4              91020 
P5              91020 
P6              91020 
P7              91020 
P8              91020 
P9              91020 
P10             91020 

And voila! 10 partitions all loaded in direct mode concurrently! There is also a standard means of loading data concurrently using parallel DML, but if I know the segmentation of the data in advance, the “manual” method above can sometimes be a quicker and easier to debug option.

11 comments

  1. Connor, have you used the same methodology to insert into a date interval table where the partition ( or sub-partition ) name is not know ahead of time ?

    1. Using the FOR syntax seems to fine

      SQL> create table t
      2 partition by range ( x ) interval (1)
      3 ( partition p1 values less than (2),
      4 partition p2 values less than (3)
      5 )
      6 as select 1 x, d.* from dba_objects d
      7 where 1=0;

      Table created.

      SQL>
      SQL> insert /*+ APPEND */ into t partition for ( 1 )
      2 select 1 x, d.* from dba_objects d;

      82003 rows created.

      SQL>
      SQL>
      SQL> declare
      2 pragma autonomous_transaction;
      3 begin
      4 insert /*+ APPEND */ into t partition for ( 2 )
      5 select 2 x, d.* from dba_objects d;
      6 commit;
      7 end;
      8 /

      PL/SQL procedure successfully completed.

  2. Hi Connor,

    In your last part where you assemble the INSERT string before using the DBMS_JOB, how come you didn’t assemble the insert string with DECLARE pragma_autonomous nor with /*+ APPEND */ hint? Are they not needed when using DBMS_JOB?

    1. Well spotted ! That was silly of me. Yes they ARE needed. I’ll redo the demo (to make sure the test is not invalidated) and update the blog.
      Thanks for stopping by.

      1. Thanks Connor. I’m guessing you didn’t include DECLARE pragma autonomous_transaction; BEGIN….END; because DBMS_JOB executes each INSERT block in a separate session, so no need to explicitly declare? Also, is it advisable to use DBMS_JOB or should we transition to use DBMS_SCHEDULER instead?

        1. Under the covers DBMS_JOB uses the scheduler in current releases anyway, so no real difference.

          And yes, the pragma was only because I was in the same SQL Plus session

  3. Connor, how would you modify this if you are loading into a Date Interval table and the partitions are not pre-created ? I am experimenting with this now but would appreciate your ideas. Thanks !

    1. Couple of things

      1) The FOR clause would let you nominate the partition without knowing its name, eg

      select * from t partition for ( date ‘2020-02-02′ )

      2) If you want to precreate partitions on an interval partitioned table, a lock attempt is sufficient to do so without needing to load data, eg

      SQL> create table t ( d date, c int )
      2 partition by range ( d )
      3 interval ( numtodsinterval(1,’DAY’ ))
      4 (
      5 partition p1 values less than ( date ‘2020-01-02’ )
      6 );

      Table created.

      SQL>
      SQL> lock table t partition for ( date ‘2020-02-02’ ) in exclusive mode ;

      Table(s) Locked.

      SQL>
      SQL> select count(*)
      2 from user_tab_partitions
      3 where table_name = ‘T’;

      COUNT(*)
      ———-
      2

  4. Hi Connor,

    Thanks a lot for your article as it is really helping me with some massive DELETEs and INSERTs that a colleague of mine is facing with his data workload.

    I had some questions:-

    1. What happens if one of the code-blocks for a specific partition fails for any reason by the DBMS_JOB? Will my entire procedure fail as a result of one hiccup or will it continue to process other DBMS_JOB code-blocks, despite, one or more fail?

    2. Is an EXCEPTION WHEN OTHERS THEN block needed in the assembled sql string or not necessary as DBMS_JOB will raise an exception and mark that one particular job as FAILED while continue to process other code-blocks?

    3. I notice that after all the individual, partition-specific COMMITs are applied, you still issue one final, overall, COMMIT at the end. Is this needed to release the overall Table Lock?

    Thanks a lot in advance.

    1. Each job runs in the background in its own process (just like having 10 separate SQL Plus sessions). What this *does* mean, because they are running in the background, you need some logging etc so you can track when they run, if they work or fail etc. In a more real world scenario, each job would be a plsql procedure that takes the partition name as parameter. You would build your SQL and do logging etc etc ….

      The final commit is the commit the job defintions. Only then will the database see them and start to run the jobs.

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.