Parallel DML and locking – going manual

Posted by

Generally speaking, if you have a giant chunk of data over “here”, and you need to load it into a table over “there”, and you have exclusive access to that target table, then parallel DML is a great option. People often neglect using parallel DML on the premise that “Database are I/O bound anyway so more sessions doesn’t help”. But the world (and in particular the hardware we use) evolves and in these times of flash-based storage, the I/O bandwidth is often no longer a bottleneck. 

Sidebar: In the days of spinning rust drives, you may be surprised to discover that even then, it was rarely disk speeds that were the issue – it was typically either the CPU on the storage array, or the pipe to the database server that killed you, but that’s a story for another post.

But the key part of my opening statement is “you have exclusive access”. Here’s an example of probably the most common form of parallel DML, namely, a direct path insert.


SQL> insert /*+ PARALLEL APPEND */ into t
  2  select *
  3  from
  4  ( select 1, d.* from dba_objects d
  5    union all
  6    select 2, d.* from dba_objects d
  7  );

167618 rows created.

Look at what happens in any other session whilst I’m running that load (or even after the load is complete but I have not yet committed).


SQL> insert into t
  2  select 3, d.* from dba_objects d
  3  where rownum = 1;
[waiting waiting ....]

That’s right…Nothing happens. I’m stuck on a lock.

But what happens when that the target table is partitioned? Here’s the same table above, this time partitioned by list (to keep the demo simple). I’ll now load the table with data that will only fill partitions P1 and P2.


SQL> create table t
  2  partition by list ( x)
  3  ( partition p1 values (1),
  4    partition p2 values (2),
  5    partition p3 values (3)
  6  )
  7  as
  8  select 0 x, d.* from dba_objects d where 1=0;

Table created.

SQL>
SQL> alter session enable parallel dml;

Session altered.

SQL> insert /*+ PARALLEL APPEND */ into t
  2  select *
  3  from
  4  ( select 1, d.* from dba_objects d
  5    union all
  6    select 2, d.* from dba_objects d
  7  );

167624 rows created.

Let me now attempt to do some DML on partition P3 which was untouched by the operation.


SQL> insert into t
  2  select 3, d.* from dba_objects d
  3  where rownum = 1;
[waiting waiting ....]

No change. The untouched partition is still locked away. This is because the lock is done based on the target in the DML, and in this case, it is my table T. The entire table is locked.

It is however possible to lock at the partition level. So in instances such as this, you may sometimes be better off taking control of the parallelism yourself and using a session per partition.

Session 1


SQL> insert /*+ APPEND */ into t partition (p1 )
  2  select *
  3  from
  4  ( select 1, d.* from dba_objects d );

83812 rows created.

Session 2


SQL> insert /*+ APPEND */ into t partition (p2 )
  2  select *
  3  from
  4  ( select 2, d.* from dba_objects d );

83812 rows created.

Session 2


SQL> insert into t
  2  select 3, d.* from dba_objects d  -- will be partition p3
  3  where rownum = 1;

1 row created.

Notice that all three sessions can happily run concurrently, and the locking in session 1 and 2 is only at the partition level because the partition is the nominated target in the DML rather than the entire table. I often adopt this approach because the greater control I have also means that if one of the loads fail, then I know exactly which partition needs to be fixed and re-run.  With a giant parallel load across the entire table it is an “all or nothing” operation.

Of course, if the source data was truly massive, you could explore parallel DML on each of the partition-level insert statements assuming you had sufficient server firepower.

So even though parallel DML is a cool piece of tech, don’t be afraid to employ your own parallel mechanisms given your knowledge of the source data.

TL;DR: If you can get down the granularity of the partition or subpartition, sometimes using your own home-grown parallelism scheme can yield more concurrency.

5 comments

  1. I am not sure what I am not doing right, but, I am using Oracle Developer VM, with Oracle 19c Enterprise Edition, and can not reproduce this behaviour. I am using two session, and they are not locking each other. For example, when table T is partitioned, one session is entering 1 and 2, and second one is entering 3. They are not interlocked at commit level – they are not waiting for each other. Meaning that session 1, which enters 1, and 2, finished its job, not commiting,and session 2, which enters 3. Session 2 finishes immediately, and can commit, even if session 1 is not committed yet.

      1. Here is the case – I am not able to reproduce what I’ve said in my comment (apologize for that) , but I am also getting some very interesting behavior – that session 2 is committing the data from session 1:
        Session 0:
        drop table T;

        create table T
        partition by list ( x)
        ( partition p1 values (1),
        partition p2 values (2),
        partition p3 values (3)
        )
        as
        select 0 x, d.* from dba_objects d where 1=0;

        Output: Table T created.

        Session 1:
        alter session enable parallel dml;
        insert /*+ PARALLEL APPEND */ into T
        select *
        from
        ( select 1, d.* from dba_objects d
        union all
        select 2, d.* from dba_objects d
        );

        Output:
        Session altered.
        144,960 rows inserted.

        Session 2:
        insert into T
        select 3, d.* from dba_objects d
        where rownum = 1;
        commit;

        Output:
        Error starting at line : 1 in command –
        insert into T
        select 3, d.* from dba_objects d
        where rownum = 1
        Error report –
        ORA-12838: cannot read/modify an object after modifying it in parallel

        Commit complete.

        Session 3:
        select count(*) from T;

        Output:
        144960

        So, even session 1 did not commit, its data are committed by session 2, but, I presume this is due to the direct load path?

        Sorry once more for not checking more extensively what I’ve posted.

        Best regards.

  2. Hi Connor, thanks for this article. I will waiting for your sidebars considerations on another articles, about storage bottleneck.

  3. I prefer explicitly naming the target partition for an Insert statement — if we can identify it.

    Haven’t tried using the “partition_extended_names” subclause method to name multiple partitions.

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.