Using ROWID for efficient parallel execution

October 16, 2023

4 minutes

As far back as 11g, a nifty tool for “manual” parallel processing is the DBMS_PARALLEL_EXECUTE package. In a nutshell, the basic premise is that for tasks that do not naturally fit into Oracle’s standard parallel processing DML/DDL options, you can use the package to break a large task into a smaller number of tasks that will be executed in parallel using the job scheduler in the database. It is nothing that you could not build yourself manually, but having a pre-packaged routine to do it just makes it all easier.

There’s plenty of posts out there about how to use it, so I’m not going to rehash the basics, but I wanted to provide a solution to one shortcoming.

I’ll start with a typical example where I want to carve up the processing of a large table BIGTAB using the “create_chunks_by_rowid” routine.


SQL> create table bigtab 
  2  as
  3  select d.* from dba_objects d,
  4   ( select 1 from dual connect by level <= 100 );

Table created.

SQL>
SQL> exec dbms_parallel_execute.create_task(task_name=>'SPLIT_ROWS');

PL/SQL procedure successfully completed.

SQL>
SQL> begin
  2    dbms_parallel_execute.create_chunks_by_rowid(
  3      task_name  =>'SPLIT_ROWS',
  4      table_owner=>user,
  5      table_name =>'BIGTAB',
  6      by_row     =>false,
  7      chunk_size =>10000);
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> select start_rowid, end_rowid
  2  from   user_parallel_execute_chunks
  3  where  task_name = 'SPLIT_ROWS'
  4  order by chunk_id;

START_ROWID        END_ROWID
------------------ ------------------
AAAjZ/AAQAAAWCAAAA AAAjZ/AAQAAAYePH//
AAAjZ/AAQAAAYeQAAA AAAjZ/AAQAAAa6fH//
AAAjZ/AAQAAAa6gAAA AAAjZ/AAQAAAdWvH//
AAAjZ/AAQAAAdWwAAA AAAjZ/AAQAAAfy/H//
AAAjZ/AAQAAAfzAAAA AAAjZ/AAQAAAiPPH//
AAAjZ/AAQAAAiPQAAA AAAjZ/AAQAAAkrfH//
AAAjZ/AAQAAAkrgAAA AAAjZ/AAQAAAnHvH//
AAAjZ/AAQAAAnHwAAA AAAjZ/AAQAAApj/H//
AAAjZ/AAQAAApkAAAA AAAjZ/AAQAAAsAPH//
AAAjZ/AAQAAAsAQAAA AAAjZ/AAQAAAucfH//
AAAjZ/AAQAAAucgAAA AAAjZ/AAQAAAw4vH//
AAAjZ/AAQAAAw4wAAA AAAjZ/AAQAAAzU/H//
AAAjZ/AAQAAAzVAAAA AAAjZ/AAQAAA1xPH//
AAAjZ/AAQAAA1xQAAA AAAjZ/AAQAAA4NfH//
AAAjZ/AAQAAA4NgAAA AAAjZ/AAQAAA6pvH//
AAAjZ/AAQAAA6pwAAA AAAjZ/AAQAAA9F/H//
AAAjZ/AAQAAA9GAAAA AAAjZ/AAQAAA/iPH//
AAAjZ/AAQAAA/iQAAA AAAjZ/AAQAABB+fH//
AAAjZ/AAQAABB+gAAA AAAjZ/AAQAABCB/H//

19 rows selected.

SQL>
SQL> exec dbms_parallel_execute.drop_task('SPLIT_ROWS');

PL/SQL procedure successfully completed.

In the absence of any other information, I picked a chunk size of 10000, and I ended up with 19 chunks to be processed. Here is the problem with this approach – if I went ahead and built a task to process those chunks, I would now create 19 scheduler jobs that will all launch concurrently. Maybe my server has enough cores to handle this sudden jump in activity, but maybe it doesn’t. A common recommendation to avoid an excessive spike in server load is to lower “job_queue_processes” to a smaller value, but that comes with its own challenge: What about standard already scheduled jobs? Will they get starved out? You might be happily processing your parallel tasks, but all of a sudden, no-one is getting emails anymore out of APEX ☹. You could leave job_queue_processes at its default and conceivably work around the concurrency issue with dedicated services/Resource Manager/sessions profiles etc, but suddenly the promise of “ease of use” with DBMS_PARALLEL_EXECUTE becomes less inviting.

Ideally, we would like to control how many chunks will be created by DBMS_PARALLEL_EXECUTE so that we can impose a cap on how many jobs will run concurrently for this task. And here’s a simple way to do that.

I traced “create_chunks_by_rowid” routine to examine how it generated the ROWID ranges and as you’d probably expect, we simply loop through the extents in DBA_EXTENTS, and every time the cumulative total of blocks passed our CHUNK_SIZE setting, we output a new chunk for the task. Armed with this knowledge it is now trivial to tweak the execution to always build a nominated number of chunks rather than being bound by chunk size.

Let’s say we only want 4 concurrent tasks – here’s the simple adjustment to the code.



SQL> exec dbms_parallel_execute.create_task(task_name=>'SPLIT_ROWS');

PL/SQL procedure successfully completed.

SQL>
SQL> declare
  2    num_jobs int := 4;
  3    chunk_size int;
  4  begin
  5    select 8+ceil(sum(blocks))/num_jobs
  6    into chunk_size
  7    from dba_extents
  8    where owner = user
  9    and   segment_name = 'BIGTAB';
 10
 11    dbms_parallel_execute.create_chunks_by_rowid(
 12      task_name  =>'SPLIT_ROWS',
 13      table_owner=>user,
 14      table_name =>'BIGTAB',
 15      by_row     =>false,
 16      chunk_size =>chunk_size);
 17  end;
 18  /

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> select start_rowid, end_rowid
  2  from   user_parallel_execute_chunks
  3  where  task_name = 'SPLIT_ROWS'
  4  order by chunk_id;

START_ROWID        END_ROWID
------------------ ------------------
AAAjZ/AAQAAAWCAAAA AAAjZ/AAQAAAhB/H//
AAAjZ/AAQAAAhCAAAA AAAjZ/AAQAAAsB/H//
AAAjZ/AAQAAAsCAAAA AAAjZ/AAQAAA3B/H//
AAAjZ/AAQAAA3CAAAA AAAjZ/AAQAABCB/H//

4 rows selected.
SQL>
SQL> exec dbms_parallel_execute.drop_task('SPLIT_ROWS');

PL/SQL procedure successfully completed.

You can see that although we’re still using CHUNK_SIZE in the API call, by querying DBA_EXTENTS first to determine totalnumber of blocks to be processed, we can now control the number of task chunks that will created.

Using the DBMS_PARALLEL_EXECUTE package to build ROWID ranges like this does not solely have to be for onward use in the package itself. Anywhere where you need ranges of data you might find the “create_chunks_by…” utilities useful. For example, Daniel describes a use case for DataPump when exporting LOBs here. You could take advantage of DBMS_PARALLEL_EXECUTE to yield faster parallel DataPump executions.

Let me know if you come up with other interesting uses for DBMS_PARALLEL_EXECUTE.

5 responses to “Using ROWID for efficient parallel execution”

stewashton

October 16, 2023 at 5:34 pm

Hi Connor,

Your solution is elegant and useful in more than one context, but for the specific requirement to limit the number of concurrent scheduler jobs, doesn’t the parallel_level parameter do that as well?

Best regards, Stew

Reply
1. Connor McDonald
  
  October 17, 2023 at 11:11 am
  
  Well… I steered clear of that because I didnt want to open a can of worms :-). My “issue” with parallel_level is that it doesn’t (imo) ease the decision point about chunk size. As you can can imagine, when parallel_level is less than number of chunks, we treat each chunk like a queue msg, ie, as a job completes its work, it looks for the next unprocessed chunk and moves onto that one.
  
  Which means you’re back to the same issue – you need to *carefully* choose chunk size otherwise you’ll get skew of workload across the jobs. Here’s an example – I’ve the set the “cost” of updating a row to 0.1 seconds, and I’ve got 2000 rows to update. Thus the overall cost if 200 seconds, which when I set “parallel_level” to 4, I would be expecting around 50 seconds to complete.
  
  SQL> create table t as
  2 select rownum x, rpad(‘x’,4000) y from dual
  3 connect by level
  SQL> select blocks from user_tables where table_name = ‘T’;
  
  BLOCKS
  ———-
  2040
  
  SQL> create or replace
  2 trigger trg
  3 after update on t
  4 for each row
  5 begin
  6 dbms_session.sleep(0.1);
  7 end;
  8 /
  
  Trigger created.
  
  SQL>
  SQL>
  SQL> exec DBMS_PARALLEL_EXECUTE.create_task (task_name => ‘test_task’);
  
  PL/SQL procedure successfully completed.
  
  SQL>
  SQL> begin
  2 DBMS_PARALLEL_EXECUTE.create_chunks_by_rowid(task_name => ‘test_task’,
  3 table_owner => user,
  4 table_name => ‘T’,
  5 by_row => false,
  6 chunk_size => 400);
  7 end;
  8 /
  
  PL/SQL procedure successfully completed.
  
  SQL>
  SQL>
  SQL>
  SQL> SELECT chunk_id, status, start_rowid, end_rowid
  2 FROM user_parallel_execute_chunks
  3 WHERE task_name = ‘test_task’
  4 ORDER BY chunk_id;
  
  CHUNK_ID STATUS START_ROWID END_ROWID
  ———- ——————– —————— ——————
  357 UNASSIGNED AAAjdnAAMAAAACQAAA AAAjdnAAMAAB+kPH//
  358 UNASSIGNED AAAjdnAAMAAB+kQAAA AAAjdnAAMAAB+qfH//
  359 UNASSIGNED AAAjdnAAMAAB+qgAAA AAAjdnAAMAAB+wvH//
  360 UNASSIGNED AAAjdnAAMAAB+wwAAA AAAjdnAAMAAB+2/H//
  361 UNASSIGNED AAAjdnAAMAAB+3AAAA AAAjdnAAMAAB+9PH//
  362 UNASSIGNED AAAjdnAAMAAB+9QAAA AAAjdnAAMAAB+9/H//
  
  6 rows selected.
  
  SQL>
  SQL> set timing on
  SQL> DECLARE
  2 l_sql_stmt VARCHAR2(32767);
  3 BEGIN
  4 l_sql_stmt := ‘UPDATE t
  5 SET t.x = t.x + 10
  6 WHERE rowid BETWEEN :start_id AND :end_id’;
  7
  8 DBMS_PARALLEL_EXECUTE.run_task(task_name => ‘test_task’,
  9 sql_stmt => l_sql_stmt,
  10 language_flag => DBMS_SQL.NATIVE,
  11 parallel_level => 4);
  12 END;
  13 /
  
  PL/SQL procedure successfully completed.
  
  Elapsed: 00:01:18.04
  
  But we took 80 seconds! Assuming we went with dbms_parallel_execute for the purpose of reducing run time, that’s not ideal.
  
  The chunk parallel level ratio wasn’t good. I could workaround that by having a huge number of chunks, but what if my process startup time is expensive, etc. I’m effectively back to the same issue – needing to take care when picking chunks.
  
  But if I choose the chunks to marry the target job queue processes I want as per the blog post…
  
  SQL> begin
  2 DBMS_PARALLEL_EXECUTE.create_chunks_by_rowid(task_name => ‘test_task’,
  3 table_owner => user,
  4 table_name => ‘T’,
  5 by_row => false,
  6 chunk_size => 520);
  7 end;
  8 /
  
  PL/SQL procedure successfully completed.
  
  SQL>
  SQL>
  SQL>
  SQL> SELECT chunk_id, status, start_rowid, end_rowid
  2 FROM user_parallel_execute_chunks
  3 WHERE task_name = ‘test_task’
  4 ORDER BY chunk_id;
  
  CHUNK_ID STATUS START_ROWID END_ROWID
  ———- ——————– —————— ——————
  367 UNASSIGNED AAAjdtAAMAAAACQAAA AAAjdtAAMAAB+mHH//
  368 UNASSIGNED AAAjdtAAMAAB+mIAAA AAAjdtAAMAAB+uPH//
  369 UNASSIGNED AAAjdtAAMAAB+uQAAA AAAjdtAAMAAB+2XH//
  370 UNASSIGNED AAAjdtAAMAAB+2YAAA AAAjdtAAMAAB+9/H//
  
  4 rows selected.
  
  SQL>
  SQL> set timing on
  SQL> DECLARE
  2 l_sql_stmt VARCHAR2(32767);
  3 BEGIN
  4 l_sql_stmt := ‘UPDATE t
  5 SET t.x = t.x + 10
  6 WHERE rowid BETWEEN :start_id AND :end_id’;
  7
  8 DBMS_PARALLEL_EXECUTE.run_task(task_name => ‘test_task’,
  9 sql_stmt => l_sql_stmt,
  10 language_flag => DBMS_SQL.NATIVE,
  11 parallel_level => 4);
  12 END;
  13 /
  
  PL/SQL procedure successfully completed.
  
  Elapsed: 00:00:54.04
  
  I’m closer to my expected results.
  
  Reply
NiceTheoryVidar

October 20, 2023 at 6:49 pm

Nice post – I love dbms_parallel_execute and have done some really cool stuff with it.
1) Do you have a solution when you don’t have access to dba_extents?
2) I prefer “many” smaller chunks, as this makes large jobs more controllable in terms of progress and restartability.

Vidar

Reply
1. Connor McDonald
  
  October 20, 2023 at 7:03 pm
  
  without DBA_EXTENTS you could use BLOCKS from USER_TABLES as a reasonable guess
  
  Reply
2. Rajesh
  
  March 8, 2024 at 11:03 pm
  
  Hi Vidar,
  
  Could you post some examples of what you have achieved with dbms_parallel_execute please?
  
  Rajesh
  
  Reply