As far back as 11g, a nifty tool for “manual” parallel processing is the DBMS_PARALLEL_EXECUTE package. In a nutshell, the basic premise is that for tasks that do not naturally fit into Oracle’s standard parallel processing DML/DDL options, you can use the package to break a large task into a smaller number of tasks that will be executed in parallel using the job scheduler in the database. It is nothing that you could not build yourself manually, but having a pre-packaged routine to do it just makes it all easier.
There’s plenty of posts out there about how to use it, so I’m not going to rehash the basics, but I wanted to provide a solution to one shortcoming.
I’ll start with a typical example where I want to carve up the processing of a large table BIGTAB using the “create_chunks_by_rowid” routine.
SQL> create table bigtab
2 as
3 select d.* from dba_objects d,
4 ( select 1 from dual connect by level <= 100 );
Table created.
SQL>
SQL> exec dbms_parallel_execute.create_task(task_name=>'SPLIT_ROWS');
PL/SQL procedure successfully completed.
SQL>
SQL> begin
2 dbms_parallel_execute.create_chunks_by_rowid(
3 task_name =>'SPLIT_ROWS',
4 table_owner=>user,
5 table_name =>'BIGTAB',
6 by_row =>false,
7 chunk_size =>10000);
8 end;
9 /
PL/SQL procedure successfully completed.
SQL>
SQL>
SQL> select start_rowid, end_rowid
2 from user_parallel_execute_chunks
3 where task_name = 'SPLIT_ROWS'
4 order by chunk_id;
START_ROWID END_ROWID
------------------ ------------------
AAAjZ/AAQAAAWCAAAA AAAjZ/AAQAAAYePH//
AAAjZ/AAQAAAYeQAAA AAAjZ/AAQAAAa6fH//
AAAjZ/AAQAAAa6gAAA AAAjZ/AAQAAAdWvH//
AAAjZ/AAQAAAdWwAAA AAAjZ/AAQAAAfy/H//
AAAjZ/AAQAAAfzAAAA AAAjZ/AAQAAAiPPH//
AAAjZ/AAQAAAiPQAAA AAAjZ/AAQAAAkrfH//
AAAjZ/AAQAAAkrgAAA AAAjZ/AAQAAAnHvH//
AAAjZ/AAQAAAnHwAAA AAAjZ/AAQAAApj/H//
AAAjZ/AAQAAApkAAAA AAAjZ/AAQAAAsAPH//
AAAjZ/AAQAAAsAQAAA AAAjZ/AAQAAAucfH//
AAAjZ/AAQAAAucgAAA AAAjZ/AAQAAAw4vH//
AAAjZ/AAQAAAw4wAAA AAAjZ/AAQAAAzU/H//
AAAjZ/AAQAAAzVAAAA AAAjZ/AAQAAA1xPH//
AAAjZ/AAQAAA1xQAAA AAAjZ/AAQAAA4NfH//
AAAjZ/AAQAAA4NgAAA AAAjZ/AAQAAA6pvH//
AAAjZ/AAQAAA6pwAAA AAAjZ/AAQAAA9F/H//
AAAjZ/AAQAAA9GAAAA AAAjZ/AAQAAA/iPH//
AAAjZ/AAQAAA/iQAAA AAAjZ/AAQAABB+fH//
AAAjZ/AAQAABB+gAAA AAAjZ/AAQAABCB/H//
19 rows selected.
SQL>
SQL> exec dbms_parallel_execute.drop_task('SPLIT_ROWS');
PL/SQL procedure successfully completed.
In the absence of any other information, I picked a chunk size of 10000, and I ended up with 19 chunks to be processed. Here is the problem with this approach – if I went ahead and built a task to process those chunks, I would now create 19 scheduler jobs that will all launch concurrently. Maybe my server has enough cores to handle this sudden jump in activity, but maybe it doesn’t. A common recommendation to avoid an excessive spike in server load is to lower “job_queue_processes” to a smaller value, but that comes with its own challenge: What about standard already scheduled jobs? Will they get starved out? You might be happily processing your parallel tasks, but all of a sudden, no-one is getting emails anymore out of APEX ☹. You could leave job_queue_processes at its default and conceivably work around the concurrency issue with dedicated services/Resource Manager/sessions profiles etc, but suddenly the promise of “ease of use” with DBMS_PARALLEL_EXECUTE becomes less inviting.
Ideally, we would like to control how many chunks will be created by DBMS_PARALLEL_EXECUTE so that we can impose a cap on how many jobs will run concurrently for this task. And here’s a simple way to do that.
I traced “create_chunks_by_rowid” routine to examine how it generated the ROWID ranges and as you’d probably expect, we simply loop through the extents in DBA_EXTENTS, and every time the cumulative total of blocks passed our CHUNK_SIZE setting, we output a new chunk for the task. Armed with this knowledge it is now trivial to tweak the execution to always build a nominated number of chunks rather than being bound by chunk size.
Let’s say we only want 4 concurrent tasks – here’s the simple adjustment to the code.
SQL> exec dbms_parallel_execute.create_task(task_name=>'SPLIT_ROWS');
PL/SQL procedure successfully completed.
SQL>
SQL> declare
2 num_jobs int := 4;
3 chunk_size int;
4 begin
5 select 8+ceil(sum(blocks))/num_jobs
6 into chunk_size
7 from dba_extents
8 where owner = user
9 and segment_name = 'BIGTAB';
10
11 dbms_parallel_execute.create_chunks_by_rowid(
12 task_name =>'SPLIT_ROWS',
13 table_owner=>user,
14 table_name =>'BIGTAB',
15 by_row =>false,
16 chunk_size =>chunk_size);
17 end;
18 /
PL/SQL procedure successfully completed.
SQL>
SQL>
SQL> select start_rowid, end_rowid
2 from user_parallel_execute_chunks
3 where task_name = 'SPLIT_ROWS'
4 order by chunk_id;
START_ROWID END_ROWID
------------------ ------------------
AAAjZ/AAQAAAWCAAAA AAAjZ/AAQAAAhB/H//
AAAjZ/AAQAAAhCAAAA AAAjZ/AAQAAAsB/H//
AAAjZ/AAQAAAsCAAAA AAAjZ/AAQAAA3B/H//
AAAjZ/AAQAAA3CAAAA AAAjZ/AAQAABCB/H//
4 rows selected.
SQL>
SQL> exec dbms_parallel_execute.drop_task('SPLIT_ROWS');
PL/SQL procedure successfully completed.
You can see that although we’re still using CHUNK_SIZE in the API call, by querying DBA_EXTENTS first to determine totalnumber of blocks to be processed, we can now control the number of task chunks that will created.
Using the DBMS_PARALLEL_EXECUTE package to build ROWID ranges like this does not solely have to be for onward use in the package itself. Anywhere where you need ranges of data you might find the “create_chunks_by…” utilities useful. For example, Daniel describes a use case for DataPump when exporting LOBs here. You could take advantage of DBMS_PARALLEL_EXECUTE to yield faster parallel DataPump executions.
Let me know if you come up with other interesting uses for DBMS_PARALLEL_EXECUTE.
Got some thoughts? Leave a comment