DIY parallel task execution

Posted by

We had a question on AskTOM recently, where a poster wanted to rebuild all of the indexes in his schema that had a status of UNUSABLE.  Running the rebuild’s in serial fashion (one after the other) seemed an inefficient use of the server horsepower, and rebuilding each index with a PARALLEL clause also was not particularly beneficial because the issue was more about the volume of indexes rather than the size of each index.

An obvious solution was to use the DBMS_PARALLEL_EXECUTE facility.  Our poster was pleased with our response but then came back asking for help, because they were stuck languishing on an old release for which DBMS_PARALLEL_EXECUTE was not yet present.

If you’re in this situation, it’s surprisingly easy to throw together your own rudimentary version using DBMS_JOB (or DBMS_SCHEDULER). Here’s a simple demo (using our poster’s original issue of rebuild indexes)

Step 1

First we’ll identify the list of operations needed.  In our case, that’s trivial – just a query to the dictionary.  But we will store that list in table because as we rebuild indexes, the list changes.  We want the initial static list of indexes so that we can spread this over a number of “worker” processes.


SQL> create table ind_rebuild as
  2  select
  3    rownum seq,
  4    owner,
  5    index_name,
  6    'NOT STARTED' job_status
  7  from
  8    dba_indexes
  9  where status = 'UNUSABLE'
 10  and ...

Table created.

SQL> create index ind_rebuild_ix on ind_rebuild ( owner, index_name );

Index created.

SQL> select * from ind_rebuild;

       SEQ OWNER                          INDEX_NAME                               JOB_STATUS
---------- ------------------------------ ---------------------------------------- -----------
         1 SCOTT                          EMP_PK                                   NOT STARTED
         2 SCOTT                          DEPT_PK                                  NOT STARTED
         3 MCDONAC                        SYS_C0013656                             NOT STARTED
         4 MCDONAC                        PK_CONTAINER                             NOT STARTED
         5 MCDONAC                        UN_CONTAINER                             NOT STARTED
         6 MCDONAC                        PK_ELEMENTS                              NOT STARTED
         7 MCDONAC                        UN_ELEMENTS                              NOT STARTED
...
...
...

Step 2

Now we’ll create a simple procedure that will perform the rebuild.  This procedure will be run from several worker sessions concurrently, so we need a way splitting the workload out.  For any sequenced list, MOD will do the job quite nicely.  In a real production scenario, you might have more sophisticated requirements (for example, you might have some sort of weighting system so that there is an estimate of how much effort each rebuild would need, and split rebuilds out accordingly in order for all processes to finish at approximately the same time).




SQL> create or replace
  2  procedure worker_bee(p_jobs pls_integer default 4, p_this_job pls_integer) is
  3    l_progress     pls_integer := 0;
  4    l_err          int;
  5  begin
  6    if p_this_job not between 0 and p_jobs-1 then
  7       raise_application_error(-20000,'0 and '||p_jobs||' for modulo');
  8    end if;
  9
 10    for i in ( select * from ind_rebuild )
 11    loop
 12        if mod(i.seq,p_jobs) = p_this_job and
 13           i.job_status != 'DONE'
 14        then
 15          l_progress := l_progress + 1;
 16          dbms_application_info.set_client_info('Task '||l_progress||','||i.owner||'.'||i.index_name);
 17          begin
 18            execute immediate 'alter index '||i.owner||'.'||i.index_name||' rebuild';
 19            update ind_rebuild
 20            set    job_status = 'DONE'
 21            where  owner = i.owner
 22            and    index_name = i.index_name;
 23          exception
 24            when others then
 25              l_err := sqlcode;
 26              update ind_rebuild
 27              set    job_status = to_char(l_err)
 28              where  owner = i.owner
 29              and    index_name = i.index_name;
 30          end;
 31          commit;
 32        end if;
 33    end loop;
 34  end;
 35  /

Procedure created.

Line 12 shows we pick up only those rows that should be picked up by this particular worker process (as nominated by parameter “p_this_job”).  This of course could have been in the WHERE clause itself, but I’ve taken this example from one where we did instrumentation for all rows, including those rows which were skipped as not being appropriate for a particular worker process.  As we rebuild each index, we pop a message into V$SESSION.CLIENT_INFO we can monitor activity, and if the index rebuild fails, we’ll store the sqlcode in the job status.  I’ve kept the logic pretty simple to assist with understanding the demo, but it would relatively straightforward to extend it to handle index partitions as well.

Step 3

Now it just a simple matter of submitting jobs up to the number of concurrent workers you want. 



SQL> variable j number
SQL> declare
  2    c_jobs pls_integer := 4;
  3  begin
  4    for i in 0 .. c_jobs-1
  5    loop
  6      dbms_job.submit(:j,'worker_bee('||c_jobs||','||i||');');
  7    end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL>
SQL> commit;

Commit complete.

Once we commit, the jobs will commence, and we can easily monitor the jobs either by polling our candidate indexes static table, or by monitoring the job queue itself.



SQL> select * from ind_rebuild;

       SEQ OWNER                          INDEX_NAME                               JOB_STATUS
---------- ------------------------------ ---------------------------------------- -----------
         1 SCOTT                          EMP_PK                                   DONE
         2 SCOTT                          DEPT_PK                                  NOT STARTED
         3 MCDONAC                        SYS_C0013656                             DONE
         4 MCDONAC                        PK_CONTAINER                             NOT STARTED
         5 MCDONAC                        UN_CONTAINER                             -1418
         6 MCDONAC                        PK_ELEMENTS                              DONE
         7 MCDONAC                        UN_ELEMENTS                              NOT STARTED
...
...
...


So if you are on a current release, DBMS_PARALLEL_EXECUTE can do this (and more) without much fuss, but even without that facility, using the job / scheduler queue to parallelise tasks is simple to do.

2 comments

  1. If (and sadly, *only* if) the same schema owns both the index and the table, the owner of both could run the DBMS_PCLXUTIL package as well.

  2. Connor,

    Thank you for this idea. Am I correct in assuming that DBMS_SCHEDULER will not be able to achieve the same level of concurrency as DBMS_JOB because DBMS_SCHEDULER will not wait for COMMIT to start the job like DBMS_JOB does?

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.