DBMS_JOB – watching for failures – Learning is not a spectator sport

DBMS_JOB – watching for failures

May 27, 2019

2–3 minutes

I had a friend point this one out to me recently. They use DBMS_JOB to allow some “fire and forget” style functionality for user, and in their case, the jobs are “best efforts” in that if they fail, it is not a big deal.

So whilst this may sound counter-intuitive, but if you rely on jobs submitted via DBMS_JOB to fail, then please read on.

By default, if a job fails 16 times in a row, it will marked as broken by the the database. Here’s a simple of example that, where the anonymous block will obviously fail because we are dividing by zero each time. I’ll set the job to run every 5 seconds, so that within a couple of minutes we’ll have 16 failures. First I’ll run this on 11.2.0.4


SQL> select banner from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.4 - 64bit Production

SQL> declare
  2    j int;
  3  begin
  4    dbms_job.submit(j,'declare x int; begin x := 1/0; end;',sysdate,'sysdate+5/86400');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

[wait for a bit]

SQL> select job, what, failures from user_jobs;

       JOB WHAT                                                 FAILURES
---------- -------------------------------------------------- ----------
        23 declare x int; begin x := 1/0; end;                         8

[wait for a bit]

SQL> /

       JOB WHAT                                                 FAILURES B
---------- -------------------------------------------------- ---------- -
         3 declare x int; begin x := 1/0; end;                         9 N

SQL> /

       JOB WHAT                                                 FAILURES B
---------- -------------------------------------------------- ---------- -
         3 declare x int; begin x := 1/0; end;                        16 Y

SQL> /

       JOB WHAT                                                 FAILURES B
---------- -------------------------------------------------- ---------- -
         3 declare x int; begin x := 1/0; end;                        16 Y

You can see the “breaking” of the job in action here. We got to 16 failures, and the database decided “enough is enough” Smile and the job will no longer run until some sort of intervention is performed by an administrator.

Now I’ll run that demo again in 12.2


SQL> declare
  2    j int;
  3  begin
  4    dbms_job.submit(j,'declare x int; begin x := 1/0; end;',sysdate,'sysdate+5/86400');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

[wait for a bit]

SQL> select job, what, failures from user_jobs;

       JOB WHAT                                                 FAILURES
---------- -------------------------------------------------- ----------
        23 declare x int; begin x := 1/0; end;                         8

[wait for a bit]

SQL> /

       JOB WHAT                                                 FAILURES B
---------- -------------------------------------------------- ---------- -
         3 declare x int; begin x := 1/0; end;                        13 N

SQL> /

       JOB WHAT                                                 FAILURES B
---------- -------------------------------------------------- ---------- -
         3 declare x int; begin x := 1/0; end;                        19 N

SQL> /

       JOB WHAT                                                 FAILURES B
---------- -------------------------------------------------- ---------- -
         3 declare x int; begin x := 1/0; end;                        25 N

You can see that in 12.2 (and I’ve tested in 18 and 19) that failures can continue past 16 unabated. If you’re being hit by this, patches are available, so please get in touch with Support. A quick workaround until you can apply patches is to use another job to monitor the others. Here’s a small anonymous block you could run in each container as SYSDBA that you could schedule (say) every few minutes


SQL> begin
  2    for i in ( select job
  3               from dba_jobs
  4               where  failures > 16
  5               and    job not in ( sys_context('userenv','fg_job_id'),sys_context('userenv','bg_job_id'))
  6               and    job not in ( select job from dba_jobs_running )
  7               )
  8    loop
  9      dbms_ijob.broken(i.job,true);
 10    end loop;
 11    commit;
 12  end;
 13  /

PL/SQL procedure successfully completed.

Ideally, you probably want to be moving to DBMS_SCHEDULER where possible, which has a greater flexibility and control over error handling amongst many other things.

6 responses to “DBMS_JOB – watching for failures”

Christian Klammer

May 27, 2019 at 8:22 pm

Hello Connor,

running the SQL provided as sysdba will not affect jobs of other database users as dbms_job.broken can only be run as the job owner of the jov to be set broken.

Maybe 12.2 upwards changed that behaviour … up to 12.1.0.2 it is as I wrote above.

Best Regards,
Christian

Reply
1. Connor McDonald
  
  May 27, 2019 at 8:36 pm
  
  You should check that last block again…its not using DBMS_JOB 🙂
  
  Reply
Christian Klammer

May 27, 2019 at 11:18 pm

Okay, I did not know about that in fact and assumed it was a typo. Useful stuff that.

Live and Learn 😉

Thanks.

Reply
DBMS_JOB – the joy of transactions – Learning is not a spectator sport

May 28, 2019 at 10:23 am

[…] is a followup to yesterdays post on DBMS_JOB and is critical if you’re upgrading to 19c soon. Mike Dietrich wrote a nice piece […]

Reply
aminadatia

May 28, 2019 at 10:50 pm

Reading this on my phone and your code which seems to go on for a mile is obviously not readable. Would it kill you to to format the code for better readability?

Regards

Reply
1. Connor McDonald
  
  May 29, 2019 at 7:49 am
  
  I actually ran a twitter poll a while back on this. The majority said they prefer code that looks like it would be coded in an (desktop) IDE, does not wrap, and has horizontal scrolling where needed in order to preserve the layout. So I try to adhere to that. All the code sections should be “swipe right” enabled (I tested this on my Android phone under a couple of browsers)
  
  Reply