Locked rows and lunch breaks ? A simple fix

Even in the new world of stateless web applications, from time to time we still come across the “lunch break locking” problem.  That is, someone or something, commences a transaction which locks some rows, and then their session goes idle, without committing or rolling back the transaction.  The metaphor commonly used was the “someone locks a row in a table and then goes out for lunch”, with them being totally oblivious to the havoc they have caused until they return.

With modern applications being generally stateless (or transactionally stateless) you would think the problem would have disappeared, but if anything, whilst the frequency of the problem has dropped, the diagnosis of the problem when it occurs is harder than ever.  The anonymity of application server sessions, and the scale of web-based customers means when a session does “lose track” of what it was doing and leaves a transaction active, it is very hard to detect until we have a major problem or outage on our hands.

But here’s a mitigation strategy you might want to consider.  Resource Manager can do some of the heavy lifting for you.  It can kill sessions that are idle but are blocking other sessions from completing their work.

Here’s a simple demo of how to put a resource manager plan in place to do that


SQL> begin
  2    dbms_resource_manager.create_pending_area();
  3    --
  4
  5    dbms_resource_manager.create_consumer_group(
  6      CONSUMER_GROUP=>'CG_STOP_BLOCKERS',
  7      COMMENT=>'CG for stop blocking'
  8      );
  9
 10    dbms_resource_manager.create_plan(
 11      PLAN=> 'STOP_BLOCKERS',
 12      COMMENT=>'Plan for stop blocking'
 13    );
 14
 15    dbms_resource_manager.create_plan_directive(
 16      PLAN=> 'STOP_BLOCKERS',
 17      GROUP_OR_SUBPLAN=>'CG_STOP_BLOCKERS',
 18      COMMENT=>'Directive',
 19      MAX_IDLE_BLOCKER_TIME => 60
 20    );
 21
 22
 23    dbms_resource_manager.create_plan_directive(
 24      PLAN=> 'STOP_BLOCKERS',
 25      GROUP_OR_SUBPLAN=>'OTHER_GROUPS',
 26      COMMENT=>'leave others alone'
 27    );
 28
 29    DBMS_RESOURCE_MANAGER.VALIDATE_PENDING_AREA;
 30
 31    DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
 32
 33  end;
 34  /

PL/SQL procedure successfully completed.

The key element here is the plan directive, which sets MAX_IDLE_BLOCKER_TIME to 60 seconds. So if an idle session is blocking another for that duration, resource manage will kill the session. Let’s assign the the appropriate consumer group to SCOTT, enable the plan and see it in action


SQL> exec dbms_resource_manager_privs.grant_switch_consumer_group('SCOTT','CG_STOP_BLOCKERS',false);

PL/SQL procedure successfully completed.

SQL> exec dbms_resource_manager.set_initial_consumer_group('SCOTT','CG_STOP_BLOCKERS');

PL/SQL procedure successfully completed.

SQL> ALTER SYSTEM SET RESOURCE_MANAGER_PLAN = STOP_BLOCKERS;

System altered.

SQL> conn scott/tiger
Connected.

SQL> create table t as select 1 x from dual;

Table created.

SQL> delete from t;

1 row deleted.

So this session now has a lock on the row, but is now idle…Now we fire up another session who will get blocked


SQL> set timing on
SQL> delete from t;

[waiting]

After approximately 60 seconds, the second session did in fact respond


SQL> set timing on
SQL> delete from t;

1 row deleted.

Elapsed: 00:00:58.09

So presumably the lock from session 1 has been released. If we return to session 1, we can see how the lock get released


SQL> select * from dual;
select * from dual
            *
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 24708
Session ID: 593 Serial number: 59128


They got killed ! So they can take their time at lunch, and come back with their double-choc muffins, chai latte and watercress salad…We don’t care, because our application is protected Smile

Forever is a long time

It’s interesting that we still see lots of code out “in the wild” that leans itself to taking forever. Now I’m not talking here about poorly performing SQL etc…because (at least theoretically) no matter how bad the SQL is, presumably it will finish one day, even if that day is years into the future. What I’m referring to here is the default mechanism for locking in Oracle.

If you try access a row that is locked by someone else, you will wait. Don’t get me wrong … that’s a very good thing. The alternative – of letting two people change the same data at the same time is equivalent to saying “I dont care about my data at all”.

And how long will we wait ? Well… we’ll wait forever. We will never stop waiting, until that lock has been released.

I don’t know about you … but for me… forever is long time. A really long time. I cannot recall ever seeing any kind of Service Level Agreement for an application’s response times as stating “Yup…forever is cool. Take as long as you want” Smile

So perhaps consider that when you’re building your applications in Oracle. Yes, the default is to wait forever, but a little code (which has been available since version 9) can really go a long way.

Rather than just coding:

select * from T where …for update

or

delete from T where …

You can opt to wait for a nominated amount of time to attempt to get that lock (and then report something sensible back to the user).


SQL> select * from T for update wait 60;

(60 seconds pass)

select * from T for update wait 60
*
ERROR at line 1:
ORA-30006: resource busy; acquire with WAIT timeout expired

Note that the error code is not the same as the error you could get with a NOWAIT clause, which is:

ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

But you don’t even need to stop there. Why not catch that error with a SERVERERROR trigger – and then do some session level probing to get more information. You could even empower the users to take their own action 🙂


SQL> create or replace trigger lock_expiry
  2  after servererror on database
  3  declare
  4    l_err varchar2(4000);
  5  begin
  6    if ( is_servererror(30006) ) then
  7      for i in (
  8      select /*+ leading(lk sess) */ sess.sid, sess.username, sess.module
  9      from   v$lock lk,
 10             v$session sess,
 11             v$session s
 12      where  lk.id1 = s.row_wait_obj#
 13      and    lk.type = 'TM'
 14      and    lk.sid = sess.sid
 15      and    s.sid = sys_context('USERENV','SID')
 16      )
 17      loop
 18        l_err := l_err || 'Session '||i.sid||' who is '||i.username||' using '||i.module || chr(10);
 19      end loop;
 20      raise_application_error(-20000,l_err);
 21    end if;
 22  end;
 23  /

Trigger created.

SQL> select * from t for update wait 60;
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-20000: Session 717 who is MCDONAC using SQL*Plus
ORA-06512: at line 18
ORA-30006: resource busy; acquire with WAIT timeout expired

Just for completeness, note that if you having locking issue, V$SESSION gives you direct access to find out who is blocking you.


SQL> select sid, last_call_et,
  2         nvl2(lockwait,'BLOCKED',status) status,
  3         blocking_session
  4  from   v$session s;

       SID LAST_CALL_ET STATUS     BLOCKING_SESSION 
---------- ------------ ---------- ---------------- 
        39          376 INACTIVE                    
        40          412 INACTIVE
        41          412 INACTIVE
        44          421 INACTIVE
        46          340 BLOCKED                  39 
        49            4 ACTIVE
        50          453 INACTIVE
        51          453 INACTIVE


can you lock part of a table ?

I should stress that this is all well covered in the documentation, but it was something I stumbled upon just recently.

Its common knowledge that you can lock a table:

SQL> create table T 
  2   ( x int ) 
  3  partition by range ( x ) 
  4  ( 
  5    partition p1 values less than (10), 
  6    partition p2 values less than (20), 
  7    partition p3 values less than (30) 
  8  );

Table created.

SQL> lock table T in exclusive mode;

Table(s) Locked.

However, you can also selectively lock just partitions of a table as well, using the standard partition clause, for example:

Session 1

SQL> lock table T partition ( p1 ) in exclusive mode;

Table(s) Locked.

whilst the following is still possible in session 2:

SQL> insert into T values (20);

1 row created.