Parsing … no big deal eh ?

Most of us have probably seen the standard demo when it comes to emphasizing the need for sharable SQL, aka, using bind variables where appropriate.  The demo traditionally compares two similar scripts, where one of them generates a lot of SQL statements with literals, and the other recasts the same script with bind variables for dramatic improvement.

Here’s a simple version I’ve whipped up:


SQL> create table t ( x int primary key) organization index;

Table created.

SQL> insert into t
  2  select rownum from dual
  3  connect by level <= 100000;

100000 rows created.

SQL> exec dbms_stats.gather_table_stats('','T');

PL/SQL procedure successfully completed.

SQL> set timing on
SQL> declare
  2    v int;
  3  begin
  4    for i in 1 .. 100000 loop
  5      select x into v from t where x = i;
  6    end loop;
  7  end;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.56

SQL> set timing on
SQL> declare
  2    v int;
  3  begin
  4    for i in 1 .. 100000 loop
  5      execute immediate 'select x from t where x = '||i into v;
  6    end loop;
  7  end;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:01:04.92

So typically what follows is a “Woo Hoo! Look how awesome bind variables are, and look how crappy the literal SQL is” style of statement, with much hand waving and pointing in the direction of the elapsed time.

But lately, there has been people happy to provide a rebuttal, namely, “So what?”.  The argument runs like this:

In the “terrible” example, we are performing approximately 1600 statements per second.  Is that anywhere near as good as the binding case ?  No…but who cares.  There’s plenty of systems out there for which 16 statements per second would be fine, let alone 1600.  So is all the fuss about binding really justified?

Whilst such a rebuttal falls to pieces on other grounds – such as latch contention once we introduce multi-user scenarios, and probably more importantly in today’s security world – the risk of SQL injection, I’m not going to re-hash those.  I want to look at the claim of “Oh… the performance is good enough anyway”.

We seem to have forgotten, then when the first such “bind versus literals” scripts were being produced and published, it was a different era.  Personally, in the early 90’s, my Oracle database (version 7) was running on a Sparcstation 10, with it’s tremendously powerful and incredibly expensive, 50 Mhz processor !!!! Smile (For a trip down memory lane, see here https://en.wikipedia.org/wiki/SPARCstation_10 )

 

So in those days, the differential between binding and literals was massive, because processor cycles were incredibly valuable.  But we tend to forget, that just as processors have evolved, so too have our requirements to access data.  Yes, it’s true that a modern processor can probably easily handle those literal-based simple primary key lookup queries at a rate that meets our needs.  But that’s not always what a “modern” query looks like.  Nowadays, queries have evolved just like the processors – they can be much more complicated.  What happens to the parsing costs then ? So let’s bring some complexity into the fold, and re-run our test with some more complicated queries.

Obviously a complicated query might have a long execution time, so we’ll take that out of the equation by using DBMS_SQL to only parse (and not execute) the query.  We’ll use a couple of dictionary views for our query, which themselves comprise several dictionary views, so there’s plenty of complexity even though the query is just a simple join.


SQL> set timing on
SQL> declare
  2    c int;
  3  begin
  4   c := dbms_sql.open_cursor;
  5   for i in 1 .. 100000 loop
  6      dbms_sql.parse(c,
  7        q'{
  8        select o.object_name, o.last_ddl_time, sum(s.bytes)
  9        from   all_objects o,
 10               dba_segments s
 11        where  o.owner = s.owner
 12        and    o.object_name = s.segment_name
 13        and    o.object_type = 'TABLE'
 14        and    o.object_id = :1
 15        group by o.object_name, o.last_ddl_time
 16        }',
 17        dbms_sql.native );
 18    end loop;
 19    dbms_sql.close_cursor(c);
 20  end;
 21  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.14

SQL> declare
  2    c int;
  3  begin
  4   c := dbms_sql.open_cursor;
  5   for i in 1 .. 100000 loop
  6      dbms_sql.parse(c,
  7        q'{
  8        select o.object_name, o.last_ddl_time, sum(s.bytes)
  9        from   all_objects o,
 10               dba_segments s
 11        where  o.owner = s.owner
 12        and    o.object_name = s.segment_name
 13        and    o.object_type = 'TABLE'
 14        and    o.object_id = }'||i||
 15        q'{
 16        group by o.object_name, o.last_ddl_time
 17        }',
 18        dbms_sql.native );
 19    end loop;
 20    dbms_sql.close_cursor(c);
 21  end;
 22  /

PL/SQL procedure successfully completed.

Elapsed: 10:07:12.71


Yes…you are reading that right. Over 10 hours to get through all of that parsing load.

So just because the processors are bigger and smarter….don’t forget, for high volume calls, those literals are literally going to hurt your systems Smile

SQL statements using literals

16 years ago, someone “Ask-ed Tom” how to find those SQL statements that were not using bind variables.   You can see the question here (because we don’t delete stuff ever Smile) but I’ll paraphrase the answer below:

Tom took the following approach

  • take a copy of SQL statements in the library cache
  • create a routine that would hunt for constants in the SQL text (that is, numbers and anything within quotes) and replace them with place holders
  • then count the resultant SQL’s for duplicates

create table t1 as select sql_text from v$sqlarea;

alter table t1 add sql_text_wo_constants varchar2(1000);

create or replace function 
remove_constants( p_query in varchar2 ) return varchar2
as
    l_query long;
    l_char  varchar2(1);
    l_in_quotes boolean default FALSE;
begin
    for i in 1 .. length( p_query )
    loop
        l_char := substr(p_query,i,1);
        if ( l_char = '''' and l_in_quotes )
        then
            l_in_quotes := FALSE;
        elsif ( l_char = '''' and NOT l_in_quotes )
        then
            l_in_quotes := TRUE;
            l_query := l_query || '''#';
        end if;
        if ( NOT l_in_quotes ) then
            l_query := l_query || l_char;
        end if;
    end loop;
    l_query := translate( l_query, '0123456789', '@@@@@@@@@@' );
    for i in 0 .. 8 loop
        l_query := replace( l_query, lpad('@',10-i,'@'), '@' );
        l_query := replace( l_query, lpad(' ',10-i,' '), ' ' );
    end loop;
    return upper(l_query);
end;
/
update t1 set sql_text_wo_constants = remove_constants(sql_text);

select sql_text_wo_constants, count(*)
  from t1
 group by sql_text_wo_constants
having count(*) > 100
 order by 2
/

It’s a utility I used many times over the years.  But times change and it’s always good to reflect on the tools and techniques you use, and whether anything has come along in later versions to assist you.  Consider the second bullet point above:

create a routine that would hunt for constants and replace them with place holders

That sounds a lot like the process that must be followed when “cursor_sharing” is enabled (and set to “force”), the only difference being the place holders would be come bind variables.  With that in mind, even if we are not using cursor_sharing, then the information that got added to the library cache when cursor_sharing was introduced all those years ago can now be used to assist us.

In V$SQLSTATS, there is a column FORCE_MATCHING_SIGNATURE which is a “signature” (or hash value) representing a SQL statement that has been converted to allow for cursor_sharing = force.  So that becomes the perfect grouping mechanism to identify repeated statements using literals.  For example:



SQL> select force_matching_signature, count(*)
  2  from  v$sqlstats
  3  where force_matching_signature > 0
  4  group by force_matching_signature
  5  having count(*) > 10
  6  order by 2 desc;

 FORCE_MATCHING_SIGNATURE   COUNT(*)
------------------------- ----------
      7756258419218828704         73
     15993438058742417605         16
     15893216616221909352         15
     14052954841849993177         12
     10493170372233636846         11

5 rows selected.

SQL> select sql_text from v$sqlstats
  2  where FORCE_MATCHING_SIGNATURE = 7756258419218828704;

SQL_TEXT
----------------------------------------------------------------
select count(FA#) from SYS_FBA_TRACKEDTABLES where OBJ# = 148525
 and bitand(FLAGS, 128)=0

select count(FA#) from SYS_FBA_TRACKEDTABLES where OBJ# = 8448 
and bitand(FLAGS, 128)=0

select count(FA#) from SYS_FBA_TRACKEDTABLES where OBJ# = 6309 
and bitand(FLAGS, 128)=0

select count(FA#) from SYS_FBA_TRACKEDTABLES where OBJ# = 148524
 and bitand(FLAGS, 128)=0

select count(FA#) from SYS_FBA_TRACKEDTABLES where OBJ# = 8442 
and bitand(FLAGS, 128)=0

select count(FA#) from SYS_FBA_TRACKEDTABLES where OBJ# = 8594 
and bitand(FLAGS, 128)=0

select count(FA#) from SYS_FBA_TRACKEDTABLES where OBJ# = 148693
 and bitand(FLAGS, 128)=0

select count(FA#) from SYS_FBA_TRACKEDTABLES where OBJ# = 8602 
and bitand(FLAGS, 128)=0

...

Identification of literal SQL has become trivial. Fixing it ? Well…that’s another story Smile

Not using binds .. banging an obsolete drum ?

We’ve all seen the mantra – “you should be using binds”.  And this is a demo that’s been rolled out for decades now, showing the performance overhead of not using them:



SQL> drop table T purge;

Table dropped.

SQL> create table T (x primary key) as
  2  select rownum x from dual
  3  connect by level <= 100000;

Table created.

SQL> set timing on
SQL> declare
  2    res int;
  3  begin
  4    for i in 1 .. 100000 loop
  5      select x into res from t where x = i;
  6    end loop;
  7  end;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.61

SQL> set timing on
SQL> declare
  2    res int;
  3  begin
  4    for i in 1 .. 100000 loop
  5      execute immediate 'select x from t where x = '||i into res ;
  6    end loop;
  7  end;
  8  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:58.72


And the conclusion is pretty straightforward. The parsing costs added nearly a 40-fold overhead to the execution time.

But in the middle of a discussion like this recently, someone responded to me: “Yeah, but I’m still getting 1700 executions per second.  That’s plenty fast enough for me”

And this is true…even with the overhead of parsing, we can still crank out a pretty good execution rate.

Which leads me to thinking that the demo, which we’ve used for so long, is actually flawed.  Because in reality, we don’t run queries that are “select from one_table”.  The moment you have any sort of real application code in play, there are joins, there are views, there are security predicates…things are a lot more complicated.

So what happens to our parsing costs when we look at something closer to real application code.  Here’s a demo joining two of the Oracle dictionary views, both of which have complex definitions.  Let me stress two things in this case

  • we are not even running the SQL like the demo above.  All we are doing is parsing the query
  • we are only doing 10,000 calls, not 100,000.  This is only 10% of the size of the initial demo.  Why ?  Well… see for yourself below


SQL> set timing on
SQL> declare
  2    c number := dbms_sql.open_cursor;
  3  begin
  4    for i in 1 .. 10000 loop
  5      dbms_sql.parse(c,'select * from dba_objects o, dba_segments s
  6                        where object_id = '||i||'
  7                        and o.owner = s.owner
  8                        and o.object_name = s.segment_name' ,
  9                            dbms_sql.native);
 10    end loop;
 11    dbms_sql.close_cursor(c);
 12  end;
 13  /

PL/SQL procedure successfully completed.

Elapsed: 00:39:33.15

That’s nearly 40 minutes of my server being eaten alive by parsing costs.  Not using binds when you think about real application code is probably going to kill your app.

But to be fair – in a future post, I’ll talk about not using binds Smile