Juicing up UTL_FILE

Think about your rubbish bin for a second. Because, clearly this is going to be an oh so obvious metaphor leading into UTL_FILE right?  OK, maybe a little explanation is needed. I have a basket next to my desk into which I throw any waste paper. It is where I throw my stupid ideas and broken dreams Smile

image

Melancholy aside, once it is full I’ll take it out and empty it into the main recycling bin that is collected once a fortnight from my street front.

image

Metaphor make sense now? Still No? OK, let me bring it around to the database. When you make changes to data in the database, most of the time these changes are only made to data blocks in memory. Obviously we have to keep a permanent record of the changes in the redo log to ensure that database recovery is possible, but the changed data blocks themselves are only dumped out to disk on an intermittent basis. That way, the process of changing data in the database is nice and efficient.

That is similar to what I do with the waste paper basket. I don’t take each piece of paper out to the street front recycling bin; I wait until the waste paper basket is full and then just do the one trip. It is more efficient. (This is hopefully your light bulb moment on my metaphor skills Smile)

So back to UTL_FILE. You can take the same approach when you need to unload some data using UTL_FILE. There is an inherent overhead with every call you make to write out a line of output to a flat file using UTL_FILE. Hence, if you can minimize the number of calls you make, you’ll see some benefits.

Here’s an example of that in action. I’m building a simple CSV file based on a table called T which contains approximately 40 million rows.  Here is my first cut at the solution which I’d wager most people have written in their IT lives. We simply loop through each record and write it out to the file:


SQL> set serverout on
SQL> declare
  2    f utl_file.file_type;
  3    line varchar2(255);
  4    s timestamp;
  5  begin
  6    f := utl_file.fopen('TMP','demo.dat','w');
  7    s := systimestamp;
  8    for i in ( select * from t )
  9    loop
 10      line := i.object_id||','||i.object_name||','||i.object_type||','||i.data_object_id;
 11      utl_file.put_line(f,line);
 12    end loop;
 13    dbms_output.put_line(systimestamp-s);
 14    utl_file.fclose_all;
 15  end;
 16  /
+000000000 00:04:07.368000000

PL/SQL procedure successfully completed.

I’d contend that four minutes for 40 million records is pretty good, but maybe we want to go faster.  Based on my metaphor it might not be immediately apparent how could I perform less UTL_FILE calls? Surely I need to call it once per line of data written? But don’t forget that a “line” in a file is merely the data terminated with a end-of-line pattern specific to your platform. It could be ASCII 10 or an ASCII 13,10 pair, but either way, it is really just an extension of the data you need to write to the file. A “line” is merely our interpretation of that data where that pattern means move to the next line.

So rather than call UTL_FILE for each line, I’ll build up multiple lines and then write them out with a single UTL_FILE.PUT_LINE call. I’m choosing a cap of around 300 lines to ensure that my concatenation does not exceed the maximum size of a VARCHAR2, but as a result I should cut down on the number of UTL_FILE calls by a factor of 300.


SQL> set serverout on
SQL> declare
  2    f utl_file.file_type;
  3    line varchar2(255);
  4    big_line varchar2(32767);
  5    s timestamp;
  6    cnt pls_integer := 0;
  7  begin
  8    f := utl_file.fopen('TMP','demo2.dat','w',32767);
  9    s := systimestamp;
 10    for i in ( select * from t )
 11    loop
 12      line := i.object_id||','||i.object_name||','||i.object_type||','||i.data_object_id;
 13      cnt := cnt + 1;
 14      if cnt < 300 then
 15         big_line := big_line || line || chr(10);
 16      else
 17         utl_file.put_line(f,big_line||line);
 18         cnt := 0;
 19         big_line := null;
 20      end if;
 21    end loop;
 22    utl_file.put_line(f,big_line);
 23    dbms_output.put_line(systimestamp-s);
 24    utl_file.fclose_all;
 25  end;
 26  /
+000000000 00:02:23.297000000

PL/SQL procedure successfully completed.

That’s pretty cool. We reclaimed around 100 seconds of elapsed time just by reducing the number of UTL_FILE.PUT_LINE calls. In this particular instance, that’s about 40% but obviously your mileage may vary based on a myriad of factors – so make you sure do your own benchmarking on your own systems.

As you can see, with just a couple of lines of extra code, we can really make UTL_FILE sing from a performance perspective. Having said that, in an upcoming blog post I’ll make a bold assertion – that you probably don’t need to use UTL_FILE ever again!  Stay tuned for that one.

The simplest things….can be risky

Java and Oracle expert Lukas Eder tweeted yesterday about a potential optimization that could be done when reviewing database SQL code.

image

This looks to be a logical thing to do.  Why scan the table T twice to perform an update, when the same job could be done in a single pass.  The benefits seem obvious:

  • less I/O work
  • less time the data is spent locked
  • less risk of an error between the two operations

so don’t get me wrong – the consolidation is going to be a good thing in the majority of cases

And therein lies the rub – the “majority” of cases is the not the same as “all” cases, and that is why I don’t think a tool should ever automatically perform this change. I’d be cool with a tool making a recommendation but let’s see why you cannot just assume that the consolidation is correct.

Here’s our table with a single row and single business rule implement with a check constraint



SQL> create table t ( a int, b int );

Table created.

SQL> alter table t add constraint chk check ( a < b ) ;

Table altered.

SQL> insert into t values (1,2);

1 row created.

SQL> commit;

Commit complete.

Now I’ll implement the application in the original “unoptimized” way:


SQL> update t set a = a + 1;
update t set a = a + 1
*
ERROR at line 1:
ORA-02290: check constraint (MCDONAC.CHK) violated


SQL> update t set b = b*2;

1 row updated.

You can see that the first update failed – it violated the check constraint. Of course, it is not definitively clear whether this should be the case based on the business requirements, because I haven’t elaborated on whether these two updates should be two transactions or a single transaction. The correctness is not really the point I’m trying to make here, but that if I now choose to consolidate the update, I end up with a different application behaviour.

I’ll roll back the change above, and repeat the experiment using the consolidate update:


SQL> roll;
Rollback complete.
SQL> update t set a = a + 1, b = b*2;

1 row updated.

This time the update completes successfully. If a tool had automatically done this, then I will get a different behaviour in my application. That might be a good thing..it might not be. I could eliminate the difference by implementing the constraint in a DEFERRED usage, but we’re starting to depart even further from the existing implementation of the application code, which means more scrutiny and more regression testing.

So by all means, explore opportunities to improve the performance of your SQL by re-arranging it, consolidating it, and aiming to get more done with less work. But be careful that you do not unknowingly change the way your application works when you do so.

 

Oracle Developer Community Appreciation Day

OK, time for some controversy Smile

My ODC appreciation post is about support.oracle.com

Now I’m sure there will people in the community just itching to lose their proverbial about how they lost years of their lives waiting on the support response or similar complaint, and since I work for Oracle, they will be just as keen to jump on the bandwagon and claim that I’m just shilling for the company I work for.

But I’m not denying that our support resource could be better.  Any company’s support resources can be better. 

This post is about the value I get from support.oracle.com when answering questions on asktom.oracle.com.  Yes, there is some out-dated or incorrect content on there, but there is also some awesome content on there.  Many times I find myself quoting MOS notes when providing background information for a question on AskTOM.

So this is my thank you to the people who contribute support notes to MOS.

Because I’m pretty sure it’s a thankless task.

image