Juicing up UTL_FILE

Think about your rubbish bin for a second. Because, clearly this is going to be an oh so obvious metaphor leading into UTL_FILE right?  OK, maybe a little explanation is needed. I have a basket next to my desk into which I throw any waste paper. It is where I throw my stupid ideas and broken dreams Smile

image

Melancholy aside, once it is full I’ll take it out and empty it into the main recycling bin that is collected once a fortnight from my street front.

image

Metaphor make sense now? Still No? OK, let me bring it around to the database. When you make changes to data in the database, most of the time these changes are only made to data blocks in memory. Obviously we have to keep a permanent record of the changes in the redo log to ensure that database recovery is possible, but the changed data blocks themselves are only dumped out to disk on an intermittent basis. That way, the process of changing data in the database is nice and efficient.

That is similar to what I do with the waste paper basket. I don’t take each piece of paper out to the street front recycling bin; I wait until the waste paper basket is full and then just do the one trip. It is more efficient. (This is hopefully your light bulb moment on my metaphor skills Smile)

So back to UTL_FILE. You can take the same approach when you need to unload some data using UTL_FILE. There is an inherent overhead with every call you make to write out a line of output to a flat file using UTL_FILE. Hence, if you can minimize the number of calls you make, you’ll see some benefits.

Here’s an example of that in action. I’m building a simple CSV file based on a table called T which contains approximately 40 million rows.  Here is my first cut at the solution which I’d wager most people have written in their IT lives. We simply loop through each record and write it out to the file:


SQL> set serverout on
SQL> declare
  2    f utl_file.file_type;
  3    line varchar2(255);
  4    s timestamp;
  5  begin
  6    f := utl_file.fopen('TMP','demo.dat','w');
  7    s := systimestamp;
  8    for i in ( select * from t )
  9    loop
 10      line := i.object_id||','||i.object_name||','||i.object_type||','||i.data_object_id;
 11      utl_file.put_line(f,line);
 12    end loop;
 13    dbms_output.put_line(systimestamp-s);
 14    utl_file.fclose_all;
 15  end;
 16  /
+000000000 00:04:07.368000000

PL/SQL procedure successfully completed.

I’d contend that four minutes for 40 million records is pretty good, but maybe we want to go faster.  Based on my metaphor it might not be immediately apparent how could I perform less UTL_FILE calls? Surely I need to call it once per line of data written? But don’t forget that a “line” in a file is merely the data terminated with a end-of-line pattern specific to your platform. It could be ASCII 10 or an ASCII 13,10 pair, but either way, it is really just an extension of the data you need to write to the file. A “line” is merely our interpretation of that data where that pattern means move to the next line.

So rather than call UTL_FILE for each line, I’ll build up multiple lines and then write them out with a single UTL_FILE.PUT_LINE call. I’m choosing a cap of around 300 lines to ensure that my concatenation does not exceed the maximum size of a VARCHAR2, but as a result I should cut down on the number of UTL_FILE calls by a factor of 300.


SQL> set serverout on
SQL> declare
  2    f utl_file.file_type;
  3    line varchar2(255);
  4    big_line varchar2(32767);
  5    s timestamp;
  6    cnt pls_integer := 0;
  7  begin
  8    f := utl_file.fopen('TMP','demo2.dat','w',32767);
  9    s := systimestamp;
 10    for i in ( select * from t )
 11    loop
 12      line := i.object_id||','||i.object_name||','||i.object_type||','||i.data_object_id;
 13      cnt := cnt + 1;
 14      if cnt < 300 then
 15         big_line := big_line || line || chr(10);
 16      else
 17         utl_file.put_line(f,big_line||line);
 18         cnt := 0;
 19         big_line := null;
 20      end if;
 21    end loop;
 22    utl_file.put_line(f,big_line);
 23    dbms_output.put_line(systimestamp-s);
 24    utl_file.fclose_all;
 25  end;
 26  /
+000000000 00:02:23.297000000

PL/SQL procedure successfully completed.

That’s pretty cool. We reclaimed around 100 seconds of elapsed time just by reducing the number of UTL_FILE.PUT_LINE calls. In this particular instance, that’s about 40% but obviously your mileage may vary based on a myriad of factors – so make you sure do your own benchmarking on your own systems.

As you can see, with just a couple of lines of extra code, we can really make UTL_FILE sing from a performance perspective. Having said that, in an upcoming blog post I’ll make a bold assertion – that you probably don’t need to use UTL_FILE ever again!  Stay tuned for that one.

5 thoughts on “Juicing up UTL_FILE

    • “… which is not true for the SQL*Plus method (directly).”

      Have a look at:

      DBMS_SCHEDULER.create_job(
      job_name …
      job_type => ‘SQL_SCRIPT’,

      It understands sqlplus s` commands.

      • Less known, faster (at least on Windows) than UTL_FILE and closer by design than sqlplus spool is:

        DBMS_SCHEDULER.PUT_FILE (
        destination_file IN VARCHAR2,
        destination_host IN VARCHAR2,
        credential_name IN VARCHAR2,
        file_contents IN {BLOB|CLOB},
        destination_permissions IN VARCHAR2 DEFAULT NULL)

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.