Think about your rubbish bin for a second. Because, clearly this is going to be an oh so obvious metaphor leading into UTL_FILE right? OK, maybe a little explanation is needed. I have a basket next to my desk into which I throw any waste paper. It is where I throw my stupid ideas and broken dreams
Melancholy aside, once it is full I’ll take it out and empty it into the main recycling bin that is collected once a fortnight from my street front.
Metaphor make sense now? Still No? OK, let me bring it around to the database. When you make changes to data in the database, most of the time these changes are only made to data blocks in memory. Obviously we have to keep a permanent record of the changes in the redo log to ensure that database recovery is possible, but the changed data blocks themselves are only dumped out to disk on an intermittent basis. That way, the process of changing data in the database is nice and efficient.
That is similar to what I do with the waste paper basket. I don’t take each piece of paper out to the street front recycling bin; I wait until the waste paper basket is full and then just do the one trip. It is more efficient. (This is hopefully your light bulb moment on my metaphor skills )
So back to UTL_FILE. You can take the same approach when you need to unload some data using UTL_FILE. There is an inherent overhead with every call you make to write out a line of output to a flat file using UTL_FILE. Hence, if you can minimize the number of calls you make, you’ll see some benefits.
Here’s an example of that in action. I’m building a simple CSV file based on a table called T which contains approximately 40 million rows. Here is my first cut at the solution which I’d wager most people have written in their IT lives. We simply loop through each record and write it out to the file:
SQL> set serverout on SQL> declare 2 f utl_file.file_type; 3 line varchar2(255); 4 s timestamp; 5 begin 6 f := utl_file.fopen('TMP','demo.dat','w'); 7 s := systimestamp; 8 for i in ( select * from t ) 9 loop 10 line := i.object_id||','||i.object_name||','||i.object_type||','||i.data_object_id; 11 utl_file.put_line(f,line); 12 end loop; 13 dbms_output.put_line(systimestamp-s); 14 utl_file.fclose_all; 15 end; 16 / +000000000 00:04:07.368000000 PL/SQL procedure successfully completed.
I’d contend that four minutes for 40 million records is pretty good, but maybe we want to go faster. Based on my metaphor it might not be immediately apparent how could I perform less UTL_FILE calls? Surely I need to call it once per line of data written? But don’t forget that a “line” in a file is merely the data terminated with a end-of-line pattern specific to your platform. It could be ASCII 10 or an ASCII 13,10 pair, but either way, it is really just an extension of the data you need to write to the file. A “line” is merely our interpretation of that data where that pattern means move to the next line.
So rather than call UTL_FILE for each line, I’ll build up multiple lines and then write them out with a single UTL_FILE.PUT_LINE call. I’m choosing a cap of around 300 lines to ensure that my concatenation does not exceed the maximum size of a VARCHAR2, but as a result I should cut down on the number of UTL_FILE calls by a factor of 300.
SQL> set serverout on SQL> declare 2 f utl_file.file_type; 3 line varchar2(255); 4 big_line varchar2(32767); 5 s timestamp; 6 cnt pls_integer := 0; 7 begin 8 f := utl_file.fopen('TMP','demo2.dat','w',32767); 9 s := systimestamp; 10 for i in ( select * from t ) 11 loop 12 line := i.object_id||','||i.object_name||','||i.object_type||','||i.data_object_id; 13 cnt := cnt + 1; 14 if cnt < 300 then 15 big_line := big_line || line || chr(10); 16 else 17 utl_file.put_line(f,big_line||line); 18 cnt := 0; 19 big_line := null; 20 end if; 21 end loop; 22 utl_file.put_line(f,big_line); 23 dbms_output.put_line(systimestamp-s); 24 utl_file.fclose_all; 25 end; 26 / +000000000 00:02:23.297000000 PL/SQL procedure successfully completed.
That’s pretty cool. We reclaimed around 100 seconds of elapsed time just by reducing the number of UTL_FILE.PUT_LINE calls. In this particular instance, that’s about 40% but obviously your mileage may vary based on a myriad of factors – so make you sure do your own benchmarking on your own systems.
As you can see, with just a couple of lines of extra code, we can really make UTL_FILE sing from a performance perspective. Having said that, in an upcoming blog post I’ll make a bold assertion – that you probably don’t need to use UTL_FILE ever again! Stay tuned for that one.