UTL_FILE_DIR and 18c

I wrote a blog post called The Death of UTL_FILE which attracted a comment from a reader:

“There is NO chance to stay at UTL_FILE as it is DESUPPORTED starting with database Version 18c”

This is not the case, but since I wanted to clarify what has changed in 18c, it warrants this small but separate blog post. When UTL_FILE first into existence in Oracle 7, the concept of directory object did not apply to UTL_FILE. Clearly we could not just let UTL_FILE to write to any destination, otherwise a malicious person could write a little PL/SQL block like this:


declare
  f utl_file.file_type;
begin
  for i in ( select 
                regexp_substr(name,'(.*)\\(.*)', 1, 1, 'i', 1) path,
                regexp_substr(name,'(.*)\\(.*)', 1, 1, 'i', 2) name
             from v$datafile 
             order by file# desc ) 
  loop
    f := utl_file.fopen(i.path,i.name,'W');
  end loop;
  end;
  utl_file.fclose_all;
end;
/

and voila! No more database. So an initialization parameter was created to nominate which directories UTL_FILE was allowed to access.  This parameter was called UTL_FILE_DIR, and the above code hopefully is justification enough to show that you should never ever set UTL_FILE_DIR to “*”, meaning it could write to wherever the OS permissions on the Oracle software account would allow it.  Bye Bye datafiles…bye bye database Smile

In more recent versions, UTL_FILE was improved so that the directory parameter could be supplied as a directory object. This is a much tighter implementation because read and write privileges on directory objects can be controlled from within the database.

In 18c, it is not UTL_FILE that has been de-supported, it is the ability to use the older style convention of hard-coded path names for the directory that is no longer allowed. You have to use directory objects. If you set the traditional UTL_FILE_DIR path in the spfile it will be ignored, and you’ll get a warning on startup.


SQL> alter system set utl_file_dir = 'c:\temp' scope=spfile;

System altered.


SQL> startup force
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.


--
-- alert log
--
Obsolete system parameters with specified values:
  utl_file_dir
End of obsolete system parameter listing

But even if you set UTL_FILE_DIR in the spfile, you will not be allowed to use OS directory paths in the UTL_FILE dir calls.


SQL> declare
  2    f utl_file.file_type;
  3  begin
  4    f := utl_file.fopen('c:\temp','demo.dat','W');
  5    utl_file.fclose_all;
  6  end;
  7  /
declare
*
ERROR at line 1:
ORA-29280: invalid directory object
ORA-06512: at "SYS.UTL_FILE", line 41
ORA-06512: at "SYS.UTL_FILE", line 478
ORA-06512: at line 4

If you prefer a video version of this explanation, I had some fun riding my bike when I talked about this when 18c first came out earlier this year.

The death of UTL_FILE

In a previous post I covered a technique to improve the performance of UTL_FILE, but concluded the post with a teaser: “you probably don’t need to use UTL_FILE ever again”.

image

Time for me to back that statement up with some concrete evidence.

UTL_FILE can read and write files. This blog post will cover the writing functionality of UTL_FILE and why I think you probably don’t need UTL_FILE for this. I’ll come back to UTL_FILE to read files in a future post.

There are two possibilities when it comes to writing a file:

  • The file is being requested by a client program and the results should be stored on a client machine. In that case, you do not use UTL_FILE anyway – you use the tools available on the client, for example, SQL Developer, SQLcl, SQL*Plus or a myriad of other tools, OR
  • the file is to be written by the database server to a location on the database server itself, or to a location that is accessible to the database server. This is where UTL_FILE has been used, the main motivations being that it can be part of existing database-resident PL/SQL code, and does not require clients to have direct SQL or OS level access to the database server.

So here is my assertion – most of the time, UTL_FILE is used to write out a file because we can’t make use an easier tool like those mentioned above directly on the database server.  After all, who wants to write code like:


SQL> declare
  2    f utl_file.file_type;
  3    line varchar2(255);
  4    s timestamp;
  5  begin
  6    f := utl_file.fopen('TMP','demo.dat','w');
  7    s := systimestamp;
  8    for i in ( select * from t )
  9    loop
 10      line := i.object_id||','||i.object_name||','||i.object_type||','||i.data_object_id;
 11      utl_file.put_line(f,line);
 12    end loop;
 13    dbms_output.put_line(systimestamp-s);
 14    utl_file.fclose_all;
 15  end;
 16  /

PL/SQL procedure successfully completed.

when most of the time if I was doing this in a tool it would be simply:


SQL> set markup csv on
SQL> spool c:\tmp\demo.dat
SQL> select * from t;
SQL> spool off

But in recent versions of the database, you can do exactly this! The database scheduler has been enhanced to be able to to run SQL*Plus style scripts directly out of the database without needing to give OS access to database server, or the SQL*Plus executable. In the previous post I unloaded 40million rows to a file in CSV format and the responsibility for formatting the data into CSV format fell to me – I had to do all of the “heavy lifting”. Using the scheduler and the SQL_SCRIPT job type, it is as simple as writing a SQL*Plus script, and submitting it as a job.


SQL> declare
  2    l_script   VARCHAR2(32767) :=
  3  'conn /@db
  4  set markup csv on
  5  set arraysize 500
  6  set pages 0
  7  set lines 200
  8  set trimspool on
  9  spool c:\tmp\demo_sched.dat
 10  select * from t;
 11  spool off';
 12  begin
 13    dbms_scheduler.create_job(
 14      job_name        => 'UNLOAD_DATA',
 15      job_type        => 'SQL_SCRIPT',
 16      job_action      => l_script,
 17      credential_name => 'MY_ACCOUNT',
 18      enabled         => true
 19    );
 20  end;
 21  /

PL/SQL procedure successfully completed.

You will want to ensure that you have some controls over the usage of this feature, and what credentials the scripts will run under.  Also in my example, I’ve got a connection wallet setup so that I do not have to code any passwords into the connection string for my scheduler job. But suddenly it has become easy to get access to the scripting tools we are used to on our own client machines, and utilize them on the database server.

“Slow down there cowboy…” I hear you exclaim. “…That might be fine for simple SELECT * scripts, but my UTL_FILE procedures have a lot of complex logic to construct the file data”.

Even if you are using UTL_FILE because you are performing some complicated algorithms to generate the data that will be written to file, you can still utilize this scheduler facility.  After your complicated logic is completed, ultimately you typically will have a line of data you need to write to a file.  And if you have a line of data, then it is trivial to port that procedure to become a pipelined function.  And once you have a pipelined function, then we have a simple query mechanism that can be used to spool the output.  For example, if my original procedure is:


SQL> create or replace
  2  procedure my_procedure is
  3  begin
  4    for each_row in ( [some set] )
  5    loop
  6    ...
  7    ... my complex logic
  8    ...
  9
 10    ...
 11    ... writing each line to file with UTL_FILE
 12    ...
 13    end loop;
 14  end;
 15  /

then we change none of the logic – we only need replace all of that messy UTL_FILE code with a simple pipe command to allow querying that function as if it was source of rows, and then spool it to a file in the usual way.


SQL> create or replace
  2  function my_function return sys.odcivarchar2list pipelined is
  3  begin
  4    for each_row in ( [some set] )
  5    loop
  6    ...
  7    ... my complex logic
  8    ...
  9
 10    ...
 11    ... pipe row ( the_line );
 12    ...
 13    end loop;
 14    return;
 15  end;
 16  /

SQL> spool c:\tmp\demo.dat
SQL> select * from my_function();
SQL> spool off

So next time you’re cranking out some cumbersome UTL_FILE code to write a file, take a moment to see if the scheduler can look after some of the work for you. I’ll talk about (not) using UTL_FILE to read files in a future post.

Juicing up UTL_FILE

Think about your rubbish bin for a second. Because, clearly this is going to be an oh so obvious metaphor leading into UTL_FILE right?  OK, maybe a little explanation is needed. I have a basket next to my desk into which I throw any waste paper. It is where I throw my stupid ideas and broken dreams Smile

image

Melancholy aside, once it is full I’ll take it out and empty it into the main recycling bin that is collected once a fortnight from my street front.

image

Metaphor make sense now? Still No? OK, let me bring it around to the database. When you make changes to data in the database, most of the time these changes are only made to data blocks in memory. Obviously we have to keep a permanent record of the changes in the redo log to ensure that database recovery is possible, but the changed data blocks themselves are only dumped out to disk on an intermittent basis. That way, the process of changing data in the database is nice and efficient.

That is similar to what I do with the waste paper basket. I don’t take each piece of paper out to the street front recycling bin; I wait until the waste paper basket is full and then just do the one trip. It is more efficient. (This is hopefully your light bulb moment on my metaphor skills Smile)

So back to UTL_FILE. You can take the same approach when you need to unload some data using UTL_FILE. There is an inherent overhead with every call you make to write out a line of output to a flat file using UTL_FILE. Hence, if you can minimize the number of calls you make, you’ll see some benefits.

Here’s an example of that in action. I’m building a simple CSV file based on a table called T which contains approximately 40 million rows.  Here is my first cut at the solution which I’d wager most people have written in their IT lives. We simply loop through each record and write it out to the file:


SQL> set serverout on
SQL> declare
  2    f utl_file.file_type;
  3    line varchar2(255);
  4    s timestamp;
  5  begin
  6    f := utl_file.fopen('TMP','demo.dat','w');
  7    s := systimestamp;
  8    for i in ( select * from t )
  9    loop
 10      line := i.object_id||','||i.object_name||','||i.object_type||','||i.data_object_id;
 11      utl_file.put_line(f,line);
 12    end loop;
 13    dbms_output.put_line(systimestamp-s);
 14    utl_file.fclose_all;
 15  end;
 16  /
+000000000 00:04:07.368000000

PL/SQL procedure successfully completed.

I’d contend that four minutes for 40 million records is pretty good, but maybe we want to go faster.  Based on my metaphor it might not be immediately apparent how could I perform less UTL_FILE calls? Surely I need to call it once per line of data written? But don’t forget that a “line” in a file is merely the data terminated with a end-of-line pattern specific to your platform. It could be ASCII 10 or an ASCII 13,10 pair, but either way, it is really just an extension of the data you need to write to the file. A “line” is merely our interpretation of that data where that pattern means move to the next line.

So rather than call UTL_FILE for each line, I’ll build up multiple lines and then write them out with a single UTL_FILE.PUT_LINE call. I’m choosing a cap of around 300 lines to ensure that my concatenation does not exceed the maximum size of a VARCHAR2, but as a result I should cut down on the number of UTL_FILE calls by a factor of 300.


SQL> set serverout on
SQL> declare
  2    f utl_file.file_type;
  3    line varchar2(255);
  4    big_line varchar2(32767);
  5    s timestamp;
  6    cnt pls_integer := 0;
  7  begin
  8    f := utl_file.fopen('TMP','demo2.dat','w',32767);
  9    s := systimestamp;
 10    for i in ( select * from t )
 11    loop
 12      line := i.object_id||','||i.object_name||','||i.object_type||','||i.data_object_id;
 13      cnt := cnt + 1;
 14      if cnt < 300 then
 15         big_line := big_line || line || chr(10);
 16      else
 17         utl_file.put_line(f,big_line||line);
 18         cnt := 0;
 19         big_line := null;
 20      end if;
 21    end loop;
 22    utl_file.put_line(f,big_line);
 23    dbms_output.put_line(systimestamp-s);
 24    utl_file.fclose_all;
 25  end;
 26  /
+000000000 00:02:23.297000000

PL/SQL procedure successfully completed.

That’s pretty cool. We reclaimed around 100 seconds of elapsed time just by reducing the number of UTL_FILE.PUT_LINE calls. In this particular instance, that’s about 40% but obviously your mileage may vary based on a myriad of factors – so make you sure do your own benchmarking on your own systems.

As you can see, with just a couple of lines of extra code, we can really make UTL_FILE sing from a performance perspective. Having said that, in an upcoming blog post I’ll make a bold assertion – that you probably don’t need to use UTL_FILE ever again!  Stay tuned for that one.