The death of UTL_FILE

In a previous post I covered a technique to improve the performance of UTL_FILE, but concluded the post with a teaser: “you probably don’t need to use UTL_FILE ever again”.

image

Time for me to back that statement up with some concrete evidence.

UTL_FILE can read and write files. This blog post will cover the writing functionality of UTL_FILE and why I think you probably don’t need UTL_FILE for this. I’ll come back to UTL_FILE to read files in a future post.

There are two possibilities when it comes to writing a file:

  • The file is being requested by a client program and the results should be stored on a client machine. In that case, you do not use UTL_FILE anyway – you use the tools available on the client, for example, SQL Developer, SQLcl, SQL*Plus or a myriad of other tools, OR
  • the file is to be written by the database server to a location on the database server itself, or to a location that is accessible to the database server. This is where UTL_FILE has been used, the main motivations being that it can be part of existing database-resident PL/SQL code, and does not require clients to have direct SQL or OS level access to the database server.

So here is my assertion – most of the time, UTL_FILE is used to write out a file because we can’t make use an easier tool like those mentioned above directly on the database server.  After all, who wants to write code like:


SQL> declare
  2    f utl_file.file_type;
  3    line varchar2(255);
  4    s timestamp;
  5  begin
  6    f := utl_file.fopen('TMP','demo.dat','w');
  7    s := systimestamp;
  8    for i in ( select * from t )
  9    loop
 10      line := i.object_id||','||i.object_name||','||i.object_type||','||i.data_object_id;
 11      utl_file.put_line(f,line);
 12    end loop;
 13    dbms_output.put_line(systimestamp-s);
 14    utl_file.fclose_all;
 15  end;
 16  /

PL/SQL procedure successfully completed.

when most of the time if I was doing this in a tool it would be simply:


SQL> set markup csv on
SQL> spool c:\tmp\demo.dat
SQL> select * from t;
SQL> spool off

But in recent versions of the database, you can do exactly this! The database scheduler has been enhanced to be able to to run SQL*Plus style scripts directly out of the database without needing to give OS access to database server, or the SQL*Plus executable. In the previous post I unloaded 40million rows to a file in CSV format and the responsibility for formatting the data into CSV format fell to me – I had to do all of the “heavy lifting”. Using the scheduler and the SQL_SCRIPT job type, it is as simple as writing a SQL*Plus script, and submitting it as a job.


SQL> declare
  2    l_script   VARCHAR2(32767) :=
  3  'conn /@db
  4  set markup csv on
  5  set arraysize 500
  6  set pages 0
  7  set lines 200
  8  set trimspool on
  9  spool c:\tmp\demo_sched.dat
 10  select * from t;
 11  spool off';
 12  begin
 13    dbms_scheduler.create_job(
 14      job_name        => 'UNLOAD_DATA',
 15      job_type        => 'SQL_SCRIPT',
 16      job_action      => l_script,
 17      credential_name => 'MY_ACCOUNT',
 18      enabled         => true
 19    );
 20  end;
 21  /

PL/SQL procedure successfully completed.

You will want to ensure that you have some controls over the usage of this feature, and what credentials the scripts will run under.  Also in my example, I’ve got a connection wallet setup so that I do not have to code any passwords into the connection string for my scheduler job. But suddenly it has become easy to get access to the scripting tools we are used to on our own client machines, and utilize them on the database server.

“Slow down there cowboy…” I hear you exclaim. “…That might be fine for simple SELECT * scripts, but my UTL_FILE procedures have a lot of complex logic to construct the file data”.

Even if you are using UTL_FILE because you are performing some complicated algorithms to generate the data that will be written to file, you can still utilize this scheduler facility.  After your complicated logic is completed, ultimately you typically will have a line of data you need to write to a file.  And if you have a line of data, then it is trivial to port that procedure to become a pipelined function.  And once you have a pipelined function, then we have a simple query mechanism that can be used to spool the output.  For example, if my original procedure is:


SQL> create or replace
  2  procedure my_procedure is
  3  begin
  4    for each_row in ( [some set] )
  5    loop
  6    ...
  7    ... my complex logic
  8    ...
  9
 10    ...
 11    ... writing each line to file with UTL_FILE
 12    ...
 13    end loop;
 14  end;
 15  /

then we change none of the logic – we only need replace all of that messy UTL_FILE code with a simple pipe command to allow querying that function as if it was source of rows, and then spool it to a file in the usual way.


SQL> create or replace
  2  function my_function return sys.odcivarchar2list pipelined is
  3  begin
  4    for each_row in ( [some set] )
  5    loop
  6    ...
  7    ... my complex logic
  8    ...
  9
 10    ...
 11    ... pipe row ( the_line );
 12    ...
 13    end loop;
 14    return;
 15  end;
 16  /

SQL> spool c:\tmp\demo.dat
SQL> select * from my_function();
SQL> spool off

So next time you’re cranking out some cumbersome UTL_FILE code to write a file, take a moment to see if the scheduler can look after some of the work for you. I’ll talk about (not) using UTL_FILE to read files in a future post.

When WHEN went faster

Yeah…try saying that blog post title 10 times in a row as fast as you can Smile

But since we’re talking about doing things fast, this is just a quick post about a conversation I had a twitter yesterday about the WHEN clause in a trigger.

 

image

That is an easy benchmark to whip up – I just need a couple of tables, each with a simple a trigger differing only by their usage of the WHEN clause.  Here is my setup:


SQL> create table t1 ( x int );

Table created.

SQL> create table t2 ( x int );

Table created.

SQL> create or replace
  2  trigger TRG1
  3  before insert on t1
  4  for each row
  5  begin
  6    if sys_context('USERENV','SID') = 0 then
  7       :new.x := 0;
  8    end if;
  9  end;
 10  /

Trigger created.

SQL> create or replace
  2  trigger TRG2
  3  before insert on t2
  4  for each row
  5  when ( sys_context('USERENV','SID') = 0 )
  6  begin
  7     :new.x := 0;
  8  end;
  9  /

Trigger created.

I’m using an always false condition so the content of the trigger will never be fired (except for the WHEN clause and the IF-test).  Now I’m ready to slam a bunch of inserts into each table and measure the performance.


SQL> set timing on
SQL> insert into t1
  2  select 1 from dual
  3  connect by level <= 1000000;

1000000 rows created.

Elapsed: 00:00:02.52

SQL> insert into t2
  2  select 1 from dual
  3  connect by level <= 1000000;

1000000 rows created.

Elapsed: 00:00:00.41

There is an immediate urge here to go screaming from the roof tops that WHEN is the all conquering hero of this benchmark. And yes, it is indeed faster but make sure you keep the results in perspective.  We gained back about 2 seconds for 1 million inserts. For me, once I’m inserting millions of rows into a table I’m starting to think not about trigger performance but whether this is the kind of table I want to be having triggers on at all. 

Juicing up UTL_FILE

Think about your rubbish bin for a second. Because, clearly this is going to be an oh so obvious metaphor leading into UTL_FILE right?  OK, maybe a little explanation is needed. I have a basket next to my desk into which I throw any waste paper. It is where I throw my stupid ideas and broken dreams Smile

image

Melancholy aside, once it is full I’ll take it out and empty it into the main recycling bin that is collected once a fortnight from my street front.

image

Metaphor make sense now? Still No? OK, let me bring it around to the database. When you make changes to data in the database, most of the time these changes are only made to data blocks in memory. Obviously we have to keep a permanent record of the changes in the redo log to ensure that database recovery is possible, but the changed data blocks themselves are only dumped out to disk on an intermittent basis. That way, the process of changing data in the database is nice and efficient.

That is similar to what I do with the waste paper basket. I don’t take each piece of paper out to the street front recycling bin; I wait until the waste paper basket is full and then just do the one trip. It is more efficient. (This is hopefully your light bulb moment on my metaphor skills Smile)

So back to UTL_FILE. You can take the same approach when you need to unload some data using UTL_FILE. There is an inherent overhead with every call you make to write out a line of output to a flat file using UTL_FILE. Hence, if you can minimize the number of calls you make, you’ll see some benefits.

Here’s an example of that in action. I’m building a simple CSV file based on a table called T which contains approximately 40 million rows.  Here is my first cut at the solution which I’d wager most people have written in their IT lives. We simply loop through each record and write it out to the file:


SQL> set serverout on
SQL> declare
  2    f utl_file.file_type;
  3    line varchar2(255);
  4    s timestamp;
  5  begin
  6    f := utl_file.fopen('TMP','demo.dat','w');
  7    s := systimestamp;
  8    for i in ( select * from t )
  9    loop
 10      line := i.object_id||','||i.object_name||','||i.object_type||','||i.data_object_id;
 11      utl_file.put_line(f,line);
 12    end loop;
 13    dbms_output.put_line(systimestamp-s);
 14    utl_file.fclose_all;
 15  end;
 16  /
+000000000 00:04:07.368000000

PL/SQL procedure successfully completed.

I’d contend that four minutes for 40 million records is pretty good, but maybe we want to go faster.  Based on my metaphor it might not be immediately apparent how could I perform less UTL_FILE calls? Surely I need to call it once per line of data written? But don’t forget that a “line” in a file is merely the data terminated with a end-of-line pattern specific to your platform. It could be ASCII 10 or an ASCII 13,10 pair, but either way, it is really just an extension of the data you need to write to the file. A “line” is merely our interpretation of that data where that pattern means move to the next line.

So rather than call UTL_FILE for each line, I’ll build up multiple lines and then write them out with a single UTL_FILE.PUT_LINE call. I’m choosing a cap of around 300 lines to ensure that my concatenation does not exceed the maximum size of a VARCHAR2, but as a result I should cut down on the number of UTL_FILE calls by a factor of 300.


SQL> set serverout on
SQL> declare
  2    f utl_file.file_type;
  3    line varchar2(255);
  4    big_line varchar2(32767);
  5    s timestamp;
  6    cnt pls_integer := 0;
  7  begin
  8    f := utl_file.fopen('TMP','demo2.dat','w',32767);
  9    s := systimestamp;
 10    for i in ( select * from t )
 11    loop
 12      line := i.object_id||','||i.object_name||','||i.object_type||','||i.data_object_id;
 13      cnt := cnt + 1;
 14      if cnt < 300 then
 15         big_line := big_line || line || chr(10);
 16      else
 17         utl_file.put_line(f,big_line||line);
 18         cnt := 0;
 19         big_line := null;
 20      end if;
 21    end loop;
 22    utl_file.put_line(f,big_line);
 23    dbms_output.put_line(systimestamp-s);
 24    utl_file.fclose_all;
 25  end;
 26  /
+000000000 00:02:23.297000000

PL/SQL procedure successfully completed.

That’s pretty cool. We reclaimed around 100 seconds of elapsed time just by reducing the number of UTL_FILE.PUT_LINE calls. In this particular instance, that’s about 40% but obviously your mileage may vary based on a myriad of factors – so make you sure do your own benchmarking on your own systems.

As you can see, with just a couple of lines of extra code, we can really make UTL_FILE sing from a performance perspective. Having said that, in an upcoming blog post I’ll make a bold assertion – that you probably don’t need to use UTL_FILE ever again!  Stay tuned for that one.

“Call me!” Many many times!

Some readers might recall that classic Blondie track “Call me”.  Of course, some readers might be wishing that I wouldn’t harp on about great songs from the 80’s. But bear with me, there is a (very tenuous) link to this post. If you haven’t heard the song, you can jump to the chorus right here.  Go on, I’ll wait until you get back. Smile

This golden oldie is relevant when it comes to dealing with object types in PL/SQL, and in particular, when you are using them in a SQL-related context.  To set the scene, I’ll start with a trivial example – creating a simple object, utilizing that object in a function, and then using a simple SELECT-FROM-DUAL on that function to check that it works.


SQL> create or replace
  2  type three_values as object
  3   ( val_1 int,
  4     val_2 int,
  5     val_3 int
  6   );
  7  /

Type created.

SQL>
SQL> create or replace
  2  function f return three_values is
  3  begin
  4     return three_values(1,2,3);
  5  end;
  6  /

Function created.

SQL>
SQL> select f from dual;

F(VAL_1, VAL_2, VAL_3)
-----------------------------------------
THREE_VALUES(1, 2, 3)

So far so good. But as is often the case, the requirement is not to get the object as an entity in it’s own right, but to get at the scalar values within the object. Hence the wrapping of that query as an inline view to an outer one to get the individual columns:


SQL> select x.c.val_1, x.c.val_2, x.c.val_3
  2  from
  3  (select f c from dual ) x;

   C.VAL_1    C.VAL_2    C.VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

But let’s explore that a little more carefully. Observation of the SQL text might lead us to think that

  • we ran the inner query,
  • saved the result we’ve already seen into a temporary result of type THREE_VALUES,
  • then extracted the elements via the outer SELECT.

But now I’ll make a small amendment to the function as follows. I’ll introduce a package variable so we can do some tracking of executions


SQL> create or replace package pkg as
  2    x int := 0;
  3  end;
  4  /

Package created.

SQL>
SQL> create or replace
  2  function f return three_values is
  3  begin
  4     pkg.x:= pkg.x+1;
  5     return three_values(1,2,3);
  6  end;
  7  /

Function created.

As you would expect, a simple SELECT-FROM-DUAL results in the function being executed once.


SQL> select f from dual;

F(VAL_1, VAL_2, VAL_3)
-----------------------------------------------------
THREE_VALUES(1, 2, 3)

1 row selected.

SQL> set serverout on
SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=1

PL/SQL procedure successfully completed.

As before, I’ll now wrap that query in another SELECT to extract the column elements. I’ll reset my package variable to start the count again from zero.


SQL> exec pkg.x := 0;

PL/SQL procedure successfully completed.

SQL> select x.c.val_1, x.c.val_2, x.c.val_3
  2  from
  3  (select f c from dual ) x;

   C.VAL_1    C.VAL_2    C.VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

SQL> set serverout on
SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=3

PL/SQL procedure successfully completed.

Notice the difference. The function was called three times. If that function was doing some “heavy lifting” then those additional executions might be a cause for concern. The number of calls to a function referenced from SQL has always been indeterminate, so this is nothing new, but it still might catch you by surprise. References throughout the SQL to those function scalar values can bump the count up even more:


SQL> exec pkg.x := 0;

PL/SQL procedure successfully completed.

SQL> select x.c.val_1, x.c.val_2, x.c.val_3
  2  from
  3  (select f c from dual ) x
  4  where x.c.val_1 = 1 and x.c.val_2 = 2;

   C.VAL_1    C.VAL_2    C.VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

SQL> set serverout on
SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=5

but conversely, you cannot just simply assume that every reference will result in an additional execution. For example, adding an ORDER BY clause containing references does not increment the count.


SQL> exec pkg.x := 0;

PL/SQL procedure successfully completed.

SQL> select x.c.val_1, x.c.val_2, x.c.val_3
  2  from
  3  (select f c from dual ) x
  4  where x.c.val_1 = 1 and x.c.val_1+10 = 11
  5  order by x.c.val_1, x.c.val_2, x.c.val_3;

   C.VAL_1    C.VAL_2    C.VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

SQL> set serverout on
SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=5

PL/SQL procedure successfully completed.

A 10053 trace on any of these queries reveals the reason why we can expect to see multiple executions. When I parse my initial query


select x.c.val_1, x.c.val_2, x.c.val_3
from
(select f c from dual ) x

then scrolling through the 10053 trace, I’ll end up with this:


Final query after transformations:******* UNPARSED QUERY IS *******
SELECT SYS_OP_ATG("SCOTT"."F"(),1,2,2) "C.VAL_1",SYS_OP_ATG("SCOTT"."F"(),2,3,2) "C.VAL_2",SYS_OP_ATG("SCOTT"."F"(),3,4,2) "C.VAL_3" 
FROM "SYS"."DUAL" "DUAL"

You can see that the query has been transformed to have explicit calls of the function for each column we requested.

Can the multiple calls be avoided? Yes. There’s a couple of simple options to explore here. First of these, is that you can use a scalar subquery to take advantage of some query level caching that the database will do to avoid repeated executions of query sub-components.  Here is the previous examples repeated but with a scalar subselect to access the function.


SQL> exec pkg.x := 0;

PL/SQL procedure successfully completed.

SQL> select x.c.val_1, x.c.val_2, x.c.val_3
  2  from
  3  (select ( select f from dual ) c
  4   from dual ) x;

   C.VAL_1    C.VAL_2    C.VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

SQL>
SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=1

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL>
SQL> exec pkg.x := 0;

PL/SQL procedure successfully completed.

SQL> select x.c.val_1, x.c.val_2, x.c.val_3
  2  from
  3  (select ( select f from dual ) c
  4   from dual ) x
  5  where x.c.val_1 = 1 and x.c.val_2 = 2;

   C.VAL_1    C.VAL_2    C.VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

SQL>
SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=1

PL/SQL procedure successfully completed.

Alternately, since we are returning an object as a table, then we can reflect this by adding an additional nested table object type to handle this within the function:


SQL> create or replace
  2  type three_val_list as
  3   table of three_values
  4  /

Type created.

SQL>
SQL> create or replace
  2  function f return three_val_list is
  3  begin
  4     pkg.x:= pkg.x+1;
  5     return three_val_list(three_values(1,2,3));
  6  end;
  7  /

Function created.

SQL> exec pkg.x := 0;

PL/SQL procedure successfully completed.

SQL> select * from f();

     VAL_1      VAL_2      VAL_3
---------- ---------- ----------
         1          2          3

1 row selected.

SQL> exec dbms_output.put_line('calls='||pkg.x);
calls=1

PL/SQL procedure successfully completed.

Before you get carried away and start screaming “Performance Bug!” from the rooftops, the moral of the story here is probably to first ask yourself: Is there a valid reason for referencing my OBJECT data types via SQL? Very often I see the instantiation of any variable (complex or scalar) being done with a SELECT-FROM-DUAL. Don’t do it – it is almost never needed.

But, if you have truly embraced the Object-Relational features and are genuinely gathering object types from SQL statements, then consider some of the workarounds above to avoid excessive function calls.

Concurrency … the path to success and the path the failure

Let’s face it. Concurrency is a good thing when it comes to database applications. After all, if there is only a single user of your application, then chances are, it is not a successful application Smile.  Of course there are exceptions to this rule, but by and large, most of the applications we build are going to be used by large populations of users.  And given the recent publicity of users data and privacy, we can also be pretty confident that we want the data in our applications to be correct at all times.

As a developer, it is absolutely critical to keep concurrency in mind when building applications.  Here is a simple demonstration to thrust home the point. 

The task assigned to the developer here is simple – transfer move all of the rows satisfying a particular condition from table T1 to table T2.  Let’s create our database objects first:


SQL> create table t1 ( x int, y int, z int );

Table created.

SQL> create table t2 as select * from t1 where 1=0;

Table created.

SQL>
SQL> insert into t1
  2  select rownum, rownum, rownum
  3  from dual connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

So table T1 has 10,000 rows and T2 is empty, and our condition to determine which rows to transfer will be X being non-zero.  That happens to be all of the rows, but of course, it could be any filtering rule on the existing data. I’ve chosen all of the rows so I can keep the numbers easy to digest in the upcoming examples.

There is no such thing as a MOVE command in SQL, so we can implement this with two statements: 

  • INSERT-SELECT to copy the rows from T1 to T2
  • DELETE to remove the rows we just copied.

Here is the simple PLSQL routine to do that, and a first test run of the code.


SQL>
SQL> set serverout on
SQL> begin
  2      insert into t2 select * from t1 where x != 0;
  3      delete from t1 where x != 0;
  4      commit;
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from t1;

  COUNT(*)
----------
         0

1 row selected.

SQL> select count(*) from t2;

  COUNT(*)
----------
     10000

1 row selected.

SQL>

Everything looks fine doesn’t it? But it’s not.  In fact, I’ll do what any good developer should be doing, and add some instrumentation to the code to get more information about its operations.


SQL> create table t1 ( x int, y int, z int );

Table created.

SQL> create table t2 as select * from t1 where 1=0;

Table created.

SQL>
SQL> insert into t1
  2  select rownum, rownum, rownum
  3  from dual connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> set serverout on
SQL> begin
  2      insert into t2 select * from t1 where x != 0;
  3      dbms_output.put_line(sql%rowcount);           
  4      delete from t1 where x != 0;
  5      dbms_output.put_line(sql%rowcount);
  6      commit;
  7  end;
  8  /
10000
10100      <=========== !!!!!

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from t1;

  COUNT(*)
----------
         0

1 row selected.

SQL> select count(*) from t2;

  COUNT(*)
----------
     10000

1 row selected.

SQL>
SQL>

Look at the output from the PL/SQL block.  I copied(inserted ) 10,000 rows from T1 to T2, but the deletion phase removed more than 10,000 rows. How was that possible?  It is because databases are about concurrency.  In another session, I had a simple script running that just adds new rows to the table continuously.  As a developer, it is vital that you do not assume that you are the only person accessing or manipulating a table.  My code in its existing form has corrupted the database, because 100 rows have been deleted that were never copied to T2.  They’re gone forever!

So….onto some potential solutions.  Lets try locking the rows first


SQL> create table t1 ( x int, y int, z int );

Table created.

SQL> create table t2 as select * from t1 where 1=0;

Table created.

SQL>
SQL> insert into t1
  2  select rownum, rownum, rownum
  3  from dual connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> set serverout on
SQL> declare
  2    cursor c is select * from t1 where x != 0 FOR UPDATE;
  3  begin
  4    open c;
  5    insert into t2 select * from t1 where x != 0;
  6    dbms_output.put_line(sql%rowcount);               
  7    delete from t1 where x != 0;
  8    dbms_output.put_line(sql%rowcount);
  9    commit;
 10    close c;
 11  end;
 12  /
10000
10100

PL/SQL procedure successfully completed.

SQL>
SQL> select count(*) from t1;

  COUNT(*)
----------
         0

1 row selected.

SQL> select count(*) from t2;

  COUNT(*)
----------
     10000

1 row selected.

As you can see, that has not yielded any benefit because locking is only of benefit for locking existing rows.  The new rows arriving from a separate session are unhindered, and hence my DELETE statement still picks up the new rows when it should not. I could workaround this by locking the entire table, but what if my filtering condition was only a small percentage of the rows?  It seems overly aggressive to lock everyone out of the table just to copy a small subset of rows.  In this case, I can use PL/SQL to help me out. Using the BULK COLLECT / BULK BIND facilities, I can restrict my DML operations to just the ROWIDs of the rows I am copying. 


SQL> create table t1 ( x int, y int, z int );

Table created.

SQL> create table t2 as select * from t1 where 1=0;

Table created.

SQL>
SQL> insert into t1
  2  select rownum, rownum, rownum
  3  from dual connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> set serverout on
SQL> declare
  2    cursor c is select rowid rid, t1.* from t1 where x != 0 for update;
  3    type array is table of c%rowtype index by pls_integer;
  4    l_rows array;
  5    l_idx int;
  6  begin
  7    open c;
  8    loop
  9      fetch c bulk collect into l_rows limit 20000;
 10      exit when l_rows.count = 0;
 11
 12      forall i in 1 .. l_rows.count
 13         insert into t2 values (
 14            l_rows(i).x,
 15            l_rows(i).y,
 16            l_rows(i).z );
 17      dbms_output.put_line(sql%rowcount);       
 18      forall i in 1 .. l_rows.count
 19        delete from t1
 20        where rowid = l_rows(i).rid;
 21      dbms_output.put_line(sql%rowcount);
 22    end loop;
 23  end;
 24  /
10000
10000

PL/SQL procedure successfully completed.

SQL>
SQL>
SQL> select count(*) from t1;

  COUNT(*)
----------
       147

1 row selected.

SQL> select count(*) from t2;

  COUNT(*)
----------
     10000

1 row selected.

I am collecting the 10000 ROWID’s of interest into an array, and then deleting only those rows.  So even though new rows are arriving, I will never see them because they will not be in the set of ROWIDs that I am moving to table T2.  We can see that during the operation, 147 new rows arrived in T1 from other sessions.

(Note:  Normally you would choose a bulk collect size of perhaps 1000 or similar, and loop around repeated to avoid excessive PGA use.  I bulk collected the entire set to show the rowcounts more intuitively)

You might be thinking that perhaps we no longer need the “FOR UPDATE” clause on the cursor, but it is still a required element.  Without it, it is possible that a different session could update a row from the candidate data set, and that updated row would still be deleted.  Or a different session could even delete, commit, and insert a fresh row that happens to pick up the same ROWID as one from the original set.

Bottom Line – keep concurrency front and foremost in your mind when you are writing code to manipulate data.

NULL’s vs NOT NULL’s and Performance

When it comes to giving the cost based optimiser the best possible chance to make the “right” decisions, many DBA’s are diligent in keeping statistics up to date, using histograms where appropriate, creating more indexes (or removing surplus indexes).

However one often neglected area is that the the null-ness of columns also impacts the optimiser decisions. NULL and NOT NULL do more than just act as constraints, they also add (or detract) to the value of indexes on those columns. Here’s an example of how the null-ness of a column impacts optimizer decisions. I have a table T which is a copy of DBA_OBJECTS, indexed on OBJECT_ID.


SQL> create table t as
  2  select * from dba_objects
  3  where object_id is not null;

Table created.

SQL>
SQL> create index IX on T ( object_id );

Index created.

Now I want to perform a standard pagination style query, namely, get the first 5 rows in order of OBJECT_ID


SQL>
SQL> set autotrace traceonly
SQL> select *
  2  from
  3    ( select * from t
  4      order by object_id
  5    )
  6  where rownum <= 5;

5 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3299198703

----------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     5 |  2405 |       |  2755   (1)| 00:00:01 |
|*  1 |  COUNT STOPKEY          |      |       |       |       |            |          |
|   2 |   VIEW                  |      | 78750 |    36M|       |  2755   (1)| 00:00:01 |
|*  3 |    SORT ORDER BY STOPKEY|      | 78750 |     9M|    14M|  2755   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL   | T    | 78750 |     9M|       |   428   (1)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)
   3 - filter(ROWNUM<=5)


Statistics
----------------------------------------------------------
         52  recursive calls
        130  db block gets
       1591  consistent gets
          0  physical reads
      25420  redo size
       2735  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          5  rows processed

I scanned the entire table and burned around 1500 consistent gets. Now I’ll let the database know what I already know, that is, that the OBJECT_ID column is contains no nulls. Then I’ll repeat the same query.


SQL>
SQL> alter table T modify object_id not null;

Table altered.

SQL>
SQL> select *
  2  from
  3    ( select * from t
  4      order by object_id
  5    )
  6  where rownum <= 5;

5 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3114946973

--------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |     5 |  2405 |     3   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                |      |       |       |            |          |
|   2 |   VIEW                        |      |     5 |  2405 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T    | 78750 |     9M|     3   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN           | IX   |     5 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=5)


Statistics
----------------------------------------------------------
         85  recursive calls
        132  db block gets
         72  consistent gets
          1  physical reads
      27192  redo size
       2735  bytes sent via SQL*Net to client
        607  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          5  rows processed

Woo hoo! 20 times more efficient. The extra information we provided to the database allowed for more execution paths to be considered by the optimizer.

That’s all well and good. By how do we know which columns might be missing an appropriate NOT NULL constraint ?

Well, the following routine might help Smile

The PL/SQL procedure below deduces a ‘number of rows that are null’ count for all columns that are indexed for all tables in the schema (passed as parameter P_SCHEMA), although you can pass a table name to restrict the procedure to just that table. For any column that contains no nulls, the appropriate ‘alter table … ( column_name not null)’ command will be outputted. (Make sure you set serveroutput on).

Notes:

  • The client for whom I wrote this script, had no NOT NULL constraints on any table so the procedure only takes a rough stab at ALL_CONSTRAINTS for existing constraints.
  • It tries to keep things sensible – avoiding complex data types, tables that are auto-generated or used for queues etc, but there’s no guarantee it won’t either miss a table, or pick up it should not.
  • This is a brute force approach – it works by scanning every table in the schema, so use your discretion as to when would be a suitable time for running this routine.  But it will only scan each table once to determine the null count for all candidate columns.
  • (As with any diagnosis script), you should not apply it’s recommendations without some careful thought first.

12.2 version


create or replace 
procedure check_indexed_columns_for_null(
                 p_schema varchar2, 
                 p_table_name varchar2 default null) is
  cursor x is
   select 
     table_name,
     column_name,
     count(*) over ( partition by table_name ) as colcount
     from
     (
       select 
         table_name,
         column_name,
         min(existing_constraint)
       from 
       (
       select  
           a.table_name, 
           a.column_name, 
           ( select  count(*)
             from    all_constraints x,
                     all_cons_columns cc
             where   x.owner = c.owner
             and     x.table_name = c.table_name
             and     cc.owner      = x.owner
             and     cc.constraint_name = x.constraint_name
             and     
            (
               ( x.constraint_type = 'C'  and replace(search_condition_vc,'"') = a.column_name||' IS NOT NULL' )
                   or 
               ( x.constraint_type = 'P'  and cc.column_name = a.column_name )
             )
            ) existing_constraint
       from 
         all_ind_columns a,
         all_tables c,
         all_tab_columns ccol
       where a.index_owner = p_schema
       and a.index_owner = p_schema
       and a.table_name = nvl(upper(p_table_name),a.table_name)
       and c.table_name = a.table_name
       and c.owner      = a.table_owner
       and c.owner      = ccol.owner
       and c.table_name = ccol.table_name
       and a.column_name = ccol.column_name
       and c.secondary = 'N'
       and c.temporary = 'N'
       and c.nested    = 'NO'
       and c.external  = 'NO'
       and ccol.data_type_owner is null
       and ccol.data_type not in ('LONG','LONG RAW','CLOB','UROWID','UNDEFINED','NCLOB','BLOB','BFILE','ROWID')
       and (c.owner,c.table_name) not in ( select owner, queue_table from all_queue_tables where owner = p_schema )
       )
       group by 
         table_name,
         column_name
       having min(existing_constraint) = 0
     );

  str0 varchar2(32767); 
  str1 varchar2(32767); 
  str2 varchar2(32767); 
  str3 varchar2(32767);

  prev varchar2(100) := '*';
  cnt number;
  trailer varchar2(5);

procedure do_sql(thesql varchar2) is
  tcursor integer;
  dummy integer;
begin
  -- dbms_output.put_line(thesql);
  execute immediate thesql;
end;

begin
for i in x loop
  if prev != i.table_name then
    str0 := 'declare ';
    str1 := 'begin select '; str2 := ' into ';
    str3 := ' '; cnt := 1;
  end if;
  if cnt = i.colcount then 
    trailer := ' ';
  else
    trailer := ','||chr(10);
  end if;
  str0 := str0 || 'v'||ltrim(cnt)||' number;';
  str1 := str1 || 'sum(decode('||i.column_name||',null,1,0))'||trailer;
  str2 := str2 || 'v'||ltrim(cnt)||trailer;
  str3 := str3 || 'if v'||ltrim(cnt)||' = 0 then '||
    'dbms_output.put_line(''alter table '||p_schema||'.'||i.table_name||
    ' modify ('||i.column_name||' not null);''); end if;'||chr(10);
  if cnt = i.colcount then
    str2 := str2 ||' from '||p_schema||'.'||i.table_name||';';
    str3 := str3 ||' end;';
    do_sql(str0||' '||str1||' '||str2||' '||str3);
  end if;
  prev := i.table_name;
  cnt := cnt + 1;
end loop;
end;
/
sho err

set serverout on
exec check_indexed_columns_for_null('SCOTT')

11.2 version

(Needs a direct CREATE TABLE grant to the owner of the procedure)


create or replace
procedure check_indexed_columns_for_null(
                 p_schema varchar2,
                 p_table_name varchar2 default null) is
  cursor c_template is
    select table_name, column_name, 0 colcount
    from   all_tab_columns;

  type rowlist is table of c_template%rowtype;  
  r    rowlist;
  rc   sys_refcursor;

  str0 varchar2(32767);
  str1 varchar2(32767);
  str2 varchar2(32767);
  str3 varchar2(32767);

  prev varchar2(100) := '*';
  cnt number;
  trailer varchar2(5);

procedure do_sql(thesql varchar2) is
  tcursor integer;
  dummy integer;
begin
  dbms_output.put_line(thesql);
  execute immediate thesql;
end;

begin

  begin
    execute immediate 'drop table tmp$cons purge';
  exception when others then null;
  end;

  execute immediate 'create table tmp$cons as  select owner, table_name, constraint_name, constraint_type, to_lob(search_condition) search_condition_vc '||
                    'from all_constraints';

  open rc for
    q'{select
      table_name,
      column_name,
      count(*) over ( partition by table_name ) as colcount
      from
      (
        select
          table_name,
          column_name,
          min(existing_constraint)
        from
        (
        select
            a.table_name,
            a.column_name,
            ( select  count(*)
              from    ( select owner, table_name, constraint_name, constraint_type, cast(search_condition_vc as varchar2(4000)) search_condition_vc 
                        from tmp$cons ) x,
                      all_cons_columns cc
              where   x.owner = c.owner
              and     x.table_name = c.table_name
              and     cc.owner      = x.owner
              and     cc.constraint_name = x.constraint_name
              and
             (
                ( x.constraint_type = 'C'  and replace(search_condition_vc,'"') = a.column_name||' IS NOT NULL' )
                    or
                ( x.constraint_type = 'P'  and cc.column_name = a.column_name )
              )
             ) existing_constraint
        from
          all_ind_columns a,
          all_tables c,
          all_tab_columns ccol
        where a.index_owner = :p_schema
        and a.index_owner = :p_schema
        and a.table_name = nvl(upper(:p_table_name),a.table_name)
        and c.table_name = a.table_name
        and c.owner      = a.table_owner
        and c.owner      = ccol.owner
        and c.table_name = ccol.table_name
        and a.column_name = ccol.column_name
        and c.secondary = 'N'
        and c.temporary = 'N'
        and c.nested    = 'NO'
        and (c.owner,c.table_name) not in ( select owner, table_name from all_external_tables where owner = :p_schema )
        and ccol.data_type_owner is null
        and ccol.data_type not in ('LONG','LONG RAW','CLOB','UROWID','UNDEFINED','NCLOB','BLOB','BFILE','ROWID')
        and (c.owner,c.table_name) not in ( select owner, queue_table from all_queue_tables where owner = :p_schema )
        )
        group by
          table_name,
          column_name
        having min(existing_constraint) = 0
      )
      }' using p_schema,p_schema,p_table_name,p_schema,p_schema;

  fetch rc bulk collect into r;
  close rc;
  
  for i in 1 .. r.count loop
    if prev != r(i).table_name then
      str0 := 'declare ';
      str1 := 'begin select '; str2 := ' into ';
      str3 := ' '; cnt := 1;
    end if;
    if cnt = r(i).colcount then
      trailer := ' ';
    else
      trailer := ','||chr(10);
    end if;
    str0 := str0 || 'v'||ltrim(cnt)||' number;';
    str1 := str1 || 'sum(decode('||r(i).column_name||',null,1,0))'||trailer;
    str2 := str2 || 'v'||ltrim(cnt)||trailer;
    str3 := str3 || 'if v'||ltrim(cnt)||' = 0 then '||
      'dbms_output.put_line(''alter table '||p_schema||'.'||r(i).table_name||
      ' modify ('||r(i).column_name||' not null);''); end if;'||chr(10);
    if cnt = r(i).colcount then
      str2 := str2 ||' from '||p_schema||'.'||r(i).table_name||';';
      str3 := str3 ||' end;';
      do_sql(str0||' '||str1||' '||str2||' '||str3);
    end if;
    prev := r(i).table_name;
    cnt := cnt + 1;
  end loop;
end;
/

 

If you want to see the underlying table scan queries that are being run, simply comment back in the “dbms_output.put_line” in the DO_SQL subroutine.

Enjoy!

Dealing with IP addresses

Kris Rice made mention on Twitter about the free (but limited lifetime) IP address to Country mappings that you can get from Maxmind.

image

If you ever need to deal with IP addresses, you might find the following utility package I wrote a while back useful.  It lets you convert from the string representation of an IP address to its numeric equivalent, and vice-versa.  It handles IPv4 and IPv6 with one caveat being that that I didn’t bother with the collapsed zeros for IPv6 so I could keep the performance snappy.  Free for your use without warranty or responsibility Smile


SQL> create or replace
  2  package ip_util is
  3
  4  function ip_num_from_str(p_ip_str varchar2) return number deterministic;
  5  function ip_str_from_num(p_ipnum number) return varchar2 deterministic;
  6
  7  end;
  8  /

Package created.

SQL>
SQL>
SQL> create or replace
  2  package body ip_util is
  3
  4    --
  5    -- constants need to be fixed, not expressions if you want to avoid ora-4068
  6    --
  7    l_ip41 constant number(12)  := 256;        -- power(256,1);
  8    l_ip42 constant number(12)  := 65536;      -- power(256,2);
  9    l_ip43 constant number(12)  := 16777216;   -- power(256,3);
 10    l_ip44 constant number(12)  := 4294967296; -- power(256,4);
 11
 12    l_ip61 constant number(38)  := 65536;                              --power(65536,1);
 13    l_ip62 constant number(38)  := 4294967296;                         --power(65536,2);
 14    l_ip63 constant number(38)  := 281474976710656;                    --power(65536,3);
 15    l_ip64 constant number(38)  := 18446744073709551616;               --power(65536,4);
 16    l_ip65 constant number(38)  := 1208925819614629174706176;          --power(65536,5);
 17    l_ip66 constant number(38)  := 79228162514264337593543950336;      --power(65536,6);
 18    l_ip67 constant number(38)  := 5192296858534827628530496329220096; --power(65536,7);
 19
 20
 21  function ip_num_from_str(p_ip_str varchar2) return number deterministic is
 22    l_ip_num     number;
 23    l_dot1       pls_integer;
 24    l_dot2       pls_integer;
 25    l_dot3       pls_integer;
 26    l_dot4       pls_integer;
 27
 28    l_colon      pls_integer;
 29    l_colon_cnt  pls_integer;
 30    l_hex        varchar2(32);
 31    l_ip_str     varchar2(64);
 32  begin
 33    if p_ip_str like '%.%' then
 34      l_dot1   := instr(p_ip_str,'.');
 35      l_dot2   := instr(p_ip_str,'.',l_dot1+1);
 36      l_dot3   := instr(p_ip_str,'.',l_dot2+1);
 37      l_dot4   := instr(p_ip_str,'.',l_dot3+1);
 38      if l_dot4 > 0 then
 39         raise_application_error(-20000,'Cannot be resolved to an IP4 address');
 40      end if;
 41
 42      l_ip_num :=  l_ip43*to_number(substr(p_ip_str,1,l_dot1-1)) +
 43                   l_ip42*to_number(substr(p_ip_str,l_dot1+1,l_dot2-l_dot1-1)) +
 44                   l_ip41*to_number(substr(p_ip_str,l_dot2+1,l_dot3-l_dot2-1)) +
 45                   to_number(substr(p_ip_str,l_dot3+1));
 46
 47    elsif p_ip_str like '%:%' then
 48      --
 49      -- Note: The abbreviation of "Consecutive sections of zeroes are replaced with a double colon (::)" is not implemented.
 50      --
 51      l_colon_cnt := length(p_ip_str)-length(replace(p_ip_str,':'));
 52      if l_colon_cnt != 7 then
 53         raise_application_error(-20000,'Cannot be resolved to an IP6 address');
 54      end if;
 55
 56      l_ip_str := p_ip_str||':';
 57      loop
 58        l_colon := instr(l_ip_str,':');
 59        l_hex := l_hex || lpad(substr(l_ip_str,1,l_colon-1),4,'0');
 60        l_ip_str := substr(l_ip_str,l_colon+1);
 61        exit when l_ip_str is null;
 62      end loop;
 63      l_ip_num := to_number(l_hex,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx');
 64    end if;
 65
 66    return l_ip_num;
 67  end;
 68
 69
 70  function ip_str_from_num(p_ipnum number) return varchar2 deterministic is
 71  begin
 72    if p_ipnum < l_ip44 then
 73      return  mod(trunc(p_ipnum/l_ip43),l_ip41) ||'.'||
 74              mod(trunc(p_ipnum/l_ip42),l_ip41) ||'.'||
 75              mod(trunc(p_ipnum/l_ip41),l_ip41) ||'.'||
 76              mod(p_ipnum,l_ip41);
 77    else
 78      --
 79      -- Note: The abbreviation of "Consecutive sections of zeroes are replaced with a double colon (::)" is not implemented.
 80      --
 81      return  to_char(mod(trunc(p_ipnum/l_ip67),l_ip61),'fmxxxx') ||':'||
 82              to_char(mod(trunc(p_ipnum/l_ip66),l_ip61),'fmxxxx') ||':'||
 83              to_char(mod(trunc(p_ipnum/l_ip65),l_ip61),'fmxxxx') ||':'||
 84              to_char(mod(trunc(p_ipnum/l_ip64),l_ip61),'fmxxxx') ||':'||
 85              to_char(mod(trunc(p_ipnum/l_ip63),l_ip61),'fmxxxx') ||':'||
 86              to_char(mod(trunc(p_ipnum/l_ip62),l_ip61),'fmxxxx') ||':'||
 87              to_char(mod(trunc(p_ipnum/l_ip61),l_ip61),'fmxxxx') ||':'||
 88              to_char(mod(p_ipnum,l_ip61),'fmxxxx');
 89    end if;
 90  end;
 91
 92  end;
 93  /

Package body created.

SQL> select ip_util.ip_num_from_str('192.168.1.2') from dual;

IP_UTIL.IP_NUM_FROM_STR('192.168.1.2')
--------------------------------------
                            3232235778

SQL> select ip_util.ip_str_from_num(3232235778) from dual;

IP_UTIL.IP_STR_FROM_NUM(3232235778)
-----------------------------------------------------------------------------------------------------------------
192.168.1.2

SQL> select ip_util.ip_num_from_str('2001:db8:0:0:0:ff00:42:8329') ip from dual;

                                          IP
--------------------------------------------
      42540766411282592856904265327123268393

SQL> select ip_util.ip_str_from_num(42540766411282592856904265327123268393) from dual;

IP_UTIL.IP_STR_FROM_NUM(42540766411282592856904265327123268393)
-----------------------------------------------------------------------------------------------------------------
2001:db8:0:0:0:ff00:42:8329