How long will the script take to run?

In a world where databases are now the norm, whereas distributing data via a CSV file used to be commonplace, nowadays we often see the ubiquitous INSERT script being offered as a means to seed data. This is perfectly fine for those requirements where we are populating a finite list of reference data that is often required by an application to run for the first time. Things like list of valid genders, list of valid states or counties, list of valid post/zip codes, etc. They are all typically sourced from an owning authority, don’t change frequently over time, and even when they do, it is typically sufficient to manually make a correction to your database.

INSERT scripts everywhere

The problem is, when you have a hammer, you generally tend to treat everything as a nail, and thus, when the need arises to populate tables with thousands, or hundreds of thousands, or millions(!) of rows, the trusty INSERT script still tends to rear its head.

I’ve already posted previously about options for improving the speed of such scripts to reduce parsing and latency, but no matter what approach you are using, a common question that comes up when these scripts are well into their 3rd hour of running Smile is

“How much has it done? How much longer will it take?”

Hopefully you are spooling the output or at least writing some regular checkpoint activity to a log file, because …well…that’s called being a good I.T citizen! But if not, are there any ways from within the database to see what is going on? To see what is available, let’s start with a simple INSERT script to populate a CUSTOMERS table


SQL> create table customers
  2   ( cust_id int,
  3     first_name varchar2(50),
  4     last_name  varchar2(50)
  5    );

Table created.

SQL>
SQL> insert into customers (cust_id,first_name,last_name) values (100,'Steven','King');

1 row created.

SQL> insert into customers (cust_id,first_name,last_name) values (101,'Neena','Kochhar');

1 row created.

SQL> insert into customers (cust_id,first_name,last_name) values (102,'Lex','De Haan');

1 row created.

...
...
...


SQL> insert into customers (cust_id,first_name,last_name) values (198,'Donald','OConnell');

1 row created.

SQL> insert into customers (cust_id,first_name,last_name) values (199,'Douglas','Grant');

1 row created.

This is 100 customers, each being loaded with a single row insert. Let’s assume there were 1million of these and it is currently busily adding these to our table in a single transaction. Can we monitor progress?

Monitoring a transaction

Anything inside the Oracle database that is being within the context of a transaction, must be able to be rolled back either manually or in the event of a database crash (eg power outage) and thus all transactions record undo information. For explanation of how that works, see here.

We can take advantage of this to do some monitoring. I’ve run my customer insertion script above, so we have created 100 rows. The script has finished, but for the sake of discussion, we can assume it has completed the first 100 rows of a much larger set. The details for the current transaction can be seen from V$TRANSACTION.


SQL> desc V$TRANSACTION
 Name                          Null?    Type
 ----------------------------- -------- ---------------
 ADDR                                   RAW(8)
 XIDUSN                                 NUMBER
 XIDSLOT                                NUMBER
 XIDSQN                                 NUMBER
 UBAFIL                                 NUMBER
 UBABLK                                 NUMBER
 UBASQN                                 NUMBER
 UBAREC                                 NUMBER
 STATUS                                 VARCHAR2(16)
 START_TIME                             VARCHAR2(20)
 START_SCNB                             NUMBER
 START_SCNW                             NUMBER
 START_UEXT                             NUMBER
 START_UBAFIL                           NUMBER
 START_UBABLK                           NUMBER
 START_UBASQN                           NUMBER
 START_UBAREC                           NUMBER
 SES_ADDR                               RAW(8)
 FLAG                                   NUMBER
 SPACE                                  VARCHAR2(3)
 RECURSIVE                              VARCHAR2(3)
 NOUNDO                                 VARCHAR2(3)
 PTX                                    VARCHAR2(3)
 NAME                                   VARCHAR2(256)
 PRV_XIDUSN                             NUMBER
 PRV_XIDSLT                             NUMBER
 PRV_XIDSQN                             NUMBER
 PTX_XIDUSN                             NUMBER
 PTX_XIDSLT                             NUMBER
 PTX_XIDSQN                             NUMBER
 DSCN-B                                 NUMBER
 DSCN-W                                 NUMBER
 USED_UBLK                              NUMBER
 USED_UREC                              NUMBER
 LOG_IO                                 NUMBER
 PHY_IO                                 NUMBER
 CR_GET                                 NUMBER
 CR_CHANGE                              NUMBER
 START_DATE                             DATE
 DSCN_BASE                              NUMBER
 DSCN_WRAP                              NUMBER
 START_SCN                              NUMBER
 DEPENDENT_SCN                          NUMBER
 XID                                    RAW(8)
 PRV_XID                                RAW(8)
 PTX_XID                                RAW(8)
 CON_ID                                 NUMBER

SQL>

Notice the two USED_ prefixed columns. This indicates the amount of undo information recorded to date for this transaction. We have inserted 100 rows, so lets see what is in the USED_UREC column.


SQL> select USED_UREC
  2  from   V$TRANSACTION;

 USED_UREC
----------
       100

And voila! I can see how many rows have been inserted so far, but looking at the USED_UREC column.

Not…so….fast

The undo information is there to undo all of the changes made by a transaction. What if my table had an index, or even multiple indexes? All of those changes would need to be undone as well, so we need to capture undo information for those as well. I’ll roll back the 100 rows and now repeat the test with a single index on the table.


SQL> rollback;

Rollback complete.

SQL>
SQL> create index CUST_IX on CUSTOMERS ( cust_id );

Index created.

SQL> insert into customers (cust_id,first_name,last_name) values (100,'Steven','King');

1 row created.

..
..
..

SQL> insert into customers (cust_id,first_name,last_name) values (199,'Douglas','Grant');

1 row created.

SQL>
SQL>
SQL> select USED_UREC
  2  from   V$TRANSACTION;

 USED_UREC
----------
       200

Now there are 200 undo records created, one each row in the table and one for each entry in the index. Similarly, as I add more indexes, the amount of undo information increases


SQL> rollback;

Rollback complete.

SQL>
SQL> create index CUST_IX2 on CUSTOMERS ( first_name );

Index created.

SQL> create index CUST_IX3 on CUSTOMERS ( last_name );

Index created.

SQL> insert into customers (cust_id,first_name,last_name) values (100,'Steven','King');

1 row created.

SQL> insert into customers (cust_id,first_name,last_name) values (101,'Neena','Kochhar');

1 row created.

...
...
...

SQL> insert into customers (cust_id,first_name,last_name) values (199,'Douglas','Grant');

1 row created.

SQL>
SQL>
SQL> select USED_UREC
  2  from   V$TRANSACTION;

 USED_UREC
----------
       400

SQL>
SQL>

Whilst there is a relationship between the insert progression and the amount of undo that is being captured, you need to take care about interpreting the results in order to get a good estimate on progress. One undo entry per table row plus one for each index is a reasonable starting point, but don’t forget that your table might have triggers, or LOBs, or a multitude of other facets that might perturb the numbers you’re seeing. A common technique I’ll employ if I want some confidence on the numbers is just to create an equivalent temporary version of the table I’m inserting into and throw in a couple of sample rows to get an idea on how many undo records I create. Then I can use that as a baseline for tracking the progress of my true population script.

Avoid the performance cost

There is obviously a better of way of avoiding the need to dig into V$TRANSACTION to see when a stream of single row inserts will complete, and that is to not use single row inserts for large data population. My other post has options for manipulating the inserts, but if don’t forget that you can explore options unrelated to insert scripts, such as external table, data pump and the like for getting reference data from a source and loading it efficiently into the database.

Want to try this yourself? Get the scripts from this post here

Learning is not a spectator sport

Leave a ReplyCancel reply

Trending

Keeping your tech ego in check

First dabblings with APEXlang

Vanishing objects; hello ORA-08103

A very robust query