In a world where databases are now the norm, whereas distributing data via a CSV file used to be commonplace, nowadays we often see the ubiquitous INSERT script being offered as a means to seed data. This is perfectly fine for those requirements where we are populating a finite list of reference data that is often required by an application to run for the first time. Things like list of valid genders, list of valid states or counties, list of valid post/zip codes, etc. They are all typically sourced from an owning authority, don’t change frequently over time, and even when they do, it is typically sufficient to manually make a correction to your database.
INSERT scripts everywhere
The problem is, when you have a hammer, you generally tend to treat everything as a nail, and thus, when the need arises to populate tables with thousands, or hundreds of thousands, or millions(!) of rows, the trusty INSERT script still tends to rear its head.
I’ve already posted previously about options for improving the speed of such scripts to reduce parsing and latency, but no matter what approach you are using, a common question that comes up when these scripts are well into their 3rd hour of running is
“How much has it done? How much longer will it take?”
Hopefully you are spooling the output or at least writing some regular checkpoint activity to a log file, because …well…that’s called being a good I.T citizen! But if not, are there any ways from within the database to see what is going on? To see what is available, let’s start with a simple INSERT script to populate a CUSTOMERS table
SQL> create table customers
2 ( cust_id int,
3 first_name varchar2(50),
4 last_name varchar2(50)
5 );
Table created.
SQL>
SQL> insert into customers (cust_id,first_name,last_name) values (100,'Steven','King');
1 row created.
SQL> insert into customers (cust_id,first_name,last_name) values (101,'Neena','Kochhar');
1 row created.
SQL> insert into customers (cust_id,first_name,last_name) values (102,'Lex','De Haan');
1 row created.
...
...
...
SQL> insert into customers (cust_id,first_name,last_name) values (198,'Donald','OConnell');
1 row created.
SQL> insert into customers (cust_id,first_name,last_name) values (199,'Douglas','Grant');
1 row created.
This is 100 customers, each being loaded with a single row insert. Let’s assume there were 1million of these and it is currently busily adding these to our table in a single transaction. Can we monitor progress?
Monitoring a transaction
Anything inside the Oracle database that is being within the context of a transaction, must be able to be rolled back either manually or in the event of a database crash (eg power outage) and thus all transactions record undo information. For explanation of how that works, see here.
We can take advantage of this to do some monitoring. I’ve run my customer insertion script above, so we have created 100 rows. The script has finished, but for the sake of discussion, we can assume it has completed the first 100 rows of a much larger set. The details for the current transaction can be seen from V$TRANSACTION.
SQL> desc V$TRANSACTION
Name Null? Type
----------------------------- -------- ---------------
ADDR RAW(8)
XIDUSN NUMBER
XIDSLOT NUMBER
XIDSQN NUMBER
UBAFIL NUMBER
UBABLK NUMBER
UBASQN NUMBER
UBAREC NUMBER
STATUS VARCHAR2(16)
START_TIME VARCHAR2(20)
START_SCNB NUMBER
START_SCNW NUMBER
START_UEXT NUMBER
START_UBAFIL NUMBER
START_UBABLK NUMBER
START_UBASQN NUMBER
START_UBAREC NUMBER
SES_ADDR RAW(8)
FLAG NUMBER
SPACE VARCHAR2(3)
RECURSIVE VARCHAR2(3)
NOUNDO VARCHAR2(3)
PTX VARCHAR2(3)
NAME VARCHAR2(256)
PRV_XIDUSN NUMBER
PRV_XIDSLT NUMBER
PRV_XIDSQN NUMBER
PTX_XIDUSN NUMBER
PTX_XIDSLT NUMBER
PTX_XIDSQN NUMBER
DSCN-B NUMBER
DSCN-W NUMBER
USED_UBLK NUMBER
USED_UREC NUMBER
LOG_IO NUMBER
PHY_IO NUMBER
CR_GET NUMBER
CR_CHANGE NUMBER
START_DATE DATE
DSCN_BASE NUMBER
DSCN_WRAP NUMBER
START_SCN NUMBER
DEPENDENT_SCN NUMBER
XID RAW(8)
PRV_XID RAW(8)
PTX_XID RAW(8)
CON_ID NUMBER
SQL>
Notice the two USED_ prefixed columns. This indicates the amount of undo information recorded to date for this transaction. We have inserted 100 rows, so lets see what is in the USED_UREC column.
SQL> select USED_UREC
2 from V$TRANSACTION;
USED_UREC
----------
100
And voila! I can see how many rows have been inserted so far, but looking at the USED_UREC column.
Not…so….fast
The undo information is there to undo all of the changes made by a transaction. What if my table had an index, or even multiple indexes? All of those changes would need to be undone as well, so we need to capture undo information for those as well. I’ll roll back the 100 rows and now repeat the test with a single index on the table.
SQL> rollback;
Rollback complete.
SQL>
SQL> create index CUST_IX on CUSTOMERS ( cust_id );
Index created.
SQL> insert into customers (cust_id,first_name,last_name) values (100,'Steven','King');
1 row created.
..
..
..
SQL> insert into customers (cust_id,first_name,last_name) values (199,'Douglas','Grant');
1 row created.
SQL>
SQL>
SQL> select USED_UREC
2 from V$TRANSACTION;
USED_UREC
----------
200
Now there are 200 undo records created, one each row in the table and one for each entry in the index. Similarly, as I add more indexes, the amount of undo information increases
SQL> rollback;
Rollback complete.
SQL>
SQL> create index CUST_IX2 on CUSTOMERS ( first_name );
Index created.
SQL> create index CUST_IX3 on CUSTOMERS ( last_name );
Index created.
SQL> insert into customers (cust_id,first_name,last_name) values (100,'Steven','King');
1 row created.
SQL> insert into customers (cust_id,first_name,last_name) values (101,'Neena','Kochhar');
1 row created.
...
...
...
SQL> insert into customers (cust_id,first_name,last_name) values (199,'Douglas','Grant');
1 row created.
SQL>
SQL>
SQL> select USED_UREC
2 from V$TRANSACTION;
USED_UREC
----------
400
SQL>
SQL>
Whilst there is a relationship between the insert progression and the amount of undo that is being captured, you need to take care about interpreting the results in order to get a good estimate on progress. One undo entry per table row plus one for each index is a reasonable starting point, but don’t forget that your table might have triggers, or LOBs, or a multitude of other facets that might perturb the numbers you’re seeing. A common technique I’ll employ if I want some confidence on the numbers is just to create an equivalent temporary version of the table I’m inserting into and throw in a couple of sample rows to get an idea on how many undo records I create. Then I can use that as a baseline for tracking the progress of my true population script.
Avoid the performance cost
There is obviously a better of way of avoiding the need to dig into V$TRANSACTION to see when a stream of single row inserts will complete, and that is to not use single row inserts for large data population. My other post has options for manipulating the inserts, but if don’t forget that you can explore options unrelated to insert scripts, such as external table, data pump and the like for getting reference data from a source and loading it efficiently into the database.
Want to try this yourself? Get the scripts from this post here