A new use for DML error logging

Many moons ago I did a short video on the DML error logging feature in Oracle. The feature has been around for many years now, and is a great tool for capturing errors during a large load without losing all of the rows that successfully loaded. You can watch that video below if you’re new to DML error logging.

But here is a possible new use case for DML error logging, even if you are not doing large scale loads. Let me describe the problem first, and then show how DML error logging might be a solution.

I’ll create a table with a constraint on it’s column


SQL> create table t1 (val number not null);

Table created.

One of the nice things that came into Oracle (I think) way back in version 8.0 was more detailed information when you violate those constraints. So when I try to insert a null into that table


SQL> insert into t1 values (null);
insert into t1 values (null)
                       *
ERROR at line 1:
ORA-01400: cannot insert NULL into ("SCOTT"."T1"."VAL")

not only am I told that I got an ORA-1400, I am also told the column (VAL) that caused the error. Even if this INSERT is performed from a PL/SQL routine, I still get that information:


SQL> exec insert into t1 values (null);
BEGIN insert into t1 values (null); END;

*
ERROR at line 1:
ORA-01400: cannot insert NULL into ("SCOTT"."T1"."VAL")
ORA-06512: at line 1

That might not seem particularly beneficial in this instance, but consider a more true to life application, which might have tables with dozens of columns, or even hundreds of columns. Knowing which column was in error is a handy piece of information to have when debugging.

That all seems just sweet, until I throw a little bulk binding into the mix. Here’s a package that will bulk bind an insert into the T1 table.


SQL> create or replace package bulk_ins as
  2    type t1_tt is table of t1%rowtype;
  3    procedure bulk_insert;
  4  end bulk_ins;
  5  /

Package created.

SQL> create or replace package body bulk_ins as
  2    procedure bulk_insert is
  3      l_data t1_tt:=t1_tt();
  4    begin
  5      l_data.extend(2);
  6      l_data(1).val:=999;
  7      l_data(2).val:=null;
  8      
  9      begin
 10        forall i in 1..l_data.count save exceptions
 11          insert into t1 values l_data(i);
 12      exception
 13        when others then
 14          for i in 1..sql%bulk_exceptions.count loop
 15            dbms_output.put_line( sqlerrm( -sql%bulk_exceptions(i).error_code ) );
 16          end loop;
 17      end forall_loop_with_save_except;
 18      commit;
 19    end bulk_insert;
 20  end bulk_ins;
 21  /

Package body created.

Even without running the code, you can see on line 6 and 7 that we are seeding an array with a null value in one of the entries, so table T1 is not going to like that upon insertion! Luckily though, we will be using the SAVE EXCEPTIONS extension to capture any errors and report on them to the calling environment. Here is what happens


SQL> set serverout on
SQL> exec bulk_ins.bulk_insert;
ORA-01400: cannot insert NULL into ()

PL/SQL procedure successfully completed.

The SAVE EXCEPTIONS has captured the error, but notice that the vital column name has been lost. Because SAVE EXCEPTIONS captures the error code, but the time we convert that to standard error message text, the precision has been lost.

Here is where DML error handling can step in and improve things. I’ll recode the package body to use DML error logging and dispense with the SAVE EXCEPTIONS


SQL>  exec   DBMS_ERRLOG.create_error_log(dml_table_name=>'T1')

PL/SQL procedure successfully completed.

SQL> create or replace package body bulk_ins as
  2    procedure bulk_insert is
  3      l_data t1_tt:=t1_tt();
  4    begin
  5      l_data.extend(2);
  6      l_data(1).val:=999;
  7      l_data(2).val:=null;
  8
  9      forall i in 1..l_data.count
 10        insert into t1 values l_data(i) LOG ERRORS REJECT LIMIT UNLIMITED;
 11
 12      commit;
 13    end bulk_insert;
 14  end bulk_ins;
 15  /

Package body created.

SQL>
SQL> exec bulk_ins.bulk_insert;

PL/SQL procedure successfully completed.

SQL>
SQL> select * from err$_t1
  2  @pr
==============================
ORA_ERR_NUMBER$               : 1400
ORA_ERR_MESG$                 : ORA-01400: cannot insert NULL into ("SCOTT"."T1"."VAL")
ORA_ERR_ROWID$                :
ORA_ERR_OPTYP$                : I
ORA_ERR_TAG$                  :
VAL                           :

Notice that the error message now contains the complete information as it did from the standard SQL insert statement. That’s pretty cool. I’m not saying that you should rush out and replace all of your SAVE EXCEPTIONS code with DML error logging. You can see from the example, that there are overheads in the management of this. You need to have the error logging table defined in advance, you would need to manage the rows from multiple sessions, and obviously every load must be suffixed with a query to the error logging table once to see if any errors occurred. But if you really really need that column name, using DML error logging might be the way forward for you.

More on DML error logging here

DDL invalidates your SQL right ?

I stumbled upon this post by optimizer guru Nigel Bayliss last week, so please have a read of that first before proceeding. But I wanted to show a simple demo of how management of cursors continues to improve with each version of the database.

Throughout the many years and versions of using Oracle, a common mantra has been: if you perform DDL on a table, then any SQL cursors that reference that table will become invalidated and re-parsed on next execution. That makes a good deal of sense because if you (say) drop a column from a table, then a “SELECT * FROM MY_TABLE WHERE ID = 123” is a significantly different proposition than it was before the drop of the column. The asterisk (‘*’) now resolves to something different, and the SQL might not even be valid anymore if the column that was dropped was the ID column. A re-parse is mandatory.

But not all DDL’s are equal, or perhaps a better phrasing would be “not all DDLs are as severe as others”, and since parsing is an expensive operation, any time that we can avoid having to do it is a good thing. Some DDLs are so “innocuous” that we know that a re-parse is not required. Here is an example of that in action in 18c.

I’ll create a table, issue a simple query on it, and check its details in V$SQL. I’m using the cool “set feedback on sql_id” facility in SQL*Plus 18c to immediately get the SQL_ID.


SQL> create table t as select * from all_objects;

Table created.

SQL> set feedback on sql_id
SQL> select count(*) from t;

  COUNT(*)
----------
     76921

1 row selected.

SQL_ID: cyzznbykb509s

SQL> select
  2       sql_text
  3      ,sql_fulltext
  4      ,sql_id
  5      ,executions
  6      ,parse_calls
  7      ,invalidations
  8      ,ddl_no_invalidate
  9      ,is_rolling_invalid
 10      ,is_rolling_refresh_invalid
 11   from v$sql where sql_id = 'cyzznbykb509s'
 12  @pr

==============================
SQL_TEXT                      : select count(*) from t
SQL_FULLTEXT                  : select count(*) from t
SQL_ID                        : cyzznbykb509s
EXECUTIONS                    : 1
PARSE_CALLS                   : 1
INVALIDATIONS                 : 0
DDL_NO_INVALIDATE             : N
IS_ROLLING_INVALID            : N
IS_ROLLING_REFRESH_INVALID    : N

PL/SQL procedure successfully completed.

Now I’ll perform a DDL on the table that does not change the structure, the data or the security privileges on the table


SQL> alter table t read only;

Table altered.

In versions of yester year, this would invalidate any cursors even though nothing about the table has changed. But with 18c, the DDL_NO_VALIDATE column tells us that even though a DDL was performed, we did not need to invalidate the cursor.


SQL> select
  2       sql_text
  3      ,sql_fulltext
  4      ,sql_id
  5      ,executions
  6      ,parse_calls
  7      ,invalidations
  8      ,ddl_no_invalidate
  9      ,is_rolling_invalid
 10      ,is_rolling_refresh_invalid
 11   from v$sql where sql_id = 'cyzznbykb509s'
 12  @pr
==============================
SQL_TEXT                      : select count(*) from t
SQL_FULLTEXT                  : select count(*) from t
SQL_ID                        : cyzznbykb509s
EXECUTIONS                    : 1
PARSE_CALLS                   : 1
INVALIDATIONS                 : 0
DDL_NO_INVALIDATE             : Y
IS_ROLLING_INVALID            : N
IS_ROLLING_REFRESH_INVALID    : N

PL/SQL procedure successfully completed.

SQL>

Note that this column in V$SQL is also present in most versions of 12c, but does not appear to be populated reliably until you get to 18c.

No more stale statistics in 19c

There is an odd contradiction that we all encounter for most databases, especially if they are predominantly used during the business day. Here is how that contradiction comes to be – it is in the way that we obtain and use optimizer  statistics on those databases. The contradiction runs like this:

  • To minimize service disruption, we gather statistics at a quiet time, for example, in the middle of the night
  • We then use those statistics during the business day whilst user activity is at its highest.
  • Highest user activity will typically mean the highest frequency of data changes.
  • Hence the statistics are at their peak accuracy when no-one is using them to optimize queries, and they are at their least accurate when everyone is using them to optimize queries!

We can demonstrate this easily with the following script run in 18c.


SQL> select banner from v$version where rownum = 1 ;

BANNER
--------------------------------------------------------------------------------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production

1 row selected.

SQL> create table t (
  2    id number(10),
  3    num number(10),
  4    constraint t_pk primary key ( id )
  5  );

Table created.

SQL> insert into t select rownum,rownum from dual connect by rownum<=10000;

10000 rows created.

SQL> exec dbms_stats.gather_table_stats(user,'t')

PL/SQL procedure successfully completed.

SQL> select stale_stats from user_tab_statistics
  2  where  table_name = 'T';

STALE_S
-------
NO

So far I’ve mimicked a table that has been populated with (say) a days worth of data (10,000) rows, and I’ve gathered statistics at the end of the business day so that my statistics reflect the current data in the table.



SQL> select table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

TABLE_NAME NUM_ROWS     BLOCKS
---------- -------- ----------
T             10000         20

1 row selected.

SQL> select table_name, column_name, low_value, high_value, num_distinct
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE      NUM_DISTINCT
---------- -------------------- --------------- --------------- ------------
T          ID                   C102            C302                   10000
T          NUM                  C102            C302                   10000

2 rows selected.

Hence, if I was to optimize and run a query now, I can expect an excellent estimate from the optimizer. The query below returns a COUNT of 1000 rows, and we can see from the PLAN_TABLE output, that the estimated rows was also 1000. So a perfect estimate by the optimizer!



SQL> select max(num),count(*)
  2  from t where id > 9000;

  MAX(NUM)   COUNT(*)
---------- ----------
     10000       1000

1 row selected.

SQL> select *
  2  from table(dbms_xplan.display_cursor(format=>'typical'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------

---------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |       |       |
|   1 |  SORT AGGREGATE                      |      |     1 |     8 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |  1000 |  8000 |
|*  3 |    INDEX RANGE SCAN                  | T_PK |  1000 |       |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID">9000)


20 rows selected.

But of course, the next business day rolls on, and by mid-morning we may have added 1000 more rows to our table, bringing the total number of rows to 11000.


SQL> insert into t select rownum+10000,rownum+10000 from dual connect by rownum<=1000;

1000 rows created.

SQL> commit;

Commit complete.

Even with an explicit flush of the database monitoring information, we not yet reached the threshold where this tables statistics would be considered stale, so even a GATHER_STALE operation would have no effect if we ran one. The statistics still reflect the data from last night.


SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

SQL> select stale_stats from user_tab_statistics
  2  where  table_name = 'T';

STALE_S
-------
NO

SQL> select table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

TABLE_NAME NUM_ROWS     BLOCKS
---------- -------- ----------
T             10000         20

1 row selected.

SQL> select table_name, column_name, low_value, high_value
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE
---------- -------------------- --------------- ---------------
T          ID                   C102            C302
T          NUM                  C102            C302

2 rows selected.

When I run my same query, and don’t forget, this would now be during the business day, when it is critical that I get a good optimizer plan, you can see that the result is now 2000 rows, but the plan estimate is left languishing at its original 1000 rows because the statistics no long reflect the current state of the table.


SQL> select max(num),count(*)
  2  from t where id > 9000;

  MAX(NUM)   COUNT(*)
---------- ----------
     11000       2000

1 row selected.

SQL> select *
  2  from table(dbms_xplan.display_cursor(format=>'typical'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------
SQL_ID  8juuu5warw2z6, child number 0
-------------------------------------
select max(num),count(*) from t where id > 9000

Plan hash value: 2053823973

---------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |       |       |
|   1 |  SORT AGGREGATE                      |      |     1 |     8 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |  1000 |  8000 |
|*  3 |    INDEX RANGE SCAN                  | T_PK |  1000 |       |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID">9000)


20 rows selected.

Having great statistics that are not used, and poor statistics that are used all the time seems a nonsensical way to do things, but there really hasn’t been any alternative unless you were prepared to consume precious server resources to collect statistics whilst your users are performing their transactions. I have known customers to do this in the past, but of course, it takes careful monitoring and management to ensure that the slicing up of the resource “pie” is done appropriately to keep the right balance between business needs and background database tasks.

19c brings a nifty solution to this contradiction by having the database automatically maintain details about how data is changing in a table via a feature called Real Time Statistics. DML activities can be tracked, and that information can then be looped back as input into future query parsing. Here’s the same example from above, this time performed in 19c.

The same table is seeded with the same initial 10000 rows as before. I have included a new column in my data dictionary queries called NOTES. You will see the significance of that shortly.



SQL> select banner from v$version where rownum = 1;

BANNER
--------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

1 row selected.

SQL>
SQL> create table t (
  2    id number(10),
  3    num number(10),
  4    constraint t_pk primary key ( id )
  5  );

Table created.

SQL> insert into t select rownum,rownum from dual connect by rownum<=10000;

10000 rows created.

SQL> exec dbms_stats.gather_table_stats(user,'t')

PL/SQL procedure successfully completed.

SQL> select stale_stats from user_tab_statistics
  2  where  table_name = 'T';

STALE_S
-------
NO

1 row selected.

SQL> select notes, table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME NUM_ROWS     BLOCKS
-------------------------------- ---------- -------- ----------
                                 T             10000         20

1 row selected.

SQL> select notes, table_name, column_name, low_value, high_value, num_distinct
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE      NUM_DISTINCT
-------------------------------- ---------- -------------------- --------------- --------------- ------------
                                 T          ID                   C102            C302                   10000
                                 T          NUM                  C102            C302                   10000

And since this is a simple query, the optimizer performs exactly as per 18c for its estimate. Once again, at this point, where the data and statistics are in alignment, the estimate is perfect.


                                 
SQL> select max(num),count(*)
  2  from t where id > 9000;

  MAX(NUM)   COUNT(*)
---------- ----------
     10000       1000

1 row selected.                                 


SQL> select *
  2  from table(dbms_xplan.display_cursor(format=>'typical'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |       |       |     5 (100)|
|   1 |  SORT AGGREGATE                      |      |     1 |     8 |            |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |  1000 |  8000 |     5   (0)|
|*  3 |    INDEX RANGE SCAN                  | T_PK |  1000 |       |     3   (0)|
----------------------------------------------------------------------------------

The business day commences, and a fresh 1000 rows are added to the database.



SQL> insert into t select rownum+10000,rownum+10000 from dual connect by rownum<=1000;

1000 rows created.

SQL> commit;

Commit complete.

And just like before, this is not enough for the statistics on the table to be considered stale by the database



SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

SQL>
SQL> select stale_stats from user_tab_statistics
  2  where  table_name = 'T';

STALE_S
-------
NO

But, here is where the 19c enhancements come into play. When we query the data dictionary, we now have TWO rows of statistics for this single table.



SQL> select notes, table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME NUM_ROWS     BLOCKS
-------------------------------- ---------- -------- ----------
                                 T             10000         20
STATS_ON_CONVENTIONAL_DML        T             11000         20

2 rows selected.

The first row is the original statistics we gathered. The second row is new statistics we have derived from the normal DML activities that have occurred on the database. This is not a full GATHER operation, it is simply taking advantage of simple mathematics, eg if I have 10000 rows and then insert 50 and deleted 20, then I would have 10030 rows etc. The real time statistics can also keep track of simple column level statistics, like whether the low and high water mark values have changed.



SQL> select notes, table_name, column_name, low_value, high_value
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE
-------------------------------- ---------- -------------------- --------------- ----------
                                 T          ID                   C102            C302
                                 T          NUM                  C102            C302
STATS_ON_CONVENTIONAL_DML        T          ID                   C102            C302094A
STATS_ON_CONVENTIONAL_DML        T          NUM                  C102            C302094A

The real test is whether these statistics can be beneficial to our sample query. I’ll run that through again.



SQL> select max(num),count(*)
  2  from t where id > 9000;

  MAX(NUM)   COUNT(*)
---------- ----------
     11000       2000

1 row selected.

SQL> select *
  2  from table(dbms_xplan.display_cursor(format=>'typical'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     7 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |  1895 | 15160 |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("ID">9000)

Note
-----
   - dynamic statistics used: statistics for conventional DML

Notice the estimate of 1895 is much closer now to the true value of 2000. You can expect a little variation here from the true value, because as I mentioned before, real time statistics is not about performing a full gather operation on the table – that would be far too resource hungry for every single DML. So some elements of the statistics (for example, the number of distinct values, or histogram distribution) are not feasible to keep up to date every time a DML is performed. But even having the improved row counts and column extrema have yielded benefit on the execution plan. When the normal nightly gather job comes along, these DML statistics are no longer relevant and will be expunged


SQL> exec dbms_stats.gather_table_stats(user,'t')

PL/SQL procedure successfully completed.

SQL> select notes, table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME NUM_ROWS     BLOCKS
-------------------------------- ---------- -------- ----------
                                 T             11000         20

1 row selected.

SQL> select notes, table_name, column_name, low_value, high_value
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE
-------------------------------- ---------- -------------------- --------------- ----------
                                 T          ID                   C102            C3020B
                                 T          NUM                  C102            C3020B

2 rows selected.

There are other enhancements in 19c to keep the statistics in the dictionary closer to the true data in database tables which you can read about here.

Full disclosure: Some 19c new features are available on certain Oracle Database platforms only. Check the Oracle Database Licensing Guide for more information

Plugzilla!

Cloning a pluggable database takes time, and for environments where you’d like to use clones as part of unit testing, or other elements of Agile development, it would be nice to be able to bring a clone into operation in the smallest time possible. One mechanism for that is sparse storage clones aka snapshot copy, but depending on your database version and your storage infrastructure, you might hit some limitations.

Enter …. Plugzilla! This PL/SQL package allows you clone pluggable databases extremely quickly by having pluggable database pre-cloned in advance.

Example

Lets say you have a development pluggable database called PDB1. You want to let developers take clones of this as quickly and as often as they like and at various stages in its life cycle. Here is how we might do it with plugzilla.

On Sunday June 1st, we’ve just built (say) version 3.1 of our app into PDB1. I want a frozen copy of PDB1 that can be used as a seed for developers to clone from. So I’ll do:


SQL> set serverout on
SQL> exec plugzilla.new_seed_from_existing(p_source=>'PDB1',p_seed=>'PDB31');
alter session set container = cdb$root
seed=SEED_PDB31
src=PDB1,mode=READ WRITE
alter pluggable database PDB1 close immediate
alter pluggable database PDB1 open read only
create pluggable database SEED_PDB31 from PDB1 file_name_convert=('PDB1','SEED_PDB31')
alter pluggable database SEED_PDB31 open restricted
alter pluggable database SEED_PDB31 close immediate
alter pluggable database SEED_PDB31 open read only
alter pluggable database PDB1 close immediate
alter pluggable database PDB1 open
alter session set container = cdb$root

PL/SQL procedure successfully completed.

This will create a pluggable database called SEED_PDB31 (all the seeds will have the prefix “SEED_”, but see the package constants in the source code if you want to change this).

On Wednesday June 4th, we’ve just built (say) version 3.2 of our app into PDB1. I again want a frozen copy of PDB1 that can be used as a seed for developers to clone from. So I’ll do:


SQL> exec plugzilla.new_seed_from_existing(p_source=>'PDB1',p_seed=>'PDB32');
alter session set container = cdb$root
seed=SEED_PDB32
src=PDB1,mode=READ WRITE
alter pluggable database PDB1 close immediate
alter pluggable database PDB1 open read only
create pluggable database SEED_PDB32 from PDB1 file_name_convert=('PDB1','SEED_PDB32')
alter pluggable database SEED_PDB32 open restricted
alter pluggable database SEED_PDB32 close immediate
alter pluggable database SEED_PDB32 open read only
alter pluggable database PDB1 close immediate
alter pluggable database PDB1 open
alter session set container = cdb$root

PL/SQL procedure successfully completed.

So now we have two seed copies of PDB1 at different points in time. These are the pluggables that form the base for any developer to clone from.

We now call plugzilla.preclone (although more likely is that you would have this as a scheduler job). This will look for any seeds (we have 2 from above) and pre-create ‘n’ pluggable copies of those databases where ‘n is defined by the package constant ‘g_reserve_copies’


SQL> exec plugzilla.preclone
alter session set container = cdb$root
Processing seed: SEED_PDB31
- pending clones: 0
- building 3 pre-clones
create pluggable database PEND_PDB3100001 from SEED_PDB31 file_name_convert=('SEED_PDB31','PEND_PDB3100001')
create pluggable database PEND_PDB3100002 from SEED_PDB31 file_name_convert=('SEED_PDB31','PEND_PDB3100002')
create pluggable database PEND_PDB3100003 from SEED_PDB31 file_name_convert=('SEED_PDB31','PEND_PDB3100003')
Processing seed: SEED_PDB32
- pending clones: 0
- building 3 pre-clones
create pluggable database PEND_PDB3200001 from SEED_PDB32 file_name_convert=('SEED_PDB32','PEND_PDB3200001')
create pluggable database PEND_PDB3200002 from SEED_PDB32 file_name_convert=('SEED_PDB32','PEND_PDB3200002')
create pluggable database PEND_PDB3200003 from SEED_PDB32 file_name_convert=('SEED_PDB32','PEND_PDB3200003')
alter session set container = cdb$root

PL/SQL procedure successfully completed.

These “pending” clones are fully operational pluggables that are yet to be claimed by a developer. They are pre-created so that when a developer wants a clone, they can do it very quickly.

So when a developer wants a sandbox pluggable of the application as of version 3.1. They simply ask for one


SQL> exec plugzilla.clone_from_seed('PDB31')
alter session set container = cdb$root
Found pending pdb PEND_PDB3100001 to use
alter pluggable database PEND_PDB3100001 open restricted
alter session set container = PEND_PDB3100001
alter pluggable database PEND_PDB3100001 rename global_name to PDB3100001
alter pluggable database PDB3100001 close immediate
alter pluggable database PDB3100001 open
alter session set container = cdb$root

PL/SQL procedure successfully completed.

The first available pending pluggable database is picked from the list and renamed to PDB31001. This is an automatically generated name, but developers can choose their own. Because this is just a rename, irrespective of size, the developer will have their pluggable database available to them within 5-10 seconds.

If we want a second sandbox clone, this time with a custom name, I’ll simply run the routine again.


SQL> exec plugzilla.clone_from_seed('PDB31','MYDB') 
alter session set container = cdb$root
Found pending pdb PEND_PDB3100002 to use
alter pluggable database PEND_PDB3100002 open restricted
alter session set container = PEND_PDB3100002
alter pluggable database PEND_PDB3100002 rename global_name to MYDB
alter pluggable database MYDB close immediate
alter pluggable database MYDB open
alter session set container = cdb$root

PL/SQL procedure successfully completed.

When the developer is done with their clone, they drop it.


SQL> exec plugzilla.drop_clone('MYDB')
alter session set container = cdb$root
Dropping clone MYDB
alter pluggable database MYDB close immediate
drop pluggable database MYDB including datafiles
alter session set container = cdb$root

PL/SQL procedure successfully completed.

Note that this does not “return” the pluggable to the pool of available pluggables, because it could contain changes which means it will have diverged from its initial seed. It is completely dropped and the space freed up. It is ‘preclone’ alone that keeps a pre-allocation of pluggables available. Because the numeric suffix continues to rise, there is a cap of 99999 pluggables that could be created. If your application is not deployed by then, you’ve got bigger issues to worry about Smile .

At any time, the table PLUGZILLA_META contains the state of plugzilla. After the above operations, it would look like this:


SQL> select * from plugzilla_meta;

RELATIONSHIP                   CHILD                PARENT
------------------------------ -------------------- -----------------
SEED_FROM_PDB                  SEED_PDB31           PDB1                 => we took a seed PDB31 from PDB1
SEED_FROM_PDB                  SEED_PDB32           PDB1                 => we took a seed PDB32 from PDB1
PENDING_CLONE                  PEND_PDB3100003      SEED_PDB31           => preclone built these
PENDING_CLONE                  PEND_PDB3200001      SEED_PDB32
PENDING_CLONE                  PEND_PDB3200002      SEED_PDB32
PENDING_CLONE                  PEND_PDB3200003      SEED_PDB32
CLONE_FROM_SEED                PDB3100001           SEED_PDB31           => we took a preclone and converted it to a clone

Notes

  • There are many many options in terms of cloning for pluggable databases. This package goes with the Keep-It-Simple policy. It is going to clone pluggables by making the source read only cloning the datafiles replacing existing pluggable name with a new one. You are of course free to modify the package to choose a cloning mechanism that best suits your needs.
  • Don’t forget – you’re messing with pluggable databases here. Don’t be THAT person that drops the wrong data!
  • When a pending pluggable becomes owned by a developer, the files are not being moved or renamed. This is done to keep the operation nice and snappy. A datafile move (done from within the database) is a copy operation. You could conceivably extend the package to call out to the OS to perform the appropriate “move” commands if you really wanted those file names changed, but Plugzilla doesn’t do that (currently).

Download plugzilla source code here

Autonomous Transaction Processing – your slice of the pie

I grabbed the following screen shot from a slide deck I’ve been giving about Autonomous Transaction Processing (ATP). It shows the services that are made available to you once you create your database. At first glance, it looks like we have a simple tier, where the lower in the tier you are, the smaller the slice of the database resource “pie” you get.

image

And this is true in terms of the share of resources that are assigned to each service. But it is also important to note the 3rd column, namely, the parallelism capacities that are assigned. This is the level of parallel capabilities that are active on that service. For example, here’s a simple little SQL demo that creates a table, and performs a couple of set-based deletes on that table. I’ll start by connecting to the LOW service.


SQL> set echo on
SQL> connect xxx/xxx@myservice_low
Connected.
SQL> drop table t purge;

Table T dropped.

SQL> create table t as select * from all_objects;

Table T created.

SQL> delete from t where rownum <= 1000;

1,000 rows deleted.

SQL> delete from t where rownum <= 1000;

1,000 rows deleted.

I might think to myself “Hmmm…I want to make sure that these deletes get a larger slice of the pie, so I’ll use the MEDIUM service instead”. Let’s see what happens when I do that.


SQL> connect xxx/xxx@myservice_medium
Connected.
SQL> drop table t purge;

Table T dropped.

SQL> create table t as select * from all_objects;

Table T created.

SQL> delete from t where rownum <= 1000;

1,000 rows deleted.

SQL> delete from t where rownum <= 1000;

Error starting at line : 5 in command -
delete from t where rownum <= 1000
Error report -
ORA-12838: cannot read/modify an object after modifying it in parallel

My script now crashes Sad smile. So its important to be aware of the parallelism facilities defined for MEDIUM and HIGH. We’ll do operations in parallel whenever possible. You get faster queries and faster DML but there are implications on the way you write your scripts, the locking that will result and how commit processing must be handled. HIGH and MEDIUM are not just “bigger” versions of the LOW service.

This is all outlined in the docs:

Predefined Database Service Names for Autonomous Transaction

The predefined service names provide different levels of performance and concurrency for Autonomous Transaction Processing.

  • tpurgent: The highest priority application connection service for time critical transaction processing operations. This connection service supports manual parallelism.
  • tp: A typical application connection service for transaction processing operations. This connection service does not run with parallelism.
  • high: A high priority application connection service for reporting and batch operations. All operations run in parallel and are subject to queuing.
  • medium: A typical application connection service for reporting and batch operations. All operations run in parallel and are subject to queuing. Using this service the degree of parallelism is limited to four (4).
  • low: A lowest priority application connection service for reporting or batch processing operations. This connection service does not run with parallelism.

TL;DR: For standard transactional activities on ATP, make sure it uses the TP or TPURGENT services. If you need faster performance for volume operations, then HIGH and MEDIUM are your friend, but understand the locking and commit implications.

Take a COPY out of SQLcl’s book

As the world continues to get smaller, it makes sense for all databases to be defined to handle more than just US7ASCII, which is why the default characterset for all recent versions of the Oracle database is AL32UTF8. In that way, we can handle the various challenges when it comes to languages other than english.


SQL> create table t ( x varchar2(30 char));

Table created.

SQL> insert into t values ('안녕미미 99~12900원 시즌오프 원피');

1 row created.

But lets see what happens when we copy that data using some legacy facilities in SQL Plus.


SQL> COPY TO scott/tiger@db18_pdb1 CREATE t1 USING SELECT * FROM t;

Array fetch/bind size is 15. (arraysize is 15)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)
Table T1 created.

   1 rows selected from DEFAULT HOST connection.
   1 rows inserted into T1.
   1 rows committed into T1 at scott@db18_pdb1.

SQL> desc scott.t1
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 X                                      VARCHAR2(120)

Notice that the definition of table T1 does not match that of the source. The COPY command in SQL Plus dates back to a time before CHAR and BYTE semantics existed, and we even stress that in the documentation.

“12.16 COPY.
The COPY command is not being enhanced to handle datatypes or features introduced with, or after Oracle8i. The COPY command is likely to be deprecated in a future release.”

Luckily, we of course have a newer version of SQL Plus, namely SQLcl. Running the same command under SQLcl is the easy solution here.


C:\>sqlcl

SQLcl: Release 19.2.1 Production on Mon Aug 05 10:46:43 2019

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Mon Aug 05 2019 10:46:43 +08:00

Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.6.0.0.0

SQL> COPY TO scott/tiger@db18_pdb1 CREATE t2 USING SELECT * FROM t;
Array fetch/bind size is 15 (less if there are longs or blobs). (arraysize is 15)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)
Table T2 created.
    1 rows selected from DEFAULT HOST connection.
    1 rows inserted into T2.
    1 rows committed into T2 at scott@db18_pdb1.

SQL> desc scott.t2
Name Null? Type
---- ----- -----------------
X          VARCHAR2(30 CHAR)

Statistics on Load

One of the talks I gave recently on the Groundbreaker Yatra tour was on the changes that arrive with the optimizer when you move to 12c and above. We talked about changes to histograms, all of the adaptive features etc, but also, one of the simplest but coolest optimizer improvements when it comes to loading data. When you perform a direct-path insert into an empty table, or a create-table-as-select (CTAS) operation, there is no need to perform an additional DBMS_STATS call to gather statistics at the conclusion of the load. The load process itself handles that for you. Here’s a simple example of that in action


SQL> create table t (id primary key, tag)
  2  as select rownum id, to_char(rownum) tag
  3  from dual connect by level <= 50000;

Table created.

SQL>
SQL> select num_rows from user_tables
  2  where table_name = 'T';

  NUM_ROWS
----------
     50000

1 row selected.

SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T';

LEAF_BLOCKS
-----------
        103

1 row selected.

Of course, many people are now aware of this functionality, so why does it warrant a blog post? Mainly to make readers aware of a boundary case. If the table you are creating or loading is an index-organized table (IOT), then you do not get the statistics gathered automatically.


SQL> drop table t purge;

Table dropped.

SQL>
SQL> create table t (id primary key, tag)
  2  organization index
  3  as select rownum id, to_char(rownum) tag
  4  from dual connect by level <= 50000;

Table created.

SQL>
SQL> select num_rows from user_tables
  2  where table_name = 'T';

  NUM_ROWS
----------


1 row selected.

SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T';

LEAF_BLOCKS
-----------


1 row selected.

Initially I suspected that the reason for this is that since the statistics that are being picked up on the fly are being collected on the source data, it only becomes possible for those statistics to be transposed if the target table structure is the same as the source. But if that were (strictly) the case, then other examples would exhibit a similar restriction, such as going from compressed data to uncompressed or vice versa. But you can see that the CTAS on load statistics are fine in this circumstance:


SQL> create table t1 (id primary key, tag)
  2  compress
  3  as select * from t;

Table created.

SQL>
SQL> select num_rows from user_tables
  2  where table_name = 'T1';

  NUM_ROWS
----------
     99999

SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T1';

LEAF_BLOCKS
-----------
        208

Similarly, even if I am copying from an identical index-organized table, the statistics will not be collected.


SQL> create table t2 (id primary key, tag)
  2  organization index
  3  as select * from t;

Table created.

SQL> select num_rows from user_tables
  2  where table_name = 'T2';

  NUM_ROWS
----------


SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T2';

LEAF_BLOCKS
-----------

Whatever the root cause is, just be aware, that if you are direct-path loading IOTs, then make sure you take the time to to also collect statistics on that table before you start querying it in earnest.