No more stale statistics in 19c

There is an odd contradiction that we all encounter for most databases, especially if they are predominantly used during the business day. Here is how that contradiction comes to be – it is in the way that we obtain and use optimizer  statistics on those databases. The contradiction runs like this:

  • To minimize service disruption, we gather statistics at a quiet time, for example, in the middle of the night
  • We then use those statistics during the business day whilst user activity is at its highest.
  • Highest user activity will typically mean the highest frequency of data changes.
  • Hence the statistics are at their peak accuracy when no-one is using them to optimize queries, and they are at their least accurate when everyone is using them to optimize queries!

We can demonstrate this easily with the following script run in 18c.


SQL> select banner from v$version where rownum = 1 ;

BANNER
--------------------------------------------------------------------------------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production

1 row selected.

SQL> create table t (
  2    id number(10),
  3    num number(10),
  4    constraint t_pk primary key ( id )
  5  );

Table created.

SQL> insert into t select rownum,rownum from dual connect by rownum<=10000;

10000 rows created.

SQL> exec dbms_stats.gather_table_stats(user,'t')

PL/SQL procedure successfully completed.

SQL> select stale_stats from user_tab_statistics
  2  where  table_name = 'T';

STALE_S
-------
NO

So far I’ve mimicked a table that has been populated with (say) a days worth of data (10,000) rows, and I’ve gathered statistics at the end of the business day so that my statistics reflect the current data in the table.



SQL> select table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

TABLE_NAME NUM_ROWS     BLOCKS
---------- -------- ----------
T             10000         20

1 row selected.

SQL> select table_name, column_name, low_value, high_value, num_distinct
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE      NUM_DISTINCT
---------- -------------------- --------------- --------------- ------------
T          ID                   C102            C302                   10000
T          NUM                  C102            C302                   10000

2 rows selected.

Hence, if I was to optimize and run a query now, I can expect an excellent estimate from the optimizer. The query below returns a COUNT of 1000 rows, and we can see from the PLAN_TABLE output, that the estimated rows was also 1000. So a perfect estimate by the optimizer!



SQL> select max(num),count(*)
  2  from t where id > 9000;

  MAX(NUM)   COUNT(*)
---------- ----------
     10000       1000

1 row selected.

SQL> select *
  2  from table(dbms_xplan.display_cursor(format=>'typical'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------

---------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |       |       |
|   1 |  SORT AGGREGATE                      |      |     1 |     8 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |  1000 |  8000 |
|*  3 |    INDEX RANGE SCAN                  | T_PK |  1000 |       |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID">9000)


20 rows selected.

But of course, the next business day rolls on, and by mid-morning we may have added 1000 more rows to our table, bringing the total number of rows to 11000.


SQL> insert into t select rownum+10000,rownum+10000 from dual connect by rownum<=1000;

1000 rows created.

SQL> commit;

Commit complete.

Even with an explicit flush of the database monitoring information, we not yet reached the threshold where this tables statistics would be considered stale, so even a GATHER_STALE operation would have no effect if we ran one. The statistics still reflect the data from last night.


SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

SQL> select stale_stats from user_tab_statistics
  2  where  table_name = 'T';

STALE_S
-------
NO

SQL> select table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

TABLE_NAME NUM_ROWS     BLOCKS
---------- -------- ----------
T             10000         20

1 row selected.

SQL> select table_name, column_name, low_value, high_value
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE
---------- -------------------- --------------- ---------------
T          ID                   C102            C302
T          NUM                  C102            C302

2 rows selected.

When I run my same query, and don’t forget, this would now be during the business day, when it is critical that I get a good optimizer plan, you can see that the result is now 2000 rows, but the plan estimate is left languishing at its original 1000 rows because the statistics no long reflect the current state of the table.


SQL> select max(num),count(*)
  2  from t where id > 9000;

  MAX(NUM)   COUNT(*)
---------- ----------
     11000       2000

1 row selected.

SQL> select *
  2  from table(dbms_xplan.display_cursor(format=>'typical'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------
SQL_ID  8juuu5warw2z6, child number 0
-------------------------------------
select max(num),count(*) from t where id > 9000

Plan hash value: 2053823973

---------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |       |       |
|   1 |  SORT AGGREGATE                      |      |     1 |     8 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |  1000 |  8000 |
|*  3 |    INDEX RANGE SCAN                  | T_PK |  1000 |       |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID">9000)


20 rows selected.

Having great statistics that are not used, and poor statistics that are used all the time seems a nonsensical way to do things, but there really hasn’t been any alternative unless you were prepared to consume precious server resources to collect statistics whilst your users are performing their transactions. I have known customers to do this in the past, but of course, it takes careful monitoring and management to ensure that the slicing up of the resource “pie” is done appropriately to keep the right balance between business needs and background database tasks.

19c brings a nifty solution to this contradiction by having the database automatically maintain details about how data is changing in a table via a feature called Real Time Statistics. DML activities can be tracked, and that information can then be looped back as input into future query parsing. Here’s the same example from above, this time performed in 19c.

The same table is seeded with the same initial 10000 rows as before. I have included a new column in my data dictionary queries called NOTES. You will see the significance of that shortly.



SQL> select banner from v$version where rownum = 1;

BANNER
--------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

1 row selected.

SQL>
SQL> create table t (
  2    id number(10),
  3    num number(10),
  4    constraint t_pk primary key ( id )
  5  );

Table created.

SQL> insert into t select rownum,rownum from dual connect by rownum<=10000;

10000 rows created.

SQL> exec dbms_stats.gather_table_stats(user,'t')

PL/SQL procedure successfully completed.

SQL> select stale_stats from user_tab_statistics
  2  where  table_name = 'T';

STALE_S
-------
NO

1 row selected.

SQL> select notes, table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME NUM_ROWS     BLOCKS
-------------------------------- ---------- -------- ----------
                                 T             10000         20

1 row selected.

SQL> select notes, table_name, column_name, low_value, high_value, num_distinct
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE      NUM_DISTINCT
-------------------------------- ---------- -------------------- --------------- --------------- ------------
                                 T          ID                   C102            C302                   10000
                                 T          NUM                  C102            C302                   10000

And since this is a simple query, the optimizer performs exactly as per 18c for its estimate. Once again, at this point, where the data and statistics are in alignment, the estimate is perfect.


                                 
SQL> select max(num),count(*)
  2  from t where id > 9000;

  MAX(NUM)   COUNT(*)
---------- ----------
     10000       1000

1 row selected.                                 


SQL> select *
  2  from table(dbms_xplan.display_cursor(format=>'typical'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |       |       |     5 (100)|
|   1 |  SORT AGGREGATE                      |      |     1 |     8 |            |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |  1000 |  8000 |     5   (0)|
|*  3 |    INDEX RANGE SCAN                  | T_PK |  1000 |       |     3   (0)|
----------------------------------------------------------------------------------

The business day commences, and a fresh 1000 rows are added to the database.



SQL> insert into t select rownum+10000,rownum+10000 from dual connect by rownum<=1000;

1000 rows created.

SQL> commit;

Commit complete.

And just like before, this is not enough for the statistics on the table to be considered stale by the database



SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

SQL>
SQL> select stale_stats from user_tab_statistics
  2  where  table_name = 'T';

STALE_S
-------
NO

But, here is where the 19c enhancements come into play. When we query the data dictionary, we now have TWO rows of statistics for this single table.



SQL> select notes, table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME NUM_ROWS     BLOCKS
-------------------------------- ---------- -------- ----------
                                 T             10000         20
STATS_ON_CONVENTIONAL_DML        T             11000         20

2 rows selected.

The first row is the original statistics we gathered. The second row is new statistics we have derived from the normal DML activities that have occurred on the database. This is not a full GATHER operation, it is simply taking advantage of simple mathematics, eg if I have 10000 rows and then insert 50 and deleted 20, then I would have 10030 rows etc. The real time statistics can also keep track of simple column level statistics, like whether the low and high water mark values have changed.



SQL> select notes, table_name, column_name, low_value, high_value
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE
-------------------------------- ---------- -------------------- --------------- ----------
                                 T          ID                   C102            C302
                                 T          NUM                  C102            C302
STATS_ON_CONVENTIONAL_DML        T          ID                   C102            C302094A
STATS_ON_CONVENTIONAL_DML        T          NUM                  C102            C302094A

The real test is whether these statistics can be beneficial to our sample query. I’ll run that through again.



SQL> select max(num),count(*)
  2  from t where id > 9000;

  MAX(NUM)   COUNT(*)
---------- ----------
     11000       2000

1 row selected.

SQL> select *
  2  from table(dbms_xplan.display_cursor(format=>'typical'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     7 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |  1895 | 15160 |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("ID">9000)

Note
-----
   - dynamic statistics used: statistics for conventional DML

Notice the estimate of 1895 is much closer now to the true value of 2000. You can expect a little variation here from the true value, because as I mentioned before, real time statistics is not about performing a full gather operation on the table – that would be far too resource hungry for every single DML. So some elements of the statistics (for example, the number of distinct values, or histogram distribution) are not feasible to keep up to date every time a DML is performed. But even having the improved row counts and column extrema have yielded benefit on the execution plan. When the normal nightly gather job comes along, these DML statistics are no longer relevant and will be expunged


SQL> exec dbms_stats.gather_table_stats(user,'t')

PL/SQL procedure successfully completed.

SQL> select notes, table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME NUM_ROWS     BLOCKS
-------------------------------- ---------- -------- ----------
                                 T             11000         20

1 row selected.

SQL> select notes, table_name, column_name, low_value, high_value
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE
-------------------------------- ---------- -------------------- --------------- ----------
                                 T          ID                   C102            C3020B
                                 T          NUM                  C102            C3020B

2 rows selected.

There are other enhancements in 19c to keep the statistics in the dictionary closer to the true data in database tables which you can read about here.

Full disclosure: Some 19c new features are available on certain Oracle Database platforms only. Check the Oracle Database Licensing Guide for more information

Plugzilla!

Cloning a pluggable database takes time, and for environments where you’d like to use clones as part of unit testing, or other elements of Agile development, it would be nice to be able to bring a clone into operation in the smallest time possible. One mechanism for that is sparse storage clones aka snapshot copy, but depending on your database version and your storage infrastructure, you might hit some limitations.

Enter …. Plugzilla! This PL/SQL package allows you clone pluggable databases extremely quickly by having pluggable database pre-cloned in advance.

Example

Lets say you have a development pluggable database called PDB1. You want to let developers take clones of this as quickly and as often as they like and at various stages in its life cycle. Here is how we might do it with plugzilla.

On Sunday June 1st, we’ve just built (say) version 3.1 of our app into PDB1. I want a frozen copy of PDB1 that can be used as a seed for developers to clone from. So I’ll do:


SQL> set serverout on
SQL> exec plugzilla.new_seed_from_existing(p_source=>'PDB1',p_seed=>'PDB31');
alter session set container = cdb$root
seed=SEED_PDB31
src=PDB1,mode=READ WRITE
alter pluggable database PDB1 close immediate
alter pluggable database PDB1 open read only
create pluggable database SEED_PDB31 from PDB1 file_name_convert=('PDB1','SEED_PDB31')
alter pluggable database SEED_PDB31 open restricted
alter pluggable database SEED_PDB31 close immediate
alter pluggable database SEED_PDB31 open read only
alter pluggable database PDB1 close immediate
alter pluggable database PDB1 open
alter session set container = cdb$root

PL/SQL procedure successfully completed.

This will create a pluggable database called SEED_PDB31 (all the seeds will have the prefix “SEED_”, but see the package constants in the source code if you want to change this).

On Wednesday June 4th, we’ve just built (say) version 3.2 of our app into PDB1. I again want a frozen copy of PDB1 that can be used as a seed for developers to clone from. So I’ll do:


SQL> exec plugzilla.new_seed_from_existing(p_source=>'PDB1',p_seed=>'PDB32');
alter session set container = cdb$root
seed=SEED_PDB32
src=PDB1,mode=READ WRITE
alter pluggable database PDB1 close immediate
alter pluggable database PDB1 open read only
create pluggable database SEED_PDB32 from PDB1 file_name_convert=('PDB1','SEED_PDB32')
alter pluggable database SEED_PDB32 open restricted
alter pluggable database SEED_PDB32 close immediate
alter pluggable database SEED_PDB32 open read only
alter pluggable database PDB1 close immediate
alter pluggable database PDB1 open
alter session set container = cdb$root

PL/SQL procedure successfully completed.

So now we have two seed copies of PDB1 at different points in time. These are the pluggables that form the base for any developer to clone from.

We now call plugzilla.preclone (although more likely is that you would have this as a scheduler job). This will look for any seeds (we have 2 from above) and pre-create ‘n’ pluggable copies of those databases where ‘n is defined by the package constant ‘g_reserve_copies’


SQL> exec plugzilla.preclone
alter session set container = cdb$root
Processing seed: SEED_PDB31
- pending clones: 0
- building 3 pre-clones
create pluggable database PEND_PDB3100001 from SEED_PDB31 file_name_convert=('SEED_PDB31','PEND_PDB3100001')
create pluggable database PEND_PDB3100002 from SEED_PDB31 file_name_convert=('SEED_PDB31','PEND_PDB3100002')
create pluggable database PEND_PDB3100003 from SEED_PDB31 file_name_convert=('SEED_PDB31','PEND_PDB3100003')
Processing seed: SEED_PDB32
- pending clones: 0
- building 3 pre-clones
create pluggable database PEND_PDB3200001 from SEED_PDB32 file_name_convert=('SEED_PDB32','PEND_PDB3200001')
create pluggable database PEND_PDB3200002 from SEED_PDB32 file_name_convert=('SEED_PDB32','PEND_PDB3200002')
create pluggable database PEND_PDB3200003 from SEED_PDB32 file_name_convert=('SEED_PDB32','PEND_PDB3200003')
alter session set container = cdb$root

PL/SQL procedure successfully completed.

These “pending” clones are fully operational pluggables that are yet to be claimed by a developer. They are pre-created so that when a developer wants a clone, they can do it very quickly.

So when a developer wants a sandbox pluggable of the application as of version 3.1. They simply ask for one


SQL> exec plugzilla.clone_from_seed('PDB31')
alter session set container = cdb$root
Found pending pdb PEND_PDB3100001 to use
alter pluggable database PEND_PDB3100001 open restricted
alter session set container = PEND_PDB3100001
alter pluggable database PEND_PDB3100001 rename global_name to PDB3100001
alter pluggable database PDB3100001 close immediate
alter pluggable database PDB3100001 open
alter session set container = cdb$root

PL/SQL procedure successfully completed.

The first available pending pluggable database is picked from the list and renamed to PDB31001. This is an automatically generated name, but developers can choose their own. Because this is just a rename, irrespective of size, the developer will have their pluggable database available to them within 5-10 seconds.

If we want a second sandbox clone, this time with a custom name, I’ll simply run the routine again.


SQL> exec plugzilla.clone_from_seed('PDB31','MYDB') 
alter session set container = cdb$root
Found pending pdb PEND_PDB3100002 to use
alter pluggable database PEND_PDB3100002 open restricted
alter session set container = PEND_PDB3100002
alter pluggable database PEND_PDB3100002 rename global_name to MYDB
alter pluggable database MYDB close immediate
alter pluggable database MYDB open
alter session set container = cdb$root

PL/SQL procedure successfully completed.

When the developer is done with their clone, they drop it.


SQL> exec plugzilla.drop_clone('MYDB')
alter session set container = cdb$root
Dropping clone MYDB
alter pluggable database MYDB close immediate
drop pluggable database MYDB including datafiles
alter session set container = cdb$root

PL/SQL procedure successfully completed.

Note that this does not “return” the pluggable to the pool of available pluggables, because it could contain changes which means it will have diverged from its initial seed. It is completely dropped and the space freed up. It is ‘preclone’ alone that keeps a pre-allocation of pluggables available. Because the numeric suffix continues to rise, there is a cap of 99999 pluggables that could be created. If your application is not deployed by then, you’ve got bigger issues to worry about Smile .

At any time, the table PLUGZILLA_META contains the state of plugzilla. After the above operations, it would look like this:


SQL> select * from plugzilla_meta;

RELATIONSHIP                   CHILD                PARENT
------------------------------ -------------------- -----------------
SEED_FROM_PDB                  SEED_PDB31           PDB1                 => we took a seed PDB31 from PDB1
SEED_FROM_PDB                  SEED_PDB32           PDB1                 => we took a seed PDB32 from PDB1
PENDING_CLONE                  PEND_PDB3100003      SEED_PDB31           => preclone built these
PENDING_CLONE                  PEND_PDB3200001      SEED_PDB32
PENDING_CLONE                  PEND_PDB3200002      SEED_PDB32
PENDING_CLONE                  PEND_PDB3200003      SEED_PDB32
CLONE_FROM_SEED                PDB3100001           SEED_PDB31           => we took a preclone and converted it to a clone

Notes

  • There are many many options in terms of cloning for pluggable databases. This package goes with the Keep-It-Simple policy. It is going to clone pluggables by making the source read only cloning the datafiles replacing existing pluggable name with a new one. You are of course free to modify the package to choose a cloning mechanism that best suits your needs.
  • Don’t forget – you’re messing with pluggable databases here. Don’t be THAT person that drops the wrong data!
  • When a pending pluggable becomes owned by a developer, the files are not being moved or renamed. This is done to keep the operation nice and snappy. A datafile move (done from within the database) is a copy operation. You could conceivably extend the package to call out to the OS to perform the appropriate “move” commands if you really wanted those file names changed, but Plugzilla doesn’t do that (currently).

Download plugzilla source code here

Autonomous Transaction Processing – your slice of the pie

I grabbed the following screen shot from a slide deck I’ve been giving about Autonomous Transaction Processing (ATP). It shows the services that are made available to you once you create your database. At first glance, it looks like we have a simple tier, where the lower in the tier you are, the smaller the slice of the database resource “pie” you get.

image

And this is true in terms of the share of resources that are assigned to each service. But it is also important to note the 3rd column, namely, the parallelism capacities that are assigned. This is the level of parallel capabilities that are active on that service. For example, here’s a simple little SQL demo that creates a table, and performs a couple of set-based deletes on that table. I’ll start by connecting to the LOW service.


SQL> set echo on
SQL> connect xxx/xxx@myservice_low
Connected.
SQL> drop table t purge;

Table T dropped.

SQL> create table t as select * from all_objects;

Table T created.

SQL> delete from t where rownum <= 1000;

1,000 rows deleted.

SQL> delete from t where rownum <= 1000;

1,000 rows deleted.

I might think to myself “Hmmm…I want to make sure that these deletes get a larger slice of the pie, so I’ll use the MEDIUM service instead”. Let’s see what happens when I do that.


SQL> connect xxx/xxx@myservice_medium
Connected.
SQL> drop table t purge;

Table T dropped.

SQL> create table t as select * from all_objects;

Table T created.

SQL> delete from t where rownum <= 1000;

1,000 rows deleted.

SQL> delete from t where rownum <= 1000;

Error starting at line : 5 in command -
delete from t where rownum <= 1000
Error report -
ORA-12838: cannot read/modify an object after modifying it in parallel

My script now crashes Sad smile. So its important to be aware of the parallelism facilities defined for MEDIUM and HIGH. We’ll do operations in parallel whenever possible. You get faster queries and faster DML but there are implications on the way you write your scripts, the locking that will result and how commit processing must be handled. HIGH and MEDIUM are not just “bigger” versions of the LOW service.

This is all outlined in the docs:

Predefined Database Service Names for Autonomous Transaction

The predefined service names provide different levels of performance and concurrency for Autonomous Transaction Processing.

  • tpurgent: The highest priority application connection service for time critical transaction processing operations. This connection service supports manual parallelism.
  • tp: A typical application connection service for transaction processing operations. This connection service does not run with parallelism.
  • high: A high priority application connection service for reporting and batch operations. All operations run in parallel and are subject to queuing.
  • medium: A typical application connection service for reporting and batch operations. All operations run in parallel and are subject to queuing. Using this service the degree of parallelism is limited to four (4).
  • low: A lowest priority application connection service for reporting or batch processing operations. This connection service does not run with parallelism.

TL;DR: For standard transactional activities on ATP, make sure it uses the TP or TPURGENT services. If you need faster performance for volume operations, then HIGH and MEDIUM are your friend, but understand the locking and commit implications.

Take a COPY out of SQLcl’s book

As the world continues to get smaller, it makes sense for all databases to be defined to handle more than just US7ASCII, which is why the default characterset for all recent versions of the Oracle database is AL32UTF8. In that way, we can handle the various challenges when it comes to languages other than english.


SQL> create table t ( x varchar2(30 char));

Table created.

SQL> insert into t values ('안녕미미 99~12900원 시즌오프 원피');

1 row created.

But lets see what happens when we copy that data using some legacy facilities in SQL Plus.


SQL> COPY TO scott/tiger@db18_pdb1 CREATE t1 USING SELECT * FROM t;

Array fetch/bind size is 15. (arraysize is 15)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)
Table T1 created.

   1 rows selected from DEFAULT HOST connection.
   1 rows inserted into T1.
   1 rows committed into T1 at scott@db18_pdb1.

SQL> desc scott.t1
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 X                                      VARCHAR2(120)

Notice that the definition of table T1 does not match that of the source. The COPY command in SQL Plus dates back to a time before CHAR and BYTE semantics existed, and we even stress that in the documentation.

“12.16 COPY.
The COPY command is not being enhanced to handle datatypes or features introduced with, or after Oracle8i. The COPY command is likely to be deprecated in a future release.”

Luckily, we of course have a newer version of SQL Plus, namely SQLcl. Running the same command under SQLcl is the easy solution here.


C:\>sqlcl

SQLcl: Release 19.2.1 Production on Mon Aug 05 10:46:43 2019

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Mon Aug 05 2019 10:46:43 +08:00

Connected to:
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production
Version 18.6.0.0.0

SQL> COPY TO scott/tiger@db18_pdb1 CREATE t2 USING SELECT * FROM t;
Array fetch/bind size is 15 (less if there are longs or blobs). (arraysize is 15)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)
Table T2 created.
    1 rows selected from DEFAULT HOST connection.
    1 rows inserted into T2.
    1 rows committed into T2 at scott@db18_pdb1.

SQL> desc scott.t2
Name Null? Type
---- ----- -----------------
X          VARCHAR2(30 CHAR)

Statistics on Load

One of the talks I gave recently on the Groundbreaker Yatra tour was on the changes that arrive with the optimizer when you move to 12c and above. We talked about changes to histograms, all of the adaptive features etc, but also, one of the simplest but coolest optimizer improvements when it comes to loading data. When you perform a direct-path insert into an empty table, or a create-table-as-select (CTAS) operation, there is no need to perform an additional DBMS_STATS call to gather statistics at the conclusion of the load. The load process itself handles that for you. Here’s a simple example of that in action


SQL> create table t (id primary key, tag)
  2  as select rownum id, to_char(rownum) tag
  3  from dual connect by level <= 50000;

Table created.

SQL>
SQL> select num_rows from user_tables
  2  where table_name = 'T';

  NUM_ROWS
----------
     50000

1 row selected.

SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T';

LEAF_BLOCKS
-----------
        103

1 row selected.

Of course, many people are now aware of this functionality, so why does it warrant a blog post? Mainly to make readers aware of a boundary case. If the table you are creating or loading is an index-organized table (IOT), then you do not get the statistics gathered automatically.


SQL> drop table t purge;

Table dropped.

SQL>
SQL> create table t (id primary key, tag)
  2  organization index
  3  as select rownum id, to_char(rownum) tag
  4  from dual connect by level <= 50000;

Table created.

SQL>
SQL> select num_rows from user_tables
  2  where table_name = 'T';

  NUM_ROWS
----------


1 row selected.

SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T';

LEAF_BLOCKS
-----------


1 row selected.

Initially I suspected that the reason for this is that since the statistics that are being picked up on the fly are being collected on the source data, it only becomes possible for those statistics to be transposed if the target table structure is the same as the source. But if that were (strictly) the case, then other examples would exhibit a similar restriction, such as going from compressed data to uncompressed or vice versa. But you can see that the CTAS on load statistics are fine in this circumstance:


SQL> create table t1 (id primary key, tag)
  2  compress
  3  as select * from t;

Table created.

SQL>
SQL> select num_rows from user_tables
  2  where table_name = 'T1';

  NUM_ROWS
----------
     99999

SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T1';

LEAF_BLOCKS
-----------
        208

Similarly, even if I am copying from an identical index-organized table, the statistics will not be collected.


SQL> create table t2 (id primary key, tag)
  2  organization index
  3  as select * from t;

Table created.

SQL> select num_rows from user_tables
  2  where table_name = 'T2';

  NUM_ROWS
----------


SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T2';

LEAF_BLOCKS
-----------

Whatever the root cause is, just be aware, that if you are direct-path loading IOTs, then make sure you take the time to to also collect statistics on that table before you start querying it in earnest.

It’s back!

Yes indeed! Now that the dates and times are available for OpenWorld 2019, then it is naturally time for the best data searching, filtering and analysis tool on the planet to step up to the plate, enter the fray and …. hmmm… I’ve run out of metaphors 🙂

But of course, if you have data to view, then it is Application Express that offers so much more flexibility than the standard session catalog.

So make your way to https://apex.oracle.com/pls/apex/f?p=OOW19 and start planning out your OpenWorld right now!

See you in San Francisco!

 

HIGH_VALUE (and other LONG columns) revisited

Just a quick post to ensure people don’t get caught out by a recent “trick” I saw on an AskTOM question for extracting the HIGH_VALUE out of it’s LONG storage in the dictionary to a more appropriate datatype. A reviewer (I’m sure in good faith) posted the following mechanism to extract the HIGH_VALUE


SQL> create table t1 ( x timestamp )
  2  partition by range ( x )
  3  ( partition p1 values less than ( timestamp '2019-01-01 00:00:00' ),
  4    partition p2 values less than ( timestamp '2019-02-01 00:00:00' )
  5  );

Table created.

SQL>
SQL> set serveroutput on
SQL> declare
  2    v long;
  3    d date;
  4  begin
  5    for i in ( select high_value from user_tab_partitions where table_name = 'T1' )
  6    loop
  7      execute immediate 'select cast(:hv as date) from dual' into d using  i.high_value;
  8      dbms_output.put_line(d);
  9    end loop;
 10  end;
 11  /
01-JAN-19
01-FEB-19

PL/SQL procedure successfully completed.

You can see that we are casting the HIGH_VALUE as a date, and voila, out pops the values in a nice DATE datatype. That all seems well and good, but it only works for TIMESTAMP partitions and not the general case. Let’s replace our table with a DATE based partitioning scheme


SQL> drop table t1 purge;

Table dropped.

SQL> create table t1 ( x date )
  2  partition by range ( x )
  3  ( partition p1 values less than ( date '2019-01-01' ),
  4    partition p2 values less than ( date '2019-02-01' )
  5  );

Table created.

SQL>
SQL> set serveroutput on
SQL> declare
  2    d date;
  3  begin
  4    for i in ( select high_value from user_tab_partitions where table_name = 'T1' )
  5    loop
  6      execute immediate 'select cast(:hv as date) from dual' into d using i.high_value;
  7      dbms_output.put_line(d);
  8    end loop;
  9  end;
 10  /
declare
*
ERROR at line 1:
ORA-01858: a non-numeric character was found where a numeric was expected
ORA-06512: at line 6
ORA-06512: at line 6

There are plenty of methods to overcome the limitations of LONG. Here’s the one I typically use – just being a small variant on the code above


SQL> set serveroutput on
SQL> declare
  2    d date;
  3  begin
  4    for i in ( select high_value from user_tab_partitions where table_name = 'T1' )
  5    loop
  6      execute immediate 'select '||i.high_value||' from dual' into d;
  7      dbms_output.put_line(d);
  8    end loop;
  9  end;
 10  /
01-JAN-19
01-FEB-19

PL/SQL procedure successfully completed.

which I’ve also covered in video form here.

But just be careful using the CAST trick. It might cause you grief depending on the partitioning key you are using.

And if you’re wondering why the image associated with this post is that of a horse. It’s the old joke: “A horse walks into a bar, and the barman says: Why the LONG face?” 🙂