No more stale statistics in 19c

Posted by

There is an odd contradiction that we all encounter for most databases, especially if they are predominantly used during the business day. Here is how that contradiction comes to be – it is in the way that we obtain and use optimizer  statistics on those databases. The contradiction runs like this:

  • To minimize service disruption, we gather statistics at a quiet time, for example, in the middle of the night
  • We then use those statistics during the business day whilst user activity is at its highest.
  • Highest user activity will typically mean the highest frequency of data changes.
  • Hence the statistics are at their peak accuracy when no-one is using them to optimize queries, and they are at their least accurate when everyone is using them to optimize queries!

We can demonstrate this easily with the following script run in 18c.


SQL> select banner from v$version where rownum = 1 ;

BANNER
--------------------------------------------------------------------------------
Oracle Database 18c Enterprise Edition Release 18.0.0.0.0 - Production

1 row selected.

SQL> create table t (
  2    id number(10),
  3    num number(10),
  4    constraint t_pk primary key ( id )
  5  );

Table created.

SQL> insert into t select rownum,rownum from dual connect by rownum<=10000;

10000 rows created.

SQL> exec dbms_stats.gather_table_stats(user,'t')

PL/SQL procedure successfully completed.

SQL> select stale_stats from user_tab_statistics
  2  where  table_name = 'T';

STALE_S
-------
NO

So far I’ve mimicked a table that has been populated with (say) a days worth of data (10,000) rows, and I’ve gathered statistics at the end of the business day so that my statistics reflect the current data in the table.



SQL> select table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

TABLE_NAME NUM_ROWS     BLOCKS
---------- -------- ----------
T             10000         20

1 row selected.

SQL> select table_name, column_name, low_value, high_value, num_distinct
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE      NUM_DISTINCT
---------- -------------------- --------------- --------------- ------------
T          ID                   C102            C302                   10000
T          NUM                  C102            C302                   10000

2 rows selected.

Hence, if I was to optimize and run a query now, I can expect an excellent estimate from the optimizer. The query below returns a COUNT of 1000 rows, and we can see from the PLAN_TABLE output, that the estimated rows was also 1000. So a perfect estimate by the optimizer!



SQL> select max(num),count(*)
  2  from t where id > 9000;

  MAX(NUM)   COUNT(*)
---------- ----------
     10000       1000

1 row selected.

SQL> select *
  2  from table(dbms_xplan.display_cursor(format=>'typical'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------

---------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |       |       |
|   1 |  SORT AGGREGATE                      |      |     1 |     8 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |  1000 |  8000 |
|*  3 |    INDEX RANGE SCAN                  | T_PK |  1000 |       |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID">9000)


20 rows selected.

But of course, the next business day rolls on, and by mid-morning we may have added 1000 more rows to our table, bringing the total number of rows to 11000.


SQL> insert into t select rownum+10000,rownum+10000 from dual connect by rownum<=1000;

1000 rows created.

SQL> commit;

Commit complete.

Even with an explicit flush of the database monitoring information, we not yet reached the threshold where this tables statistics would be considered stale, so even a GATHER_STALE operation would have no effect if we ran one. The statistics still reflect the data from last night.


SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

SQL> select stale_stats from user_tab_statistics
  2  where  table_name = 'T';

STALE_S
-------
NO

SQL> select table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

TABLE_NAME NUM_ROWS     BLOCKS
---------- -------- ----------
T             10000         20

1 row selected.

SQL> select table_name, column_name, low_value, high_value
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE
---------- -------------------- --------------- ---------------
T          ID                   C102            C302
T          NUM                  C102            C302

2 rows selected.

When I run my same query, and don’t forget, this would now be during the business day, when it is critical that I get a good optimizer plan, you can see that the result is now 2000 rows, but the plan estimate is left languishing at its original 1000 rows because the statistics no long reflect the current state of the table.


SQL> select max(num),count(*)
  2  from t where id > 9000;

  MAX(NUM)   COUNT(*)
---------- ----------
     11000       2000

1 row selected.

SQL> select *
  2  from table(dbms_xplan.display_cursor(format=>'typical'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------
SQL_ID  8juuu5warw2z6, child number 0
-------------------------------------
select max(num),count(*) from t where id > 9000

Plan hash value: 2053823973

---------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |       |       |
|   1 |  SORT AGGREGATE                      |      |     1 |     8 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |  1000 |  8000 |
|*  3 |    INDEX RANGE SCAN                  | T_PK |  1000 |       |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ID">9000)


20 rows selected.

Having great statistics that are not used, and poor statistics that are used all the time seems a nonsensical way to do things, but there really hasn’t been any alternative unless you were prepared to consume precious server resources to collect statistics whilst your users are performing their transactions. I have known customers to do this in the past, but of course, it takes careful monitoring and management to ensure that the slicing up of the resource “pie” is done appropriately to keep the right balance between business needs and background database tasks.

19c brings a nifty solution to this contradiction by having the database automatically maintain details about how data is changing in a table via a feature called Real Time Statistics. DML activities can be tracked, and that information can then be looped back as input into future query parsing. Here’s the same example from above, this time performed in 19c.

The same table is seeded with the same initial 10000 rows as before. I have included a new column in my data dictionary queries called NOTES. You will see the significance of that shortly.



SQL> select banner from v$version where rownum = 1;

BANNER
--------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

1 row selected.

SQL>
SQL> create table t (
  2    id number(10),
  3    num number(10),
  4    constraint t_pk primary key ( id )
  5  );

Table created.

SQL> insert into t select rownum,rownum from dual connect by rownum<=10000;

10000 rows created.

SQL> exec dbms_stats.gather_table_stats(user,'t')

PL/SQL procedure successfully completed.

SQL> select stale_stats from user_tab_statistics
  2  where  table_name = 'T';

STALE_S
-------
NO

1 row selected.

SQL> select notes, table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME NUM_ROWS     BLOCKS
-------------------------------- ---------- -------- ----------
                                 T             10000         20

1 row selected.

SQL> select notes, table_name, column_name, low_value, high_value, num_distinct
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE      NUM_DISTINCT
-------------------------------- ---------- -------------------- --------------- --------------- ------------
                                 T          ID                   C102            C302                   10000
                                 T          NUM                  C102            C302                   10000

And since this is a simple query, the optimizer performs exactly as per 18c for its estimate. Once again, at this point, where the data and statistics are in alignment, the estimate is perfect.


                                 
SQL> select max(num),count(*)
  2  from t where id > 9000;

  MAX(NUM)   COUNT(*)
---------- ----------
     10000       1000

1 row selected.                                 


SQL> select *
  2  from table(dbms_xplan.display_cursor(format=>'typical'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |      |       |       |     5 (100)|
|   1 |  SORT AGGREGATE                      |      |     1 |     8 |            |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T    |  1000 |  8000 |     5   (0)|
|*  3 |    INDEX RANGE SCAN                  | T_PK |  1000 |       |     3   (0)|
----------------------------------------------------------------------------------

The business day commences, and a fresh 1000 rows are added to the database.



SQL> insert into t select rownum+10000,rownum+10000 from dual connect by rownum<=1000;

1000 rows created.

SQL> commit;

Commit complete.

And just like before, this is not enough for the statistics on the table to be considered stale by the database



SQL> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

SQL>
SQL> select stale_stats from user_tab_statistics
  2  where  table_name = 'T';

STALE_S
-------
NO

But, here is where the 19c enhancements come into play. When we query the data dictionary, we now have TWO rows of statistics for this single table.



SQL> select notes, table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME NUM_ROWS     BLOCKS
-------------------------------- ---------- -------- ----------
                                 T             10000         20
STATS_ON_CONVENTIONAL_DML        T             11000         20

2 rows selected.

The first row is the original statistics we gathered. The second row is new statistics we have derived from the normal DML activities that have occurred on the database. This is not a full GATHER operation, it is simply taking advantage of simple mathematics, eg if I have 10000 rows and then insert 50 and deleted 20, then I would have 10030 rows etc. The real time statistics can also keep track of simple column level statistics, like whether the low and high water mark values have changed.



SQL> select notes, table_name, column_name, low_value, high_value
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE
-------------------------------- ---------- -------------------- --------------- ----------
                                 T          ID                   C102            C302
                                 T          NUM                  C102            C302
STATS_ON_CONVENTIONAL_DML        T          ID                   C102            C302094A
STATS_ON_CONVENTIONAL_DML        T          NUM                  C102            C302094A

The real test is whether these statistics can be beneficial to our sample query. I’ll run that through again.



SQL> select max(num),count(*)
  2  from t where id > 9000;

  MAX(NUM)   COUNT(*)
---------- ----------
     11000       2000

1 row selected.

SQL> select *
  2  from table(dbms_xplan.display_cursor(format=>'typical'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     7 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |  1895 | 15160 |     7   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("ID">9000)

Note
-----
   - dynamic statistics used: statistics for conventional DML

Notice the estimate of 1895 is much closer now to the true value of 2000. You can expect a little variation here from the true value, because as I mentioned before, real time statistics is not about performing a full gather operation on the table – that would be far too resource hungry for every single DML. So some elements of the statistics (for example, the number of distinct values, or histogram distribution) are not feasible to keep up to date every time a DML is performed. But even having the improved row counts and column extrema have yielded benefit on the execution plan. When the normal nightly gather job comes along, these DML statistics are no longer relevant and will be expunged


SQL> exec dbms_stats.gather_table_stats(user,'t')

PL/SQL procedure successfully completed.

SQL> select notes, table_name, num_rows, blocks
  2  from   user_tab_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME NUM_ROWS     BLOCKS
-------------------------------- ---------- -------- ----------
                                 T             11000         20

1 row selected.

SQL> select notes, table_name, column_name, low_value, high_value
  2  from   user_tab_col_statistics
  3  where  table_name = 'T';

NOTES                            TABLE_NAME COLUMN_NAME          LOW_VALUE       HIGH_VALUE
-------------------------------- ---------- -------------------- --------------- ----------
                                 T          ID                   C102            C3020B
                                 T          NUM                  C102            C3020B

2 rows selected.

There are more real time demos of 19c statistics improvements in the video below

There are other enhancements in 19c to keep the statistics in the dictionary closer to the true data in database tables which you can read about here.

Full disclosure: Some 19c new features are available on certain Oracle Database platforms only. Check the Oracle Database Licensing Guide for more information

13 comments

  1. I think stale statistics are the great myth of most level 1 DBA support services – they get a call about poor peformance and their first thought is “are there stale statistics” rather than “are stale statistics relevant to the problem”

    1. True, but a very common problem is queries above the low/high recorded values for columns, eg

      where sales_date > sysdate

      and of course, ‘sales_date’ continuously moves upwards as data comes in. Real time statistics could be a real winner here.

  2. I didn’t like that this feature is restricted to Exadata and Cloud services 😦
    neither MemOptimized RowStore.. the most interesting features in 19c are very restricted.

  3. You could gather stats in triggers (not using DBMS_STATS though) and effectively deal with stale stats. The solution comes with some performance penalty.

    Here the papers that explain why it is done and how it is done:

    Click to access 2013_nyoug_presentation.pdf

    http://iiotzov.files.wordpress.com/2013/09/2013_nyoug_whitepaper.docx

    And here is some code:
    http://iiotzov.files.wordpress.com/2012/03/just_stats_spec4.doc
    https://iiotzov.files.wordpress.com/2014/08/just_stats_body_patch_aug2014.docx
    https://iiotzov.files.wordpress.com/2012/03/readme.doc
    https://iiotzov.files.wordpress.com/2012/03/gathering_stats_in_trigger_example.doc

    Disclaimer:
    —————————————————————–
    I do not offer support for that package

    I do not accept any legal liability or responsibility for the content or the accuracy of the information so provided, or, for any loss or damage caused arising directly or indirectly in connection with reliance on the use of such information.

    Use at your own risk!
    —————————————————————–

  4. Cool.
    But I was wondering….

    When and how do these DML statistics get recorded into the data dictionary?

    At running the DML?
    It would feel like that means that statistics are recorded for data that is not (yet, and may never be) visible to other sessions.
    So, that would mean queries from other sessions may have different optimizer plans due to data that does not (and may never) exist as far as that session knows.

    At commit time?
    That would mean a session can’t benefit from the statistics in queries it fires before it ends the transaction.

    The only thing I can think of to get around both disadvantages would be some sort of user-level dml-statistics which get ‘promoted’ to globally visible dml-statistics at commit (or thrown away on rollback).

    I have no doubt this point has been well discussed and dealt with by Oracle, but I’m just trying to understand the end result.

  5. >When the normal nightly gather job comes along, these DML statistics are no longer relevant and will be expunged

    Does this mean that nightly job will explicitly search for presence of real time statistics to recollect them as if they were stale?

  6. An explicit re-gather clears them out as shown above. I havent checked yet but my assumption will be that the nightly job would look for stale stats as per normal. If they were not stale, the real time stats would still be in use until the table became stale.

  7. I am trying this on Exadata with 19c (19.17) But I do not see the ” STATS_ON_CONVENTIONAL_DML” line in the output. Is there any setting I need to check for in the Exadata?

Leave a Reply to Dom Brooks Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.