Something I’ve been seeing lately is a strategy of not collecting statistics on global temporary tables.

The philosophy makes sense since typically global temporary tables (GTTs) are populated with an unknown or arbitrary amount of data, and thus rather than having a fixed set of statistics on them, you just rely on dynamic sampling to take a look at the data in the table whilst your statements are being optimised.

Here’s a quick example of that. We load in 50 rows, and via dynamic sampling, the optimizer comes up with a suitable estimate for the size of the data.


SQL> create global temporary table gtt
  2  ( x int, y int )
  3  on commit preserve rows;

Table created.

SQL>
SQL> insert into gtt
  2  select rownum ,rownum
  3  from dual
  4  connect by level <= 50;

50 rows created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from gtt;

Execution Plan
----------------------------------------------------------
Plan hash value: 917624683

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    50 |  1300 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| GTT  |    50 |  1300 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SQL> set autotrace off

For lots of use cases, that is going to be perfectly fine. It is pretty common for GTTs to be a “one-and-done” affair. They are loaded, consumed and discarded as part of a single task.

However, you do need to take a little more care if your GTT is going to be repeatedly changed or reloaded etc during its lifetime in a database session. For example, if you load the table with 100 rows, run some queries, and then load another 100 rows in that table, then dynamic sampling might not be the best approach here.

Unfortunately, if developers are unaware of the important distinction between the plan you get via an EXPLAIN PLAN command and the true runtime execution plan for a statement, it is easy to be lulled into a false sense of security with dynamic sampling.

In the following example, I’ll

  • load 50 rows,
  • check my statement plan,
  • load some more rows,
  • check my statement plan

SQL> create global temporary table gtt
  2  ( x int, y int )
  3  on commit preserve rows;

Table created.

SQL>
SQL> insert into gtt
  2  select rownum ,rownum
  3  from dual
  4  connect by level <= 50;

50 rows created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from gtt;

Execution Plan
----------------------------------------------------------
Plan hash value: 917624683

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    50 |  1300 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| GTT  |    50 |  1300 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SQL> set autotrace off
SQL>
SQL> insert into gtt
  2  select rownum ,rownum
  3  from dual
  4  connect by level <= 50000;

50000 rows created.

SQL>
SQL> set autotrace traceonly explain
SQL> select * from gtt;

Execution Plan
----------------------------------------------------------
Plan hash value: 917624683

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 51885 |  1317K|    31   (4)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| GTT  | 51885 |  1317K|    31   (4)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SQL> set autotrace off

That all looks great right? Dynamic sampling came long and worked out some suitable numbers in each case.

Not…so…fast 🙂

When you use SETAUTOTRACE you are in effect running an EXPLAIN PLAN command for each statement, which means a fresh parse and hence a fresh set of dynamically sampled statistics. That is why the data looks good above.

But when our poor developer, instilled with confidence from the above results pushes their code into Production, they may be in for a shock when we look at the true execution plans


SQL> create global temporary table gtt
  2  ( x int, y int )
  3  on commit preserve rows;

Table created.

SQL>
SQL> insert into gtt
  2  select rownum ,rownum
  3  from dual
  4  connect by level <= 50;

50 rows created.

SQL>
SQL> select max(x) from gtt;

    MAX(X)
----------
        50

1 row selected.

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
SQL_ID  64542q4x4hxdd, child number 0
-------------------------------------
select max(x) from gtt

Plan hash value: 3344941513

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    13 |            |          |
|   2 |   TABLE ACCESS FULL| GTT  |    50 |   650 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


18 rows selected.

SQL>
SQL> insert into gtt
  2  select rownum ,rownum
  3  from dual
  4  connect by level <= 50000;

50000 rows created.

SQL>
SQL> select max(x) from gtt;

    MAX(X)
----------
     50000

1 row selected.

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
SQL_ID  64542q4x4hxdd, child number 0
-------------------------------------
select max(x) from gtt

Plan hash value: 3344941513

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    13 |            |          |
|   2 |   TABLE ACCESS FULL| GTT  |    50 |   650 |     2   (0)| 00:00:01 |   --- UH OH !
---------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


18 rows selected.

Notice on the second execution, because we already have a plan in our library cache, we simply re-used that and our size estimates for the GTT were wildly off. That could be catastrophic in terms of runtime performance when the query is not as simple a full scan of a single table.

This is why I’m starting to lean toward a call DBMS_STATS when I’m using GTTs, even when the GTT is being used just once in a session (because you never know when the code will change to have that GTT used multiple times).

I’ll repeat the above, but this time with an appropriate call to gather some fresh statistics.


SQL> create global temporary table gtt
  2  ( x int, y int )
  3  on commit preserve rows;

Table created.

SQL>
SQL> insert into gtt
  2  select rownum ,rownum
  3  from dual
  4  connect by level <= 50;

50 rows created.

SQL> exec dbms_stats.gather_table_stats('','GTT');

PL/SQL procedure successfully completed.

SQL>
SQL> select max(x) from gtt;

    MAX(X)
----------
        50

1 row selected.

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID  64542q4x4hxdd, child number 0
-------------------------------------
select max(x) from gtt

Plan hash value: 3344941513

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|   2 |   TABLE ACCESS FULL| GTT  |    50 |   150 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Note
-----
   - Global temporary table session private statistics used


18 rows selected.

SQL>
SQL> insert into gtt
  2  select rownum ,rownum
  3  from dual
  4  connect by level <= 50000;

50000 rows created.

SQL> exec dbms_stats.gather_table_stats('','GTT');

PL/SQL procedure successfully completed.

SQL>
SQL> select max(x) from gtt;

    MAX(X)
----------
     50000

1 row selected.

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID  64542q4x4hxdd, child number 0
-------------------------------------
select max(x) from gtt

Plan hash value: 3344941513

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |    31 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     5 |            |          |
|   2 |   TABLE ACCESS FULL| GTT  | 50050 |   244K|    31   (4)| 00:00:01 |
---------------------------------------------------------------------------

Note
-----
   - Global temporary table session private statistics used


18 rows selected.

SQL>

Also notice the commentary about the statistics being private. Once of the cool things about statistics on GTTs since version 12c is that the statistics you gather in this session will not have an impact on the same GTT being used in other sessions. Below is a brand new session, with a different number of rows. Notice we did not use statistics from the first session (we used dynamic sampling because we don’t have any statistics in this session yet), and once we gather statistics, we then use the statistics collected in this session.


SQL> insert into gtt
  2  select rownum ,rownum
  3  from dual
  4  connect by level <= 80;

80 rows created.

SQL>
SQL> select max(x) from gtt;

    MAX(X)
----------
        80

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID  64542q4x4hxdd, child number 1
-------------------------------------
select max(x) from gtt

Plan hash value: 3344941513

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    13 |            |          |
|   2 |   TABLE ACCESS FULL| GTT  |    80 |  1040 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


18 rows selected.

SQL>
SQL> insert into gtt
  2  select rownum ,rownum
  3  from dual
  4  connect by level <= 400;

400 rows created.

SQL> exec dbms_stats.gather_table_stats('','GTT');

PL/SQL procedure successfully completed.

SQL>
SQL> select max(x) from gtt;

    MAX(X)
----------
       400

SQL> select * from dbms_xplan.display_cursor();

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID  64542q4x4hxdd, child number 1
-------------------------------------
select max(x) from gtt

Plan hash value: 3344941513

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     2 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|   2 |   TABLE ACCESS FULL| GTT  |   480 |  1920 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Note
-----
   - Global temporary table session private statistics used


18 rows selected.

So I think I’ve changed my mind on GTTs and dynamic sampling. I think I prefer collecting statistics, even if it might be a little bit of overkill from time to time.

Let me know your preferences on GTTs.

6 responses to “Statistics on Global Temporary Tables”

  1. Hi Connor,

    I was just thinking that collecting table statistics after each DML operation might not be the most practical approach. While the example shows that estimated row counts were adjusted, my understanding is that the primary aim is to help the optimizer choose a better plan based on accurate statistics. In a real-world scenario, if there’s a significant data change after the initial execution where dynamic sampling kicks in, I would have expected subsequent executions to be re-optimized with Statistics Feedback (formerly Cardinality Feedback, controlled by _optimizer_use_feedback). That might be a more sustainable approach than gathering statistics after every DML – unless, of course, there’s a known limitation with global temporary tables that I’m missing. I’d really appreciate any clarification or insight on this.

    1. All the tools available at your disposal are not mutually exclusive, so me saying gathering stats is a good idea is not excluding the potential benefits of statistics feedback.
      For example, if my app is manipulating a GTT dozens, hundreds or thousands of times in a session, then I might (a) just rely on stats feedback/dyn sampling, or (b) gather stats “occasionally”, ie, when I know (from my knowledge of the app) which manipulations have the greatest change in the data.

      Relying 100% on stats feedback might work for many, but I’d be reluctant to be completely reliant on it.

      1. Thanks, Connor, for the clarification.

      2. Also, looks like when you gather the private statistics on the GTT in your example, the previous child cursor #1 gets invalidated and on the next parse a new child cursor is created (that in this case reuses the child slot #1). Since this cursor is now a brand new one, cardinality feedback doesn’t have a chance to kick in yet.

  2. I remember some 9i version, Oracle did not do dynamic sampling for indexes on GTT and it used some defaults instead, like 9 leaf blocks…

    Has it changed since then?

    1. No longer the case (not sure it ever was to be honest)

      You can run this and see the dyn sampling at work

      create global temporary table gtt (c1 int not null, c2 int)
      on commit preserve rows;
      create index gtt_ix on gtt ( c1 );
      insert into gtt
      select rownum, rownum
      from dual
      connect by level 0;
      select * from gtt where c1 > 250000;
      set autotrace off

      drop index gtt_ix;

      set autotrace traceonly explain
      select c1 x from gtt;
      select c1 from gtt where c1 > 1;
      select * from gtt where c1 > 249000;
      set autotrace off

Got some thoughts? Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Trending