Something I’ve been seeing lately is a strategy of not collecting statistics on global temporary tables.
The philosophy makes sense since typically global temporary tables (GTTs) are populated with an unknown or arbitrary amount of data, and thus rather than having a fixed set of statistics on them, you just rely on dynamic sampling to take a look at the data in the table whilst your statements are being optimised.
Here’s a quick example of that. We load in 50 rows, and via dynamic sampling, the optimizer comes up with a suitable estimate for the size of the data.
SQL> create global temporary table gtt
2 ( x int, y int )
3 on commit preserve rows;
Table created.
SQL>
SQL> insert into gtt
2 select rownum ,rownum
3 from dual
4 connect by level <= 50;
50 rows created.
SQL>
SQL> set autotrace traceonly explain
SQL> select * from gtt;
Execution Plan
----------------------------------------------------------
Plan hash value: 917624683
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 1300 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| GTT | 50 | 1300 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
SQL> set autotrace off
For lots of use cases, that is going to be perfectly fine. It is pretty common for GTTs to be a “one-and-done” affair. They are loaded, consumed and discarded as part of a single task.
However, you do need to take a little more care if your GTT is going to be repeatedly changed or reloaded etc during its lifetime in a database session. For example, if you load the table with 100 rows, run some queries, and then load another 100 rows in that table, then dynamic sampling might not be the best approach here.
Unfortunately, if developers are unaware of the important distinction between the plan you get via an EXPLAIN PLAN command and the true runtime execution plan for a statement, it is easy to be lulled into a false sense of security with dynamic sampling.
In the following example, I’ll
- load 50 rows,
- check my statement plan,
- load some more rows,
- check my statement plan
SQL> create global temporary table gtt
2 ( x int, y int )
3 on commit preserve rows;
Table created.
SQL>
SQL> insert into gtt
2 select rownum ,rownum
3 from dual
4 connect by level <= 50;
50 rows created.
SQL>
SQL> set autotrace traceonly explain
SQL> select * from gtt;
Execution Plan
----------------------------------------------------------
Plan hash value: 917624683
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 1300 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS FULL| GTT | 50 | 1300 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
SQL> set autotrace off
SQL>
SQL> insert into gtt
2 select rownum ,rownum
3 from dual
4 connect by level <= 50000;
50000 rows created.
SQL>
SQL> set autotrace traceonly explain
SQL> select * from gtt;
Execution Plan
----------------------------------------------------------
Plan hash value: 917624683
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 51885 | 1317K| 31 (4)| 00:00:01 |
| 1 | TABLE ACCESS FULL| GTT | 51885 | 1317K| 31 (4)| 00:00:01 |
--------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
SQL> set autotrace off
That all looks great right? Dynamic sampling came long and worked out some suitable numbers in each case.
Not…so…fast 🙂
When you use SETAUTOTRACE you are in effect running an EXPLAIN PLAN command for each statement, which means a fresh parse and hence a fresh set of dynamically sampled statistics. That is why the data looks good above.
But when our poor developer, instilled with confidence from the above results pushes their code into Production, they may be in for a shock when we look at the true execution plans
SQL> create global temporary table gtt
2 ( x int, y int )
3 on commit preserve rows;
Table created.
SQL>
SQL> insert into gtt
2 select rownum ,rownum
3 from dual
4 connect by level <= 50;
50 rows created.
SQL>
SQL> select max(x) from gtt;
MAX(X)
----------
50
1 row selected.
SQL> select * from dbms_xplan.display_cursor();
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
SQL_ID 64542q4x4hxdd, child number 0
-------------------------------------
select max(x) from gtt
Plan hash value: 3344941513
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | TABLE ACCESS FULL| GTT | 50 | 650 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
18 rows selected.
SQL>
SQL> insert into gtt
2 select rownum ,rownum
3 from dual
4 connect by level <= 50000;
50000 rows created.
SQL>
SQL> select max(x) from gtt;
MAX(X)
----------
50000
1 row selected.
SQL> select * from dbms_xplan.display_cursor();
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
SQL_ID 64542q4x4hxdd, child number 0
-------------------------------------
select max(x) from gtt
Plan hash value: 3344941513
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | TABLE ACCESS FULL| GTT | 50 | 650 | 2 (0)| 00:00:01 | --- UH OH !
---------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
18 rows selected.
Notice on the second execution, because we already have a plan in our library cache, we simply re-used that and our size estimates for the GTT were wildly off. That could be catastrophic in terms of runtime performance when the query is not as simple a full scan of a single table.
This is why I’m starting to lean toward a call DBMS_STATS when I’m using GTTs, even when the GTT is being used just once in a session (because you never know when the code will change to have that GTT used multiple times).
I’ll repeat the above, but this time with an appropriate call to gather some fresh statistics.
SQL> create global temporary table gtt
2 ( x int, y int )
3 on commit preserve rows;
Table created.
SQL>
SQL> insert into gtt
2 select rownum ,rownum
3 from dual
4 connect by level <= 50;
50 rows created.
SQL> exec dbms_stats.gather_table_stats('','GTT');
PL/SQL procedure successfully completed.
SQL>
SQL> select max(x) from gtt;
MAX(X)
----------
50
1 row selected.
SQL> select * from dbms_xplan.display_cursor();
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID 64542q4x4hxdd, child number 0
-------------------------------------
select max(x) from gtt
Plan hash value: 3344941513
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
| 2 | TABLE ACCESS FULL| GTT | 50 | 150 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
Note
-----
- Global temporary table session private statistics used
18 rows selected.
SQL>
SQL> insert into gtt
2 select rownum ,rownum
3 from dual
4 connect by level <= 50000;
50000 rows created.
SQL> exec dbms_stats.gather_table_stats('','GTT');
PL/SQL procedure successfully completed.
SQL>
SQL> select max(x) from gtt;
MAX(X)
----------
50000
1 row selected.
SQL> select * from dbms_xplan.display_cursor();
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID 64542q4x4hxdd, child number 0
-------------------------------------
select max(x) from gtt
Plan hash value: 3344941513
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 31 (100)| |
| 1 | SORT AGGREGATE | | 1 | 5 | | |
| 2 | TABLE ACCESS FULL| GTT | 50050 | 244K| 31 (4)| 00:00:01 |
---------------------------------------------------------------------------
Note
-----
- Global temporary table session private statistics used
18 rows selected.
SQL>
Also notice the commentary about the statistics being private. Once of the cool things about statistics on GTTs since version 12c is that the statistics you gather in this session will not have an impact on the same GTT being used in other sessions. Below is a brand new session, with a different number of rows. Notice we did not use statistics from the first session (we used dynamic sampling because we don’t have any statistics in this session yet), and once we gather statistics, we then use the statistics collected in this session.
SQL> insert into gtt
2 select rownum ,rownum
3 from dual
4 connect by level <= 80;
80 rows created.
SQL>
SQL> select max(x) from gtt;
MAX(X)
----------
80
SQL> select * from dbms_xplan.display_cursor();
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID 64542q4x4hxdd, child number 1
-------------------------------------
select max(x) from gtt
Plan hash value: 3344941513
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | TABLE ACCESS FULL| GTT | 80 | 1040 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
18 rows selected.
SQL>
SQL> insert into gtt
2 select rownum ,rownum
3 from dual
4 connect by level <= 400;
400 rows created.
SQL> exec dbms_stats.gather_table_stats('','GTT');
PL/SQL procedure successfully completed.
SQL>
SQL> select max(x) from gtt;
MAX(X)
----------
400
SQL> select * from dbms_xplan.display_cursor();
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID 64542q4x4hxdd, child number 1
-------------------------------------
select max(x) from gtt
Plan hash value: 3344941513
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | TABLE ACCESS FULL| GTT | 480 | 1920 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------
Note
-----
- Global temporary table session private statistics used
18 rows selected.
So I think I’ve changed my mind on GTTs and dynamic sampling. I think I prefer collecting statistics, even if it might be a little bit of overkill from time to time.
Let me know your preferences on GTTs.




Leave a reply to Tanel Poder Cancel reply