A better way of calculating Database space usage

Posted by

It seems easy enough. An email pops through from your manager and he’s concerned about the space utilization in the database. Then comes the question:

“So what exactly is using up most of the space?”

For the sake of simplicity, lets assume that you already know that most of the space comes from a single database schema, so you connect to that schema and throw a query against USER_SEGMENTS. And that is when the frustration starts..

USER_SEGMENTS is about segments

That statement might seem right up there in a the list of all-time most obvious statements, but it is important to understand that strictly speaking, it is not tables and indexes and the like that consume database space. They can be thought of as a logical construct. The only thing that consumes database space is a segment, and even that really is just a logical grouping of extents. The Database Concepts guide contains an thorough explanation of these terms, but in a nutshell, a database table could consist of no segment (ie, it does not yet contain any data), a single segment, or even multiple segments (for partitioned table).  The same goes for indexes. Thus if you query USER_TABLES or USER_INDEXES to find the largest space consumer you are not getting a true indicator of the space usage on the database.  You can only get that by looking at segments. But when you query USER_SEGMENTS, you might end up with a result like this:

SQL> select segment_name, bytes
  2  from user_segments
  3  order by bytes desc
  4  fetch first 1 row only;

SEGMENT_NAME                        BYTES
------------------------------ ----------
SYS_LOB0000089108C00006$       268435456

It doesn’t take a lot of imagination to see how that will play out with your manager:

  • You: “Yes, I’ve taken a look at the database as you requested”
  • Manager: “So what is the the biggest consumer? Is it our SALES table? Our CUSTOMERS table?”
  • You: “It is SYS_LOB0000089108C00006$$”
  • Manager: “The….what?!?!?!”

Suitably chastened you head back to the database. Since this looks like a LOB segment, you dive into USER_LOBS to see what the table is for that particular LOB segment.

SQL> select table_name
  2  from   user_lobs
  3  where  segment_name = 'SYS_LOB0000089108C00006

Now you can go back to the manager.

  • You: “Whoops…sorry about that. My mistake. That is a LOB segment – it is part of one our tables”
  • Manager: “No problems. So the biggest table is?”
  • You: “It is DR$TEXT_COL_IDX$X”
  • Manager: “The….what?!?!?!”

Segment Hierarchy

Whilst the database is quite happy to manage and report segments in this way, as humans (and particular managers Smile) we have an expectation that everything that consume space in the database, ultimately must roll up to the tables that we hold our true data in. The indexes, the LOBs, the LOB indexes, the tables automatically generated to support things like Text indexes, all really just contribute to our understanding of the space a table consumes.

With that in mind, I’ve put together a script which hopefully will assist. It will attribute space to a table in the database based on:

  • the data for the table itself,
  • any partitions for that table,
  • any indexes for that table,
  • any index partitions for that table,
  • any LOB segments,
  • any secondary tables,
  • any indexes on those secondary tables,
  • any LOB segments on those secondary tables

For example, the AskTOM schema contains tables, partitioned tables, text indexes, LOBs and the like. A straight query from USER_SEGMENTS looks like this:

SQL> select segment_name, bytes
  2  from user_segments
  3  order by bytes desc;

SEGMENT_NAME                        BYTES
------------------------------ ----------
ATE_QUESTION_REVIEWS            268435456
SYS_LOB0000089108C00006$       151191552
DR$ATE_SEARCH_IDX$I             134217728
SYS_LOB0000089079C00013$        67305472
BIN$VuDAwtUUQHSNPYBK64buUQ==$0   54525952
SYS_LOB0000089079C00014$        50528256
SYS_LOB0000089108C00014$        50528256
DR$ATE_SEARCH_IDX$X              30408704

but with my space distribution script, we get a much better idea of where the true space consumption lies

SEG                            SZ
------------------------------ ------------
ATE_QUESTION_REVIEWS               474M
$RECYCLE_BIN$                       53M
ATE_ERROR_CAPTURE                   40M
ATE_QUESTION_VIEWS                  17M
ATE_PRESENTATIONS                    6M
ATE_SUPPORTING_FILES                 4M
SOLVER                               2M
ATE_TRIBUTE                          2M
ATE_ADMINS                           2M

The script is below and you can also get from my github miscellaneous scrips repos. Hope you find it useful

indexes_with_secondary as
( --
-- normal indexes
select index_name, table_name
from user_indexes
where table_name not in ( select secondary_object_name from user_secondary_objects)
union all
-- secondary tables
select uso.secondary_object_name, ui.table_name
from user_secondary_objects uso,
user_tables ut,
user_indexes ui
where ut.table_name = uso.secondary_object_name
and uso.index_name = ui.index_name
union all
-- indexes on secondary tables
select ui.index_name, ui_parent.table_name
from user_indexes ui,
user_secondary_objects uso,
user_tables ut,
user_indexes ui_parent
where ut.table_name = uso.secondary_object_name
and ut.table_name = ui.table_name
and uso.index_name = ui_parent.index_name
lobs_with_secondary as
( --
-- normal lobs
select segment_name, table_name
from user_lobs
where table_name not in ( select secondary_object_name from user_secondary_objects)
union all
-- secondary tables
select ul.segment_name, ut.table_name
from user_lobs ul,
user_secondary_objects uso,
user_tables ut,
user_indexes ui
where ul.table_name = uso.secondary_object_name
and uso.index_name = ui.index_name
and ui.table_name = ut.table_name
seg_space as
case when s.segment_name like 'BIN$%' then '$RECYCLE_BIN$' else s.segment_name end) seg, 
sum(s.bytes) byt
from user_segments s,
indexes_with_secondary i,
lobs_with_secondary l
where s.segment_name = i.index_name(+)
and s.segment_name = l.segment_name(+) 
group by coalesce(i.table_name,l.table_name, 
case when s.segment_name like 'BIN$%' then '$RECYCLE_BIN$' else s.segment_name end)
select seg, 
when byt > 1024*1024*1024 then round(byt/1024/1024/1024)||'G'
when byt > 1024*1024 then round(byt/1024/1024)||'M'
else round(byt/1024)||'K'
end,8) sz
from seg_space s
order by byt desc;

Got some thoughts? Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.