The SMARTDBA of 2002
SMARTDBA of 2002 :
the winner of our annual “essay” question is Connor McDonald.
Connor gets a free trip to OracleWorld and will be recognized at both
Oracle and BMC Software events. Connor also becomes an automatic member
of the SMARTTEAM. Connor is an Australian
with 10 years experience with Oracle from version 7 to 9.2. He’s spent
the last 3 years working as an independent consultant at various
locations in the UK. He has presented at the UK User Groups meetings and
at UK annual conference this year, and has contributed articles to Oracle
Scene (UK Magazine).
Connor maintains http://connor-mcdonald.com,
and is a member of the OakTable network–
a collection of professionals around the world who specialize in Oracle
functionality, which includes members such as Steve Adams, Jonathan
Lewis (also a DBAzine contributor), Howard
Rogers, and Mogens Nørgaard.
Your manager wants you to find out, through SQL issued from a client,
which database segments are attracting the greatest amount of physical
I/O. Write him a report explaining the problems of doing this under Oracle8i
and how accurate you would expect your results to be. (You
must keep it under 1,500 words! See contest rules.)
word count= 1,498; he was not penalized for writing in English-proper!
The task appears straightforward,
but we must clarify some definitions before proceeding. Firstly, a “physical
I/O” must be properly defined. Within the Oracle database, the
distinction between a logical I/O (a buffer cache operation) and a physical
I/O (a miss or deliberate avoidance of the buffer cache) is recorded,
but Oracle’s perception of a physical I/O does not necessarily imply
a physical operation on the underlying disks.
Modern storage infrastructure
typically has a number of mechanisms for optimizing the characteristics
of disk I/O. An I/O request might be serviced without performing
physical I/O (for example, a hardware cache), or conversely, it might
be serviced by performing additional I/O (for example, read ahead).
further cloud the issue. Many databases are built upon file systems,
not the underlying raw volumes, and operating systems will typically
cache file system data to avoid physical I/O. Conversely, a single I/O
request from a database may result in multiple I/O requests in the underlying
OS layer, for example, when there is a block size mismatch between the
file system and database. And, for the disk infrastructure reasons mentioned
earlier – some, all, or even none of these requests may result in a
true physical I/O operation on the disks
Can we quantify the
disparity between Oracle’s physical I/O statistics and true physical
disk operations? Not from within the database, and possibly, not even
the OS layer. Whilst we can hypothesize whether true I/O is occurring
using average I/O times within Oracle (via V$FILESTAT), such an approach
is dubious at best, due to factors such as asynchronous I/O, and the
accuracy of timing data in V$FILESTAT has a poor reputation on many
Oracle versions. For this report, we will define an I/O operation to
be “physical” as per the database perception. The relevance
of an “Oracle physical I/O” to true physical I/O cannot be
accurately quantified without a significant amount of database-external
measurement, which is outside this report’s scope.
Secondly, a more precise
definition of “segment” is required. The distinction between
the various types of segments is important because it changes some of
the strategy used for monitoring I/O. Broadly speaking, segments can
be classed into four types: temporary, application, undo and system.
are those used on a transient basis, for example, in sorting and global
temporary table operations. Unfortunately, the evolution of Oracle tablespaces
used for temporary data has led to several variants of temporary segments.
These variants change the method and accuracy via which we can measure
their I/O. Temporary segments also raise the philosophical debate of
where to attribute their I/O. For example, if 4 tables are joined and
the sorted result set is large, then the I/O against the 4 source tables
may be small, but the sorting I/O huge. Are the 4 tables “responsible”
for this large I/O or not? In previous versions of Oracle, it was a
reasonable assumption that all temporary segment I/O was caused by an
underlying operation on a permanent segment, but with the advent of
global temporary tables this assumption is no longer fully valid.
are the actual segments defined for our applications, namely, tables,
Undo segments are
the segments used for transaction processing and read consistency in
the database, and can also contribute large amounts of I/O. Undo segments
raise a similar debate as for temporary segments. Does the undo generated
for (say) a large delete operation count as I/O against the table being
modified or not?
Finally we have system
segments, the segments that form the database catalog. This of course
does not include application segments that happen to reside in
the SYSTEM tablespace (a behavior some databases are prone to exhibit).
Whilst the kernel’s use of system segments is rarely a cause of significant
physical I/O load (courtesy of the dictionary cache), other processes,
such as over exuberant monitoring software can be a source of excessive
I/O. On this premise, system segments can be viewed as application segments
for the purposes of I/O monitoring.
For the remainder
of this report, we will note where necessary any special treatment that
temporary and/or undo segments mandate, but the emphasis will be on
application segment I/O whether it occurs directly or indirectly via
temporary or undo segments.
I/O on segments is
of course driven by SQL against those segments. However, Oracle does
not expose any statistics directly between segments and physical I/O
(if it did, this report would be a lot shorter!). Definitive physical
I/O statistics within Oracle are exposed only at the file level and
this represents the challenge to deriving accurate results. A single
file will typically contains many segments, and conversely, a single
segment may span many files.
Additional I/O information
can be obtained at the statement (SQL) level but clearly a single statement
can affect multiple database segments, and unlike the file I/O statistics,
this information is transient, being available only whilst it remains
in the library cache.
Hence capturing “all”
of the statement level I/O data means a polling solution. Thus a more
suitable avenue of investigation is to start with the file information,
and having narrowed our scope to the busiest files, then attempt to
obtain some convergence to the statements that must have caused that
I/O. This should yield a list of segments that are contributing the
greatest I/O load. Such a list may be sufficient for our reporting requirement,
but if not, we can poll the database more aggressively for information
(for example, via V$SESSION_WAIT or V$BH). Even if we must resort to
this method, then by starting with the file level statistics, it will
be far less intrusive when only considering a small list of candidate
segments as opposed to all segments in the database.
For permanent segments,
file read and write counts are measured as PHYRDS and PHYWRTS respectively
in the V$FILESTAT view (joining to V$DATAFILE on FILE# if file names
is required). As with all of the performance views, the data recorded
is from instance startup, so we can observe just the absolutes, or take
regular snapshots and report the deltas, dependent on the requirement.
For genuine temporary tablespaces the same information is recorded in
V$TEMPSTAT (with file name in V$TEMPFILE).
There is more than
just the quantity of I/O that we can glean from the file statistics.
Obviously we can distinguish between read and write activity, but we
can also get an indication of the type of I/O. If (on the busy files)
the ratio of blocks read to physical reads is close to the multi block
read count, then we can infer that table scanning is probably dominant,
whereas if the ratio is closer to 1:1, single block I/O is probably
the significant activity.
a file or some files of interest, we need to identify those segments
(or portions thereof) that are contained within those files. For permanent
segments, DBA_EXTENTS shows the file (FILE_ID) for each segment extent
in the database. If these are application segments, then we can focus
further attention upon these.
If the segments identified
are undo and/or temporary segments, it is probably more important to
identify the application segments that catalyzed the undo or temporary
segment I/O. We will treat these as special cases when we move onto
statement processing below.
We will now have identified
a set of candidate segments, some or all of which are responsible for
the majority of physical I/O. Unfortunately, it is also possible that
these segments also have extents spread in multiple datafiles, which
may distort our conclusions.
To prune our segment
list further, we examine the transient data in V$SQL for large DISK_READS
values. (Using V$SQL is less intrusive than V$SQLREA). For a suitably
sized shared pool, we should not need to poll too frequently, but there
will always exist the possibility that a rogue statement has been flushed
and therefore goes undetected. Joining on ADDRESS to V$OBJECT_DEPENDENCY
gives a list of objects used in each SQL, which we can then compare
to our candidate segment set.
If our file I/O is
predominantly temporary or undo, we must modify this approach slightly.
For temporary I/O, hopefully make some inferences by examining the SORTS
column on V$SQL as well as high disk reads. Similarly, for high undo
activity, we can focus on DML operations, and use the ROWS_PROCESSED
column in V$SQL as an indicator of the amount of undo being generated.
Finally, by generating
an optimizer plan on relevant statements we may be able to determine
the type of I/O for each segment, and relate this back to the type of
I/O observed in V$FILESTAT. This is somewhat prone to inaccuracy since
the plan generated may be different from the one that was actually used.
To summarise, by starting
with file level I/O statistics to generate a list of candidate segments,
and only then, looking at the transient statement level I/O, we can
determine high I/O segments in an unobtrusive manner but still have
a reasonable level of confidence in the results.
that we set, a request from a manager for a report listing the objects
that were attracting the greatest physical I/O, was deliberately designed
to stimulate debate as to whether this was a useful report. We were gratified
by the way in which a good proportion of the entrants rose to that part
of the challenge, and prefaced their specific solution with a general
discussion of the underlying issues. We were, however, less impressed
by those who submitted a general essay on Oracle tuning without detailing
any solution to the problem we had set. One of these essays was of a very
high standard as an introduction to Oracle performance tuning, but it
completely failed to address the problem and was discarded at an early
stage of the judging.
Also a number
of competitors had either not understood the question and the rules or
were hoping that the members of SmartTEAM who were reading each entry
would not notice that they were avoiding the real issues. We knew when
we picked the question that allocating each application segment (table
or index) to a dedicated tablespace made it easy to solve the problem.
Those entries that relied on this approach were quickly set aside, as
were those that simply recommended that the application be upgraded to
Oracle9i Release 2 where the answer can be reported directly from V$SEGMENT_STATISTICS.
The question specifically asked for a discussion of the problems of providing
the report under Oracle8i.
that made it to our final judging stage all relied on mining one or more
of V$SESSION_WAIT, V$BH, or V$SQL.
solution would have been to poll V$SESSION_WAIT looking for sessions that
are waiting for I/O because during these waits the columns P1 and P2 give
the FILE# and BLOCK# respectively, and these can be mapped back to the
segment from DBA_USED_EXTENTS though some care is needed in order to make
this an efficient operation. Sadly very few entries mentioned V$SESSION_WAIT,
and none gave a complete analysis of why it is so appropriate to the problem.
One great advantage of V$SESSION_WAIT is that the view is relatively short,
having no more rows than there are sessions. The CPU impact of repeated
visits is therefore rather less than that incurred for V$BH, discussed
of entrants correctly pointed out that physical I/O as seen by Oracle
is not necessarily physical I/O at the device level. Caching within any
or all of the operating system, the storage controller (or storage array)
and the storage device may mean that a read requested by Oracle can be
achieved from solid state memory. If the operation is very fast then the
resulting wait is much less likely to be picked up by a monitor polling
V$SESSION_WAIT every few seconds.
of entries correctly identified that it is possible to infer from repeated
pools of V$BH when a database block has been read or written. Very few
people pointed out that V$BH contains the column OBJD (object number)
and that this makes it trivial to relate changes in buffer content to
a particular segment. None of the short-listed entries discussed the key
downside of using V$BH, which is that many instances are configured with
over 100,000 buffers and Oracle instances with over 1,000,000 buffers
are not unknown. The processing time for each poll therefore requires
Lastly a number of entries identified that a number of inferences can
be drawn from V$SQL. A handful of entries also pointed out the importance
of using V$SQL rather than V$SQLAREA; there are several reasons for this,
but the most important is that V$SQLAREA effectively ignores “child
cursors” or cases where the same SQL statement has many different
execution plans and may even access a different set of tables (because
it was issued by a different user). We were impressed by entries that
showed an understanding of how to find information gleaned by the Oracle
parser, and our winner in particular pointed out that V$OBJECT_DEPENDENCY
will tell you exactly which tables are referenced by any given SQL statement.
At least one entry proposed to examine the SQL_TEXT column in V$SQL to
find out whether a statement whether a statement was a select, insert,
update, or delete. For various syntactic reasons this is nothing like
as easy as it sounds, and in any event V$SQL has a column COMMAND_TYPE
that gives, unsurprisingly, the command type.
driven from V$SQL cannot afford a complete solution except in query only
applications that never perform joins. There are no statistics on write
I/O and it is impossible to get any accuracy at all into estimates of
the physical I/O across the objects accessed. However, as Connor (our
winner) observed, careful analysis based on V$SQL can really help to identify
the prime suspects.
When entering, please include full name, title, company, address, work
phone and email where you can be contacted.
2. All entries must be received no later October 11th, 2002, midnight.
One entry per person. The answers become the property of BMC Software,
Inc. upon submission.
3. The winner with be the Guest of Honor of BMC at the 2002 OracleWorld
in San Francisco, CA. Winner will be awarded round trip air fare, hotel
(5 nights), $50 meal allowance per day and entry to the 2002 OracleWorld
in San Francisco, CA. BMC will be allowed to use the winner’s names in
publications. Winner will be required to be present at all BMC and Oracle
events related to SmartDBA Challenge that will occur at 2002 OracleWorld
in San Francisco, CA.
4. Eligibility is open to anyone working in an IT organization at any
company excluding those working for a competitor of BMC Software or ORACLE
or any of their partners/affiliates. Not eligible to enter are employees
and their immediate families, including household members, of Official
Sponsor Companies, their affiliates, subsidiaries, divisions and/or advertising
and promotion agencies. This promotion is open to all entrants who are
18 years of age or older at time of entry. This promotion is subject to
all federal, state and local laws. Void where prohibited. Winner agrees
that the sponsors, their affiliates and their agencies and employees shall
not be liable for injury, loss or damage of any kind resulting from participating
in this promotion or from the acceptance or use of any prize awarded.
Sponsors reserve the right to verify eligibility qualifications of any
winner. U.S. law governs this promotion.
5. We are looking for answers in essay form, written in English, with
a maximum length of 1,500 words.
a) If your answer is more than 1,500 words long we will only read and
mark the first 1,500 words.
b) Answers must be submitted as either pure text files, rtf files or Word
c) No marks will be deducted specifically for spelling mistakes or grammatical
errors but our script markers are only human so the easier your essay
is to understand the more likely you are to get high marks.
d) Any assertion that is not derived directly from the Oracle 8.1.7 documentation
should be supported in some way. Published works may be cited as authority
and such references need not form part of the word count however you will
not necessarily gain marks just for knowing what other people have said
on a topic.
Any answer in the form of SQL scripts will not be marked, and you are
discouraged from putting complete SQL statements into your answer. You
are encouraged to use the names of database objects (and columns) created
as part of normal database creation.
7. No part of any answer may rely on direct memory access to a memory
region managed by Oracle, connecting as SYS or on the existence of any
SYS object other than those installed as part of normal database creation.
Answers that depend on storage configurations that are unusual in Oracle
(such as dedicating a tablespace to a specific segment) will be heavily
We want you to demonstrate each of the following, and the percentages
quoted are the percentage of the final mark that will be derived from
a) An understanding of the data sources available within the Oracle server
for suitably privileged users to use SQL to retrieve performance and resource
utilization data (15%)
b) An understanding of the problems associated with tracking both input
and output to specific storage segments (15%)
c) An understanding of the distinction between objects and segments (5%)
d) The effectiveness of any proposed solution (30%)
e) A clear understanding of the likely accuracy of any proposed solution,
and in particular the factors that would affect its accuracy (15%)
f) The ease of operation of any proposed solution (10%)
g) An understanding of the likely performance impact of any proposed solution
and the steps taken to keep this within acceptable limits (10%)
In the event of two or more answers getting exactly the same mark, we
will use the received date then word count (if received on the same date)
to resolve the tie in which case the shortest entry or entries will win.
purchase is necessary. By entering, entrants acknowledge compliance with
these official rules, including all eligibility requirements. All entries
become the property of dbazine.com. None of the Official Sponsor Companies,
dbazine.com, their affiliates, subsidiary divisions or advertising and
promotion agencies is responsible for incorrect or inaccurate transcription
of entry information, human error or for any technical malfunctions. By
entering this promotion, entrants will automatically be registered as
members of dbazine.com- we will not rent or sell your name to anyone,
taking the SMARTDBA Challenge
you agree to abide by the Contest Rules.