Smart DBA

This will look a little messy – I’ve tried to preserve the original layout from way way back in 2002 🙂 But this is just a little gloating on my part. I got a free trip to OpenWorld for this one!

Results: The SMARTDBA of 2002


The SMARTDBA of 2002 : the winner of our annual “essay” question is Connor McDonald. Connor gets a free trip to OracleWorld and will be recognized at both Oracle and BMC Software events. Connor also becomes an automatic member of the SMARTTEAM. Connor is an Australian with 10 years experience with Oracle from version 7 to 9.2. He’s spent the last 3 years working as an independent consultant at various locations in the UK. He has presented at the UK User Groups meetings and at UK annual conference this year, and has contributed articles to Oracle Scene (UK Magazine).

Connor maintains, and is a member of the OakTable network– a collection of professionals around the world who specialize in Oracle functionality, which includes members such as Steve Adams, Jonathan Lewis (also a DBAzine contributor), Howard Rogers, and Mogens Nørgaard.

The Question:
Your manager wants you to find out, through SQL issued from a client, which database segments are attracting the greatest amount of physical I/O. Write him a report explaining the problems of doing this under Oracle8i and how accurate you would expect your results to be.
(You must keep it under 1,500 words! See contest rules.)

Connor’s winning Answer:
word count= 1,498; he was not penalized for writing in English-proper! 🙂

The task appears straightforward, but we must clarify some definitions before proceeding. Firstly, a “physical I/O” must be properly defined. Within the Oracle database, the distinction between a logical I/O (a buffer cache operation) and a physical I/O (a miss or deliberate avoidance of the buffer cache) is recorded, but Oracle’s perception of a physical I/O does not necessarily imply a physical operation on the underlying disks.

Modern storage infrastructure typically has a number of mechanisms for optimizing the characteristics of disk I/O. An I/O request might be serviced without performing physical I/O (for example, a hardware cache), or conversely, it might be serviced by performing additional I/O (for example, read ahead).

Operating systems further cloud the issue. Many databases are built upon file systems, not the underlying raw volumes, and operating systems will typically cache file system data to avoid physical I/O. Conversely, a single I/O request from a database may result in multiple I/O requests in the underlying OS layer, for example, when there is a block size mismatch between the file system and database. And, for the disk infrastructure reasons mentioned earlier – some, all, or even none of these requests may result in a true physical I/O operation on the disks

Can we quantify the disparity between Oracle’s physical I/O statistics and true physical disk operations? Not from within the database, and possibly, not even the OS layer. Whilst we can hypothesize whether true I/O is occurring using average I/O times within Oracle (via V$FILESTAT), such an approach is dubious at best, due to factors such as asynchronous I/O, and the accuracy of timing data in V$FILESTAT has a poor reputation on many Oracle versions. For this report, we will define an I/O operation to be “physical” as per the database perception. The relevance of an “Oracle physical I/O” to true physical I/O cannot be accurately quantified without a significant amount of database-external measurement, which is outside this report’s scope.

Secondly, a more precise definition of “segment” is required. The distinction between the various types of segments is important because it changes some of the strategy used for monitoring I/O. Broadly speaking, segments can be classed into four types: temporary, application, undo and system.

Temporary segments are those used on a transient basis, for example, in sorting and global temporary table operations. Unfortunately, the evolution of Oracle tablespaces used for temporary data has led to several variants of temporary segments. These variants change the method and accuracy via which we can measure their I/O. Temporary segments also raise the philosophical debate of where to attribute their I/O. For example, if 4 tables are joined and the sorted result set is large, then the I/O against the 4 source tables may be small, but the sorting I/O huge. Are the 4 tables “responsible” for this large I/O or not? In previous versions of Oracle, it was a reasonable assumption that all temporary segment I/O was caused by an underlying operation on a permanent segment, but with the advent of global temporary tables this assumption is no longer fully valid.

Application segments are the actual segments defined for our applications, namely, tables, indexes, etc.

Undo segments are the segments used for transaction processing and read consistency in the database, and can also contribute large amounts of I/O. Undo segments raise a similar debate as for temporary segments. Does the undo generated for (say) a large delete operation count as I/O against the table being modified or not?

Finally we have system segments, the segments that form the database catalog. This of course does not include application segments that happen to reside in the SYSTEM tablespace (a behavior some databases are prone to exhibit). Whilst the kernel’s use of system segments is rarely a cause of significant physical I/O load (courtesy of the dictionary cache), other processes, such as over exuberant monitoring software can be a source of excessive I/O. On this premise, system segments can be viewed as application segments for the purposes of I/O monitoring.

For the remainder of this report, we will note where necessary any special treatment that temporary and/or undo segments mandate, but the emphasis will be on application segment I/O whether it occurs directly or indirectly via temporary or undo segments.

I/O on segments is of course driven by SQL against those segments. However, Oracle does not expose any statistics directly between segments and physical I/O (if it did, this report would be a lot shorter!). Definitive physical I/O statistics within Oracle are exposed only at the file level and this represents the challenge to deriving accurate results. A single file will typically contains many segments, and conversely, a single segment may span many files.

Additional I/O information can be obtained at the statement (SQL) level but clearly a single statement can affect multiple database segments, and unlike the file I/O statistics, this information is transient, being available only whilst it remains in the library cache.

Hence capturing “all” of the statement level I/O data means a polling solution. Thus a more suitable avenue of investigation is to start with the file information, and having narrowed our scope to the busiest files, then attempt to obtain some convergence to the statements that must have caused that I/O. This should yield a list of segments that are contributing the greatest I/O load. Such a list may be sufficient for our reporting requirement, but if not, we can poll the database more aggressively for information (for example, via V$SESSION_WAIT or V$BH). Even if we must resort to this method, then by starting with the file level statistics, it will be far less intrusive when only considering a small list of candidate segments as opposed to all segments in the database.

For permanent segments, file read and write counts are measured as PHYRDS and PHYWRTS respectively in the V$FILESTAT view (joining to V$DATAFILE on FILE# if file names is required). As with all of the performance views, the data recorded is from instance startup, so we can observe just the absolutes, or take regular snapshots and report the deltas, dependent on the requirement. For genuine temporary tablespaces the same information is recorded in V$TEMPSTAT (with file name in V$TEMPFILE).

There is more than just the quantity of I/O that we can glean from the file statistics. Obviously we can distinguish between read and write activity, but we can also get an indication of the type of I/O. If (on the busy files) the ratio of blocks read to physical reads is close to the multi block read count, then we can infer that table scanning is probably dominant, whereas if the ratio is closer to 1:1, single block I/O is probably the significant activity.

Having identified a file or some files of interest, we need to identify those segments (or portions thereof) that are contained within those files. For permanent segments, DBA_EXTENTS shows the file (FILE_ID) for each segment extent in the database. If these are application segments, then we can focus further attention upon these.

If the segments identified are undo and/or temporary segments, it is probably more important to identify the application segments that catalyzed the undo or temporary segment I/O. We will treat these as special cases when we move onto statement processing below.

We will now have identified a set of candidate segments, some or all of which are responsible for the majority of physical I/O. Unfortunately, it is also possible that these segments also have extents spread in multiple datafiles, which may distort our conclusions.

To prune our segment list further, we examine the transient data in V$SQL for large DISK_READS values. (Using V$SQL is less intrusive than V$SQLREA). For a suitably sized shared pool, we should not need to poll too frequently, but there will always exist the possibility that a rogue statement has been flushed and therefore goes undetected. Joining on ADDRESS to V$OBJECT_DEPENDENCY gives a list of objects used in each SQL, which we can then compare to our candidate segment set.

If our file I/O is predominantly temporary or undo, we must modify this approach slightly. For temporary I/O, hopefully make some inferences by examining the SORTS column on V$SQL as well as high disk reads. Similarly, for high undo activity, we can focus on DML operations, and use the ROWS_PROCESSED column in V$SQL as an indicator of the amount of undo being generated.

Finally, by generating an optimizer plan on relevant statements we may be able to determine the type of I/O for each segment, and relate this back to the type of I/O observed in V$FILESTAT. This is somewhat prone to inaccuracy since the plan generated may be different from the one that was actually used.

To summarise, by starting with file level I/O statistics to generate a list of candidate segments, and only then, looking at the transient statement level I/O, we can determine high I/O segments in an unobtrusive manner but still have a reasonable level of confidence in the results.

SmartDBA Comments:

The problem that we set, a request from a manager for a report listing the objects that were attracting the greatest physical I/O, was deliberately designed to stimulate debate as to whether this was a useful report. We were gratified by the way in which a good proportion of the entrants rose to that part of the challenge, and prefaced their specific solution with a general discussion of the underlying issues. We were, however, less impressed by those who submitted a general essay on Oracle tuning without detailing any solution to the problem we had set. One of these essays was of a very high standard as an introduction to Oracle performance tuning, but it completely failed to address the problem and was discarded at an early stage of the judging.

Also a number of competitors had either not understood the question and the rules or were hoping that the members of SmartTEAM who were reading each entry would not notice that they were avoiding the real issues. We knew when we picked the question that allocating each application segment (table or index) to a dedicated tablespace made it easy to solve the problem. Those entries that relied on this approach were quickly set aside, as were those that simply recommended that the application be upgraded to Oracle9i Release 2 where the answer can be reported directly from V$SEGMENT_STATISTICS. The question specifically asked for a discussion of the problems of providing the report under Oracle8i.

The entries that made it to our final judging stage all relied on mining one or more of V$SESSION_WAIT, V$BH, or V$SQL.

The preferred solution would have been to poll V$SESSION_WAIT looking for sessions that are waiting for I/O because during these waits the columns P1 and P2 give the FILE# and BLOCK# respectively, and these can be mapped back to the segment from DBA_USED_EXTENTS though some care is needed in order to make this an efficient operation. Sadly very few entries mentioned V$SESSION_WAIT, and none gave a complete analysis of why it is so appropriate to the problem. One great advantage of V$SESSION_WAIT is that the view is relatively short, having no more rows than there are sessions. The CPU impact of repeated visits is therefore rather less than that incurred for V$BH, discussed below.

A number of entrants correctly pointed out that physical I/O as seen by Oracle is not necessarily physical I/O at the device level. Caching within any or all of the operating system, the storage controller (or storage array) and the storage device may mean that a read requested by Oracle can be achieved from solid state memory. If the operation is very fast then the resulting wait is much less likely to be picked up by a monitor polling V$SESSION_WAIT every few seconds.

A number of entries correctly identified that it is possible to infer from repeated pools of V$BH when a database block has been read or written. Very few people pointed out that V$BH contains the column OBJD (object number) and that this makes it trivial to relate changes in buffer content to a particular segment. None of the short-listed entries discussed the key downside of using V$BH, which is that many instances are configured with over 100,000 buffers and Oracle instances with over 1,000,000 buffers are not unknown. The processing time for each poll therefore requires careful assessment.
Lastly a number of entries identified that a number of inferences can be drawn from V$SQL. A handful of entries also pointed out the importance of using V$SQL rather than V$SQLAREA; there are several reasons for this, but the most important is that V$SQLAREA effectively ignores “child cursors” or cases where the same SQL statement has many different execution plans and may even access a different set of tables (because it was issued by a different user). We were impressed by entries that showed an understanding of how to find information gleaned by the Oracle parser, and our winner in particular pointed out that V$OBJECT_DEPENDENCY will tell you exactly which tables are referenced by any given SQL statement. At least one entry proposed to examine the SQL_TEXT column in V$SQL to find out whether a statement whether a statement was a select, insert, update, or delete. For various syntactic reasons this is nothing like as easy as it sounds, and in any event V$SQL has a column COMMAND_TYPE that gives, unsurprisingly, the command type.

Analysis driven from V$SQL cannot afford a complete solution except in query only applications that never perform joins. There are no statistics on write I/O and it is impossible to get any accuracy at all into estimates of the physical I/O across the objects accessed. However, as Connor (our winner) observed, careful analysis based on V$SQL can really help to identify the prime suspects.

The Rules

1. When entering, please include full name, title, company, address, work phone and email where you can be contacted.

2. All entries must be received no later October 11th, 2002, midnight. One entry per person. The answers become the property of BMC Software, Inc. upon submission.

3. The winner with be the Guest of Honor of BMC at the 2002 OracleWorld in San Francisco, CA. Winner will be awarded round trip air fare, hotel (5 nights), $50 meal allowance per day and entry to the 2002 OracleWorld in San Francisco, CA. BMC will be allowed to use the winner’s names in publications. Winner will be required to be present at all BMC and Oracle events related to SmartDBA Challenge that will occur at 2002 OracleWorld in San Francisco, CA.

4. Eligibility is open to anyone working in an IT organization at any company excluding those working for a competitor of BMC Software or ORACLE or any of their partners/affiliates. Not eligible to enter are employees and their immediate families, including household members, of Official Sponsor Companies, their affiliates, subsidiaries, divisions and/or advertising and promotion agencies. This promotion is open to all entrants who are 18 years of age or older at time of entry. This promotion is subject to all federal, state and local laws. Void where prohibited. Winner agrees that the sponsors, their affiliates and their agencies and employees shall not be liable for injury, loss or damage of any kind resulting from participating in this promotion or from the acceptance or use of any prize awarded. Sponsors reserve the right to verify eligibility qualifications of any winner. U.S. law governs this promotion.

5. We are looking for answers in essay form, written in English, with a maximum length of 1,500 words.
a) If your answer is more than 1,500 words long we will only read and mark the first 1,500 words.
b) Answers must be submitted as either pure text files, rtf files or Word documents.
c) No marks will be deducted specifically for spelling mistakes or grammatical errors but our script markers are only human so the easier your essay is to understand the more likely you are to get high marks.
d) Any assertion that is not derived directly from the Oracle 8.1.7 documentation should be supported in some way. Published works may be cited as authority and such references need not form part of the word count however you will not necessarily gain marks just for knowing what other people have said on a topic.

6. Any answer in the form of SQL scripts will not be marked, and you are discouraged from putting complete SQL statements into your answer. You are encouraged to use the names of database objects (and columns) created as part of normal database creation.

7. No part of any answer may rely on direct memory access to a memory region managed by Oracle, connecting as SYS or on the existence of any SYS object other than those installed as part of normal database creation.

8. Answers that depend on storage configurations that are unusual in Oracle (such as dedicating a tablespace to a specific segment) will be heavily marked down.

9. We want you to demonstrate each of the following, and the percentages quoted are the percentage of the final mark that will be derived from each heading:
a) An understanding of the data sources available within the Oracle server for suitably privileged users to use SQL to retrieve performance and resource utilization data (15%)
b) An understanding of the problems associated with tracking both input and output to specific storage segments (15%)
c) An understanding of the distinction between objects and segments (5%)
d) The effectiveness of any proposed solution (30%)
e) A clear understanding of the likely accuracy of any proposed solution, and in particular the factors that would affect its accuracy (15%)
f) The ease of operation of any proposed solution (10%)
g) An understanding of the likely performance impact of any proposed solution and the steps taken to keep this within acceptable limits (10%)

10. In the event of two or more answers getting exactly the same mark, we will use the received date then word count (if received on the same date) to resolve the tie in which case the shortest entry or entries will win.

11.No purchase is necessary. By entering, entrants acknowledge compliance with these official rules, including all eligibility requirements. All entries become the property of None of the Official Sponsor Companies,, their affiliates, subsidiary divisions or advertising and promotion agencies is responsible for incorrect or inaccurate transcription of entry information, human error or for any technical malfunctions. By entering this promotion, entrants will automatically be registered as members of we will not rent or sell your name to anyone, not even our sponsors (see our legal and privacy policy.

By taking the SMARTDBA Challenge you agree to abide by the Contest Rules.

Take the for ORACLE!