'; zhtm += '

' + pPage + ''; zhtm += ''; figDoc.write(zhtm); figDoc.close(); popUpWin.focus(); } //--> Using Oracle8 -- Chapter 16 - Using Optimizers and the Analytic and Diagnostic Tools

Using Oracle8

Chapter 16 Using Optimizers and the Analytic and Diagnostic Tools

Functions of the Optimizer
The EXPLAIN PLAN Utility
The Trace Facility
Using AUTOTRACE

Select the appropriate optimizer
Examine execution plans
Record and interpret processing statistics
Use the AUTOTRACE feature of SQL*Plus

Functions of the Optimizer

The Oracle optimizer is responsible for identifying the best means to execute any SQL statement that needs to be processed. This includes a number of tasks:

Why the Optimizer simplifies expressions and statements

You can write a SQL statement to perform a specific task in many ways, the variety increasing as the numbers of conditions and different tables increase. However, you have a very limited number of choices of retrieval paths to a table's rows and to methods for joining multiple tables. The number of possible matches that must be checked is reduced if statements can be converted to a common set of structures.

Simplifying expressions and transforming statements where possible
Choosing a rule-based or cost-based optimization approach
Choosing an access path to retrieve the data from each table
Developing a join strategy, which includes defining the order in which the tables will be joined and the join operation to be used for each join.

Simplifying and Transforming Statements

The optimizer automatically simplifies certain common constructs in SQL statements if the result will simplify the execution of the statement. Such conversions can range from the very simple, such as simplifying the expression 2000/10 to the integer 200, to transforming a statement with an OR operation into a compound query with two component queries. The former type of simplification will always be done; the latter will depend on whether there are indexes on the columns in the original WHERE clause and which optimizer approach is being used.

Many different types of transformations can occur, including, but not limited to, the following:

Simplifying arithmetic expressions
Converting the IN operator to a series of OR conditions
Converting a BETWEEN...AND clause to a pair of comparison expressions
Integrating a view's definition into the statement's conditions
Transforming an OR operation into a compound query
Transforming a complex statement into a single statement with a join condition

The Oracle8 Server Concepts Manual contains detailed explanations of these and all other possible statement simplifications and transformations. You should be aware of them mainly to understand why the optimizer may sometimes choose an execution plan that doesn't appear to be appropriate for the structure of the statement being processed.

Choosing a Rule-Based or Cost-Based Optimization Approach

Oracle chooses an optimization approach for each statement based on several criteria:

Referenced objects having a defined degree of parallelism
Hints in the statement
Session setting of OPTIMIZER_GOAL
Value of initialization parameter OPTIMIZER_MODE
Existence of statistics on referenced objects

SEE ALSO
Find out more about database parameters and the initialization file,

Figure 16.1 shows how the optimizer uses these criteria to determine whether it uses rule-based or cost-based optimization. The first factor in choosing the optimization approach, when examining a statement to be processed, is to see if it can be executed in parallel. This includes determining if there will be at least one full table scan and if any of the objects referenced by the statement were defined with the PARALLEL option. If both conditions are true, Oracle will use cost-based optimization to create an execution plan containing parallel steps.

Figure 16.1 : Oracle examines many factors to determine whether the optimization approach should be rule-based (RBO) or cost-based (CBO).

Optimization for parallel processing and statements with hints

Oracle must use cost-based optimization to develop execution plans for statements that require parallel server processes, because the rule-based optimization approach has never been upgraded to handle parallel execution. Similarly, the only optimization option that can interpret hints is cost-based. Unless the only hint is RULE, Oracle uses cost-based optimization for any statement with one or more hints.

For statements that can't be executed, at least partially, in parallel, Oracle looks to see whether the statement contains a hint. If any hint exists (other than the RULE hint), cost-based optimization will be used to process the statement.

For statements that don't require cost-based optimization because of parallel execution steps or the presence of hints, Oracle checks to see if the session has defined an optimization choice for itself. The command

ALTER SESSION SET OPTIMIZER_GOAL = goal

Where you can use the RULE keyword

RULE is the option name (as well as the name of the optimizer approach) used in three different types of syntax to denote the type of optimization required. It's used in hints, in the ALTER SESSION command, and in the database initialization file. It's the same RULE in each case, but in the first situation, it is known as a RULE hint; in the second, it's a RULE goal (because the command sets the OPTIMIZER_GOAL parameter), and in the last case, it's the RULE mode (the parameter name is OPTIMIZER_MODE).

allows you to choose from one of the four optimization goals: FIRST_ROWS, ALL_ROWS, RULE, and CHOOSE. The first two options cause Oracle to use cost-based optimization, FIRST_ROWS causing it to find the execution plan that will return the first row as quickly as possible and ALL_ROWS optimizing the overall response time of a statement. The RULE goal always causes rule-based optimization to be used for the session by default. The final option, CHOOSE, as you can see from Figure 16.1, can cause rule-based or cost-optimization to be selected, based on the existence of statistics.

The statistics used to decide which optimization approach will be selected are collected with the ANALYZE command (discussed in the "Collecting Statistics for the Cost-Based Optimizer" section later in this chapter). When Oracle needs to optimize a statement running in a session with its optimizer goal set to CHOOSE, it looks in the data dictionary to see if any one of the segments referenced in the statement have statistics. If statistics are found, the statement will be processed with cost-based optimization, which uses these statistics to help develop its execution plan. If there are no statistics on all the segments to be processed, rule-based optimization is used.

When Oracle has no other indications as to which optimization approach to take, it uses the value assigned to the initialization parameter, OPTIMIZER_MODE. The same four values-FIRST_ROWS, ALL_ROWS, RULE, and CHOOSE-can be assigned to this parameter; they act in the same way as discussed for the ALTER SESSION command. By default, the parameter is set to CHOOSE, which means that the optimization approach chosen for any given statement will depend on the status of the statistics in the data dictionary for the segments processed by the statement. It also means that the optimization approach can change if statistics are added or dropped over time.

Choosing FIRST_ROWS versus ALL_ROWS

The FIRST_ROWS option is best used for statements executed in interactive applications, because users of such applications are typically waiting for responses from the system as soon as they initiate a process. Even if the overall execution time isn't minimized, a user can probably begin doing useful work with the first row of data returned, so the delay while the remainder of the data is processed isn't detrimental. ALL_ROWS should be used when the statement needs to execute as quickly as possible. You should always choose this option when initializing the optimizer for a batch program or for a program that may otherwise have unacceptably poor response time.

Data Access Paths

Table 16.1 lists the access paths available to reach the required rows in a table. The Rank column is included for a later discussion about rule-based optimization.

Table 16.1 Optional data access paths to be evaluated by optimizer

Access Path:	Description:	Rank:
Bitmap Index Scan	Accesses via a bitmap index entry	Not ranked
Fast Full Index Scan	Performs a full scan on the index entries rather than on the table	Not ranked
Single Row by `ROWID`	Uses the rowid as provided by a current cursor or a `WHERE` clause with a rowid value	1
Single Row Cluster Join	Returns only a single row from two or more tables in a cluster with a join condition on the cluster key	2
Single Row Hash Cluster	Returns a single row from a hash cluster when the `WHERE` clause identifies the complete hash key, which is also a unique or primary key	3
Single Row by Key	Returns a single row from a table when the `WHERE` clause identifies all columns in a unique or primary key	4
Clustered Join	Returns one or more rows from two or more tables in a cluster with a join condition on the cluster key	5
Hash Cluster Key	Returns one or more rows via the cluster-key value	6
Index Cluster Key	Returns one or more rows via the cluster-key value	7
Composite Index	Returns one or more rows when all columns of a composite index are referenced	8
Single-Column Index(es)	Uses one or more single-column indexes	9
Bounded Range Index Search	Uses a single-column index, or the leading column(s) of a composite index, to find values in a bounded range (with a lower and an upper value)	10
Unbounded Range Index Search	Uses a single-column index, or the leading column(s) of a composite index, to find values in an unbounded range (with a lower or an upper value, but not both)	11
Sort-Merge Join	Join of two tables via a join column when the tables aren't clustered together	12
`MAX` or `MIN` of Indexed Column	Returns the column maximum or minimum value from an index if the column is indexed by itself or is the leading column of a composite index, if the query has no `WHERE` clause, and if no other column is named in the `SELECT` clause	13
`ORDER BY` on Indexed Column	Uses single column index or leading column of a composite index to find rowids of table rows in order when column is guaranteed not to contain `NULL`s	14
Full Table Scan	Reads rows directly from a table	15

Rank column used in rule-based optimization only

The two access paths that show the value "Not ranked" can be used only by the cost-based optimization approach; therefore, they have no rank value for rule-based optimization.

Using an index to find column minimum or maximum values

An index provides a convenient access path for a maximum or minimum value of a column because the entries are sorted from least (the first entry is the minimum) to greatest (the last entry is the maximum). If the query needs other columns or has other restrictions on which rows are required, this retrieval path is inappropriate because it can't identify any other rows that must be considered. Also, the index can be used only if it's the leading column of a composite index or is the only column in the index.

Table Join Options

A table join occurs when the rows from a table are combined with the rows from another table, or even with the rows from itself. The latter, known as a self-join, is used when the value in one column needs to be matched with the same value in another column. Joins typically involve matching a value in a column, or set of columns, in one table with a corresponding value, or set of values, in the other table. When a value in each table must match, the resulting join is known as an equijoin. If the join condition is based on an inequality between the columns, the join is called a non-equijoin. Other join options are Cartesian products, which occur when there's no controlling condition and every row in one table is joined to every row in the other table, and outer joins, which include all rows from one table even if there's no matching value on the join column(s).

Consider a table in a manufacturing application. The application records information about individually manufactured parts as well as assemblies-that is, combinations of parts. For example, this book you're reading is an assembly of pages (one type of part) and a set of covers (another type of part). The series, of which this book is a part, is an assembly of the individual books that comprise it. A database table containing assembly information might include, among others, the columns ASSEMBLY_ID and PART_ID, where PART_ID contains a part identification value and ASSEMBLY_ID is the code number for an assembly that contains one or more parts. If we selected the rows for assembly ID 10-1-AA, we might see the following:

PART_NUMBER PART_NAME …

10-1-AA 12-8HY-U-87 …

10-1-AA 9JD7-RT-9 …

10-1-AA LK-LG-55624 …

…

Left and right joins

Outer joins, when a table having no matching data has its rows included in the result set anyway, are some-times referred to as left and right joins. Depending on whether the join condition lists the column of the non-matched table on the left or right side of the WHERE condition, the join is considered a left or a right join. Although Oracle doesn't allow left and right joins in a single statement, it will allow a view based on a left join to be included in query with a right join, and vice versa.

If we really wanted to see the part names as well as the part numbers for the parts that comprise assembly 10-1-AA, we would need to code a self-join:

SELECT a.part_number, p.part_number, p.part_name
FROM assemblies a, assemblies b
WHERE a.part_number = '10-1-AA'
AND a.part_name = p.part_number;

Oracle performs joins in a number of different ways, as summarized in Table 16.2.

Table 16.2 Oracle chooses a join method from among a number of options

Join Operation:		Characteristics:
Nested Loops		For each row retrieved from the driving table, looks for rows in the driven table
Sort-Merge		Sorts the rows from both tables in order of the join column values and merges the resulting sorted sets
Cluster Join		For each row retrieved from the driving table, looks for matching rows in the driven table on the same block
Hash Join^/1/		Builds a hash table from the rows in the driving table and uses the same hash formula on each row of the driven table to find matches
Star Query^/1/, ^/2/		Creates a Cartesian product of the dimension tables and merges the result set with the fact table
Star Transformation^/1/,^/2/		Uses bitmap indexes on the dimension tables to build a bitmap index access to the fact table

^/1/ Method available only when using cost-based optimization.

^/2/ Any of the other options can be used to join the dimension tables and join that result to the fact table.

When two tables need to be joined, the optimizer evaluates the methods as well as the order in which the tables should be joined. The table accessed first is the driving table; the one accessed next is the driven table. For joins involving multiple tables, there's a primary driving table, and the remaining tables are driven by the results obtained from the previous join results. Two situations will always cause the optimizer to select a specific table order when performing table joins:

If a table is guaranteed to return just one row based on the existence of a unique or primary key, this table will be made the driving table.
If two tables are joined with an outer join condition, the table with the outer join operator-a plus sign enclosed in parentheses, (+)-will always be made the driven table of the pair.

Using Rule-Based Optimization

Although Oracle8 still supports the rule-based optimizer, Oracle Corporation strongly encourages you to migrate to cost-based optimization. The rule-based optimizer won't be included in later releases of the database, although it isn't clear just when this will be. Support is being maintained to allow customers time to complete the transfer, tuning, and implementation of their applications-in-house and third party-to cost-based optimization.

Features not available to the rule-based optimizer

Some features introduced in recent releases can't be used by the rule-based optimizer. These features include partitioned tables, index-only tables, reverse-key indexes, bitmap indexes, parallel queries, hash joins, star joins, star transformations, histograms, and fast full index scans. When the feature is an aspect of an object, such as a reverse-key index, the rule-based optimizer will act as though the object weren't avail-able for use. In cases where there's no choice but to use the feature (such as a index-only table), Oracle will automatically use the cost-based optimizer. Other optional features-such as having a default degree of parallelism on a table-will cause Oracle to use the cost-based optimizer to take advantage of the feature.

If you're still using rule-based optimization, you should be planning your strategy to convert to the cost-based approach. You can't take advantage of many new features of the database while using the rule-based optimizer, and you may find that you can improve performance by using the newer optimizer without having to implement these new features, should you not have the resources to investigate them. Beginning, or at least anticipating, the conversion now, before there's a concrete deadline you have to meet, should help you realize a better product overall.

If you haven't set the value of the OPTIMIZER_MODE parameter in your initialization file and haven't executed the ANALYZE command to collect statistics for any of the tables, indexes, or clusters used in applications, your applications are probably running against the rule-based optimizer. However, you can't guarantee this because

Application developers and end users could include hints in the SQL statements.
Applications and user sessions could set the OPTIMIZER_GOAL.
Segments could be defined with a default degree of parallelism.

The rule-based optimization approach is so named because it follows a standard set of tests when determining what access path to use to obtain the rows required for each step of a statement's execution. Table 16.1 earlier in this chapter shows the possible access paths to a table and includes a rank number to show which approaches are preferred. During rule-based optimization, the table is tested to see if it can be accessed by each access path in turn, beginning with the rank 1 option, and the first possible path is chosen as the access path.

A good reason to continue using rule-based optimization

Although cost-based optimization is becoming the preferred approach, you shouldn't abandon the rule-based approach, if that is what you have been using, without due consideration. Poor performance can result if your database is using cost-based optimization without any statistical information from which to derive good execution plans. Statistics that are no longer current can also be a detriment to cost-based optimization.

If the statement requires a table join, the rule-based approach uses an algorithm to determine the two key elements of the join: first, which will be the driving table and which the driven table; and second, which join method will be used. The rules of this algorithm are as follows:

Choose each table in turn as the driving table and build a possible execution plan for each one.
In each potential execution plan, add the other tables in order by rank, the lower the rank number, the closer to the driving table.
Choose a join method for each driven table by looking at its rank number:
- If its rank is 11 or better, use nested loops
- If its rank is 12 or lower and
- There's an equijoin condition in join order (such as an index on the join column(s)), use sort-merge
- There's no equijoin, or equijoin in join order, use nested loops
Select the resulting execution plan with the least nested-loop operations with the driven table being accessed via a full table scan.
If there's a tie between two or more execution plans, select the plan with the least sort-merge operations.
If there's still a tie, select the plan with the best (lowest numbered) ranked access path for its driving table.
If there's still a tie, choose the plan with the most merged indexes for access to the driving table, or else the one that uses more of the leading columns of a concatenated index.
If this still results in tie, use the plan that uses the last table named in the FROM clause as its driving table.

Changing the execution plan under rule-based optimization

If you don't think the rules will generate the best execution plan for a given statement under rule-based optimization, you can try to improve it. For instance, to stop the optimizer by using an index, you can modify the reference to the column in the WHERE clause by adding a NULL or zero-length string for character columns (such as USERNAME || ''), or a zero for numeric columns (such as CUSTOMER_ ID + 0). This won't change the results returned but will prevent use of the index. To force the use of an index that's being ignored, you may have to rewrite the statement to avoid modifying the column reference, such as removing functions such as UPPER.

Using Cost-Based Optimization

As mentioned in the previous section, the cost-based approach will eventually be the only optimization approach available. Meanwhile, it will continue to support new database features that the rule-based approach can't handle. You should plan to convert your applications-if you haven't already-to run under cost-based optimization.

In Figure 16.1, you can see that you can invoke the cost-based optimization in a number of ways. You can do it directly with the FIRST_ROWS or ALL_ROWS setting in the initialization file or in an ALTER SESSION command; you can do it less directly by defining a default degree of parallelism on a segment being accessed in a statement, or by including a hint (other than RULE) in a statement. If you allow Oracle to use its default behavior to select an optimization approach (which means using the CHOOSE option for the OPTIMIZER_MODE parameter), the choice will be based on the existence or non-existence of statistics on the segments referenced in the statement.

The reason for the optimizer approach to depend on statistics is quite reasonable. Cost-based optimization depends on these statistics to compute the relative costs of performing different execution plans in order to choose the most efficient. If the statistics aren't stored, the likelihood that the optimizer will choose a good plan is reduced significantly, and Oracle would prefer to rely on the rule-based approach.

Of course, if you force the optimizer approach to be cost-based (with a hint, for example), it will work-albeit poorly-in the absence of statistics. It may also perform less than optimally if the statistics it uses are stale and no longer reflect the true nature of the segment they're meant to describe. It's therefore essential that you be prepared to maintain current statistics on the database segments if you want to use cost-based optimization. If you fail to do that, your application developers-and even end users who access the database directly-will have to include hints in many of their statements to ensure that a reasonable execution plan is used.

Statistics are generated and maintained with the ANALYZE command. The syntax for the options related to cost-based statistics is as follows; the other options are covered in other chapters where they are relevant.

ANALYZE TABLE|INDEX|CLUSTER
    [schema.]table_name|index_name|cluster_name
    [PARTITION (partition_name)]
    COMPUTE|ESTIMATE|DELETE STATISTICS
    [table_clause][,...]
    [SAMPLE integer [ROWS|PERCENT]

What you should analyze

The COMPUTE option of the ANALYZE command generally consumes more resources than the ESTIMATE option and, consequently, can have a greater impact on your application performance. You should compare the results from both options, using different estimated sample sizes, to decide if you really need to incur the extra over-head of computing exact statistics. When analyzing tables with associated indexes, you also can reduce the work performed by the database by estimating the table statistics and then computing exact values on the indexes individually. This will provide the optimizer with the best statistics when accessing the data via the indexes, which is how the data should be retrieved if the indexes are doing their job.

where

ANALYZE TABLE, INDEX, and CLUSTER identify the type of segment to be analyzed.
schema. is required if the segment belongs to another user.
table_name is the name of the table being processed. It's required if the keyword TABLE is included in the command.
index_name is the name of the index being processed. It's required if the keyword INDEX is included in the command.
cluster_name is the name of the cluster being processed. It's required if the keyword CLUSTER is included in the command.
PARTITION (partition_name) identifies the partition name if a single table or index partition is being analyzed. This option isn't valid when analyzing clusters.
COMPUTE|ESTIMATE|DELETE STATISTICS identifies which operation you want to use. You must include one of the three operations when the keyword STATISTICS is included in the command. COMPUTE provides exact statistics, ESTIMATE uses a sample of the data to generate statistics, and DELETE removes any previously analyzed statistics.
SAMPLE integer sets the size of the sample used in the estimate. It can be used only with the ESTIMATE option. The sample size will default to 1,064 rows if the ESTIMATE clause is used without the SAMPLE option.
ROWS|PERCENT indicates if the sample value should be treated as a row count or a percentage of the table size. It can be used only with the ESTIMATE option.
table_clause is allowed only when the segment being analyzed is a table and you are using the COMPUTE or ESTIMATE option. The format of the table_clause is

	Specifies that the command will create table statistics only; no column or index statistics will be generated
	Specifies that the command will create histogram statistics on every column
	Specifies maximum number of buckets in the histogram; default value is 75 if option isn't included
	Specifies that the command will create histogram statistics only on indexed columns
	Specifies that the command will create histogram statistics on the named column(s) or object scalar type(s)
	Specifies that the command will create statistics on every indexed column, but not on the table
	Specifies that the command will create statistics on every local index partition; must be included if FOR ALL INDEXES and PARTITION options are specified

The TABLE options that create histograms should be used if your table has a very uneven distribution of values in columns used for retrieval. When different values are stored in a column, the optimizer assumes that they will each appear about the same number of times. If some of the values occur only rarely and one or two of the others occur in a large proportion of the records, this assumption may not lead to a good execution plan. The frequently occurring values should be accessed by a full table scan, whereas the infrequently appearing values would be best retrieved via an index.

By building a histogram, you provide the optimizer with the information it needs to distinguish between these two types of values and assist it in building a good execution plan. The number of buckets, or partitions, in the histogram determines how finely the different values are distinguished. The more buckets, the greater the chance that the histogram will show the frequency of occurrence of any specific value in the column. If you need to isolate only one or two disproportionately occurring values, however, you need fewer buckets.

You can use the ANALYZE command to recalculate statistics any time you want without having to delete the old ones first. You should plan to perform re-analysis on a regular basis if the segment changes frequently.

Keeping statistics current

You should monitor the statistics on your database segments to make sure that they stay current. I recommend that you begin by executing the ANALYZE command and recording the statistics from the related view: DBA_TABLES, DBA_INDEXES, or DBA_ CLUSTERS. Re-execute the ANALYZE command a month later and compare the new statistical values; if they're close in value to the previous month's, you shouldn't need to perform another analysis for a few more months. If the statistics are very different, you may need to check again in a week. If they're somewhat different, you should plan to re-analyze the table every month. Over time, you should develop a sense of how frequently each different segment needs to be analyzed. You may need to run a program once a week, once a month, or at some other fixed interval. Your program may analyze just a few segments each time it's run, with additional segments every other time, more every third or fourth time, and so on.

When a statement is processed with cost-based optimization, the execution plan will include the table selection access paths and join methods based on the lowest estimated costs. These costs take into account the number of Oracle blocks that have to be manipulated, the number of reads that may need to occur to retrieve these blocks from disk into memory, the amount of additional memory that may be needed to process the data (such as space to complete sorts or hash joins), and the cost of moving data across any networks.

If you've built your database objects with application schemas-that is, where all the objects belonging to an application are owned by the same user-you can simplify the task of collecting statistics for cost-based optimization. Oracle provides a procedure, ANALYZE_SCHEMA, in its DBMS_UTILITY package, which will run the ANALYZE command for you against every segment in a named schema. If you haven't already done so, you need to execute the CATPROC.SQL script, which you can find in the admin subdirectory of your ORACLE_HOME directory, as SYS to build the necessary PL/SQL structures. You can then execute the required procedure by using SQL*Plus's EXECUTE command or by creating your own PL/SQL routine to run the procedure. The SQL*Plus EXECUTE command would look like this:

EXECUTE dbms_utility.analyze_schema('&username','&option',&rows,&pct)
SEE ALSO
Information about the various Oracle-supplied SQL scripts mentioned in this chapter,

You would substitute the name of the schema holding the segments you want to analyze at the username prompt; the COMPUTE, ESTIMATE, or DELETE keyword at the option prompt; and a number, the keyword NULL, or an empty string ('') for the rows and pct prompts. The last two options are relevant only for the ESTIMATE option, and any values provided are ignored for other options. They indicate the number of rows or the proportion of the table to be included in the sample respectively. If you don't provide a number for either, or set both to zero, the sample uses the default number of rows (1,064). If you provide a number for both, the value for rows is used unless it's zero, in which case the percentage sample size is used.

The statistics collected with the ANALYZE command are used in computing these costs. In cases where the cost-based optimizer is being used for a statement that references one or more-or even all-segments that have no statistics available, it still has to evaluate the potential costs of different execution plans. To do this, it uses basic information from the data dictionary and estimates the missing values. Naturally, the results aren't as accurate as they would be with current statistics collected with the ANALYZE command.

Using Hints to Influence Execution Plans

To overcome poor execution plans-due to missing or out-of-date statistics, or even due to unusual distribution of data in a table or index not anticipated by the optimizer-you can include hints in a statement. Hints are similar to the "tweaks" I suggested you can use to try to modify the behavior of rule-based execution plans, but they're more sophisticated and give you a much wider range of options.

Table names in hints

If you use a table alias in the FROM clause of a statement, you must also use that alias in the hint string when referencing the table. Your statement won't fail if you fail to do this, but the hint will be treated as comment text and won't be acted on as you expected.

Oracle publishes a complete list of available hints with descriptions of what they do and how to use them in the Oracle8 Server Tuning manual, so I won't reproduce that data here. I do include the details required to include a hint in a statement, as this can be confusing:

Place the hint immediately after the SELECT, UPDATE, or DELETE command statement keyword.

Open the hint with a comment delimiter and a plus sign concatenated to it. You can use either form of comment delimiter supported by Oracle:

A forward slash and asterisk: /*+
A pair of hyphens: --+

Include a valid hint or series of hints, with no punctuation (other than the required spaces) between adjacent hints.
Optionally include comments. Invalid hints and conflicting hints are treated as comment text and ignored. You won't receive an error message for an invalid hint, but the statement will proceed ignoring the intended hint.
Terminate the hint comment with a string consisting of
- An asterisk and a forward slash-*/- if the hint comment was opened with a /*+
- A carriage return if the hint comment was opened with a double hyphen: --+

Understand the difference between hint delimiters

A hint enclosed with /*+ ... */ can span multiple lines, whereas a hint introduced with --+ is always terminated at the end of a line.

Here is an example of a hint using some of the features just discussed:

	Use comment delimiter to begin hint
	Add the plus sign to indicate this string will contain hints
	Use the hint name in the string
	Separate additional hints with at least one space

The EXPLAIN PLAN Utility

To observe the execution plan that a particular statement would use were it to be executed, you can use Oracle's EXPLAIN PLAN feature. With this utility, you can examine the execution plan of statements you think may run inefficiently and determine if they're using appropriate access paths; you can observe the impact of changing a statement to use, or to avoid using, specific indexes; you can check the execution plans used when including different hints in the same statement; and you can even see if re-analyzing a segment to gather newer statistics or a different sample size results in a different execution plan.

To use the EXPLAIN PLAN feature successfully, you'll need to

Create a special table to hold the execution plan.
Know how to use the EXPLAIN PLAN command.
Learn how to interpret the results.

Creating a Plan Table

The easiest way to create a table to hold the results of an EXPLAIN PLAN command is to run the Oracle script UTLXPLAN.SQL. You can find this script, along with the others mentioned in this chapter, in the admin subdirectory under your ORACLE_ HOME directory. This will build a table, named PLAN_TABLE, in the schema of whichever user is executing the script.

Create a different PLAN TABLE for different users

Consider creating this table under every userid that might be used to evaluate statement performance characteristics by executing UTLXPLAN.SQL for each one. This will allow users to use the EXPLAIN command without having to include the table's schema in the command, or use a synonym to identify a common table. It also will automatically provide a table with the required privileges for use with the EXPLAIN command. Finally, it will allow users to manage the contents of the table for themselves, reducing the possibility of confusing different execution plans in a shared table.

If you want, you can create just one plan table and grant the required privileges on it to any other users who may need to execute the EXPLAIN PLAN command. Minimally, you should grant the INSERT, SELECT, and DELETE privileges if you want the table to be shared.

Although not recommended, you can create your own plan table by hand or change the name of the table to something other than PLAN_TABLE. If you do the latter, you have to include the name in a number of commands that would otherwise use the default name, and you can't use all the features of the AUTOTRACE feature (discussed later). If you build the table by hand, you must ensure that you include the identical column definitions from the ULTXPLAN.SQL script.

Using the EXPLAIN PLAN Command

To see the execution plan for a SQL statement, you use the EXPLAIN PLAN command, which has the following syntax:

EXPLAIN PLAN
    [SET STATEMENT_ID = 'label']
    [INTO [schema.]table_name[@dblink]
    FOR statement

where

EXPLAIN PLAN and FOR are the required keywords.
SET STATEMENT_ID are optional keywords, required only if you want to flag every row in the plan table with an identifier.
label is an arbitrary string, up to 30 characters long, that you can use to "label" every row in the plan table generated by the current command.
INTO is the option you need to include if the plan table you're using to store your results isn't in your own schema, isn't named PLAN_TABLE, or isn't in your local database.
schema is the name of the owner of the plan table you want to use. Your own schema will be targeted if you don't include this option.
table_name is the name of the plan table you want to use. You must include this name if you use the INTO option.
@dblink optionally connects you to a remote database schema, based on the information in the database link, dblink, for you to use a plan table at that location. By default, your local database will be used.
statement is any valid SELECT, INSERT, UPDATE, or DELETE statement for which you want to examine the execution plan.

Using a plan table identifier

If you're sharing a plan table with other users or want to keep the execution plans for a number of different statements (or versions of the same statement), you need to be able to identify which row belongs to which execution plan. As a regular relational table, the plan table won't necessarily store related rows together but will intermix the rows from different execution plans. Use the statement identifier clause, SET STATEMENT_ID, to include a unique string for each statement you explain, which you can then use to identify the rows associated with its execution plan.

The statement you examine in an EXPLAIN PLAN command is never executed. It's therefore possible to run the command against an empty table and still see the potential execution plan, although cost-based optimization results may be misleading due to the lack of statistics reflecting real contents. You can also safely rerun the EXPLAIN PLAN command multiple times for a statement that would generate extensive overhead if it were actually to run. This is particularly useful when trying to tune a query against massive data warehouse tables (such queries can run for hours, even days) before submitting it for execution.

Privileges needed for the EXPLAIN PLAN command

Although the statement identified in the EXPLAIN PLAN command won't be executed, you must have the necessary privileges to run the statement for its execution plan to be generated. You must also have the privilege to INSERT into the plan table. If you don't, you'll receive the same error message as if you tried to execute the statement directly.

Interpreting the EXPLAIN PLAN Results

Table 16.3 shows the meanings of each operation and options that can appear in the plan table after you execute an EXPLAIN PLAN command.

TABLE 16.3 Operations and options generated by an execution plan

Operation:	Option:	Description:
`AND-EQUAL`		An operation that accepts multiple sets of rowids from single-column indexes on the same table and returns the rowids common to all the sets.
`BITMAP`	`CONVERSION`	`TO ROWIDS` converts the bitmap representation to actual rowids in the table. `FROM ROWIDS` converts rowids into a bitmap. `COUNT` returns the number of rowids represented by the bitmap.
	`INDEX`	`SINGLE VALUE` looks for a single value in the bitmap; `RANGE SCAN` looks for a range of values; `FULL SCAN` examines the entire bitmap index. `MERGE` Merges two or more bitmaps into a single bitmap.
	`MINUS`	Subtracts the bits of the bitmap for a negated predicate from another bitmap.
	`OR`	Computes the Boolean `OR` of two bitmaps.
`CONNECT BY`		Orders rows for a query containing a `CONNECT BY` clause.
`CONCATENATION`		An operation that returns all the rows from two or more sets of rows.
`COUNT`		An operation to count the number of rows retrieved from a table.
	`STOPKEY`	A count operation that's terminated by a `ROWNUM` expression.
`FILTER`		An operation that removes a subset of the rows from a set.
`FIRST ROW`		Retrieves the first row only from a query.
`FOR UPDATE`		An operation that locks the rows retrieved when the query contains a `FOR UPDATE` clause.
`INDEX`	`UNIQUE SCAN`	An index retrieval guaranteed to find no more than one entry.
	`RANGE SCAN`	An index retrieval on a non-unique value or a range of unique or non- unique values.
	`RANGE SCAN`	A range scan performed in descending `DESCENDING` order.
`INLIST INTERATOR`	`CONCATENATED`	Repeats an operation based on the values found in the inlist.
`INTERSECTION`		An operation that combines two sets of rows and eliminates any duplicates.
`MERGE JOIN`		A table join performed by matching values in the tables' join column(s) after they've been sorted.
	`OUTER`	A merge join operation used to perform an outer join.
`MINUS`		An operation that removes rows from a set of records when they appear in a second set.
`NESTED LOOPS`		A table join that compares each value found in one table with values in the second table and returns rows with matching values in the join column(s).
	`OUTER`	A nested loops operation used to perform an outer join.
`PROJECTION`		An undocumented internal operation typically involving views.
`REMOTE`		Retrieves data from a remote database.
`SEQUENCE`		An access of values in a sequence generator.
`SORT`	`AGGREGATE`	A sort performed to apply a group function.
	`UNIQUE`	A sort performed to eliminate duplicates.
	`GROUP BY`	A sort performed to satisfy a `GROUP BY` clause.
	`JOIN`	A sort performed in preparation for a merge-join operation.
	`ORDER BY`	A sort performed to satisfy an `ORDER` `BY` clause.
`TABLE ACCESS`	`FULL`	A retrieval that accesses all the rows of a table.
	`CLUSTER`	A retrieval from a table in an indexed cluster based on a value in the cluster index.
	`HASH`	A retrieval from a table in a hash cluster based on a hash-key value.
	`BY ROWID`	A retrieval from a table based on the rowid(s) of one or more rows.
`UNION`		An operation that combines two sets of rows and returns all the rows from both sets, other than duplicates.
`VIEW`		Executes a view's query.

Operations versus options in the plan table

The operations named in an execution plan are the actual steps per-formed to process the statement. Options describe why or how the operation is being executed. The results of the operation may or may not be different because of the option. For example, the INDEX operation will always return a set of rowids, possibly empty if no rows match the desired criteria, regard-less of the option used. On the other hand, a SORT operation may only return one row for an AGGREGATE or GROUP BY option, or may return all the rows in the set as when used for the ORDER BY option.

Other columns that you may need to review to understand what the execution plan is doing include the following:

OBJECT_NAME The name of the table or index being operated on
OPTIMIZER The current mode or goal of the optimizer
OTHER_TAG Indicates if operations are being performed in parallel with parallel server processes
COST The relative cost as evaluated by cost-based optimization
CARDINALITY The number of rows that the cost-based optimization estimated will be accessed by the operation
BYTES The number of bytes that the cost-based optimization estimated will be accessed by the operation

The other useful columns in the plan table are the ID, PARENT_ID, and POSITION columns. Although you can work out the order of operations in the execution plan by using the values in these columns, you'll find it easier to use them to build a tree-walk output from the plan table, using indentation or other means to show the order in which operations will occur and which operations depend on others. For example, if you explain a query that retrieves a single row through a primary key value, you'll have three entries in your plan table. One will show the primary key index access, one will show the table access using the rowid from the index, and one will show that the whole execution plan was to satisfy a query (SELECT statement). In this case, you could determine in which order the operations would have to occur to produce the desired results.

With a complicated statement that requires tens of operations to complete, however, you may need to see organized rows from the plan table. Oracle has published a number of variations of a query that shows the relationship between each operation through levels of indentation. The query mentioned previously would be formatted to look something like

SELECT STATEMENT
    TABLE ACCESS BY ROWID
        INDEX UNIQUE SCAN

where the most indented operation is done first, the outermost operation done last, and those in between done according to the amount they're indented. Each level of indentation represents a level of dependency-the outermost operation depending on all the previous levels for it to complete. When two or more operations contribute equally to a parent operation, such as when two sorted sets of table rows are compared in a sort-merge join, they appear under the parent operation at the same level of indentation. Figure 16.2 shows how you can build a tree structure from an execution plan that's formatted in this way. The figure also indicates how you can follow the order of execution from the tree structure if you build it as shown.

Figure 16.2 : You can create a tree structure from a formatted plan table query.

To help you get started with the EXPLAIN PLAN utility, Listing 16.1 shows a script you should run from SQL*Plus after you create a plan table. It prompts for the statement you want explained, for a statement ID to keep the results separate, and lets you choose whether to delete the resulting rows from the plan table when you're done.

Listing 16.1 EXPLAIN.SQL-Script to generate and display a formatted execution plan

     01:     REM Define column widths to allow columns to fit into
     02:     REM 80-character display
     03:     COLUMN operation FORMAT A16
     04:     COLUMN goal FORMAT A6
     05:     COLUMN options FORMAT A10
     06:     COLUMN object FORMAT A10
     07:     COLUMN parallel_ops FORMAT A12
     08:     COLUMN cost FORMAT 9.9EEEE
     09:     COLUMN rows FORMAT 9.9EEEE
     10:     COLUMN bytes FORMAT 9.9EEEE
     11:     
     12:     REM Turn off distracting feedback from substitution
     13:     REM variables
     14:     SET VERIFY OFF
     15:
     16:     REM Get text of statement to be explained, statement id,
     17:     REM and set delete flag to null for DELETE, to any
     18:     REM string for no delete
     19:     PROMPT "Enter statement to be explained --"
     20:     ACCEPT stmt
     21:     ACCEPT s_id PROMPT "Enter Statement ID:  "
     22:     PROMPT "Delete PLAN_TABLE entries when done?"
     23:     PROMPT "To delete, hit <return>, or else"
     24:     ACCEPT dlt PROMPT "enter N or n:  "
     25:     
     26:     REM    Generate the execution plan
     27:     EXPLAIN PLAN SET STATEMENT_ID = '&s_id' FOR
     28:     &stmt
     29:     /
     30:
     31:     REM     Format the results
     32:     SELECT SUBSTR(LPAD('  ', 2*(LEVEL-1))||operation,1,16)
     33:             AS "OPERATION",
     34:            SUBSTR(optimizer,1,6) "GOAL",
     35:            SUBSTR(options,1,10) "OPTIONS",
     36:            SUBSTR(object_name,1,10) "OBJECT",
     37:            DECODE(other_tag, 'serial_from_remote',
     38:             'ser from rem', 'serial_to_parallel',
     39:             'ser to par', 'parallel_to_parallel',
     40:             'par to par', 'parallel_to_serial',
     41:             'par to ser', 'parallel_combined_with_parent',
     42:             'par w parent', 'parallel_combined_with_child',
     43:             'par w child') AS "PARALLEL_OPS",
     44:        cost,
     45:        cardinality "ROWS",
     46:        bytes
     47:     FROM plan_table
     48:     START WITH id = 0
     49:        AND statement_id = '&s_id'
     50:     CONNECT BY PRIOR id = parent_id
     51:        AND statement_id = '&s_id'
     52:     /
     53:     
     54:     REM Delete the plan table entries matching the statement
     55:     REM id (if a non-null entry was given at the Delete
     56:     REM prompt, the WHERE clause will fail to find any rows)
     57:     REM and clear the substitution variable values
     58:     DELETE plan_table WHERE statement_id = '&s_id' || '&dlt'
     59:     /
     60:     COMMIT
     61:     /
     62:     
     63:     UNDEFINE s_id
     64:     UNDEFINE stmt
     65:     UNDEFINE dlt

Download this code

You can download the EXPLAIN.SQL script from the Web at http://www.Mcp.com/info. You'll be asked to enter an ISBN; enter 0789716534, and then click the Search button to go to the Book Info page for Using Oracle8.

Tweak EXPLAIN.SQL

I don't expect you to run the EXPLAIN.SQL script as it stands-I rarely do. But I do have a version of it that I use for checking parallel operations, which I call EXPPAR.SQL, and another version, called EXPROWS.SQL, which allows me to concentrate on the number of rows being handled at each step.

As you'll see, if you run this script, a number of the fields are truncated to fit the data neatly onscreen. You should make any modifications to this script you would find useful, such as removing unwanted columns from the output and displaying more characters from the columns that interest you. Or you can remove the SUBSTR functions (lines 31 through 34) and allow the data to wrap to multiple rows within the defined column widths. You can see from Table 16.3 that you don't need many characters from the entries in the OPERATIONS and OPTIONS columns to be able to tell them apart (although you may need to keep this book open on your desk until you start to remember them all).

When you become familiar with the various execution plans generated by your applications' statements, you can identify useful indexes as opposed to unused ones, spot statements that aren't taking proper advantage of indexes or clustered tables, and locate steps that are causing the most overhead in a statement.

You can also gather useful information from examining the execution plans for statements that you're executing with parallel server processes. You want to ensure that most of the steps in the execution path are processed in parallel and, in particular, that you don't have any serial steps interposed between two sets of parallel steps.

The Trace Facility

If you're having problems with application performance that you can't resolve through the interpretation of the execution plans generated by its statements, you may need to track the actual performance of the SQL commands. The trace facility is designed to help you in this task by recording the basic activities performed by a statement as it executes, along with a record of the resources used to perform those activities. The results of a trace are stored in a file that, although it's a readable text file, isn't in the most convenient format for you to find critical information.

In addition to the trace option itself, Oracle also offers a tool known as TKPROF, which formats the basic trace files into a report format as well as gives you some control over the contents and organization of the output.

If you want to track performance of executing statements, you need to use the trace tool and the TKPROF program.

Creating a Trace File

You can create a trace file in a number of ways. The easiest method to manage is to use SQL*Plus.

Setting TIMED_STATISTICS for complete trace data

Some statistics that the trace facility can collect involve the length of time taken by certain activities. These values won't be recorded unless the database is running with TIMED_STATISTICS set to TRUE. You can set TIMED_ STATISTICS in your initialization file, with an ALTER SYSTEM file, or in the session where you're planning to run the trace facility. Due to the slight increase in overhead incurred when running with TIMED_STATISTICS enabled, you may want to leave it at its default value of FALSE in your initialization file. However, if your user community will likely use the trace facility and doesn't know about the option to turn on statistics with the ALTER SESSION command, you may need to keep statistics collection turned on for them. Or, if you know when they're likely to be using trace, you can turn it on and off as needed at the database level with the ALTER SYSTEM command.

Create a trace file

Connect to the database in SQL*Plus.
Optionally start collecting timing information with the command

ALTER SESSION SET TIMED_STATISTICS = TRUE;

Start the trace by issuing the command

ALTER SESSION SET SQL_TRACE = TRUE;

Issue the commands you want to analyze. Unlike using EXPLAIN PLAN, the commands will actually execute, so be careful if you use any DML commands that change production tables.
Terminate the trace with the command

ALTER SESSION SET SQL_TRACE = FALSE;

Disconnect from SQL*Plus and locate the trace file you created.
Run TKPROF to format the contents of the trace file.

You might find it more convenient to build a script file containing the statements you want to examine and execute that file while tracing is active in your session. This way, you can test the script ahead of time to ensure that you're going to be working with only the statements you intended. You can also use the script to repeat the exercise to check the impact of any changes you may decide to make as a result of your initial tests.

If you stay connected to the same Oracle session, trace will continue to use the same trace file no matter how many times you turn tracing on and off. Each time you run the same statement in a single Oracle session, trace accumulates its statistics in a single record for that statement, again whether or not you run it in the same trace session. If you want to compare the before-and-after statistics of a single statement, for example, or a query run with and without a certain index in place, you should disconnect from Oracle before tracing the second execution of the statement. This way, each trace file-one for the first and one for the second session-will include only data for the individual executions of the statement you're investigating.

The second way you can initiate the trace facility is to use an Oracle-supplied procedure. This way, you can start the trace on another user's session or start it from within an application program that can execute a PL/SQL block. The procedure, SET_SQL_TRACE_IN_SESSION, is part of the DBMS_SYSTEM package and requires the SID and SERIAL# of the session to be traced supplied as arguments. You can find these values by querying the V$SESSION dynamic performance view.

Obtaining access to DBMS_ SYSTEM

Not all users may be allowed to execute procedures in the DBMS_SYSTEM package, which is owned by SYS. Other users may need to have the execute privilege on the pack-age granted to them, by SYS or by a user with GRANT OPTION on the package. Non-SYS users will also need to include the schema name, SYS., as a prefix to the pack-age name, or else create a synonym to identify the schema and package.

Trace a session for any user

Query the V$SESSION table to obtain the SID and SERIAL# for the user's session you need to trace:

SELECT sid, serial#
FROM v$session
WHERE username = 'oracle_username'
AND osuser = 'operating_system_userid';

From within SQL*Plus, execute the procedure to start the trace for the selected session:

EXECUTE dbms_system.set_sql_trace_in_session(sid,serial#,TRUE)

You simply invoke the procedure by name without the EXECUTE command from within a PL/SQL block.

Do nothing while the user continues to work.
From within SQL*Plus, execute the procedure to stop the trace for the selected session:

EXECUTE
dbms_system.set_sql_trace_in_session(sid,serial#,FALSE)

You simply invoke the procedure by name without the EXECUTE command from within a PL/SQL block.

The third, and final, method you can use to start tracing is to set an initialization parameter to cause every session to be traced. The parameter, SQL_TRACE, takes a Boolean value-TRUE turns on database-wide tracing, and FALSE (the default value) causes no default tracing. Just as an individual session can perform its own tracing when the database is running with SQL_TRACE set to FALSE, sessions can disable a statistics collection for themselves even when database-wide tracing is active. In either case, the user issues the ALTER SESSION command as shown earlier.

Trace only at the database level under controlled conditions

Setting SQL_TRACE = TRUE in your initialization file forces Oracle to trace every session that connects to the database. You're advised not to try this on a production database due to the volumes of data that will likely be the result. Every DBA I've talked to who has tried this has never had time to review all the trace files produced. Most of them couldn't even decide which of them might contain useful information. You should plan to trace all data-base sessions only when you're working with a test database and a controlled user community (whether they be users, developers, or even simulation scripts). Even in these situations, make sure that you have the disk space to store the anticipated output.

Formatting a Trace File with TKPROF

The TKPROF program uses the data from a traced session to create a formatted report. The program can also create a script file that will store the trace results in a database table or build a script file containing the session's traced SQL statements. Options for the TKPROF program allow you to sort the report in order of the most resource-intensive statements traced, choosing which resource, or resources, you want to influence the ordering. You can request a formatted execution plan, generated with the EXPLAIN PLAN utility, to be included in the report for each SQL statement. You also can include or exclude information about recursive SQL statements, aggregate or separate the statistics for a single statement executed by different users, and limit the number of traced statements that appear in the report. The latter is most useful when used with a sort option, allowing you to restrict the report to the top few resource-intensive SQL statements.

The TKPROF command has two required arguments: the name of the trace file to be processed, and the name of the output file where the formatted report is to be written. Therefore, to accept all defaults, you can type

TKPROF filename1 filename2

at your operating system prompt, substituting the trace file for filename1 and the output file for filename2. You don't even need to include an extension for the filenames; TKPROF assumes that .trc will be the extension for the input file and will use .prf as the default extension for the report file.

Table 16.4 describes the options for the TKPROF command. To include an option, type its keyword as shown, followed by an equal sign (=) and the required arguments.

Table 16.4 Optional arguments for TKPROF

Argument:	Value(s):	Description:
`AGGREGATE`	YES (default) `NO`	Combines statistics for all executions of the command by all users. Separates statistics for the same command by user.
`EXPLAIN`	`user/password`	Causes an execution plan to be included with each statement in the report, using the named account to run the `EXPLAIN PLAN` command.
`TABLE`	`schema.table` (default is `user` for the schema)	Table used for the `EXPLAIN PLAN` output if the `EXPLAIN` argument is set. Will use an existing table, if it exists, deleting any current entries; otherwise, will create the table temporarily, dropping it when the report is completed.
`INSERT`	`filename`	Creates a script file to build a table and store results in the database. The default file extension is .prf.
`SYS`	`YES` (default) `NO`	Includes recursive SQL in report. Doesn't include recursive SQL in report.
`SORT`	See Table 16.5 (no default)	Determines the order in which SQL statements are listed in the report. Order is in descending order of resource use, where the resource is identified by a keyword from Table 16.5.
`PRINT`	`integer`	Restricts the report to include only first integer SQL statements, based on the `SORT` order.
`RECORD`	`filename`	Creates a script file to execute all non-recursive SQL statements in the trace file. The default file extension is .SQL.

Recursive SQL in TKPROF reports

Even if you run TKPROF with the option SYS=NO, certain recursive commands will still appear in the report. These commands are associated with the establishment of the trace environment. You may notice errors listed in the report if you use the EXPLAIN option because you may not have per-mission to access all the tables used by these recursive statements. These errors should be of no concern; they simply indicate that the EXPLAIN PLAN command failed when trying to build an execution plan.

An example of a TKPROF command that generates a report containing execution plans built by using the SYS plan table, and also builds a script file to re-execute the statements used in the traced session, might look like the following:

TKPROF ora00284.trc jan14hr.rpt TABLE = sys.plan_table
  EXPLAIN = scott/tiger RECORD = jan14hr.sql

TKPROF execution plans are real time

When you run TKPROF with the EXPLAIN option, the execution plans will be generated as the report is being created. If the trace file was produced some time before you format it, it's possible that the execution plan used by the command when it executed wasn't the same one you'll see in the report. New statistics-generated with the ANALYZE command for cost-based optimization or the creation of deletion of an index, for example-could cause a different execution plan to be used. You're therefore advised to run TKPROF as quickly as possible after creating the trace file if you expect to see an execution plan used, in all likelihood, when the statements were actually processed.

Table 16.5 shows the keywords, and their meanings, that you can use with TKPROF's SORT option to organize your report with the most resource-intensive SQL statement first, and the remaining statements in descending order of resource usage. You can use just one of the sort options from Table 16.5, or you use more than one, enclosing your list in a pair of parentheses and separating the options with commas, as in the following:

TKPROF ora00284.trc jan14hr.rpt SORT = (EXECPU, EXEDSK, FCHDSK)

To generate this report, TKPROF will compute the sum of the values for each statement's CPU usage during its execute phase with the disk reads performed during its execute and fetch phases, and then sort the statements in descending order of the results.

Table 16.5 Sort options for TKPROF

Option:	Description:
`PRSCNT`	Number of times parsed
`PRSCPU`	CPU time spent parsing
`PRSELA`	Elapsed time spent parsing
`PRSDSK`	Number of physical reads during parse
`PRSQRY`	Number of consistent mode block reads during parse
`PRSCU`	Number of current mode block reads during parse
`PRSMIS`	Number of library cache misses during parse
`EXECNT`	Number of times executed
`EXECPU`	CPU time spent executing
`EXEELA`	Elapsed time spent executing
`EXEDSK`	Number of physical reads during execute
`EXEQRY`	Number of consistent mode block reads during execute
`EXECU`	Number of current mode block reads during execute
`EXEROW`	Number of rows processed during execute
`EXEMIS`	Number of library cache misses during execute
`FCHCNT`	Number of fetches
`FCHCPU`	CPU time spent fetching
`FCHELA`	Elapsed time spent fetching
`FCHDSK`	Number of physical reads during fetch
`FCHQRY`	Number of consistent mode block reads during fetch
`FCHCU`	Number of current mode block reads during fetch
`FCHROW`	Number of rows fetched

Interpreting Trace Information

The following is a sample report formatted by TKPROF. The user executed only one statement in the traced session. The report includes this statement and its statistics, but other recursive SQL statements have been edited out.

Sample Trace Report Formatted by TKPROF

***********************************************************
count    = number of times OCI procedure was executed
cpu      = cpu time in seconds executing
elapsed  = elapsed time in seconds executing
disk     = number of physical reads of buffers from disk
query    = number of buffers gotten for consistent read
current  = number of buffers gotten in current mode
           (usually for update)
rows     = number of rows processed by the fetch or
           execute call
***********************************************************
...

***********************************************************

SELECT * FROM patient, doctor
WHERE patient.doctor_id = doctor.doctor_id

call   count    cpu  elapsed  disk  query  current   rows
------- ---- ------ -------- ----- ------ --------  -----
Parse      1   0.00     0.02     0      0        0      0
Execute    1   0.01     0.10     0      0        0      0
Fetch    322   0.01     0.03     5     31        3    322
------- ---- ------ -------- ----- ------ --------  -----
total    324   0.02     0.15     5     31        3    322

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 18
Rows     Execution Plan
-------  --------------------------------------------------
      0  SELECT STATEMENT   GOAL: CHOOSE
      0   NESTED LOOPS
      0    TABLE ACCESS (FULL) OF 'PATIENT'
      0    TABLE ACCESS (BY INDEX ROWID) OF 'DOCTOR'
      0     INDEX (RANGE SCAN) OF 'SYS_C00551' (NON-UNIQUE)
***********************************************************
...
***********************************************************

OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call    count    cpu  elapsed  disk  query  current    rows
------- -----  ----- -------- ----- ------ --------  ------
Parse       4   0.00     0.02     0      0        0       0
Execute     5   0.01     0.10     0      0        0       0
Fetch     322   0.01     0.03     5     31        3     330
------- -----  ----- -------- ----- ------ --------  ------
total     331   0.02     0.15     5     31        3     330

Misses in library cache during parse: 1

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call    count    cpu  elapsed  disk  query  current    rows
------- -----  ----- -------- ----- ------ --------  ------
Parse       0   0.00     0.00     0      0        0       0
Execute     0   0.00     0.00     0      0        0       0
Fetch       0   0.00     0.00     0      0        0       0
------- -----  ----- -------- ----- ------ --------  ------
total       0   0.00     0.00     0      0        0       0

Misses in library cache during parse: 0

    5  user  SQL statements in session.
    0  internal SQL statements in session.
    5  SQL statements in session.
***********************************************************
Trace file: c:\orant\rdbms80\trace\ora00280.trc
Trace file compatibility: 7.03.02
Sort options: default

       1  session in tracefile.
       5  user  SQL statements in trace file.
       0  internal SQL statements in trace file.
       5  SQL statements in trace file.
       5  unique SQL statements in trace file.
      65  lines in trace file.

Watch out for disk space limits when using the trace facilities

You must be careful before per-forming a trace to check the avail-able disk space on the machine where the trace file will reside. The file could grow tremendously in size because it captures everything you do, even logging off (and back on again). If you then process the file with TKPROF, you need the additional disk space to store the for-matted output.

Some key terms from the report that you should recognize are as follows:

parse is the step in the processing of a SQL statement in which the execution plan is developed, along with checks for valid syntax, object definitions, and user authorization.
execute is the step in the processing of INSERT, UPDATE, and DELETE statements in which the data is modified, and in a SELECT statement in which the rows are identified.
fetch is the step in the processing of a query in which rows are retrieved and returned to the application.
count is the number of times a parse, execute, or fetch step was performed on a statement.
cpu is the total amount of CPU time used for the parse, execute, or fetch step of a statement. The time is measured in seconds and reported to the nearest 1/100 second. Processing that completes in less than 1/100 second will be reported as zero.
elapsed is the total amount of elapsed (wall clock) time used for the parse, execute, or fetch step of a statement. The time is measured in seconds and reported to the nearest 1/100 second. Processing that completes in less than 1/100 second will be reported as zero.
disk is the total number of data blocks read from disk for the parse, execute, or fetch step of a statement.
query is the total number of buffers retrieved in consistent mode for the parse, execute, or fetch step of a statement. Consistent buffers are usually used for queries and may contain older copies of records for read consistency purposes.
current is the total number of buffers retrieved in current mode for the parse, execute, or fetch step of a statement. Current buffers are usually used for INSERT, UPDATE, and DELETE activities when the most up-to-date version of the data is required.
rows is the total number of rows processed by the execute or fetch step of a statement. Any rows processed by a subquery aren't included in this total.
internal SQL statements is a SQL statement executed by Oracle in addition to the statement being processed by the user to allow the user statement to complete. For example, a CREATE TABLE command will use recursive calls to reserve space for the initial extent(s), to define indexes for unique constraints, and similar actions. The statistics for these statements are totaled under the heading OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS.
library cache misses means that anytime an object definition is required but not available now in the library cache (part of the shared pool in the System Global Area), it's considered to be a library cache miss.

After you master the trace facility and can generate a formatted report, you should understand what it can tell you about your application code. Obviously, if you're running low on a system resource, such as read/write throughput or CPU cycles, you can use the sort option of TKPROF to help you identify the statements that consume most of these resources. Often, by tuning a few of the worst culprits among all the statements used by an application, you can solve your major resource problems.

You can also make some determinations about a statement's efficiency by looking at the specific statistics listed by TKPROF. Over time, you should easily be able to spot statements that consume more resources than the norm, or statements that appear to use more resources than similar statements require. Some specific indications of a poor-working statement include the following:

Don't jump to conclusions

At certain times, the statistics in a trace report may not indicate the real performance of the statements being monitored. Right after you start up the database, for example, you wouldn't see a lot of additional processing if the statement were executed after the contents of the System Global Area had stabilized. Also, running a statement, or a series of statements, in isolation from the normal work load on the system could bias the results- faster throughput because of less contention on table locks, but with more physical disk reads because the work of loading commonly used data into memory isn't being shared with other users running the same, or similar, statements.

A large number of blocks being accessed compared to the number of rows being processed. This generally means that tables are being scanned rather than have a usable index to get to the desired rows. Including the EXPLAIN PLAN output helps you determine if indexes are being underutilized.
A large number of parse counts, particularly if for the same user. This could mean that a cursor is being closed in the application that might be more usefully left open for reuse.
A row count in the execute column of a query's statistics, particularly if there's close to or exactly one row per execution. This indicates that an implicit cursor is being used in PL/SQL for single-row queries rather than an explicit cursor. This can cause additional client/server traffic because the implicit cursor has to send a query probe for what should be a non-existent row to set a return code.
Fetches equal, or nearly equal, to the number of rows returned. This is a problem in client/server environments because each fetch requires overhead that could be avoided by fetching the rows in batches.

Certain system-wide tuning problems can also be surmised from the output in a trace file. If the number of disk reads is close to the total number of buffers used (query plus current), for example, it's possible that the database buffer cache isn't large enough. Similarly, if the number of library cache misses is high, your shared pool might be too small.

Using AUTOTRACE

If you are comfortable with SQL*Plus for developing, testing, or tuning your SQL code, you can take advantage of AUTOTRACE. This option causes SQL*Plus to report analytical information after the successful execution of any INSERT, UPDATE, DELETE, or SELECT statement. The information reported is derived from the EXPLAIN PLAN utility and the trace utility, although you can control which elements you want to see. This allows you to see similar information interactively that you otherwise have to collect and format as a separate step. It can, therefore, significantly increase your productivity when you need to monitor the behavior of a particular statement or series of statements.

Restrictions on AUTOTRACE

AUTOTRACE isn't available when FIPS flagging is enabled, or with TRUSTED Oracle. Also, the formatting of your AUTOTRACE report may change if you upgrade your version of Oracle, and it might be influenced by the configuration of the server.

You control the behavior of AUTOTRACE with the SET AUTOTRACE SQL*Plus command. By itself, the SET AUTOTRACE command won't change the status of the session, but it will return the full syntax of the command, which looks like the following:

SET AUTOT[RACE] OFF | ON | TRACE[ONLY]
[EXP[LAIN]] [STAT[ISTICS]]

When you choose OFF, AUTOTRACE stops displaying a trace report. If you set it ON, a trace report will be displayed following the standard output produced by each traced statement. The TRACEONLY option will also display a trace report, but it doesn't print the data generated by a query, if any. The EXPLAIN option shows the query execution path by performing an EXPLAIN PLAN command but suppresses the statistical report. STATISTICS, the final option, will display the SQL statement statistics but will suppress the EXPLAIN option output.

If you use ON or TRACEONLY with no explicit options, the output defaults to EXPLAIN STATISTICS. You may find the TRACEONLY option to be useful to suppress the display of rows from large queries. If STATISTICS is specified with TRACEONLY, SQL*Plus still fetches the query data from the server even though the query data isn't displayed. Regardless of the options selected, the AUTOTRACE report is printed after the statement is successfully completed.

To use the EXPLAIN option, explicitly or by default, you must first create the table PLAN_TABLE in your schema. I recommend using the UTLXPLAN.SQL script to accomplish this to ensure that the version of AUTOTRACE and the table definition are compatible. As mentioned earlier, you can find this script in the admin subdirectory under your ORACLE_HOME directory.

To access STATISTICS data, you must have access to several dynamic performance views. The easiest way to handle the necessary privileges-particularly if you'll need to give a number of users access to AUTOTRACE-is to run the PLUSTRCE.SQL script. You can also find this script in the admin subdirectory under your ORACLE_HOME directory. This will create a role called PLUSTRACE and grant the necessary privileges to it. You must run PLUSTRCE.SQL as SYS and grant the PLUSTRACE role to users who will use SET AUTOTRACE.

When SQL*Plus produces a STATISTICS report, a second connection to the database is automatically created. This connection is closed when the STATISTICS option is set to OFF, or you log out of SQL*Plus.

Controlling the EXPLAIN Option Output

Each line of the execution plan has a sequential line number. SQL*Plus also displays the line number of the parent operation. Internally, the execution plan consists of seven columns, described in Table 16.6.

Table 16.6 Column definitions used by AUTOTRACE for EXPLAIN output

Column Name:	Description:	Displayed in Position:
`ID_PLUS_EXP`	Shows the line number of each execution step.	1
`PARENT_ID_PLUS_EXP`	Shows the relationship between each step and its parent.	2
`PLAN_PLUS_EXP`	Shows each step of the execution plan, including the operation and option (from Table 16.3), the object name, and, if using cost-based optimization, the cost and cardinality. Also includes the optimizer choice in the first row. For statements with parallel or remote steps, the bytes value is also included.	3
`OBJECT_NODE_PLUS_EXP`	Shows the database links or parallel server processes if they're used.	4
`ID_PLUS_EXP`	Shows the line number of each parallel or remote execution step.	5
`OTHER_TAG_PLUS_EXP`	Describes the function of the SQL statement in the `OTHER_PLUS EXP` column.	6
`OTHER_PLUS_EXP`	Shows the text of the query for the parallel server process or remote database.	7

Columns in EXPLAIN portion of the AUTOTRACE output

The first four columns-in other words, positions 1 through 3- appear in every execution plan. The last three columns, positions 5 through 7, appear only if the statement involves parallel or remote operations. Although column 4 appears in all execution plans, it's populated only when the same conditions are true that cause columns 5 through 7 to display; in all other cases, it has a value of NULL.

You can alter the display of any of these columns with the standard SQL*Plus COLUMN command. For example, to stop the PARENT_ID_PLUS_EXP column from being displayed, enter

COLUMN parent_id_plus_exp NOPRINT

The default formats can be found in the SQL*Plus site profile (for example, glogin.sql).

When you trace a statement in a parallel or distributed query, in general, the cost, cardinality, and bytes at each node represent cumulative results. For example, the cost of a join node accounts for not only the cost of completing the join operations, but also the entire cost of accessing the relations in that join. If any execution plan steps are marked with an asterisk (*), that denotes a parallel or remote operation. Each of these operations is explained in a separate part of the report, using the last of the three columns described in Table 16.6.

The Statistics Option Output

The statistics are recorded by the server when your statement executes and indicate the system resources required to execute your statement. The client referred to in the statistics is SQL*Plus. Net8 refers to the generic process communication between SQL*Plus and the server, whether or not Net8 is installed. You can't change the default format of the statistics report.

In Table 16.7, you can find the name and description of any of the reported statistics you may not understand.

Table 16.7 Statistics reported by AUTOTRACE

Name:	Description:	SQL Trace Equivalent Statistic:
Recursive calls	SQL statements executing on behalf of the user's statement	Internal SQL statements
DB block gets	Database blocks moved into a buffer and used as is	Current
Consistent gets	Database blocks reconstructed in a	Query
Physical reads	Blocks read from disk	Disk
Redo size	Bytes written to redo log buffer	N/A
Bytes sent via Net8 to client	Bytes sent from the database server to the client (SQL*Plus)	N/A
Bytes received via Net8 from client	Bytes sent from the client (SQL*Plus) to the database server	N/A
Net8 roundtrips	Messages sent between the database to/from client server and the client (SQL*Plus)	N/A
Sort (memory)	Sorts completed entirely in memory	N/A
Sort (disk)	Sorts completed by using temporary segments on disk	N/A
Rows processed	Rows retrieved (query) or processed (DML)	Rows (total)

AUTOTRACE statistics and database tuning

If many statements have high values for the Sort (disk) statistics, it could mean than the sort space allocated in your initialization file is too small. You may need to modify the parameters that control sort space. Similarly, the redo statistic can help you judge an appropriate size for the redo buffer latch parameters in your initialization file. Both topics are discussed in detail in Chapter 20, "Tuning Your Memory Structures and File Access."

SEE ALSO
Details on tuning the redo log buffer,
More about sort space utilization and balancing memory use and disk access for sorting,

You can use the same criteria to judge the efficacy of a statement from the statistics that equate to those discussed for the trace utility. The additional information from the AUTOTRACE statistics can help you judge whether a statement might be using up excessive bandwidth in a client/server environment (the Net8 statistics), or whether the statement is performing too many large sorts. You may need to examine the execution plan to determine whether any of the sorts could be reduced or even removed.

A Sample Session with AUTOTRACE

Listing 16.2 shows a sample SQL*Plus session using AUTOTRACE. The session uses two different queries: one performing a simple table join, the other a join on two tables defined with a non-zero degree of parallelism, causing parallel execution.

Listing 16.2 Tracing statements for performance statistics and query execution path

     01:     SQL> REM List statement currently stored in the
     02:     SQL> REM SQL*Plus buffer:
     03:     SQL> L
     04:     SQL> SELECT D.DNAME, E.ENAME, E.SAL, E.JOB
     05:      2  FROM EMP E, DEPT D
     06:      3  WHERE E.DEPTNO = D.DEPTNO
     07:     SQL> REM Turn on AUTOTRACE and execute statement in
     08:     SQL> REM buffer:
     09:     SQL> SET AUTOTRACE ON
     10:     SQL> /
     11:     
     12:     DNAME          ENAME             SAL JOB
     13:     -------------- ---------- ---------- ---------
     14:     ACCOUNTING     CLARK            2450 MANAGER
     15:     ACCOUNTING     KING             5000 PRESIDENT
     16:     ACCOUNTING     MILLER           1300 CLERK
     17:     RESEARCH       SMITH             800 CLERK
     18:     RESEARCH       ADAMS            1100 CLERK
     19:     RESEARCH       FORD             3000 ANALYST
     20:     RESEARCH       SCOTT            3000 ANALYST
     21:     RESEARCH       JONES            2975 MANAGER
     22:     SALES          ALLEN            1600 SALESMAN
     23:     SALES          BLAKE            2850 MANAGER
     24:     SALES          MARTIN           1250 SALESMAN
     25:     SALES          JAMES             950 CLERK
     26:     SALES          TURNER           1500 SALESMAN
     27:     SALES          WARD             1250 SALESMAN
     28:     
     29:     14 rows selected.
     30:     
     31:     Execution Plan
     32:     -------------------------------------------------------
     33:      0      SELECT STATEMENT Optimizer=CHOOSE
     34:      1    0   MERGE JOIN
     35:      2    1     SORT (JOIN)
     36:      3    2       TABLE ACCESS (FULL) OF 'DEPT'
     37:      4    1     SORT (JOIN)
     38:      5    4       TABLE ACCESS (FULL) OF 'EMP'
     39:     
     40:     Statistics
     41:     -------------------------------------------------------
     42:            148  recursive calls
     43:              4  db block gets
     44:             24  consistent gets
     45:              6  physical reads
     46:             43  redo size
     47:            591  bytes sent via SQL*Net to client
     48:            256  bytes received via SQL*Net from client
     49:              3  SQL*Net roundtrips to/from client
     50:              2  sort (memory)
     51:              0  sort (disk)
     52:             14  rows processed
     53:     
     54:     SQL> REM Create two tables with parallel clauses,
     55:     SQL> REM and add an index:
     56:     SQL> CREATE TABLE testtab1 (testcol1 NUMBER)
     57:       2  PARALLEL (DEGREE 6);
     58:     
     59:     Table created.
     60:     
     61:     SQL> CREATE TABLE testtab2 (testcol1 NUMBER)
     62:       2  PARALLEL (DEGREE 6);
     63:     
     64:     Table created.
     65:     
     66:     SQL> CREATE UNIQUE INDEX testtabl_col1_ix
     67:       2  ON testtab1(testcol1);
     68:     
     69:     Index created.
     70:     
     71:     SQL> REM Prepare to handle long fields in execution plan,
     72:     SQL> REM turn off query output and statistics reporting:
     73:     SQL> SET LONG 500 LONGCHUNKSIZE 500
     74:     SQL> SET AUTOTRACE ON EXPLAIN
     75:     SQL>
     76:     SQL> REM Define and execute a query against the two
     77:     SQL> new tables:
     78:     SQL> SELECT /*+ INDEX(B,testtab1_col1_ix) USE_NL  -
     79:     > ORDERED */ COUNT (A.testcol1)
     80:       2  FROM testtab2 A, testtab1 B
     81:       3  WHERE A.testcol1 = B.testcol1;
     82:     
     83:     Execution Plan
     84:     -------------------------------------------------------
     85:       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1
     86:                               Card=263 Bytes=5786)
     87:       1    0   SORT (AGGREGATE)
     88:       2    1     NESTED LOOPS* (Cost=1 Card=263 Bytes=5785)
     89:                                                      :Q8200
     90:       3    2     TABLE ACCESS* (FULL) OF 'TESTTAB2'  :Q8200
     91:       4    2     INDEX* (UNIQUE SCAN) OF 'TESTTAB1_COL1_IX'
     92:                         (UNIQUE)                     :Q8200
     93:       2 PARALLEL_TO_SERIAL  SELECT /*+ ORDERED NO_EXPAND
     94:                             USE_NL(A2) INDEX(A2) PIV_SSF */
     95:                             COUNT(A1.C0) FROM (SELECT/*+
     96:                             ROWID(A3) */ A3."TESTCOL1" FROM
     97:                             "TESTTAB2" A3 WHERE ROWID BETW
     98:                             BETWEEN :1 AND :2) A1,
     99:                             "TESTTAB1" A2 WHERE A1.CO=
     100:                       A2."TESTCOL1"
     101:  3 PARALLEL_COMBINED_WITH_PARENT
     102:  4 PARALLEL_COMBINED_WITH_PARENT

Asterisks point to parallel and remote operations

Notice the asterisks next to the NESTED LOOPS, TABLE ACCESS, and INDEX operations (execution steps 2, 3, and 4 respectively in lines 88, 90, and 91). These indicate that the operation is described in more detail later on-in this case, in the lines beginning with the word PARALLEL, associated with these same operation numbers (lines 93, 101, and 102).

Notice in Listing 16.2 that the results (line 12) of the initial query (line 4) are displayed along with the execution plan (line 31) and statistics (line 40) from AUTOTRACE due to the choice of the ON option only (line 9). In the second query, on the other hand, the query results and the AUTOTRACE statistics aren't shown because of the TRACEONLY and EXPLAIN options (line 74).

You also can see that the CREATE commands (lines 56, 61, and 66) don't cause any output to be generated by AUTOTRACE, and that the execution plan (line 83) for the second query (line 78) is generated even though there are no rows in either table involved. In this second query, you can also see asterisks, indicating operations performed in parallel, on steps 2, 3, and 4 in the first section of the execution plan. For each of these marked steps, there's a further explanation in the second part of the execution plan output, using the same step numbers to identify them. Here, the nature of the parallel operation is described in more detail, including an entry corresponding to the OTHER_TAG column in the plan table, and a copy of the statement passed to the parallel server processes.

PART_NUMBER	PART_NAME	…
10-1-AA	12-8HY-U-87	…
10-1-AA	9JD7-RT-9	…
10-1-AA	LK-LG-55624	…
…

^/1/	Method available only when using cost-based optimization.
^/2/	Any of the other options can be used to join the dimension tables and join that result to the fact table.