'; zhtm += ''; zhtm += '

' + pPage + ''; zhtm += ''; figDoc.write(zhtm); figDoc.close(); popUpWin.focus(); } //--> Using Oracle8 -- Chapter 16 - Using Optimizers and the Analytic and Diagnostic Tools

Using Oracle8


Chapter 16

Using Optimizers and the Analytic and Diagnostic Tools


Functions of the Optimizer

The Oracle optimizer is responsible for identifying the best means to execute any SQL statement that needs to be processed. This includes a number of tasks:

Why the Optimizer simplifies expressions and statements
You can write a SQL statement to perform a specific task in many ways, the variety increasing as the numbers of conditions and different tables increase. However, you have a very limited number of choices of retrieval paths to a table's rows and to methods for joining multiple tables. The number of possible matches that must be checked is reduced if statements can be converted to a common set of structures.

Simplifying and Transforming Statements

The optimizer automatically simplifies certain common constructs in SQL statements if the result will simplify the execution of the statement. Such conversions can range from the very simple, such as simplifying the expression 2000/10 to the integer 200, to transforming a statement with an OR operation into a compound query with two component queries. The former type of simplification will always be done; the latter will depend on whether there are indexes on the columns in the original WHERE clause and which optimizer approach is being used.

Many different types of transformations can occur, including, but not limited to, the following:

The Oracle8 Server Concepts Manual contains detailed explanations of these and all other possible statement simplifications and transformations. You should be aware of them mainly to understand why the optimizer may sometimes choose an execution plan that doesn't appear to be appropriate for the structure of the statement being processed.

Choosing a Rule-Based or Cost-Based Optimization Approach

Oracle chooses an optimization approach for each statement based on several criteria:

  1. Referenced objects having a defined degree of parallelism
  2. Hints in the statement
  3. Session setting of OPTIMIZER_GOAL
  4. Value of initialization parameter OPTIMIZER_MODE
  5. Existence of statistics on referenced objects
SEE ALSO
Find out more about database parameters and the initialization file,

Figure 16.1 shows how the optimizer uses these criteria to determine whether it uses rule-based or cost-based optimization. The first factor in choosing the optimization approach, when examining a statement to be processed, is to see if it can be executed in parallel. This includes determining if there will be at least one full table scan and if any of the objects referenced by the statement were defined with the PARALLEL option. If both conditions are true, Oracle will use cost-based optimization to create an execution plan containing parallel steps.

Figure 16.1 : Oracle examines many factors to determine whether the optimization approach should be rule-based (RBO) or cost-based (CBO).

Optimization for parallel processing and statements with hints
Oracle must use cost-based optimization to develop execution plans for statements that require parallel server processes, because the rule-based optimization approach has never been upgraded to handle parallel execution. Similarly, the only optimization option that can interpret hints is cost-based. Unless the only hint is RULE, Oracle uses cost-based optimization for any statement with one or more hints.

For statements that can't be executed, at least partially, in parallel, Oracle looks to see whether the statement contains a hint. If any hint exists (other than the RULE hint), cost-based optimization will be used to process the statement.

For statements that don't require cost-based optimization because of parallel execution steps or the presence of hints, Oracle checks to see if the session has defined an optimization choice for itself. The command

ALTER SESSION SET OPTIMIZER_GOAL = goal
Where you can use the RULE keyword
RULE is the option name (as well as the name of the optimizer approach) used in three different types of syntax to denote the type of optimization required. It's used in hints, in the ALTER SESSION command, and in the database initialization file. It's the same RULE in each case, but in the first situation, it is known as a RULE hint; in the second, it's a RULE goal (because the command sets the OPTIMIZER_GOAL parameter), and in the last case, it's the RULE mode (the parameter name is OPTIMIZER_MODE).

allows you to choose from one of the four optimization goals: FIRST_ROWS, ALL_ROWS, RULE, and CHOOSE. The first two options cause Oracle to use cost-based optimization, FIRST_ROWS causing it to find the execution plan that will return the first row as quickly as possible and ALL_ROWS optimizing the overall response time of a statement. The RULE goal always causes rule-based optimization to be used for the session by default. The final option, CHOOSE, as you can see from Figure 16.1, can cause rule-based or cost-optimization to be selected, based on the existence of statistics.

The statistics used to decide which optimization approach will be selected are collected with the ANALYZE command (discussed in the "Collecting Statistics for the Cost-Based Optimizer" section later in this chapter). When Oracle needs to optimize a statement running in a session with its optimizer goal set to CHOOSE, it looks in the data dictionary to see if any one of the segments referenced in the statement have statistics. If statistics are found, the statement will be processed with cost-based optimization, which uses these statistics to help develop its execution plan. If there are no statistics on all the segments to be processed, rule-based optimization is used.

When Oracle has no other indications as to which optimization approach to take, it uses the value assigned to the initialization parameter, OPTIMIZER_MODE. The same four values-FIRST_ROWS, ALL_ROWS, RULE, and CHOOSE-can be assigned to this parameter; they act in the same way as discussed for the ALTER SESSION command. By default, the parameter is set to CHOOSE, which means that the optimization approach chosen for any given statement will depend on the status of the statistics in the data dictionary for the segments processed by the statement. It also means that the optimization approach can change if statistics are added or dropped over time.

Choosing FIRST_ROWS versus ALL_ROWS
The FIRST_ROWS option is best used for statements executed in interactive applications, because users of such applications are typically waiting for responses from the system as soon as they initiate a process. Even if the overall execution time isn't minimized, a user can probably begin doing useful work with the first row of data returned, so the delay while the remainder of the data is processed isn't detrimental. ALL_ROWS should be used when the statement needs to execute as quickly as possible. You should always choose this option when initializing the optimizer for a batch program or for a program that may otherwise have unacceptably poor response time.

Data Access Paths

Table 16.1 lists the access paths available to reach the required rows in a table. The Rank column is included for a later discussion about rule-based optimization.

Table 16.1  Optional data access paths to be evaluated by optimizer
Access Path:
Description:
Rank:
Bitmap Index ScanAccesses via a bitmap index entry Not ranked
Fast Full Index ScanPerforms a full scan on the index entries rather than on the table Not ranked
Single Row by ROWIDUses the rowid as provided by a current cursor or a WHERE clause with a rowid value
 1
Single Row Cluster JoinReturns only a single row from two or more tables in a cluster with a join condition on the cluster key
 2
Single Row Hash ClusterReturns a single row from a hash cluster when the WHERE clause identifies the complete hash key, which is also a unique or primary key
 3
Single Row by KeyReturns a single row from a table when the WHERE clause identifies all columns in a unique or primary key
 4
Clustered JoinReturns one or more rows from two or more tables in a cluster with a join condition on the cluster key
 5
Hash Cluster KeyReturns one or more rows via the cluster-key value
 6
Index Cluster KeyReturns one or more rows via the cluster-key value
 7
Composite IndexReturns one or more rows when all columns of a composite index are referenced
 8
Single-Column Index(es)Uses one or more single-column indexes
 9
Bounded Range Index SearchUses a single-column index, or the leading column(s) of a composite index, to find values in a bounded range (with a lower and an upper value)
10
Unbounded Range Index SearchUses a single-column index, or the leading column(s) of a composite index, to find values in an unbounded range (with a lower or an upper value, but not both)
11
Sort-Merge JoinJoin of two tables via a join column when the tables aren't clustered together
12
MAX or MIN of Indexed Column Returns the column maximum or minimum value from an index if the column is indexed by itself or is the leading column of a composite index, if the query has no WHERE clause, and if no other column is named in the SELECT clause
13
ORDER BY on Indexed ColumnUses single column index or leading column of a composite index to find rowids of table rows in order when column is guaranteed not to contain NULLs
14
Full Table ScanReads rows directly from a table
15

Rank column used in rule-based optimization only
The two access paths that show the value "Not ranked" can be used only by the cost-based optimization approach; therefore, they have no rank value for rule-based optimization.

Using an index to find column minimum or maximum values
An index provides a convenient access path for a maximum or minimum value of a column because the entries are sorted from least (the first entry is the minimum) to greatest (the last entry is the maximum). If the query needs other columns or has other restrictions on which rows are required, this retrieval path is inappropriate because it can't identify any other rows that must be considered. Also, the index can be used only if it's the leading column of a composite index or is the only column in the index.

Table Join Options

A table join occurs when the rows from a table are combined with the rows from another table, or even with the rows from itself. The latter, known as a self-join, is used when the value in one column needs to be matched with the same value in another column. Joins typically involve matching a value in a column, or set of columns, in one table with a corresponding value, or set of values, in the other table. When a value in each table must match, the resulting join is known as an equijoin. If the join condition is based on an inequality between the columns, the join is called a non-equijoin. Other join options are Cartesian products, which occur when there's no controlling condition and every row in one table is joined to every row in the other table, and outer joins, which include all rows from one table even if there's no matching value on the join column(s).

Consider a table in a manufacturing application. The application records information about individually manufactured parts as well as assemblies-that is, combinations of parts. For example, this book you're reading is an assembly of pages (one type of part) and a set of covers (another type of part). The series, of which this book is a part, is an assembly of the individual books that comprise it. A database table containing assembly information might include, among others, the columns ASSEMBLY_ID and PART_ID, where PART_ID contains a part identification value and ASSEMBLY_ID is the code number for an assembly that contains one or more parts. If we selected the rows for assembly ID 10-1-AA, we might see the following:

PART_NUMBER PART_NAME
10-1-AA 12-8HY-U-87
10-1-AA 9JD7-RT-9
10-1-AA LK-LG-55624
 
 

Left and right joins
Outer joins, when a table having no matching data has its rows included in the result set anyway, are some-times referred to as left and right joins. Depending on whether the join condition lists the column of the non-matched table on the left or right side of the WHERE condition, the join is considered a left or a right join. Although Oracle doesn't allow left and right joins in a single statement, it will allow a view based on a left join to be included in query with a right join, and vice versa.

If we really wanted to see the part names as well as the part numbers for the parts that comprise assembly 10-1-AA, we would need to code a self-join:

SELECT a.part_number, p.part_number, p.part_name
FROM assemblies a, assemblies b
WHERE a.part_number = '10-1-AA'
AND a.part_name = p.part_number;

Oracle performs joins in a number of different ways, as summarized in Table 16.2.

Table 16.2  Oracle chooses a join method from among a number of options
Join Operation:
Characteristics:
Nested LoopsFor each row retrieved from the driving table, looks for rows in the driven table
Sort-MergeSorts the rows from both tables in order of the join column values and merges the resulting sorted sets
Cluster JoinFor each row retrieved from the driving table, looks for matching rows in the driven table on the same block
Hash Join/1/ Builds a hash table from the rows in the driving table and uses the same hash formula on each row of the driven table to find matches
Star Query/1/, /2/ Creates a Cartesian product of the dimension tables and merges the result set with the fact table
Star Transformation/1/,/2/ Uses bitmap indexes on the dimension tables to build a bitmap index access to the fact table

/1/Method available only when using cost-based optimization.
/2/Any of the other options can be used to join the dimension tables and join that result to the fact table.

When two tables need to be joined, the optimizer evaluates the methods as well as the order in which the tables should be joined. The table accessed first is the driving table; the one accessed next is the driven table. For joins involving multiple tables, there's a primary driving table, and the remaining tables are driven by the results obtained from the previous join results. Two situations will always cause the optimizer to select a specific table order when performing table joins:

Using Rule-Based Optimization

Although Oracle8 still supports the rule-based optimizer, Oracle Corporation strongly encourages you to migrate to cost-based optimization. The rule-based optimizer won't be included in later releases of the database, although it isn't clear just when this will be. Support is being maintained to allow customers time to complete the transfer, tuning, and implementation of their applications-in-house and third party-to cost-based optimization.

Features not available to the rule-based optimizer
Some features introduced in recent releases can't be used by the rule-based optimizer. These features include partitioned tables, index-only tables, reverse-key indexes, bitmap indexes, parallel queries, hash joins, star joins, star transformations, histograms, and fast full index scans. When the feature is an aspect of an object, such as a reverse-key index, the rule-based optimizer will act as though the object weren't avail-able for use. In cases where there's no choice but to use the feature (such as a index-only table), Oracle will automatically use the cost-based optimizer. Other optional features-such as having a default degree of parallelism on a table-will cause Oracle to use the cost-based optimizer to take advantage of the feature.

If you're still using rule-based optimization, you should be planning your strategy to convert to the cost-based approach. You can't take advantage of many new features of the database while using the rule-based optimizer, and you may find that you can improve performance by using the newer optimizer without having to implement these new features, should you not have the resources to investigate them. Beginning, or at least anticipating, the conversion now, before there's a concrete deadline you have to meet, should help you realize a better product overall.

If you haven't set the value of the OPTIMIZER_MODE parameter in your initialization file and haven't executed the ANALYZE command to collect statistics for any of the tables, indexes, or clusters used in applications, your applications are probably running against the rule-based optimizer. However, you can't guarantee this because

The rule-based optimization approach is so named because it follows a standard set of tests when determining what access path to use to obtain the rows required for each step of a statement's execution. Table 16.1 earlier in this chapter shows the possible access paths to a table and includes a rank number to show which approaches are preferred. During rule-based optimization, the table is tested to see if it can be accessed by each access path in turn, beginning with the rank 1 option, and the first possible path is chosen as the access path.

A good reason to continue using rule-based optimization
Although cost-based optimization is becoming the preferred approach, you shouldn't abandon the rule-based approach, if that is what you have been using, without due consideration. Poor performance can result if your database is using cost-based optimization without any statistical information from which to derive good execution plans. Statistics that are no longer current can also be a detriment to cost-based optimization.

If the statement requires a table join, the rule-based approach uses an algorithm to determine the two key elements of the join: first, which will be the driving table and which the driven table; and second, which join method will be used. The rules of this algorithm are as follows:

Changing the execution plan under rule-based optimization
If you don't think the rules will generate the best execution plan for a given statement under rule-based optimization, you can try to improve it. For instance, to stop the optimizer by using an index, you can modify the reference to the column in the WHERE clause by adding a NULL or zero-length string for character columns (such as USERNAME || ''), or a zero for numeric columns (such as CUSTOMER_ ID + 0). This won't change the results returned but will prevent use of the index. To force the use of an index that's being ignored, you may have to rewrite the statement to avoid modifying the column reference, such as removing functions such as UPPER.

Using Cost-Based Optimization

As mentioned in the previous section, the cost-based approach will eventually be the only optimization approach available. Meanwhile, it will continue to support new database features that the rule-based approach can't handle. You should plan to convert your applications-if you haven't already-to run under cost-based optimization.

In Figure 16.1, you can see that you can invoke the cost-based optimization in a number of ways. You can do it directly with the FIRST_ROWS or ALL_ROWS setting in the initialization file or in an ALTER SESSION command; you can do it less directly by defining a default degree of parallelism on a segment being accessed in a statement, or by including a hint (other than RULE) in a statement. If you allow Oracle to use its default behavior to select an optimization approach (which means using the CHOOSE option for the OPTIMIZER_MODE parameter), the choice will be based on the existence or non-existence of statistics on the segments referenced in the statement.

The reason for the optimizer approach to depend on statistics is quite reasonable. Cost-based optimization depends on these statistics to compute the relative costs of performing different execution plans in order to choose the most efficient. If the statistics aren't stored, the likelihood that the optimizer will choose a good plan is reduced significantly, and Oracle would prefer to rely on the rule-based approach.

Of course, if you force the optimizer approach to be cost-based (with a hint, for example), it will work-albeit poorly-in the absence of statistics. It may also perform less than optimally if the statistics it uses are stale and no longer reflect the true nature of the segment they're meant to describe. It's therefore essential that you be prepared to maintain current statistics on the database segments if you want to use cost-based optimization. If you fail to do that, your application developers-and even end users who access the database directly-will have to include hints in many of their statements to ensure that a reasonable execution plan is used.

Statistics are generated and maintained with the ANALYZE command. The syntax for the options related to cost-based statistics is as follows; the other options are covered in other chapters where they are relevant.

ANALYZE TABLE|INDEX|CLUSTER
    [schema.]table_name|index_name|cluster_name
    [PARTITION (partition_name)]
    COMPUTE|ESTIMATE|DELETE STATISTICS
    [table_clause][,...]
    [SAMPLE integer [ROWS|PERCENT]
What you should analyze
The COMPUTE option of the ANALYZE command generally consumes more resources than the ESTIMATE option and, consequently, can have a greater impact on your application performance. You should compare the results from both options, using different estimated sample sizes, to decide if you really need to incur the extra over-head of computing exact statistics. When analyzing tables with associated indexes, you also can reduce the work performed by the database by estimating the table statistics and then computing exact values on the indexes individually. This will provide the optimizer with the best statistics when accessing the data via the indexes, which is how the data should be retrieved if the indexes are doing their job.

where


Specifies that the command will create table statistics only; no column or index statistics will be generated
Specifies that the command will create histogram statistics on every column
Specifies maximum number of buckets in the histogram; default value is 75 if option isn't included
Specifies that the command will create histogram statistics only on indexed columns
Specifies that the command will create histogram statistics on the named column(s) or object scalar type(s)
Specifies that the command will create statistics on every indexed column, but not on the table
Specifies that the command will create statistics on every local index partition; must be included if FOR ALL INDEXES and PARTITION options are specified

The TABLE options that create histograms should be used if your table has a very uneven distribution of values in columns used for retrieval. When different values are stored in a column, the optimizer assumes that they will each appear about the same number of times. If some of the values occur only rarely and one or two of the others occur in a large proportion of the records, this assumption may not lead to a good execution plan. The frequently occurring values should be accessed by a full table scan, whereas the infrequently appearing values would be best retrieved via an index.

By building a histogram, you provide the optimizer with the information it needs to distinguish between these two types of values and assist it in building a good execution plan. The number of buckets, or partitions, in the histogram determines how finely the different values are distinguished. The more buckets, the greater the chance that the histogram will show the frequency of occurrence of any specific value in the column. If you need to isolate only one or two disproportionately occurring values, however, you need fewer buckets.

You can use the ANALYZE command to recalculate statistics any time you want without having to delete the old ones first. You should plan to perform re-analysis on a regular basis if the segment changes frequently.

Keeping statistics current
You should monitor the statistics on your database segments to make sure that they stay current. I recommend that you begin by executing the ANALYZE command and recording the statistics from the related view: DBA_TABLES, DBA_INDEXES, or DBA_ CLUSTERS. Re-execute the ANALYZE command a month later and compare the new statistical values; if they're close in value to the previous month's, you shouldn't need to perform another analysis for a few more months. If the statistics are very different, you may need to check again in a week. If they're somewhat different, you should plan to re-analyze the table every month. Over time, you should develop a sense of how frequently each different segment needs to be analyzed. You may need to run a program once a week, once a month, or at some other fixed interval. Your program may analyze just a few segments each time it's run, with additional segments every other time, more every third or fourth time, and so on.

When a statement is processed with cost-based optimization, the execution plan will include the table selection access paths and join methods based on the lowest estimated costs. These costs take into account the number of Oracle blocks that have to be manipulated, the number of reads that may need to occur to retrieve these blocks from disk into memory, the amount of additional memory that may be needed to process the data (such as space to complete sorts or hash joins), and the cost of moving data across any networks.

If you've built your database objects with application schemas-that is, where all the objects belonging to an application are owned by the same user-you can simplify the task of collecting statistics for cost-based optimization. Oracle provides a procedure, ANALYZE_SCHEMA, in its DBMS_UTILITY package, which will run the ANALYZE command for you against every segment in a named schema. If you haven't already done so, you need to execute the CATPROC.SQL script, which you can find in the admin subdirectory of your ORACLE_HOME directory, as SYS to build the necessary PL/SQL structures. You can then execute the required procedure by using SQL*Plus's EXECUTE command or by creating your own PL/SQL routine to run the procedure. The SQL*Plus EXECUTE command would look like this:

EXECUTE dbms_utility.analyze_schema('&username','&option',&rows,&pct)

SEE ALSO
Information about the various Oracle-supplied SQL scripts mentioned in this chapter,

You would substitute the name of the schema holding the segments you want to analyze at the username prompt; the COMPUTE, ESTIMATE, or DELETE keyword at the option prompt; and a number, the keyword NULL, or an empty string ('') for the rows and pct prompts. The last two options are relevant only for the ESTIMATE option, and any values provided are ignored for other options. They indicate the number of rows or the proportion of the table to be included in the sample respectively. If you don't provide a number for either, or set both to zero, the sample uses the default number of rows (1,064). If you provide a number for both, the value for rows is used unless it's zero, in which case the percentage sample size is used.

The statistics collected with the ANALYZE command are used in computing these costs. In cases where the cost-based optimizer is being used for a statement that references one or more-or even all-segments that have no statistics available, it still has to evaluate the potential costs of different execution plans. To do this, it uses basic information from the data dictionary and estimates the missing values. Naturally, the results aren't as accurate as they would be with current statistics collected with the ANALYZE command.

Using Hints to Influence Execution Plans

To overcome poor execution plans-due to missing or out-of-date statistics, or even due to unusual distribution of data in a table or index not anticipated by the optimizer-you can include hints in a statement. Hints are similar to the "tweaks" I suggested you can use to try to modify the behavior of rule-based execution plans, but they're more sophisticated and give you a much wider range of options.

Table names in hints
If you use a table alias in the FROM clause of a statement, you must also use that alias in the hint string when referencing the table. Your statement won't fail if you fail to do this, but the hint will be treated as comment text and won't be acted on as you expected.

Oracle publishes a complete list of available hints with descriptions of what they do and how to use them in the Oracle8 Server Tuning manual, so I won't reproduce that data here. I do include the details required to include a hint in a statement, as this can be confusing: