









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Paper; Class: Database Systems; Subject: InfSystemsTechnologyManagement; University: George Washington University; Term: Spring 2007;
Typology: Papers
1 / 16
This page cannot be seen from the preview
Don't miss anything!
An Oracle Technical White Paper
June 1997
Oracle provides multiple indexing schemes to provide efficient and quick access to data. Oracle has developed different types of indexes to provide the optimal performance when accessing data in tables. Developers or database administrators must select the indexing strategies which are best for the table they are using in an application. It is common to have more than one index on a table so that performance can be optimized for a variety of queries or access paths.
Oracle offers several types of indexing and storage options to provide optimal performance, manageability, and availability benefits. This paper is intended to be a guide to determine which index strategies are appropriate for particular requirements.
General requirements of systems vary by application. In data warehouse applications, query performance and the management of large amounts of data is very important. In on-line transactional processing (OLTP) applications, transaction and query performance are important, along with high availability of the application. In applications that manage unstructured data, such as information retrieval systems, the ability to manage and quickly access this complex, unstructured data is important. Oracle8 offers a variety of indexing strategies that support these requirements in existing and emerging applications.
Oracle8 provides the richest set of indexing functionality of any DBMS on the market. By fully integrating this technology with the optimizer in Oracle8, very fast access to data is transparently assured in nearly all application situations. The optimizer is aware of the various indexing methods used on a table, and will select the index that is most appropriate and efficient for selecting the data requested.
Oracle8 offers several different types of indexes and data storage options that provide performance, availability, and manageability benefits. In this paper, the following topics are discussed:
B-tree indexes and storage alternatives to maximize performance Bitmapped indexes to improve data warehouse query performance Partitioning indexes to improve index availability, performance, and
manageability Index-organized tables for high performance and reduced storage
Indexes are optional database structures associated with tables and are created to speed access to data within a table. Just as an index in the back of a book helps a reader locate information faster than looking through each page of the book, an index on a table provides a faster access path to table data. In addition to providing fast access to table data, indexes can also be used to enforce uniqueness of column values in a table, such as an order identifier.
Indexes are transparent to applications and users from the standpoint that the presence, or absence, of an index does not require a change in the wording of any SQL statement. An index is merely a fast-access path to the data; it affects only the speed of execution. Given a data value that has been indexed, the index points directly to the location of the rows containing that value.
Since applications and users access data in a variety of ways, multiple indexes can be created on a table. The Oracle optimizer will choose the "best" index to use in the case where more than one index is available for accessing requested data.
Oracle automatically maintains and uses indexes once they are created. Oracle automatically reflects changes to data, such as adding new rows, updating rows, or deleting rows, in all relevant indexes with no additional action required by users.
Retrieval performance of indexed data remains almost constant, even as new rows are inserted. However, the presence of many indexes on a table decreases the performance of updates, deletes, and inserts because Oracle must also update the indexes associated with the table.
Index Types
From a user or application view, indexes transparently provide fast access to table data. From a DBAs view, indexes are physical objects that provide the mechanism to access data quickly and also provide database integrity. Oracle offers DBAs two choices of index types, each physically stored in different structures and each targeted to benefit certain application requirements. These two types are B-tree indexes and bitmapped indexes. A table can also be created as an index-organized table, or as a clustered table. These are not indexes, but provide some of the functionality of indexes. Both are discussed later in this document.
B-tree indexes are the most common index type used in Oracle applications and provide excellent levels of functionality and performance. Used in both OLTP and data warehouse applications, they speed access to table data when users execute queries with varying criteria, such as equality conditions and range conditions.
Bitmapped indexes are specifically designed to improve performance of queries in data warehouse applications where there are large amounts of data but a low level of on-line transaction activity.
When to Create Indexes
B-tree indexes improve the performance of queries that select a small percentage of rows from a table. As a general guideline, an index should be created on tables that are often queried for less than a few percent (i.e. between two and four is a general guideline). This value may be higher in situations where all data can be retrieved from an index, or where the indexed columns can be used for joining to other tables.
A B-tree index can dramatically improve response times when an application or user selects data where:
A key has a specific value (^) Values are unique or close to unique Rows are within a range of values, such as a date range
There are certain instances when a B-tree index is not appropriate and will not improve performance of queries. In many of these instances, such as a column in a data warehouse with relatively few distinct values, a bitmapped index can be created to dramatically improve performance. The benefits of a B-tree index and examples are discussed later in this document.
Composite Indexes
A composite index (also called a concatenated index) is an index that is created on multiple columns in a table. Columns in a composite index can appear in any order, although the order does impact on whether or not the complete index is used in queries.
Composite indexes can speed retrieval of data in queries in which the WHERE clause references all or the leading portion of the columns in the composite index. Therefore, it is important to give some thought to the order of the columns used in the definition; generally, the most commonly accessed or most selective columns should be ordered first, or as the "leading edge" of the index.
A typical example of a composite index would be an index on a SALES_LINEITEM table, where the columns for the VENDOR_ID and the PART_NO number are indexed. The index would be fully utilized if the query criteria contained a VENDOR_ID and a PART_NO. If a query was issued that only contained the VENDOR_ID as part of the criteria, the index will still be used to quickly find matching vendors.
How to Choose Columns to Index
Follow these guidelines for choosing columns to index:
Consider indexing columns that are used frequently in WHERE clauses. Consider indexing columns that are used frequently to join tables in SQL statements. Only index columns with good selectivity. The selectivity of an index is the percentage of rows in a table having the same value for the indexed column. An
index’s selectivity is good if few rows have the same value.
As stated above, if a query requests a data value that is indexed, and that index is used in the access plan, the query can be satisfied via the index, without having to access the table. This should be considered when designing an index. In certain instances, it may be beneficial to index an additional column that is frequently returned, even though it is not used in the WHERE criteria. By indexing this column, the query can be satisfied solely from the index access, saving the performance overhead of following the ROWID pointer back to the table. In this case, the additional storage overhead may be worth the performance improvements in certain queries.
It is important to note that standard B-tree indexes should not be created on columns with few distinct values, frequently updated columns, or columns that appear in WHERE clauses with functions operating on them. Reasons for this are:
Columns with few distinct values have poor selectivity and will degrade performance since the index will be accessed very often and will point back to the table to retrieve the data. In this case, a full table scan, or access without an index, will be much faster since the full table scan can be done with multi-block reads, and without the overhead of reading the index and then reading the table data. For example, in a column with low selectivity, such as a column with a ‘yes’ or ‘no’ indicator, a standard B-tree index would not be efficient for the above reason, but a bitmap index, discussed later, may be very effective. Columns that are frequently modified should not be indexed. UPDATE statements that modify indexed columns and INSERT and DELETE statements that modify indexed tables take longer than if there were no index. Such SQL statements must modify data in indexes as well as data in tables. Columns that only appear in WHERE clauses with functions or operators should not be indexed. A WHERE clause that uses a function (other than MIN or MAX) or an operator with an indexed column will prevent an index from being accessed. (^) Unique and Non-Unique Indexes.
Indexes can be created as unique or non-unique. Unique indexes guarantee that no two rows of a table have duplicate values in the columns that define the index. Non- unique indexes do not impose this restriction on the column values.
Oracle recommends that unique indexes not be explicitly defined on tables; uniqueness is strictly a logical concept and should be associated with the definition of a table. Instead, define UNIQUE integrity constraints on the desired columns. Oracle enforces UNIQUE integrity constraints by automatically defining a unique index on the unique key.
A non-unique index can be used to enforce uniqueness. This is beneficial when a non- unique index already exists for a table and a primary key constraint is later added with the same columns of the existing non-unique index. In this case, the non-unique index can be used to enforce the uniqueness, rather than having to drop and rebuild the index with the UNIQUE parameter. This is especially useful in data warehouse environments where very large amounts of data exist in tables, making dropping and recreating an index impractical.
Before an index can be used to improve query performance, the index needs to be created, which in most instances is a significant and resource-intensive operation. Oracle provides the functionality to create, or if the need arises, re-create, an index as efficiently as possible. To create or re-create an index, the operation can be run in parallel, meaning that multiple processes can work simultaneously to complete the operation. Since the table must be scanned to initially create the index and sometimes to re-create an index, Oracle’s parallel query functionality is utilized to quickly scan the table data and create the index.
Oracle8 offers the functionality to partition an index. There are several advantages to partitioning an index, all of which are described later in this document. One such advantage is the ability to maintain index partitions independent of the other index partitions. This allows DBAs to run maintenance operations, such as create or drop, at the partition level, improving the performance and availability of indexes.
DBAs may wish to recreate an index in order to compact it and clean up fragmented space, or to change the index’s storage characteristics. When creating a new index which is a subset of an existing index, or when rebuilding an existing index with new storage characteristics, Oracle can use the existing index instead of the base table to improve performance.
Consider, for example, a table named CUST with columns NAME, CUSTID, PHONE, ADDR, BALANCE, and an index named I_CUST_CUSTINFO on columns NAME, CUSTID and BALANCE of the table. To create a new index named I_CUST_CUSTNO on columns NAME and CUSTID, the following command would be entered:
CREATE INDEX I_CUST_CUSTNO on CUST(NAME,CUSTID)
Oracle will then automatically use the existing index (I_CUST_CUSTINFO) to create the new index, rather than accessing the entire table. Note that the syntax used is the same as if the index I_CUST_CUSTINFO did not exist.
Similarly, if you have an index on the EMPNO and MGR columns of the EMP table and you want to change the storage characteristics of that composite index, Oracle can use the existing index to create the new index.
You would use the ALTER INDEX REBUILD command to reorganize or compact an existing index, or to change its storage characteristics. The REBUILD command uses the existing index as the basis for the new index. All index storage commands are supported, such as STORAGE (for extent allocation), TABLESPACE (to move the index to a new tablespace), and INITRANS (to change the initial number of entries).
By performing these index operations using the above functionality, significant time savings can be realized, especially in data warehouse applications where table sizes are very large. These highly efficient operations are also important to a mission- critical OLTP application where availability is a critical requirement, since the index is usable and accessible much quicker.
Bitmapped indexing is an alternative indexing scheme that provides substantial performance benefits and storage savings. Bitmapped indexes are particularly appropriate for data warehousing environments, where ad-hoc queries are more common and data is updated less frequently. When used in conjunction with conventional indexing schemes, such as B-tree indexes, and parallel processing techniques, bitmapped indexes provide significant performance benefits to a data warehouse.
Oracle8 offers several index types and storage options when creating an index. B-tree indexes, as described earlier, are most effective for high-cardinality data, which is when a column has many possible values, such as customer name or phone number. A bitmapped index is most appropriate and effective for low-cardinality data, where a column has relatively few distinct values.
An example of low-cardinality data is a GENDER column, which contains only two possible values: MALE and FEMALE. In this example, it is obvious that the GENDER column is a low-cardinality column. Other columns, with upwards of thousands of distinct values may still be considered low-cardinality. Low-cardinality is not a fixed number, but is defined as a proportion of distinct values to the number of total rows in a table. In a table of one million rows, a column with 10,000 values may be considered low-cardinality.
Creating a Bitmap Index
When a bitmap index is created on a column, a bit stream (ones and zeros) is created for each distinct value in the indexed column. A one or zero represents whether or not a record in the table is equal to the bit-streamed value.
For example, the figure below shows data in a customer table:
Since marital_status, region, gender, and income_level are all low-cardinality columns, these columns are candidates for bitmap indexes. The customer# column is a high-cardinality column, so a B-tree index should be used to index this column.
The figure below shows how a bitmap index of the region column would be built, considering the data in the above table:
customer# marital_status region gender (^) income_level 101 single east male (^) bracket 1 102 married central female (^) bracket 4 103 married west female (^) bracket 2 104 divorced west male (^) bracket 4 105 single central female (^) bracket 2 106 married central female (^) bracket 3
region=east region=central (^) region=west (^1 0 ) (^0 1 )
storage savings over B-tree indexes. A bitmapped index may require 100 times less space than a B- tree index for a low-cardinality column.
However, a strict comparison of the relative sizes of B-tree and bitmapped indexes is not an accurate measure. Because of the performance characteristics of bitmapped indexes and B-tree indexes, users should continue to maintain B-tree indexes for their high-cardinality data, while creating bitmapped indexes for their low-cardinality data.
With bitmapped indexes, the Oracle8 server solves the dilemma associated with multiple-column B-tree indexes. Because bitmapped indexes can be efficiently combined during query execution, three small single-column bitmapped indexes are a sufficient functional replacement for six three-column B-tree indexes. While the bitmapped indexes may not be quite as efficient during execution as the appropriate concatenated B-tree indexes, the space savings provided by bitmapped indexes more than justifies their utilization.
An area where DBAs will see significant storage and performance gains when using bitmapped indexes is when the Oracle8 star query optimization algorithm is used. The Oracle8 star query optimization uses bitmapped indexes on the foreign key columns of the large fact table. By using single-column bitmapped indexes, significant storage
savings are achieved over the Oracle7™^ star query optimization where concatenated column B-tree indexes were required. Bitmapped indexes also provide the needed functionality to support efficient processing of star queries with sparse fact tables because rows matching the query criteria can be quickly located and processed.
Benefits of Oracle’s Bitmap Indexes
Oracle’s implementation of bitmapped indexes provide unique functionality and performance benefits that make them a integral component of most application indexing strategies.
Integration
By having bitmapped indexes integrated into the server, Oracle users will be able to reap the benefits of bitmapped indexes without sacrificing any other functionality. Users can fully exploit all of Oracle’s features, including distributed and parallel functionality, integrity constraints, database triggers, database views, partitioning, and security features. Utilizing bitmapped queries requires no changes to SQL queries, standard programmatic interfaces, or any Oracle tools.
Since bitmapped indexes are an integrated part of the Oracle server, the optimizer is fully aware of the index as soon as it is created, and can then be immediately used for resolving queries. Unlike other DBMSs, where bitmapped indexes are a separate server with a different architecture and complex administrative environment, Oracle bitmapped indexes are just another built-in indexing technique that is available to the DBA to provide efficient access to data.
B-Tree indexes and bitmapped indexes can not only be created on a single table, but they can be used together to satisfy queries. For example, if a query requests data
using criteria on a column that has a B-tree index on it, such as VENDOR_ID, and a column with a bitmap index on it, such as COLOR, both indexes can be used to quickly satisfy the query.
Parallel Bitmap Index Creation
Since users will often be creating bitmapped indexes on very large tables, Oracle provides parallel bitmapped index creation, which significantly reduces the time to create the index.
Compression
Oracle utilizes patented compression technology that not only offers superior storage savings, but also improves query execution performance by providing the ability to operate upon compressed bitmaps during query processing.
Insert/Update/Delete Support
Oracle’s bitmapped indexes support insert, update, and delete operations. This functionality offers users the freedom to make small updates to their data without needing to recreate all of their bitmapped indexes.
Parallel Index Access
For very large tables, users may require performance beyond that of bitmapped indexes. In these cases, Oracle provides parallel execution of bitmapped index access.
Multi-Column Bitmapped Indexes
Oracle provides users with the ability to create multi-column bitmapped indexes. Such indexes could be useful, for example, when two low-cardinality columns are always accessed together.
Flexibility
While other products require users to build bitmapped indexes on all columns (including raw data columns, such as price and quantity, and text columns, such as comments and descriptions), Oracle users only need to index those columns that result in improved query performance. Additionally, Oracle allows users to create B-tree, bitmapped, and clustered indexes on the same table.
Oracle8 allows tables and indexes to be partitioned or broken up into smaller pieces based on a range of key values. Partitioning provides significant benefits for availability, performance, and manageability, especially for large tables and indexes.
Oralce8 offers several alternatives when partitioning indexes. Partitioning provides the functionality to satisfy the varied requirements of different types of applications, whether it is availability for a mission-critical OLTP application, or manageability in
in certain data design implementations, where data is only accessed in one way. This type of table, the index-organized table, is useful in information retrieval applications (such as those using the Oracle8 ConText Cartridge) and in some data warehouse applications. An index-organized table is structurally similar to a B-tree index; however all table data is stored in the B-tree structure itself.
In a B-tree index, the table data is stored in a database object, and the index data, the B-tree, is stored in another database object. The index data object includes the indexed column values and a ROWID pointing into the data table. In other words, the indexed values are stored twice, once in the table, and once in the index.
In many situations, the indexed columns represent just a small fraction of the overall data in the row. In these cases, the index is much small than the table. However, in other situations, an index is comprised of a majority, if not all the columns of the table. In these cases, there is a tremendous amount of redundant data stored. This is the type of table design that would benefit from an index-organized table, which reduces storage significantly.
In an index-organized table, only one database object is used to store both the data and the index. In this case, everything is stored in a B-tree index structure—there is no database object used to only store the data. The B-tree index structure holds both the indexed column values and the remaining column values. No ROWIDs are needed or stored since ROWIDs are used to point from an index to a data table row, which is unnecessary in this type of structure.
To take advantage of this feature, a table is created using the option INDEX ORGANIZED. When a table is created as an index-organized table it is transparent to most applications and users. Applications manipulate the index-organized table just like a regular table, using standard SQL statements for query, insert, update, or delete operations. Some restrictions exist when implementing index-organized tables, such as no secondary indexes can be created on the table. Also, in some cases, an updated row may force the entire row to be moved.
Figure 1: Differences between a table and traditional B-tree index, and an index- organized table.
Because all data, the key columns and the remaining columns, is stored in the B-tree structure, index-organized tables provide a faster, key-based access to table data for queries involving exact match and/or range search. Once the search has located the key values, the remaining data is present at that location, so there is no need to follow a ROWID back to the table data, as would be the case with a table and B-tree index structure. This eliminates one I/O, namely the read of the table.
The storage requirements are reduced as key columns and are not duplicated in the table and the index. Also, since ROWIDs are not necessary, this storage amount is saved for each row.
Clusters and Hash Clusters
Clusters
Clusters are another optional method of storing table data. A cluster is one or more tables that share the same data blocks because they share common columns and are often used together.
For example, the EMP and DEPT table share the DEPTNO column. When you cluster the EMP and DEPT, Oracle physically stores all rows for each department from both the EMP and DEPT tables in the same data blocks.
Because clusters store related rows of different tables together in the same data blocks, properly used clusters offer two primary benefits:
Disk I/O is reduced and access time improves for joins of clustered tables. In a cluster, a cluster key value is the value of the cluster key columns for a
Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065 U.S.A.
Worldwide Inquiries: +1.415.506. Fax +1.415.506. http://www.oracle.com/
Copyright © Oracle Corporation 1997 All Rights Reserved
This document is provided for informational purposes only, and the information herein is subject to change without notice. Please report any errors herein to Oracle Corporation. Oracle Corporation does not provide any warranties covering and specifically disclaims any liability in connection with this document.
Oracle and Oracle Parallel Server are registered trademarks and Enabling the Information Age and Oracle8 are trademarks of Oracle Corporation.
All other company and product names mentioned are used for identification purposes only and may be trademarks of their respective owners.