Oracle9i Database Administrator's Guide Release 2 (9.2) Part Number A96521-01 |
|
This chapter describes how to manage hash clusters, and contains the following topics:
See Also:
Chapter 14, "Managing Space for Schema Objects" is recommended reading before attempting tasks described in this chapter. |
Storing a table in a hash cluster is an optional way to improve the performance of data retrieval. A hash cluster provides an alternative to a nonclustered table with an index or an index cluster. With an indexed table or index cluster, Oracle locates the rows in a table using key values that Oracle stores in a separate index. To use hashing, you create a hash cluster and load tables into it. Oracle physically stores the rows of a table in a hash cluster and retrieves them according to the results of a hash function.
Oracle uses a hash function to generate a distribution of numeric values, called hash values, that are based on specific cluster key values. The key of a hash cluster, like the key of an index cluster, can be a single column or composite key (multiple column key). To find or store a row in a hash cluster, Oracle applies the hash function to the row's cluster key value. The resulting hash value corresponds to a data block in the cluster, which Oracle then reads or writes on behalf of the issued statement.
To find or store a row in an indexed table or cluster, a minimum of two (there are usually more) I/Os must be performed:
In contrast, Oracle uses a hash function to locate a row in a hash cluster; no I/O is required. As a result, a minimum of one I/O operation is necessary to read or write a row in a hash cluster.
This section helps you decide when to use hash clusters by contrasting situations where hashing is most useful against situations where there is no advantage. If you find your decision is to use indexing rather than hashing, then you should consider whether to store a table individually or as part of a cluster.
Note: Even if you decide to use hashing, a table can still have separate indexes on any columns, including the cluster key. |
See Also:
|
Hashing is useful when you have the following conditions:
SELECT ... WHERE cluster_key = ...;
In such cases, the cluster key in the equality condition is hashed, and the corresponding hash key is usually found with a single read. In comparison, for an indexed table the key value must first be found in the index (usually several reads), and then the row is read from the table (another read).
Hashing is not advantageous in the following situations:
SELECT . . . WHERE cluster_key < . . . ;
With an index, key values are ordered in the index, so cluster key values that satisfy the WHERE
clause of a query can be found with relatively few I/Os.
A hash cluster is created using a CREATE CLUSTER
statement, but you specify a HASHKEYS
clause. The following example contains a statement to create a cluster named trial_cluster
that stores the trial
table, clustered by the trialno
column (the cluster key); and another statement creating a table in the cluster.
CREATE CLUSTER trial_cluster (trialno NUMBER(5,0)) PCTUSED 80 PCTFREE 5 TABLESPACE users STORAGE (INITIAL 250K NEXT 50K MINEXTENTS 1 MAXEXTENTS 3 PCTINCREASE 0) HASH IS trialno HASHKEYS 150; CREATE TABLE trial ( trialno NUMBER(5,0) PRIMARY KEY, ...) CLUSTER trial_cluster (trialno);
As with index clusters, the key of a hash cluster can be a single column or a composite key (multiple column key). In this example, it is a single column.
The HASHKEYS
value, in this case 150, specifies and limits the number of unique hash values that can be generated by the hash function used by the cluster. Oracle rounds the number specified to the nearest prime number.
If no HASH IS
clause is specified, Oracle uses an internal hash function. If the cluster key is already a unique identifier that is uniformly distributed over its range, you can bypass the internal hash function and specify the cluster key as the hash value, as is the case in the above example. You can also use the HASH IS
clause to specify a user-defined hash function.
You cannot create a cluster index on a hash cluster, and you need not create an index on a hash cluster key.
For additional information about creating tables in a cluster, guidelines for setting parameters of the CREATE CLUSTER
statement common to index and hash clusters, and the privileges required to create any cluster, see Chapter 18, "Managing Clusters". The following sections explain and provide guidelines for setting the parameters of the CREATE CLUSTER
statement specific to hash clusters:
See Also:
|
You can also create a single-table hash cluster, which provides fast access to rows in a table. However, this table must be the only table in the hash cluster. Essentially, there must be a one-to-one mapping between hash keys and data rows. The following statement creates a single-table hash cluster named peanut
with the cluster key variety
:
CREATE CLUSTER peanut (variety NUMBER) SIZE 512 SINGLE TABLE HASHKEYS 500;
Oracle rounds the HASHKEY
value up to the nearest prime number, so this cluster has a maximum of 503 hash key values, each of size 512 bytes.
When creating a hash cluster, it is important to choose the cluster key correctly and set the HASH IS
, SIZE
, and HASHKEYS
parameters so that performance and space use are optimal. The following guidelines describe how to set these parameters.
Choosing the correct cluster key is dependent on the most common types of queries issued against the clustered tables. For example, consider the emp
table in a hash cluster. If queries often select rows by employee number, the empno
column should be the cluster key. If queries often select rows by department number, the deptno
column should be the cluster key. For hash clusters that contain a single table, the cluster key is typically the entire primary key of the contained table.
The key of a hash cluster, like that of an index cluster, can be a single column or a composite key (multiple column key). A hash cluster with a composite key must use Oracle's internal hash function.
Specify the HASH IS
parameter only if the cluster key is a single column of the NUMBER
datatype, and contains uniformly distributed integers. If the above conditions apply, you can distribute rows in the cluster so that each unique cluster key value hashes, with no collisions (two cluster key values having the same hash value), to a unique hash value. If these conditions do not apply, omit this option so that you use the internal hash function.
SIZE
should be set to the average amount of space required to hold all rows for any given hash key. Therefore, to properly determine SIZE
, you must be aware of the characteristics of your data:
SIZE
can be set to the average row size in the cluster.SIZE
can be set to the average amount of space required to hold all rows associated with a representative hash value.Further, once you have determined a (preliminary) value for SIZE
, consider the following. If the SIZE
value is small (more than four hash keys can be assigned for each data block) you can use this value for SIZE
in the CREATE CLUSTER
statement. However, if the value of SIZE
is large (four or fewer hash keys can be assigned for each data block), then you should also consider the expected frequency of collisions and whether performance of data retrieval or efficiency of space usage is more important to you.
HASH IS
) and you expect few or no collisions, you can use your preliminary value of SIZE
. No collisions occur and space is used as efficiently as possible.SIZE
as shown in the following chart.
Available Space for each Block/Calculated SIZE | Setting for SIZE |
---|---|
1 |
|
2 |
|
3 |
|
4 |
|
>4 |
|
Overestimating the value of SIZE
increases the amount of unused space in the cluster. If space efficiency is more important than the performance of data retrieval, disregard the above adjustments and use the original value for SIZE
.
For maximum distribution of rows in a hash cluster, Oracle rounds the HASHKEYS
value up to the nearest prime number.
The following examples show how to correctly choose the cluster key and set the HASH IS
, SIZE
, and HASHKEYS
parameters. For all examples, assume that the data block size is 2K and that on average, 1950 bytes of each block is available data space (block size minus overhead).
You decide to load the emp
table into a hash cluster. Most queries retrieve employee records by their employee number. You estimate that the maximum number of rows in the emp
table at any given time is 10000 and that the average row size is 55 bytes.
In this case, empno
should be the cluster key. Since this column contains integers that are unique, the internal hash function can be bypassed. SIZE
can be set to the average row size, 55 bytes. Note that 34 hash keys are assigned for each data block. HASHKEYS
can be set to the number of rows in the table, 10000. Oracle rounds this value up to the next highest prime number: 10007.
CRE
ATE CLUSTER emp_cluster (empno NUMBER) . . . SIZE 55 HASH ISempno HASHKEYS 10000;
Conditions similar to the previous example exist. In this case, however, rows are usually retrieved by department number. At most, there are 1000 departments with an average of 10 employees for each department. Department numbers increment by 10 (0, 10, 20, 30, . . . ).
In this case, deptno
should be the cluster key. Since this column contains integers that are uniformly distributed, the internal hash function can be bypassed. A preliminary value of SIZE
(the average amount of space required to hold all rows for each department) is 55 bytes * 10, or 550 bytes. Using this value for SIZE
, only three hash keys can be assigned for each data block. If you expect some collisions and want maximum performance of data retrieval, slightly alter your estimated SIZE
to prevent collisions from requiring overflow blocks. By adjusting SIZE
by 12%, to 620 bytes (refer to "Setting SIZE"), there is more space for rows from expected collisions.
HASHKEYS
can be set to the number of unique department numbers, 1000. Oracle rounds this value up to the next highest prime number: 1009.
CREATE CLU
STER emp_cluster (deptno NUMBER) . . . SIZE 620 HASH IS deptno HASHKEYS 1000;
As with index clusters, it is important to estimate the storage required for the data in a hash cluster.
Oracle guarantees that the initial allocation of space is sufficient to store the hash table according to the settings SIZE
and HASHKEYS
. If settings for the storage parameters INITIAL
, NEXT
, and MINEXTENTS
do not account for the hash table size, incremental (additional) extents are allocated until at least SIZE*HASHKEYS
is reached. For example, assume that the data block size is 2K, the available data space for each block is approximately 1900 bytes (data block size minus overhead), and that the STORAGE
and HASH
parameters are specified in the CREATE CLUSTER
statement as follows:
STORAGE (INITIAL 100K NEXT 150K MINEXTENTS 1 PCTINCREASE 0) SIZE 1500 HASHKEYS 100
In this example, only one hash key can be assigned for each data block. Therefore, the initial space required for the hash cluster is at least 100*2K or 200K. The settings for the storage parameters do not account for this requirement. Therefore, an initial extent of 100K and a second extent of 150K are allocated to the hash cluster.
Alternatively, assume the HASH
parameters are specified as follows:
SIZE 500 HASHKEYS 100
In this case, three hash keys are assigned to each data block. Therefore, the initial space required for the hash cluster is at least 34*2K or 68K. The initial settings for the storage parameters are sufficient for this requirement (an initial extent of 100K is allocated to the hash cluster).
You can alter a hash cluster with the ALTER CLUSTER
statement:
ALTER CLUSTER emp_dept . . . ;
The implications for altering a hash cluster are identical to those for altering an index cluster, described in "Altering Clusters". However, the SIZE
, HASHKEYS
, and HASH IS
parameters cannot be specified in an ALTER CLUSTER
statement. To change these parameters, you must re-create the cluster, then copy the data from the original cluster.
You can drop a hash cluster using the DROP CLUSTER
statement:
DROP CLUSTER emp_dept;
A table in a hash cluster is dropped using the DROP TABLE
statement. The implications of dropping hash clusters and tables in hash clusters are the same as those for dropping index clusters.
The following views display information about hash clusters:
See Also:
Oracle9i Database Reference for complete descriptions of these views |
|
Copyright © 2001, 2002 Oracle Corporation. All Rights Reserved. |
|