Oracle9i OLAP Developer's Guide to the OLAP DML Release 2 (9.2) Part Number A95298-01 |
|
Defining Data Objects, 8 of 11
A variable with sparse data is one in which a relatively high percentage of the cells of the variable do not contain actual data. Such "empty," or NA, values take up storage space in the analytic workspace.
There are two types of sparsity:
month
for which you do not have data for past months. The cells exist because you have defined past months in the month
dimension, but the cells are empty.A composite is an internal object that is used to store sparse data compactly in a variable. A composite is a list of dimension-value combinations in which one value is taken from each of the dimensions on which the composite is based.
Composites can be named or unnamed:
SPARSE
keyword to request that an unnamed composite is automatically created.DEFINE
COMPOSITE
command. Later, when you are defining or accessing a variable, you can specify this composite by name along with the names of other dimensions.Because the values in composites are maintained automatically, using composites is the recommended way of handling sparsity in your analytic workspace.
Using composites is one of the most important steps you can take to manage sparsity, which contributes to keeping analytic workspace size to a minimum and promoting good performance.
Using named composites makes it easier to track which variables share the same composite. A named composite in the dimension list of a variable tells Oracle OLAP that the dimensions in the named composite are sparse in this variable, and that this composite is shared only with other variables that have the same sparsity pattern.
In contrast, all variables defined with an unnamed composite that have exactly the same dimensions in the same order will automatically share that unnamed composite. If these variables have different sparsity patterns, performance will suffer.
You can also manage sparsity by using a conjoint dimension to hold dimension-value combinations for which a given variable has data. However, because the values in composites are automatically maintained, using composites is the recommended way of handling sparsity in your analytic workspace.
When you define a multidimensional variable, you can specify a composite in the list of dimensions.
First, define a named composite by using the DEFINE COMPOSITE
command. Then, define the variables by using the following syntax to include a named composite in the dimension list of each variable.
composite-name
<dims
>
For example, suppose you define a composite named proddist
, whose dimensions include product
and district
, as shown in the following command.
DEFINEproddist
COMPOSITE <product
district
>
Now, suppose you want to define a sales
variable in which time
will be the fastest-varying dimension and the proddist
composite will be the slowest-varying dimension, as shown in the following command.
DEFINE sales <time proddist<product district>>
Note that you should never use the SPARSE
keyword with a composite. Essentially, you use the name of the composite instead of the SPARSE
keyword.
See Also:
|
You can use the RENAME
command to:
When you define a multidimensional variable, you can specify that a composite is used instead of its base dimensions to dimension the data. Later, as you add values to the dimensions of the variable for which you defined a composite, the following actions are taken:
For a variable that uses a composite, cells are created for only those dimension values that are used in the composite dimension-value combinations; it does not create a variable cell for every value in the base dimensions. Data for a variable is stored in order, cell by cell, for each combination of dimension values. From the perspective of data storage, each combination of base dimension values in a composite is treated like the value of a regular dimension. This means that if you define a variable with one regular dimension and one composite, then it is stored like a two-dimensional variable.
If your company does promotional marketing for certain products in some but not all districts, then your variable data will be sparse along the product
and district
dimensions. Therefore, suppose you define a composite named proddist
, whose base dimensions are product
and district
. There are dimension-value combinations in the composite only for those values that have data. For example, if you run a promotion for tents but not canoes, then the composite includes the tents and city combinations, but not the canoes and city combinations.
The following command creates a variable called promo
that is dimensioned by month
and a composite named proddist
, whose base dimensions are product
and district
.
DEFINE promo INTEGER <month proddist<product district>>
The following conceptual figure illustrates the promo
variable that is created by this command, the month
, product
and district
base dimensions, a named composite (proddist
) created from the product
and district
base dimensions, and the internal relation that is created between the product
and district
base dimensions and the proddist
composite.
The following table is an example of the sequence in which the data for the promo
variable might be stored.
TENTS BOSTON JAN95 | TENTS BOSTON FEB95 |
TENTS BOSTON MAR95 |
. . . . . . . . . |
RACQUETS CHICAGO JAN95 |
RACQUETS CHICAGO FEB95 |
. . . . . . . . . |
---|---|---|---|---|---|---|
257 |
379 |
428 |
. . . |
635 |
192 |
. . . |
When you specify a composite for just one dimension in a variable definition, a single-dimension composite is created. The values of this composite will be a subset of the values in its base dimension.
It is a good idea to use single dimension composites when a variable will share the same dimensions as some other variables, but for a particular single dimension, the variable will only have data for some of the values of the dimension.
Suppose you have already defined a variable called actual
with the dimensions line
, division
, and month
. The actual
variable does not contain any NA values. You need to define a variable called budget
, which requires much less detail than actual
. For example, budget
only needs 10 percent of the line
dimension values, while actual
needs all of them.
If you define budget
without setting sparsity, then all of the line
dimension values are present for every month
and division
, but 90 percent of the line
dimension cells will have NA values.
To handle sparse data in this case, you define budget
with an unnamed composite for only the line
dimension as shown below.
DEFINE budget DECIMAL <SPARSE <line> division month>
|
Copyright © 2001, 2002 Oracle Corporation. All Rights Reserved. |
|