Oracle® Database 2 Day + Data Warehousing Guide 11g Release 1 (11.1) Part Number B28314-01 |
|
|
View PDF |
Using Oracle Warehouse Builder, you can design a warehouse that is either relational or dimensional.
Warehouse Builder explicitly separates dimensional design from physical implementation. You can choose either a relational implementation or a multidimensional implementation for the dimensional objects using a simple click operation. Therefore, you can implement the same dimensional object as a relational target warehouse or a multidimensional warehouse.This section shows you how to design a dimensional model implemented as a relational target warehouse. You model a small data warehouse consisting of a cube and two dimensions. Although you can use Warehouse Builder to model complex snowflake schemas, for the purposes of this demonstration, you model a simple star schema consisting of a cube with foreign key references to the two dimensions.
This section includes the following topics:
To define a relational target warehouse, complete the following steps:
Designate a schema as the warehouse target schema as described in "Identifying the Warehouse Target Schema".
Define or import source and target objects into the warehouse target module.
In general, you can right-click on any node in the warehouse target module and select either New or Import. Warehouse Builder launches the appropriate wizard to guide you. Click Help for additional information.
The types of objects you add to the warehouse target module depend on the nature of your source data and what it is you want to accomplish with the data warehouse.
To continue with the exercises presented in this guide, see "Exercise: Adding External Tables to the Target Module" and "Exercise: Understanding Dimensions".
Configure the source and target objects as necessary.
Some objects require additional configuration. After you import or define an object in the warehouse module, right-click and select Configure to review the settings and make changes as necessary.
In a traditional data warehousing implementation, there is typically only one target schema, which is the data warehouse target. In Warehouse Builder, however, any user that you define has the potential to become a target schema. Therefore, you need to indicate which of the schemas is to be the data warehouse target schema.
To designate a schema as the data warehouse target schema:
Register the schema in Warehouse Builder.
In the Global Explorer panel, expand the Security node. Right-click the Users node and select New.
On the Create User dialog box, select Create DB User... and follow the prompts. Click Help if you need more information.
For the purposes of the demonstration, create a new schema and call it EXPENSE_WH.
Specify the location information for the new schema.
In the Connection Explorer, right click and select New from Locations under the Oracle node.
Create a location named EXPENSE_WH_LOCATION.
Select the option to test the connection.
In the Project Explorer, associate a module with the schema location.
Recall that in "Example: Importing Metadata from Flat Files", you created a module to correspond to a location from which you import metadata. In a similar fashion, you need to create a module to correspond to the location for the target schema.
In the OWB_DEMO
project, expand the Databases node, right-click the Oracle node, and select New. Follow the prompts in the Create Module wizard. Be sure to designate the module status as Warehouse Target.
For the purposes of the demonstration, name the module EXPENSE_WH.
Familiarize yourself with the new data warehouse target schema.
In the Project Explorer, expand the node for the newly defined warehouse target module. Notice the various types of objects listed under the node. These are the types of objects that you can either define in or import into the module.
The types of objects you add to the target module have implications on the ETL logic you subsequently design. If your source data originates from flat files, you can choose to generate either SQL*Loader code or SQL code. Each type of code has its own advantages. Stated concisely, SQL*Loader handles large volumes of data better while SQL enables a broader range of complex joins and transformations.
To utilize SQL*Loader in Warehouse Builder, import the flat files as described in "Example: Importing Metadata from Flat Files". To utilize SQL, however, you must define an external table in the warehouse module as described in "Exercise: Adding External Tables to the Target Module".
External tables are database objects in the Oracle Database, versions 9i and higher. You cannot use external tables with any other database type or any Oracle Database previous to the 9i release.
External tables are tables that represent data from flat files in a relational format. They are read-only tables that act like regular source tables in Warehouse Builder. Each external table you create corresponds to a single record type in an existing flat file.
Our objective in this exercise is to create the necessary external tables for the two flat files we previously imported. Since both files have a single record type, we need to create only one external table for each file.
To add external tables to the target warehouse module:
In the Project Explorer, expand the Databases node and then the Oracle node.
Expand the target module where you want to create the external table.
That is, expand the EXPENSE_WH
module.
Right-click the External Tables node and select New.
Warehouse Builder displays the Create External Table wizard. Follow the prompts.
Name the external table EXPENSE_CATEGORIES.
When prompted to select a flat file, select EXPENSE_CATEGORIES_CSV.
Repeat the previous step to create an external table called EXPENSE_DATA to represent EXPORT_CSV.
Configure the physical file system details for the two external tables.
Right-click an external table from the module and select Configure. On the DataFiles node, right-click and select Create. Accept the default name, NEW_DATAFILE_1.
Enter the name of the flat file from which the external table inherits data. Therefore, specify the data file name as expense_categories.csv
for one external table and export.csv
for the other.
A dimension is a structure that organizes data. Examples of commonly used dimensions are Customers, Time, and Products.
For relational dimensions, using dimensions improves query performance because users often analyze data by drilling down on known hierarchies. An example of a hierarchy is the Time hierarchy of year, quarter, month, day. The Oracle Database uses these defined hierarchies by rewriting queries that retrieve data from materialized views rather than detail tables.
Typical relational dimension tables have the following characteristics:
A single column primary key populated with values called warehouse keys.
Warehouse keys that provide administrative control over the dimension, support techniques that preserve dimension history, and reduce the size of cubes.
One or more hierarchies that are explicitly defined as dimension objects. Hierarchies maximize the number of query rewrites by the Oracle server.
Dimensions are the primary organizational unit of data in a star schema. Examples of some commonly used dimensions are Customer, Product, and Time.
A dimension consists of a set of levels and a set of hierarchies defined over these levels. When you create a dimension, you define the following:
Dimension Attributes: A descriptive characteristic of a dimension member. It has a name and a data type.
Levels: Defines the level of aggregation of data. For example, the dimension Products can have the following levels: Total, Groups, and Product.
Level attributes: A descriptive characteristic of a level member. Each level in the dimension has a set of level attributes.
Hierarchies: A logical structure that uses ordered levels or a set of data values (for a value-based hierarchy) as a means of organizing data. A hierarchy describes parent-child relationships among a set of levels
To understand the basic concepts and design of a dimension, in this exercise you examine a pre-defined dimension.
To become familiar with the dimensions:
Open the PRODUCTS dimension in the Data Object Editor.
In the Project Explorer panel, navigate to OWB_DEMO,
Databases, Oracle, SALES_WH,
and then expand Dimensions. Double-click PRODUCTS.
Warehouse Builder launches the Data Object Editor. The Data Object Editor is the single interface where you can easily design, create, and manage a variety of database or dimensional objects.
Observe the dimension attributes.
A dimension attribute is a descriptive characteristic of a dimension member. It has a name and a data type. A dimension attribute is applicable to one or more levels in the dimension. They are implemented as level attributes to store data.
The list of dimension attributes must include all the attributes that you may need for any of the levels in the dimension.
For example, the Products dimension has a dimension attribute called Description. This attribute is applicable to all the levels Total, Groups, and Products and stores the description for each of the members of these levels.
Observe the levels.
The levels in a dimension represent the level of aggregation of data. A dimension must contain at least one level, except in the case of a dimension that contains a value-based hierarchy. Every level must have level attributes and a level identifier.
For example, the dimension Products can have the following levels: Total, Groups, and Product.
For additional exercises, you can create a time dimension and a regular dimension by following the Oracle By Example at
http://www.oracle.com/technology/obe/10gr2_owb/owb10gr2_gs/owb/lesson3/relational-wh-03.htm
Every level must have two identifiers: a surrogate identifier and a business identifier. When you create a dimension, each level must implement the dimension attributes marked as the surrogate identifier and business identifier (attributes, in the case of a composite business identifier) of the dimension.
A surrogate identifier uniquely identifies each level record across all the levels of the dimension. It must be composed of a single attribute. Surrogate identifiers enable you to hook facts to any dimension level as opposed to the lowest dimension level only.
For a dimension that has a relational implementation, the surrogate identifier should be of the data type NUMBER
. Because the value of the surrogate identifier must be unique across all dimension levels, you use the same sequence to generate the surrogate identifier of all the dimension levels.
For a relational implementation, the surrogate identifier serves the following purposes:
If a child level is stored in a different table from the parent level, each child level record stores the surrogate identifier of the parent record.
In a fact table, each cube record stores only the surrogate identifier of the dimension record to which it refers. By storing the surrogate identifier, the size of the fact table that implements the cube is reduced.
A business identifier consists of a user-selected list of attributes. The business identifier must be unique across the level and is always derived from the natural key of the data source. The business identifier uniquely identifies the member. For example, the business identifier of a Product level can be its Universal Product Code (UPC), which is a unique code for each product.
The business identifier does the following:
Identifies a record in business terms.
Provides a logical link between the fact and the dimension or between two levels.
Enables the lookup of a surrogate key.
When you populate a child level in a dimension, you must specify the business identifier of its parent level. When you populate a cube, you must specify the business identifier of the dimension level to which the cube refers.
A parent identifier is used to annotate the parent reference in a value-based hierarchy.
For example, an EMPLOYEE dimension with a value-based hierarchy, has the following dimension attributes: ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE, JOB_ID, HIRE_DATE, and MANAGER_ID. In this dimension, ID is the surrogate identifier and MANAGER_ID is the parent identifier.
A level attribute is a descriptive characteristic of a level member. Each level in the dimension has a set of level attributes. To define level attributes, you just select the dimension attributes that the level will implement. A level attribute has a distinct name and a data type. The data type is inherited from the dimension attribute that the level attribute implements. The name of the level attribute can be modified to be different from that of the dimension attribute that it implements.
Every level must implement the attribute marked as the surrogate identifier and the business identifier in the set of the dimension attributes.
A dimension hierarchy is a logical structure that uses ordered levels or a set of data values (for a value-based hierarchy) as a means of organizing data. A hierarchy describes parent-child relationships among a set of levels. A level-based hierarchy must have at least one level. A level can be part of more than one hierarchy.
For example, the Time dimension can have the following two hierarchies:
Fiscal Hierarchy: Fiscal Year > Fiscal Quarter > Fiscal Month > Fiscal Week > Day
Calendar Hierarchy: Calendar Year > Calendar Quarter > Calendar Month > Day
All hierarchies must be strict 1:n relationships. One record in a parent level corresponds to multiple records in a child level. But one record in a child level corresponds to only one parent record within a hierarchy.
A dimension role is an alias for a dimension. In a data warehouse, a cube can refer to the same dimension multiple times, without requiring the dimension to be stored multiple times. Multiple references to the same dimension may cause confusion. So you create an alias for each reference to the dimension, thus allowing the joins to be instantly understandable. In such cases, the same dimension performs different dimension roles in the cube.
For example, a sales record can have the following three time values:
Time the order is booked
Time the order is shipped
Time the order is fulfilled
Instead of creating three time dimensions and populating them with data, you can use dimension roles. Model one time dimension and create the following three roles for the time dimension: order booked time, order shipped time, and order fulfillment time. The sales cube can refer to the order time, ship time, and fulfillment time dimensions.
When the dimension is stored in the database, only one dimension is created and each dimension role references this dimension. But when the dimension is stored in the OLAP catalog, Warehouse Builder creates a dimension for each dimension role. Thus, if a time dimension has three roles, three dimensions are created in the OLAP catalog. However, all three dimensions are mapped to the same underlying table. This is a workaround because the OLAP catalog does not support dimension roles.
Note:
Dimension roles can be created for dimensions that have a relational implementation only.A level relationship is an association between levels in a dimension hierarchy. Level relationships are implemented using level attributes that store the reference to the parent level in the hierarchy.
For example, the Products dimension has the following hierarchy: Total > Groups > Product. Warehouse Builder creates two level relationships: Product to Groups and Groups to Total. Two new attributes implement this level relationship: one in the Product level and one in the Groups level. These attributes store the surrogate ID of the parent level.
An example of a dimension is the Products dimension that you use to organize product data. Table 4-1 lists the levels in the Products dimension and the surrogate identifier and business identifier for each of the levels in the dimension.
Table 4-1 Products Dimension Level Details
Level | Attribute Name | Identifier |
---|---|---|
Total |
ID |
Surrogate |
Name |
Business |
|
Description |
||
Groups |
ID |
Surrogate |
Name |
Business |
|
Description |
||
Product |
ID |
Surrogate |
UPC |
Business |
|
Name |
||
Description |
||
Package Type |
||
Package Size |
The Products dimension contains the following hierarchy:
Hierarchy 1: Total > Groups > Product
Warehouse Builder creates control rows that enable you to link fact data to a dimension at any level. For example, you may want to reuse a Time dimension in two different cubes to record the budget data at the month level and the actual data at the day level. Because of the way dimensions are loaded with control rows, you can perform this without any additional definitions. Each member in a dimension hierarchy is represented using a single record.
All control rows have negative dimension key values starting from -2. For each level value of higher levels, a row is generated that can act as a unique linking row to the fact table. All the lower levels in this linking or control rows are nulled out.
Consider the Products dimension described in Dimension Example. You load data into this dimension from a table that contains four categories of products. Warehouse Builder inserts control rows in the dimension as shown in Table 4-2. These rows enable you to link to a cube at any dimension level. Note that the table does not contain all the dimension attribute values.
Table 4-2 Control Rows Created for the Products Dimension
Dimension Key | Total Name | Categories Name | Product Name |
---|---|---|---|
-3 |
TOTAL |
||
-9 |
TOTAL |
Hardware |
|
-10 |
TOTAL |
Software |
|
-11 |
TOTAL |
Electronics |
|
-12 |
TOTAL |
Peripherals |
To obtain the real number of rows in a dimension, count the number of rows by including a WHERE clause that excludes the NULL rows. For example, to obtain a count on Products, count the number of rows including a WHERE clause to exclude NULL rows in Product.
Implementing a dimension consists of specifying how the dimension and its data are physically stored. Warehouse Builder enables several types of implementations for dimensional objects, including multi-dimensional implementations. However, for the purposes of this guide, we discuss a relational implementation only.
In a star schema implementation, Warehouse Builder stores the dimension data in a single table. Because the same table or view stores data for more than one dimension level, you must specify a dimension key column in the table. The dimension key column is the primary key for the dimension. This column also forms the foreign key reference to the cube.
Each level implements a subset of dimension attributes. By default, the level attribute name is the same as the dimension attribute name. To avoid name conflicts caused by all level data being stored in the same table, Warehouse Builder uses the following guidelines for naming in a star table:
If the level attribute name is not unique, Warehouse Builder prefixes it with the name of the level.
If the level attribute name is unique, Warehouse Builder does not use any prefix.
Note:
To ensure that no prefixes are used, you must explicitly change the level attribute name in the Create Dimension wizard or the Data Object Editor.For example, if you implement the Products dimension using a star schema, Warehouse Builder uses a single table to implement all the levels in the dimension.
Figure 4-1 Star Schema Implementation of Products Dimension
When you perform binding, you specify the database columns that will store the data of each attribute and level relationship in the dimension. You can perform either auto binding or manual binding for a dimension.
Auto Binding When you perform auto binding, Warehouse Builder binds the dimension object attributes to the database columns that store their data. When you perform auto binding for the first time, Warehouse Builder also creates the tables that are used to store the dimension data.
When you perform auto binding on a dimension that is already bound, Warehouse Builder uses the following rules:
If the implementation method of the dimension remains the same, Warehouse Builder rebinds the dimensional object to the existing implementation objects.
For example, you create a Products dimension using the star schema implementation method and perform auto binding. The dimension data is stored in a table called Products. You modify the dimension definition at a later date but retain the implementation method as star. When you now auto bind the Products dimension, Warehouse Builder rebinds the Products dimension attributes to the same implementation tables.
If the implementation method of a dimension is changed, Warehouse Builder deletes the old implementation objects and creates a new set of implementation tables. If you want to retain the old implementation objects, you must first unbind the dimensional object and then perform auto binding. For more information on implementation methods, see "Star Schema".
For example, you create a Products dimension using the star schema implementation method and bind it to the implementation table. You now edit this dimension and change its implementation method to snowflake. When you now perform auto binding for the modified Products dimension, Warehouse Builder deletes the table that stores the dimension data, creates new implementation tables, and binds the dimension attributes and relationships to the new implementation tables.
To perform auto binding:
In the Project Explorer, right-click the dimension and select Open Editor.
The Data Object Editor for this dimension is displayed.
On the Dimensional tab, right-click the Dimension node and select Bind.
Alternatively, select the dimension node on the canvas and from the Object menu choose Bind.
If the Bind option is not enabled, check if the dimension is a relational dimension and that the Manual options is not set in the Implementation section of the Storage tab.
Auto binding uses the implementation settings described in Star Schema.
Manual Binding You would typically use manual binding to bind existing tables to a dimension. Use manual binding if no auto binding or rebinding is required.
To perform manual binding for a dimension:
Create the implementation objects (tables or views) that you will use to store the dimension data.
In the case of relational dimensions, create the sequence used to load the surrogate identifier of the dimension. You can also choose to use an existing sequence.
In the Project Explorer, right-click the dimension and select Open Editor.
The Data Object Editor for the dimension opens. On the canvas, the Dimensional tab is active.
Right-click the dimension and select Detail View.
Warehouse Builder opens a new tab that has the same name as the dimension.
From the Palette, drag and drop the operator that represents the implementation object onto the canvas.
Warehouse Builder displays the Add a New or Existing <Object> dialog box. For example, if the dimension data is stored in a table, drag a Table operator from the Palette and drop it onto the canvas. The Add a New or Existing Table dialog box is displayed.
Choose the Select an existing <Object> option and then select the data object from the list of objects displayed in the selection tree.
Click OK.
A node representing the object that you just added is displayed on the canvas.
If more than one data object is used to store the dimension data, perform steps 4 to 6 for each data object.
Map the attributes in each level of the dimension to the columns that store their data. To do this, hold down your mouse on the dimension attribute, drag, and then drop on the column that stores the attribute value.
Also map the level relationships to the database column that store their data.
For example, for the Products dimension described in "Dimension Example", the attribute Name in the Groups level of the Products dimension is stored in the Group_name attribute of the Products_tab table. Hold down the mouse on the Name attribute, drag, and drop on the Group_name attribute of the Products_tab table.
Cubes contain measures and link to one or more dimensions. The axes of a cube contain dimension members and the body of the cube contains measure values. Most measures are additive. For example, sales data can be organized into a cube whose edges contain values for Time, Products, and Customers dimensions and whose body contains values from the measures Value sales, and Dollar sales.
A cube is linked to dimension tables over foreign key constraints. Since data integrity is vital, these constraints are critical in a data warehousing environment. The constraints enforce referential integrity during the daily operations of the data warehouse.
Data analysis applications typically aggregate data across many dimensions. This enables them to look for anomalies or unusual patterns in the data. Using cubes is the most efficient way of performing these type of operations. In a relational implementation, when you design dimensions with warehouse keys, the cube row length is usually reduced. This is because warehouse keys are shorter than their natural counterparts. This results is lesser amount of storage space needed for the cube data.
A typical cube contains:
A primary key defined on a set of foreign key reference columns or, in the case of a data list, on an artificial key or a set of warehouse key columns. When the cube is a data list, the foreign key reference columns do not uniquely identify each row in the cube.
A set of foreign key reference columns that link the table with its dimensions.
A cube consists of the set of measures defined over a set of dimensions. To create a cube, you must define the following:
A measure is data, usually numeric and additive, that can be examined and analyzed. Examples of measures include sales, cost, and profit. A cube must have one or more measures. You can also perform aggregation of measures. Only numeric measures can be aggregated.
A cube is defined by a set of dimensions. A cube can refer to a level that is not the lowest level in a dimension.
For cubes that use a pure relational implementation, you can reuse the same dimension multiple times with the help of dimension roles. For more information on dimension roles, see "Dimension Roles".
Before you validate a cube, ensure that all the dimensions that the cube references are valid.
To define a dimension reference, specify the following:
The dimension and the level within the dimension to which the cube refers.
For a cube that uses a relational implementation, you can refer to intermediate levels in a dimension. Warehouse Builder supports a reference to the non surrogate identifier of a level, for example, the business keys.
For dimensions that use a relational implementation, a dimension role for each dimension to indicate what role the dimension reference is performing in the cube. Specifying the dimension role is optional.
The Sales cube stores aggregated sales data. It contains the following two measures: Value_sales and Dollar_sales.
Value_sales: Stores the amount of the sale in terms of the quantity sold.
Dollar_sales: Stores the amount of the sale.
Table 4-3 describes the dimensionality of the Sales cube. It lists the name of the dimension and the dimension level that the cube references.
When you implement a cube, you specify the physical storage details for the cube. As with dimensions, Warehouse Builder enables you to implement cubes in relational or multidimensional form. We address the relational implementation in this guide.
The database object used to store the cube data is called a fact table. A cube must be implemented using only one fact table. The fact table contains columns for the cube measures and dimension references.
To implement a cube:
Select a table or materialized view that will store the cube data.
For each measure, select a column that will store the measure data.
For each dimension reference, select a column that will store the dimension reference.
Each dimension reference corresponds to a column on the fact table and optionally a foreign key from the fact table to dimension table. The 1:n relationships from the fact tables to the dimension tables must be enforced.
Figure 4-2 Implementation of the Sales Cube
When you perform binding, you specify the database columns that will store the data of each measure and dimension reference of the cube. You can perform auto binding or manual binding for a cube.
Auto Binding When you perform auto binding, Warehouse Builder creates the table that stores the cube data and then binds the cube measures and references to the database columns. For detailed steps on performing auto binding, see "Auto Binding".
When you perform auto binding for a cube, ensure that you auto bind the dimensions that a cube references before you auto bind the cube. You will not be able to deploy the cube if any dimension that the cube references has been auto bound after the cube was last auto bound.
For example, you create the SALES cube that references the TIMES and PRODUCTS dimensions and perform auto binding for the cube. You later modify the definition of the PRODUCTS dimension. If you now attempt to auto bind the SALES cube again, Warehouse Builder generates an error. You must first auto bind the PRODUCTS dimensions and then auto bind the cube.
Manual Binding In manual binding, you must first create the table or view that stores the cube data and then map the cube references and measures to the database columns that store their data. Alternatively, you can use an existing database table or view to store the cube data.
To perform manual binding for a cube:
Create the table or view that stores the cube data.
In the Project Explorer, right-click the cube and select Open Editor.
The Data Object Editor for the cube opens. On the canvas, the Dimensional tab is active.
Right-click the cube and select Detail View.
Warehouse Builder opens a new tab that has the same name as the cube.
From the Palette, drag and drop the operator that represents the implementation object onto the canvas.
Warehouse Builder displays the Add a New or Existing <Object> dialog box. For example, if the cube data is stored in a table, drag a Table operator from the Palette and drop it onto the canvas. The Add a New or Existing Table dialog box is displayed.
Choose the Select an existing <object> option and then select the data object from the list of objects displayed in the selection tree.
Click OK.
A node representing the object that you just added is displayed on the canvas.
Map the measures and dimension references of the cube to the columns that store the cube data. To do this, hold down your mouse on the measure or dimension reference, drag, and then drop on the data object attribute that stores the measure or dimension reference.