Oracle9i Data Warehousing Guide Release 2 (9.2) Part Number A96520-01 |
|
Change Data Capture efficiently identifies and captures data that has been added to, updated, or removed from, Oracle relational tables, and makes the change data available for use by applications. Change Data Capture is provided as an Oracle database server component with Oracle9i.
This chapter introduces Change Data Capture in the following sections:
See Also:
Oracle9i Supplied PL/SQL Packages and Types Reference for more information about the Change Data Capture publish and subscribe PL/SQL packages. |
Oftentimes, data warehousing involves the extraction and transportation of relational data from one or more source databases, into the data warehouse for analysis. Change Data Capture quickly identifies and processes only the data that has changed, not entire tables, and makes the change data available for further use.
Without Change Data Capture, database extraction is a cumbersome process in which you move the entire contents of tables into flat files, and then load the files into the data warehouse. This ad hoc approach is expensive in a number of ways.
Change Data Capture does not depend on intermediate flat files to stage the data outside of the relational database. It captures the change data resulting from INSERT
, UPDATE
, and DELETE
operations made to user tables. The change data is then stored in a database object called a change table, and the change data is made available to applications in a controlled way.
Table 15-1 describes the advantages of performing database extraction with Change Data Capture.
A Change Data Capture system is based on the interaction of a publisher and subscribers to capture and distribute change data, as described in the next section.
Most Change Data Capture systems have one publisher that captures and publishes change data for any number of Oracle source tables. There can be multiple subscribers accessing the change data. Change Data Capture provides PL/SQL packages to accomplish the publish and subscribe tasks.
The publisher is usually a database administrator (DBA) who is in charge of creating and maintaining schema objects that make up the Change Data Capture system. The publisher performs these tasks:
DBMS_LOGMNR_CDC_PUBLISH
, to set up the system to capture data from one or more source tables.GRANT
and REVOKE
statements to grant and revoke the SELECT
privilege on change tables for users and roles.The subscribers, usually applications, are consumers of the published change data. Subscribers subscribe to one or more sets of columns in source tables. Subscribers perform the following tasks:
DBMS_LOGMNR_CDC_SUBSCRIBE
, to subscribe to source tables for controlled access to the published change data for analysis.SELECT
statements to retrieve change data from the subscriber views.The Change Data Capture system captures the effects of DML statements, including INSERT
, DELETE
, and UPDATE
, when they are performed on the source table. As these operations are performed, the change data is captured and published to corresponding change tables.
To capture change data, the publisher creates and administers change tables, which are special database tables that capture change data from a source table.
For example, for each source table for which you want to capture data, the publisher creates a corresponding change table. Change Data Capture ensures that none of the updates are missed or duplicated.
Each subscriber has its own view of the change data. This makes it possible for multiple subscribers to simultaneously subscribe to the same change table without interfering with one another.
Figure 15-1 shows the publish and subscribe model in a Change Data Capture system.
Text description of the illustration syncfig1.gif
For example, assume that the change tables in Figure 15-1 contains all of the changes that occurred between Monday and Friday, and also assume that:
Subscribers 1 and 2 each have a unique subscription window that contains a block of transactions. Change Data Capture manages the subscription window for each subscriber by creating a subscriber view that returns a range of transactions of interest to that subscriber. The subscriber accesses the change data by performing SELECT
statements on the subscriber view that was generated by Change Data Capture.
When a subscriber needs to read additional change data, the subscriber makes procedure calls to extend the window and to create a new subscriber view. Each subscriber can walk through the data at its own pace, while Change Data Capture manages the data storage. As each subscriber finishes processing the data in its subscription window, it calls procedures to drop the subscriber view and purge the contents of the subscription window. Extending and purging windows is necessary to prevent the change table from growing indefinitely, and to prevent the subscriber from seeing the same data again.
Thus, Change Data Capture provides the following benefits for subscribers:
This section describes the Change Data Capture components shown in Figure 15-2. The publisher is responsible for all of the components shown in Figure 15-2, except for the subscriber views. The publisher creates and maintains all of the schema objects that make up the Change Data Capture system, and publishes change data so that subscribers can use it.
Subscribers are the consumers of change data and are granted controlled access to the change data by the publisher. Subscribers subscribe to one or more columns in source tables.
With synchronous data capture, the change data is generated as data manipulation language (DML) operations are made to the source table. Every time a DML operation occurs on a source table, a record of that operation is written to the change table.
The following subsections describe Change Data Capture components in more detail.
A source system is a production database that contains source tables for which Change Data Capture will capture changes.
A source table is a database table that resides on the source system that contains the data you want to capture. Changes made to the source table are immediately reflected in the change table.
A change source represents a source system. There is a system-generated change source named SYNC_SOURCE
.
A change set represents the collection of change tables. There is a system-generated change set named SYNC_SET
.
A change table contains the change data resulting from DML statements made to a single source table. A change table consists of two things: the change data itself, which is stored in a database table, and the system metadata necessary to maintain the change table. A given change table can capture changes from only one source table. In addition to published columns, the change table contains control columns that are managed by Change Data Capture. See "Columns in a Change Table" for more information.
A publication provides a way for publishers to publish multiple change tables on the same source table, and control subscriber access to the published change data. For example, Publication A consists of a change table that contains all the columns from the EMPLOYEE
source table, while Publication B contains all the columns except the salary column from the EMPLOYEE
source table. Because each change table is a separate publication, the publisher can implement security on the salary column by allowing only selected subscribers to access Publication A.
A subscriber view is a view created by Change Data Capture that returns all of the rows in the subscription window. In Figure 15-2, the subscribers have created two views: one on columns 7 and 8 of Source Table 3 and one on columns 4, 6, and 8 of Source Table 4 The columns included in the view are based on the actual columns that the subscribers subscribed to in the source table.
A subscription window defines the time range of change rows that the subscriber can currently see. The oldest row in the window is the low watermark; the newest row in the window is the high watermark. Each subscriber has a subscription window.
Change Data Capture comes pre-packaged with the appropriate Oracle9i drivers already installed with which you can implement synchronous data capture.
In addition, note that Change Data Capture uses Java. Therefore, when you install the Oracle9i database server, ensure that Java is enabled.
Change Data Capture installs systemwide triggers on the CREATE
TABLE
, ALTER
TABLE
, and DROP
TABLE
statements. If system triggers are disabled on the database instance, Change Data Capture will not function correctly. Therefore, you should never disable system triggers.
To remove Change Data Capture from the database, the SQL script rmcdc.sql
is provided in the admin
directory. This will remove the system triggers that CDC installs on the CREATE
TABLE
, ALTER
TABLE
and DROP
table statements. In addition, rmcdc.sql
removes all Java classes used by Change Data Capture. Note that after rmcdc.sql
is called, CDC will no longer operate on the system. If the system administrator decides to remove the Java Virtual Machine from a database instance, rmcdc.sql
must be called before rmjvm
is called.
To re-install Change Data Capture, the SQL script initcdc.sql
is provided in the admin
directory. It creates the CDC system triggers and Java classes that are required by Change Data Capture.
Change Data Capture does not support the direct-path INSERT
statement (and, by association, the multi_table_insert
statement) feature in parallel DML mode.
When you create a change table, Change Data Capture creates triggers on the source table. Because a direct-path INSERT
disables all database triggers, any rows inserted into the source table using the SQL statement for direct-path INSERT
in parallel DML mode will not be captured in the change table.
Similarly, Change Data Capture cannot capture the inserted rows from multitable insert operations because the SQL multi_table_insert
statement in parallel DML mode uses direct-path INSERT
. Also, note that the multitable insert operation does not return an error message to indicate that the triggers used by Change Data Capture did not fire.
See Also:
Oracle9i SQL Reference for more information regarding multitable inserts, direct-path |
You grant privileges for a change table separately from the privileges you grant for a source table. For example, a subscriber that has privileges to perform a SELECT
operation on a source table might not have privileges to perform a SELECT
operation on a change table.
The publisher controls subscribers' access to change data by using the SQL GRANT
and REVOKE
statements to grant and revoke the SELECT
privilege on change tables for users and roles. The publisher must grant the SELECT
privilege before a user or application can subscribe to the change table.
The publisher must not grant any DML access (using either the INSERT
, UPDATE
, or DELETE
statements) to the subscribers on the change tables because of the risk that a subscriber might inadvertently change the data in the change table, making it inconsistent with its source. Furthermore, the publisher should avoid creating change tables in schemas to which users have DML access.
A change table contains the change data resulting from DML statements. A change table consists of two things: the change data itself, which is stored in a database table and the system metadata necessary to maintain the change table.
The change table contains control columns that are managed by Change Data Capture. Table 15-2 describes the contents of a change table.
Information about the Change Data Capture environment is provided in the views described in Table 15-3.
Note: See also Oracle9i Database Reference for complete information about views. |
Synchronous data capture provides up-to-the-second accuracy because the changes are being captured continuously and in real time on the production system. The change tables are populated after DML operations occur on the source table.
While synchronous mode data capture adds overhead to the system at capture time, it can reduce cost by simplifying the extraction of change data.
This section provides step-by-step instructions for setting up a Change Data Capture system to capture and publish data from one or more Oracle relational source tables. Change Data Capture captures and publishes only committed data.
You need to decide which Oracle instance will be the source system that will provide the change data. The publisher needs to gather requirements from the subscribers and determine which source system contains the relevant source tables.
You need to create the change tables that will contain the changes to individual source tables. Use the DBMS_LOGMNR_CDC_PUBLISH.CREATE_CHANGE_TABLE
procedure to create change tables.
Create a change table for each source table to be published, and decide which columns should be included. For update operations, decide whether to capture old values, new values, or both.
The publisher can set the options_string
field of the DBMS_LOGMNR_CDC_PUBLISH.CREATE_CHANGE_TABLE
procedure to have more control over the physical properties and tablespace properties of the change tables. The options_string
field can contain any option available on the CREATE TABLE
DDL
statement.
The following example creates a change table that captures changes that happen to a source table. The example uses the sample table SCOTT.EMP
.
EXECUTE DBMS_LOGMNR_CDC_PUBLISH.CREATE_CHANGE_TABLE (OWNER => 'cdc',\ CHANGE_TABLE_NAME => 'emp_ct', \ CHANGE_SET_NAME => 'SYNC_SET', \ SOURCE_SCHEMA => 'scott', \ SOURCE_TABLE => 'emp',\ COLUMN_TYPE_LIST =>. 'empno number, ename varchar2(10), job varchar2(9), mgr number, hiredate date, deptno number', \ CAPTURE_VALUES => 'both', \ RS_ID => 'y' \ ROW_ID => 'n', \ USER_ID => 'n', \ TIMESTAMP => 'n', \ OBJECT_ID => 'n', \ SOURCE_COLMAP => 'y', \ TARGET_COLMAP => 'y', \ OPTIONS_STRING => null);
This statement creates a change table named emp_ct
within the change set SYNC_SET
. The column_type_list
parameter identifies the columns captured by the change table. The source_schema
and source_table
parameters identify the schema and source table that reside on the production system.
The capture_values
setting in the example indicates that for UPDATE
operations, the change data will contain two separate rows for each row that changed: one row will contain the row values before the update occurred, and the other row will contain the row values after the update occurred.
This section describes storage management and how the publisher is able to manage change tables and subscriptions.
To ensure that the size of change tables does not grow without limit, Change Data Capture manages the data in change tables and automatically purges change data that is no longer needed. The DBMS_CDC_PUBLISH.PURGE
procedure should be called periodically to removed data from change tables that is no longer required. PURGE
looks at all active subscription windows to determine which change data is still in use. It will not purge any data as long as subscribers have active subscription windows that reference the change data.
Subscribers must call DBMS_CDC_SUBSCRIBE.PURGE_WINDOW
when they are finished using change data. This indicates to CDC that the change data is no longer needed, and that PURGE
may safely remove the unneeded rows. Conversely, until all subscribers have called PURGE_WINDOW
on their subscription windows, the change data is considered still in use: PURGE
will not remove those rows from the change table.
It is possible that a subscriber could fail to call PURGE_WINDOW
, with the end result being that a change table would not be purged. The DBA_SUBSCRIPTIONS
view helps the publisher determine if this is happening. In extreme circumstances a publisher may decide to drop an active subscription so that space can be reclaimed. An example might be that the subscriber is an applications program that is not calling PURGE_WINDOW
as needed. The DBA_CDC_PUBLISH.DROP_SUBSCRIPTION
procedure lets the publisher drop active subscriptions if circumstances require it: however, the publisher should first consider that subscribers may still be using the change data. You must use DBMS_CDC_PUBLISH.DROP_SUBSCRIBER_VIEW
to drop any subscriber views prior to dropping a subscription using the DBMS_CDC_PUBLISH.DROP_SUBSCRIPTION
procedure.
The PURGE
procedure normally runs in a job queue, therefore it runs automatically. The publisher can execute PURGE
manually at any time however.
Note that it is not possible to drop change tables by using the conventional DROP
TABLE
statement. If it is necessary to drop a change table, the procedure DBMS_CDC_PUBLISH.DROP_CHANGE_TABLE
must be called. This procedure ensures that both the change table itself as well as the CDC metadata for it are both dropped. If you try to use DROP
TABLE
on a change table, it will raise the error:
ORA-31496 must use DBMS_CDC_PUBLISH.DROP_CHANGE_TABLE
to drop change tables
DROP_CHANGE_TABLE
procedure also safeguards the publisher from inadvertently dropping a change table while there are active subscribers that are using the change table. If DROP_CHANGE_TABLE
is dropped while subscriptions are active, the procedure will fail with the Oracle error:
ORA-31424 change table has active subscriptions
If the publisher really wants to drop the change table in spite of active subscriptions, DROP_CHANGE_TABLE
procedure must be called using the parameter FORCE => 'Y'
. This tells CDC to override its normal safeguards and allow the change table to be dropped despite active subscriptions. The subscriptions will no longer be valid, and subscribers will lose access to the change data.
The subscribers, typically applications, register their interest in one or more source tables, and obtain subscriptions to these tables. Assuming sufficient access privileges, the subscribers may subscribe to any source tables that the publisher has published.
The primary role of the subscriber is to access and use the change data. To do this, the subscriber must first determine which source tables are of interest, and then call the procedures in the DBMS_LOGMNR_CDC_SUBSCRIBE
package to access them.
Query the ALL_SOURCE_TABLES
view to see all of the published source tables for which the subscriber has access privileges.
Call the DBMS_LOGMNR_CDC_SUBSCRIBE
.GET_SUBSCRIPTION_HANDLE
procedure to create a subscription.
The following example shows how the subscriber first names the change set of interest (SYNC_SET
), and then returns a unique subscription handle that will be used throughout the session.
EXECUTE SYS.DBMS_LOGMNR_CDC_SUBSCRIBE.GET_SUBSCRIPTION_HANDLE ( \ CHANGE_SET => 'SYNC_SET',\ DESCRIPTION => 'Change data for emp',\ SUBSCRIPTION_HANDLE => :subhandle);
Use the DBMS_LOGMNR_CDC_SUBSCRIBE
.SUBSCRIBE
procedure to specify which columns of the source tables are of interest to the subscriber and are to be captured.
The subscriber identifies the columns of the source table that are of interest. A subscription can contain one source table or multiple tables from the same change set. To see all of the published source table columns for which the subscriber has privileges, query the ALL_PUBLISHED_COLUMNS
view.
In the following example, the subscriber wants to see only one source table.
EXECUTE SYS.DBMS_LOGMNR_CDC_SUBSCRIBE.SUBSCRIBE (\ SUBSCRIPTION_HANDLE => :subhandle, \ SOURCE_SCHEMA => 'scott', \ SOURCE_TABLE => 'emp', \ COLUMN_LIST => 'empno, ename, hiredate');
Use the DBMS_LOGMNR_CDC_SUBSCRIBE
.ACTIVATE_SUBSCRIPTION
procedure to activate the subscription.
Subscribers call this procedure when they are finished subscribing to source tables, and are ready to receive change data. Whether subscribing to one or multiple source tables, the subscriber needs to call the ACTIVATE_SUBSCRIPTION
procedure only once.
In the following example, the ACTIVATE_SUBSCRIPTION
procedure sets the subscription window to empty. At this point, no additional source tables can be added to the subscription.
EXECUTE SYS.DBMS_LOGMNR_CDC_SUBSCRIBE.ACTIVATE_SUBSCRIPTION ( \ SUBSCRIPTION_HANDLE => :subhandle);
Call the DBMS_LOGMNR_CDC_SUBSCRIBE.EXTEND_WINDOW
procedure to set the upper boundary (called a high-water mark) for a subscription window.
For example:
EXECUTE SYS.DBMS_LOGMNR_CDC_SUBSCRIBE.EXTEND_WINDOW (\ SUBSCRIPTION_HANDLE => :subhandle);
At this point, the subscriber has created a new window that begins where the previous window ends. The new window contains any data that was added to the change table. If no new data has been added, the EXTEND_WINDOW
procedure has no effect. To access the new change data, the subscriber must call the CREATE_SUBSCRIBER_VIEW
procedure, and select from the new subscriber view that is generated by Change Data Capture.
Use the DBMS_LOGMNR_CDC_SUBSCRIBE.PREPARE_SUBSCRIBER_VIEW
procedure to create and prepare a subscriber view. (You must do this for each change table in the subscription.)
Subscribers do not access data directly from a change table; subscribers see the change data through subscriber views and perform SELECT
operations against them. The reason for this is because Change Data Capture generates a view that restricts the data to only the columns to which the application has subscribed, and returns only the rows that the application has not viewed previously. The contents of the subscriber view will not change.
The following example shows how to prepare a subscriber view:
EXECUTE SYS.DBMS_LOGMNR_CDC_SUBSCRIBE.PREPARE_SUBSCRIBER_VIEW ( \ SUBSCRIPTION_HANDLE => :subhandle, \ SOURCE_SCHEMA => 'scott',\ SOURCE_TABLE => 'emp', \ VIEW_NAME => :viewname);
Use the SQL SELECT
statement on the subscriber view to read and query the contents of change tables (within the boundaries of the subscription window). You must do this for each change table in the subscription. For example:
SELECT * FROM CDC#CV$119490;
The subscriber view name, CDC#CV$119490
, is a generated name.
Use the DBMS_LOGMNR_CDC_SUBSCRIBE
.DROP_SUBSCRIBER_VIEW
procedure to drop the subscriber views.
Change Data Capture guarantees not to change the subscriber view, even if new data has been added. Subscribers continue to have access to a subscriber view until calling the DROP_SUBSCRIBER_VIEW
procedure, which indicates the subscriber is finished using the view. For example:
EXECUTE SYS.DBMS_LOGMNR_CDC_SUBSCRIBE.DROP_SUBSCRIBER_VIEW (\ SUBSCRIPTION_HANDLE => :subhandle, \ SOURCE_SCHEMA => 'scott', \ SOURCE_TABLE => 'emp');
Use the DBMS_LOGMNR_CDC_SUBSCRIBE
.PURGE_WINDOW
procedure to let the Change Data Capture software know that the subscriber no longer needs the data in the current subscription window.
For example:
EXECUTE SYS.DBMS_LOGMNR_CDC_SUBSCRIBE.PURGE_WINDOW (\ SUBSCRIPTION_HANDLE => :subhandle);
Repeat steps 5 though 9 as long as you are interested in additional change data.
Use the DBMS_LOGMNR_CDC_SUBSCRIBE
.DROP_SUBSCRIPTION
procedure to end the subscription. This is necessary to prevent the change tables from growing without bound. For example:
EXECUTE SYS.DBMS_LOGMNR_CDC_SUBSCRIBE.DROP_SUBSCRIPTION (\ SUBSCRIPTION_HANDLE => :subhandle);
The Change Data Capture environment is dynamic in nature. The publisher can add and drop change tables at any time. The publisher can also add to and drop columns from existing change tables at any time. The following list describes how changes to the Change Data Capture environment affect subscriptions:
PREPARE_SUBSCRIBER_VIEW
procedure, an appropriate exception is generated. This becomes the notification mechanism.NULL
for the newly added control column field.DBMS_LOGMNR_CDC_PUBLISH.EXTEND_WINDOW
procedure) such that the low watermark for the window crosses over the point when the control column was added.When exporting or importing change tables for Change Data Capture, consider the following information:
DBMS_CDC_PUBLISH.PURGE
procedure). If a change table is imported, but no subscriptions are taken out before the purge job runs (24 hours later, by default), then all rows in the table will be purged.
Choose one of the following methods to prevent the purging of data from a change table:
DBMS_JOB
package to either disable the job (using the BROKEN
procedure) or execute the job sometime in the future when there are subscriptions (using the NEXT_DATE
procedure).
Assume that you have a source table Employees
that has an associated change table CT_Employees
. When you import data into Employees
, that data is also recorded in CT_Employees
.
ROW_ID
column, the ROW_ID
columns stored in the change table have meaning only if the associated source table has not been imported. If a source table is re-created or imported, each row will have a new ROW_ID
that is unrelated to the ROW_ID
that was previously recorded in a change table.
|
Copyright © 1996, 2002 Oracle Corporation. All Rights Reserved. |
|