Oracle9i Real Application Clusters Administration Release 2 (9.2) Part Number A96596-01 |
|
This chapter explains how to perform backup and recovery in Real Application Clusters. This chapter also includes information about using Recovery Manager (RMAN) for backup and recovery processing in Real Application Clusters environments. The topics in this chapter include:
The procedures for executing RMAN backups in Real Application Clusters environments do not differ substantially from the RMAN backup procedures for single-instance environments. However, the following topics describe a few issues that are specific to Real Application Clusters:
See Also:
Oracle9i Recovery Manager User's Guide for more information about single-instance RMAN backup procedures |
When you start RMAN and connect to the target database, RMAN can only connect to one instance in a Real Application Clusters database at a time. Note that this connection is a utility connection that does not perform any backups or restores, and applies only to the connection made from the RMAN command line.
Assume that node1, node2, and node3 are net service names for three instances in a Real Application Clusters configuration. In this case, connect to the target database with only one of these net service names. For example, you can connect as follows:
% rman TARGET SYS/oracle@node2 CATALOG rman/cat@catdb
In any RMAN connection made through a net service name, each net service name must specify one and only one instance. This rule applies to all RMAN connections whether they are made from the command line or through the CONNECT
clause in ALLOCATE
CHANNEL
or CONFIGURE
CHANNEL
commands. Therefore, you cannot specify a net service name that uses Oracle Net features to distribute RMAN connections to more than one instance.
When making backups in a Real Application Clusters configuration, each allocated channel can connect to a different instance in the cluster, and each channel connection must resolve to one and only one instance. For example, configure automatic channels as follows:
CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE DEVICE TYPE sbt PARALLELISM 3; CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT = 'SYS/oracle@node1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT = 'SYS/oracle@node2'; CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT = 'SYS/oracle@node3';
If the instance to which one of the channels is connected does not have the database open, then the database must not be open by any instance. In other words, either all channels must be connected to open instances, or all channels must be connected to instances that are not open. For example, if the node1 instance has the database mounted while the node2 and node3 instances have the database open, then the backup fails.
In some cluster database configurations, some nodes of the cluster have faster access to some datafiles than to other datafiles. RMAN automatically detects this affinity, which is known as node affinity awareness.
When deciding which channel to use to back up a particular datafile, RMAN gives preference to channels allocated at the nodes that have affinity to the datafiles you want to back up. For example, if you have a three-node cluster, and if node 1 has faster read/write access to datafiles 7, 8, and 9 than the other nodes, then node 1 has greater node affinity to those files than nodes 2 and 3.
To use node affinity, configure RMAN channels on the nodes of the cluster that have affinity to the datafiles you want to back up. For example, use the syntax:
CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT 'user1/password1@node1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT 'user2/password2@node2'; CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT 'user3/password3@node3';
You can manually override the automatic node affinity by specifying which channels should back up which datafiles. For example:
BACKUP # channel 1 gets datafile 1 (DATAFILE 1 CHANNEL ORA_SBT_TAPE_1) # channel 2 gets datafiles 2-4 (DATAFILE 2,3,4 CHANNEL ORA_SBT_TAPE_2) # channel 3 gets datafiles 5-10 (DATAFILE 5,6,7,8,9,10 CHANNEL ORA_SBT_TAPE_3);
See Also:
Oracle9i Recovery Manager User's Guide for more information about the |
Other important considerations for performing RMAN backups in a cluster are discussed under the following headings:
The node performing the backup must be able to read all of the files specified in the BACKUP
command. For example, assume that you run the following command on node 1 of a three-node cluster:
BACKUP DATABASE PLUS ARCHIVELOG;
In this case, RMAN attempts to back up all datafiles and archived logs. Because the datafiles are either cluster file system files or files on a shared disk, RMAN can read them. However, RMAN cannot back up any of the logs that the local node cannot read. The archiving schemes in "RMAN Archiving Configuration Schemes" explain how to configure the environment so that all logs are accessible by the node performing the backup.
The BACKUP
command must be able to delete the archived logs from disk after backing them up. For example, the DELETE
INPUT
clause specifies that RMAN should delete only the specific log that it backed up, whereas the DELETE
ALL
INPUT
clause specifies that RMAN should delete all logs that have the same thread and sequence number as the log backed up. If you are using the "Non-CFS Local Archiving Scheme" described, then you can specify either DELETE
INPUT
or DELETE
ALL
INPUT
.
This scheme describes an archiving scheme in which each node has read/write access to its local archiving directory, and either no read access to the others (when NFS is not set up) or read-only access to others (when the remote directories are mounted NFS read). In such cases, the best practice is not to specify the DELETE
ALL
INPUT
or DELETE
INPUT
clauses on the BACKUP
command. Instead, use the DELETE
command.
The following script is an example of one method for deleting the archived logs from each node after backing them up:
ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK CONNECT 'SYS/oracle@node1'; DELETE ARCHIVELOG LIKE '%arc_dest_1%' BACKED UP 1 TIMES TO DEVICE TYPE sbt; RELEASE CHANNEL; ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK CONNECT 'SYS/oracle@node2'; DELETE ARCHIVELOG LIKE '%arc_dest_2%' BACKED UP 1 TIMES TO DEVICE TYPE sbt; RELEASE CHANNEL; ALLOCATE CHANNEL FOR MAINTENANCE DEVICE TYPE DISK CONNECT 'SYS/oracle@node3'; DELETE ARCHIVELOG LIKE '%arc_dest_3%' BACKED UP 1 TIMES TO DEVICE TYPE sbt; RELEASE CHANNEL;
When configuring the backup media in a Real Application Clusters configuration, you have three options:
If only one node has a tape drive attached, then this node must be able to read all datafiles and archived logs. Both the "Cluster File System Archiving Scheme" and the "Non-CFS Local Archiving Scheme" describe scenarios in which one node in the cluster can back up all the files. However, the non-CFS scheme requires you to configure NFS so that one node can back up all of the logs. For this reason, Oracle Corporation does not recommend that you use the non-CFS archiving scheme if you have only one local drive.
Alternatively, you can use a multiple drive scheme in which each node can write to its own local tape drive. In the CFS scheme, any node can back up all datafiles and archived logs. In the non-CFS scheme, you must write the backup script so that the backup is distributed. In a distributed backup, different files are sent to the drive attached to each node. For example, node 1 can back up the logs whose path names begin with /arc_dest_1
, node 2 can back up the logs whose path names begin with /arc_dest_2
, and node 3 can back up the logs whose path names begin with /arc_dest_3
.
RMAN automatically performs autolocation of all files that it needs to back up or restore. This feature is automatically enabled whenever the allocated channels use different CONNECT
or PARMS
settings.
The autolocation feature is important for backups of archived redo logs. If you use the "Non-CFS Local Archiving Scheme" described, then each node can read only a subset of all of the logs that were generated. For example, node 1 can only read logs whose path names begin with /arc_dest_1
, node 2 can only read logs whose path names begin with /arc_dest_2
, and node 3 can only read logs whose path names begin with /arc_dest_3
. RMAN never attempts to back up logs on a channel unless RMAN can read the logs by using that channel. Each channel restricts its archived log backup to the logs that it is able to read.
During a restore operation, RMAN automatically performs the autolocation of backups. In other words, a channel connected to a specific node only attempts to restore files that were backed up to the node. For example, assume that log sequence 1001 is backed up to the drive attached to node 1, while log 1002 is backed up to the drive attached to node 2. If you then allocate channels that connect to each node, then the channel connected to node 1 can restore log 1001 (but not 1002), and the channel connected to node 2 can restore log 1002 (but not 1001).
This section describes the following backup schemes:
This scheme refers to the archiving scenario described under the heading "Cluster File System Archiving Scheme". In a CFS backup scheme, each node in the cluster has read access to all the datafiles and archived redo logs.
This scheme assumes that only one node in the cluster has a local tape drive. In this case, execute the following one-time configuration commands:
CONFIGURE DEVICE TYPE sbt PARALLELISM 1; CONFIGURE DEFAULT DEVICE TYPE TO sbt;
Because any node performing the backup has read/write access to the archived logs written by the other nodes, the backup script for any node is simple:
BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;
In this case, the tape drive receives all datafiles and archived logs.
This scheme assumes that each node in the cluster has its own local tape drive. Perform the following one-time configuration so that one channel is configured for each node in the cluster. For example, enter the following at the RMAN prompt:
CONFIGURE DEVICE TYPE sbt PARALLELISM 3; CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT 'user1/password1@node1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT 'user2/password2@node2'; CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT 'user3/password3@node3';
Similarly, you can perform this configuration for a device type of DISK
.
Note: As mentioned, this is a one-time configuration step: you do not need to issue these configuration commands for every backup. |
The following backup script, which you can run from any node in the cluster, distributes the datafile and archived log backups among the tape drives:
BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;
For example, if the database contains 10 datafiles and 100 logs are on disk, then the node 1 tape drive can back up datafiles 1, 3, and 7 and logs 1-33. Node 2 can back up datafiles 2, 5, and 10 and logs 34-66. The node 3 tape drive can back up datafiles 4, 6, 8 and 9 as well as logs 67-100.
This scheme refers to the archiving scenario described under the heading "Non-CFS Local Archiving Scheme". In a non-CFS backup scheme, the datafiles are on shared disk and are accessible by all the nodes in your cluster database. Therefore, any node can back up all the datafiles. In contrast, in a non-CFS environment, each node can back up only its own local logs, that is, node 1 cannot access the logs on node 2 or node 3 unless you configure NFS for remote access.
To configure NFS, distribute the backup to multiple drives if you do not want to configure NFS for backups. However, if you configure NFS for backups, then you can only back up to one drive.
This scheme assumes that each node in the cluster has its own local tape drive. Perform the following one-time configuration to configure one channel for each node in the cluster. For example, enter the following at the RMAN prompt:
CONFIGURE DEVICE TYPE sbt PARALLELISM 3; CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT 'user1/password1@node1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT 'user2/password2@node2'; CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT 'user3/password3@node3';
Similarly, you can perform this configuration for a device type of DISK
.
Note: As mentioned, this is a one-time configuration step: you do not need to issue these configuration commands for every backup. |
Develop a production backup script for whole database backups that you can run from any node. The RMAN autolocation feature ensures that the channel allocated on each node only backs up logs that are located on that node. The following example uses automatic channels to make a database and archived log backup:
BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;
In this example, the datafile backups and logs are distributed among the different tape drives. However, channel 1 can only read the logs archived locally on /arc_dest_1
. This is because the autolocation feature restricts channel 1 to only back up the logs in the /arc_dest_1
directory and because node 2 can only read files in the /arc_dest_2
directory, channel 2 can only back up the logs in the /arc_dest_2
directory, and so on. The important point is that all logs are backed up, but they are distributed among the different drives.
This scheme assumes that only one node in the cluster has a local tape drive. To make backups in this scheme, you must configure NFS so that the backup node has read access to the logs archived locally on the other nodes. For this reason, Oracle Corporation does not recommend that you back up to one local drive in a non-CFS archiving scheme. In this case, you can execute the following one-time configuration commands:
CONFIGURE DEVICE TYPE sbt PARALLELISM 1;
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
Because the node making the backup can read the logs archived by all of the nodes through NFS, the backup scripts do not differ from the scripts in a single-instance database. You can run the same script no matter which node is performing the backup. However, only one archiving directory on each node has read/write access. Therefore, you cannot specify DELETE
ALL
INPUT
or DELETE
INPUT
. You must instead execute DELETE
commands on each node to delete the redundant logs from disk.
For example, a production script for whole database and archived log backups from the backup node is:
BACKUP DATABASE PLUS ARCHIVELOG; # do not specify DELETE ... INPUT
To back up only the archived logs, you can run the following script:
BACKUP ARCHIVELOG ALL; # do not specify DELETE ... INPUT
An instance failure occurs when software or hardware problems disable an instance. After instance failure, Oracle automatically uses the online redo log file to perform database recovery as described in the following sections:
See Also:
Oracle9i Backup and Recovery Concepts for a general explanation of instance failure and recovery |
Instances in Real Application Clusters perform recovery through the SMON processes of the surviving instances. Instance recovery does not include restarting the failed instance or the recovery of applications that were running on the failed instance. Applications that were running can continue by using failover as described in Oracle9i Real Application Clusters Setup and Configuration.
When one instance performs recovery for another instance, the surviving instance reads redo log entries generated by the failed instance and uses that information to ensure that committed transactions are recorded in the database. Thus, data from committed transactions is not lost. The instance performing recovery rolls back transactions that were active at the time of the failure and releases resources used by those transactions.
Note: All online redo logs must be accessible for recovery. Therefore, Oracle Corporation recommends that you mirror your online logs. |
See Also:
Oracle9i Real Application Clusters Concepts for conceptual information about application failover and high availability |
After multiple node failures, as long as one instance survives, its SMON process performs instance recovery for any other instances that fail. If all instances of a Real Application Clusters database fail, then Oracle performs failure recovery automatically the next time an instance opens the database.
The instance performing recovery does not have to be one of the instances that failed. In addition, the instance performing recovery can mount the database in either shared or exclusive mode from any node of a Real Application Clusters database. This recovery procedure is the same for Oracle running in shared mode as it is for Oracle running in exclusive mode, except that one instance performs instance recovery for all the failed instances.
If you use a recovery catalog, then RMAN uses it to recover the control file of the failed instance. If you do not use a recovery catalog, then RMAN uses a copy of the control file on the recovering instance to recover the control file of the failed instance.
An instance performing recovery for a failed instance must also access all online datafiles that the failed instance accessed. When instance recovery fails because a datafile fails verification, Oracle writes a message to the alert log. After you correct the problem that prevented access to the file, use the SQL statement ALTER SYSTEM CHECK DATAFILES
to verify the datafiles and make them available to the instance.
See Also:
Oracle9i SQL Reference for more information about the |
Figure 7-1 and the narrated steps following illustrate the degree of database availability during each step of Oracle instance recovery.
The steps in recovery are:
Note: The Global Cache Service Processes (LMSn) only re-master resources that lose their masters. |
Media failures occur when Oracle file storage media are damaged. Typically, a media failure prevents Oracle from reading or writing data, resulting in the loss of one or more database files. Media recovery must be user-initiated through a client application, whereas instance recovery is automatically performed by the database.
In these situations, use Recovery Manager (RMAN) to restore backups of the datafiles and then recover the database. The procedures for RMAN media recovery in Real Application Clusters environments do not differ substantially from the media recovery procedures for single-instance environments.
The issues for media recovery are the same as the issues described under the heading "Accessibility of Files and Backup Media". The node that performs the recovery must be able to restore all the required datafiles. That node must also be able to either read all the required archived logs on disk or be able to restore them from backups.
This section describes the following restore schemes:
If you made backups in a CFS scheme, then the restore and recovery procedures are simple and do not differ substantially from a single-instance scenario.
Assume that you use the nondistributed backup scheme described under the heading "Backing Up to One Local Drive in the CFS Archiving Scheme". This example requires the following channel configuration:
CONFIGURE DEVICE TYPE sbt PARALLELISM 1; CONFIGURE DEFAULT DEVICE TYPE TO sbt;
Assume that node 3 performs the backups. If node 3 is available for the restore and recovery processing, and if all the existing logs have been backed up or are on disk, then run the following commands to perform complete recovery:
RESTORE DATABASE; RECOVER DATABASE;
If node 3 performed the backups but is unavailable, then configure a media management device for one of the remaining nodes and make the tapes from node 3 available to this device.
Assume that you use the distributed backup scheme described under the heading "Backing Up to Multiple Drives in the CFS Archiving Scheme". Perform the following one-time configuration so that one channel is configured for each node in the cluster. For example, enter the following at the RMAN prompt:
CONFIGURE DEVICE TYPE sbt PARALLELISM 3; CONFIGURE DEFAULT DEVICE TYPE TO sbt; CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT 'user1/password1@node1'; CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT 'user2/password2@node2'; CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT 'user3/password3@node3';
If all existing logs have been backed up or are on disk, then run the following commands for complete recovery from any node in the cluster:
RESTORE DATABASE; RECOVER DATABASE;
Because RMAN autolocates the backups before restoring them, the channel connected to each node only restores the files that were backed up to the tape drive attached to the node.
In this scheme, each node archives locally to a different directory. For example, node 1 archives to /arc_dest_1
, node 2 archives to /arc_dest_2
, and node 3 archives to /arc_dest_3
. You must configure NFS so that the recovery node can read the archiving directories on the remaining nodes. The restore and recovery procedure depends on whether the backups are distributed or nondistributed.
Assume that you use the distributed backup scheme described under the heading "Backing Up to Multiple Drives in a Non-CFS Backup Scheme". If all nodes are available and if all archived logs have been backed up, then you can perform a complete restore and recovery by mounting the database and executing the following commands from any node:
RESTORE DATABASE; RECOVER DATABASE;
The recovery node begins a server session on each node in the cluster. Because this example assumes that database backups are distributed, the server sessions restore the backup datafiles from the tape drives attached to each node. Because the NFS configuration enables each node read access to the other nodes, the recovery node can read and apply the archived logs located on the local and remote disks. No manual transfer of logs is required.
Assume that you use the nondistributed backup scheme described under the heading "Backing Up to One Local Drive in a Non-CFS Archiving Scheme". You have the following channel configuration:
CONFIGURE DEVICE TYPE sbt PARALLELISM 1; CONFIGURE DEFAULT DEVICE TYPE TO sbt;
Assume that node 3 performs the backups. If node 3 is available for the restore and recovery operation, and if the NFS mount points for the remote nodes are accessible, then run the following commands for complete recovery:
RESTORE DATABASE; RECOVER DATABASE;
Note that if some of the nodes are down and you are prevented from accessing their logs through NFS, and if you do not have backups of the logs required for a complete recovery, then must perform an incomplete recovery up to the point of the first missing log after the whole database backup as in the following example:
RUN { # in this example, sequence 1234 is the first missing log SET UNTIL LOG SEQUENCE 1234 THREAD 3; RESTORE DATABASE; RECOVER DATABASE;
}
ALTER DATABASE OPEN RESETLOGS;
Parallel recovery uses multiple CPUs and I/O parallelism to reduce the time required to perform thread or media recovery. Parallel recovery is most effective at reducing recovery time while concurrently recovering several datafiles on several disks. You can use parallel instance recovery, parallel failure recovery, and parallel media recovery in Real Application Clusters databases.
See Also:
Oracle9i User-Managed Backup and Recovery Guide for more information on these topics |
With RMAN's RESTORE
and RECOVER
commands, Oracle automatically parallelizes the following three stages of recovery as described in this section:
When restoring datafiles, the number of channels you allocate in the RMAN recover script effectively sets the parallelism RMAN uses. For example, if you allocate five channels, you can have up to five parallel streams restoring datafiles.
Similarly, when you are applying incremental backups, the number of channels you allocate determines the potential parallelism.
RMAN applies redo logs using a specific number of parallel processes as determined by the setting for the RECOVERY_PARALLELISM
initialization parameter.
If you employ user-managed methods to back up and recover your database, then you can parallelize instance and media recovery using either of the following procedures:
Real Application Clusters can use one process to read the log files sequentially and dispatch redo information to several recovery processes to apply the changes from the log files to the datafiles. Oracle automatically starts the recovery processes, so you do not need to use more than one session to perform recovery.
The RECOVERY_PARALLELISM
initialization parameter specifies the number of redo application server processes that participate in instance or media recovery. One process reads the log files sequentially and dispatches redo information to several recovery processes. The recovery processes then apply the changes from the log files to the datafiles. A value of 0
or 1
indicates that Oracle performs recovery serially by one process. The value of this parameter cannot exceed the value of the PARALLEL_MAX_SERVERS
parameter.
When you use the RECOVER
statement to parallelize instance and media recovery, the allocation of recovery processes to instances is operating system-specific. The DEGREE
keyword of the PARALLEL
clause can either signify the number of processes on each instance of a Real Application Clusters database or the number of processes to distribute across all instances.
|
Copyright © 1998, 2002 Oracle Corporation. All Rights Reserved. |
|