Oracle9i Real Application Clusters Real Application Clusters Guard I - Concepts and Administration Release 2 (9.2) Part Number A96601-01 |
|
This chapter describes how to troubleshoot an Oracle Real Application Clusters Guard system. It includes the following topics:
Oracle Real Application Clusters Guard provides detailed error messages that can help in troubleshooting. Error messages from the Oracle database server and from third-party media vendors also provide useful troubleshooting output. This section contains the following topics:
Table 8-1 shows the types of message output that are useful for troubleshooting Oracle Real Application Clusters Guard.
Type of Output | Produced By | Location | Description |
---|---|---|---|
PFSCTL messages |
|
OFA: Non-OFA: |
- |
PFS messages |
Packs and monitors |
OFA: Non-OFA: |
Contains a chronological log of actions that are relevant to Oracle Real Application Clusters Guard, error messages that are generated by Oracle Real Application Clusters Guard and the Oracle database server, and administrative operations |
PFS debug file |
|
OFA: Non-OFA: |
Contains detailed output generated by Oracle Real Application Clusters Guard processes. This file is appended when |
Monitor log files |
Heartbeat monitor |
OFA: Non-OFA: |
Contains information about the functioning of the heartbeat monitor |
PFS trace file |
Heartbeat monitor |
The directory specified by the Default: |
Contains SQL*Trace output, including wait and bind data. This file is created when See Also: "Making Online Changes to the ORAPING_CONFIG Table" |
Fault data capture |
Listener monitor |
|
Contains output generated by the |
Fault data capture |
Pack |
The directory specified by the Default: |
Contains output generated by Oracle system state dump and |
Alert log |
Oracle database server |
The directory specified by the Default: |
Contains a chronological log of errors, initialization parameter file settings, and administrative operations |
Oracle trace file |
Oracle database server |
The directory specified by the Default: |
Contains detailed output generated by Oracle server processes |
System logs |
Operating system |
For Sun: |
- |
The following types of error codes are found in the Oracle Real Application Clusters Guard logs and trace files:
Table 8-2 shows the error ranges for Oracle Real Application Clusters Guard error messages. The prefix is PFS
.
Note the following suggestions for identifying useful messages in the Oracle Real Application Clusters Guard log files:
Warning
. This indicates that a pack or a monitor has incurred a problem but is continuing to operate. It is an indication that actions may need to be taken before an outage occurs. The problem is described in the text of the message.Alert
. This indicates the problem that the pack or monitor incurred.The following is an example of messages from the Oracle Real Application Clusters Guard log file, pfs_SALES_hostA.log
:
Wed Jan 10 11:57:14 2001 PFS-6014: Info: Routine connecting to instance. Wed Jan 10 11:57:14 2001 ERROR: Encountered Oracle error while executing CONNECT --! Wed Jan 10 11:57:14 2001 ORA-01034: ORACLE not available ORA-27101: shared memory realm does not exist SVR4 Error: 2: No such file or directory Wed Jan 10 11:57:14 2001 PFS-6016: Alert: Routine failed to connect to instance. Wed Jan 10 11:57:14 2001 PFS-6003: Warning: Routine 1 exits. Attempt to stop routine 0. Wed Jan 10 11:57:14 2001 PFS-6006: Alert: ORACLE instance is not available. Instance monitor exits.
Read the log in chronological order. The first alert message is:
PFS-6016: Alert: Routine failed to connect to instance.
The Oracle Real Application Clusters Guard error number is PFS-6016
, which means that the problem concerns the instance monitor. The messages before the alert contain an ORA-01034
error:
ORA-01034: ORACLE not available
You can conclude that the pack was halted because the Oracle instance or the database is down.
If the standard Oracle Real Application Clusters Guard logging is not generating enough information, then the Oracle Real Application Clusters Guard debugging option can be used to generate more extensive output. Enable the Oracle Real Application Clusters Guard debugging option by setting the PFS_DEBUGGING
parameter to $PFS_TRUE
.
Use debugging for the following purposes:
The output is redirected to a separate trace file to prevent overloading the Oracle Real Application Clusters Guard log file. The debugging output contains the following information:
See Also:
"Changing Oracle Real Application Clusters Guard Configuration Parameters" for more information about change the value of |
Use the pfsboot
command to start the packs. The steps of the pfsboot
command are as follows:
PFSBOOT
command. These conditions cannot exist:
If the pfsboot
command fails, then check the following items:
Oracle Corporation recommends setting up the call-home function to alert the user when the pfsboot
command fails during normal processing.
The Oracle Real Application Clusters Guard logs should clearly describe why the pfsboot
command failed. You may need to stop the database manually before reissuing the pfsboot
command. The pfsboot
command may also fail if the packs are running in foreign mode or if the monitors do not start successfully.
This section contains the following examples:
See Also:
|
When you enter the pfsboot
command, the following message may appear at the command line:
Alert: pfsboot command failed.
The following output appears in the Oracle Real Application Clusters Guard log on hostA
(pfs_SALES_hostA.log)
:
Fri Jan 12 16:15:07 2001 PFS-5014: Processing command pfsboot. Fri Jan 12 16:15:08 2001 PFS-5074: Alert: System is not clear. Pack PFS_SALES_hostA is running. Use PFSCTL PFSHALT first. Fri Jan 12 16:15:09 2001 PFS-5080: Alert: pfsboot command failed
The first alert message is:
PFS-5074: Alert: System is not clear. Pack PFS_SALES_hostA is running. Use PFSCTL PFSHALT first.
The message number indicates that the problem is in the PFSCTL
command line. The text of the message indicates that the PFS_SALES_hostA
pack is already running. Enter the STATUS
command to find out the exact state of the packs:
PFSCTL> status
The following output results:
Info: Pack PFS_SALES_hostA started. hostB Info: Pack PFS_SALES_hostB started. Info: Local database instance is up. Info: Remote database instance is up. Info: Running primary role locally. Info: Running secondary role on remote node. Info: Cluster is up. Info: Local node part of the cluster. Info: Remote node part of the cluster. Info: No internal process is running locally. Info: No internal process is running remotely. status command succeeded.
The status
command shows that both packs are running. If you want to restart the packs, then:
pfshalt
command:
PFSCTL> pfshalt
pfsboot
command:
PFSCTL> pfsboot
When you enter the pfsboot
command, the following message may appear at the command line:
Alert: pfsboot command failed.
The following output appears in the Oracle Real Application Clusters Guard log on hostA (pfs_SALES_hostA.log
):
Mon Jan 15 10:02:57 2001 PFS-4019: Info: Attempt to send notification that instance role has changed. Mon Jan 15 10:02:58 2001 PFS-5555: SALES hostA unknown planned_up 2001/01/15-10:02:57 Mon Jan 15 10:02:58 2001 PFS-2021: Info: Calling user provided role change notification script: /mnt1/oracle/admin/sales/pfs/user/pfs_SALES_notifyrole.sh Mon Jan 15 10:02:59 2001 PFS-2012: Info: User role notification script succeeded Mon Jan 15 10:03:08 2001 PFS-4005: Info: Pack PFS_SALES_hostA starting on home node. Mon Jan 15 10:03:09 2001 PFS-4010: Info: Attempt to initialize all variables. Mon Jan 15 10:03:10 2001 PFS-4011: Info: Attempt to enable IP address. Mon Jan 15 10:03:11 2001 PFS-4012: Info: Attempt to acquire disk storage. Mon Jan 15 10:03:11 2001 PFS-4013: Info: Attempt to start public listener monitor and public listener SALES_hostA_LSNR. Mon Jan 15 10:03:12 2001 PFS-7001: Info: Attempt to start private listener monitor and private listener SALES_hostA_PRIVLSNR. Mon Jan 15 10:03:13 2001 PFS-2020: Info: Start monitor avmlprog SALES_hostA_LSNR 12432 Mon Jan 15 10:03:14 2001 PFS-4014: Info: Attempt to start database instance. Mon Jan 15 10:03:14 2001 PFS-2020: Info: Start monitor avmlprog SALES_hostA_PRIVLSNR 12540 Mon Jan 15 10:03:16 2001 PFS-1000: Alert: Attempt to start Oracle instance failed. Mon Jan 15 10:03:21 2001 PFS-5050: Alert: PFSCTL BOOTONE failed. . . . Mon Jan 15 10:03:50 2001 PFS-5064: Alert: Attempt to start primary failed. Mon Jan 15 10:03:51 2001 PFS-5080: Alert: pfsboot command failed.
The first alert message is:
PFS-1000: Alert: Attempt to start Oracle instance failed.
The message number indicates that the problem was reported from the Oracle Real Application Clusters Guard main layer. The text of the message reports a problem with starting the Oracle instance.
The alert log (alertSALES1.log
) does not show an entry for instance startup.
Try to start the database manually outside of the packs. Enter the following commands:
$ sqlplus /nolog
SQL*Plus: release 9.2.0.0 - Production on Mon Jan 15 10:26:11 2001
© Copyright 2001 Oracle Corporation. All rights reserved.
SQL> connect / as sysdba
Connected to an idle instnace.
SQL> startup pfile=init_SALES1_hostA.ora
LRM-00101: unknown parameter name `service_name'
ORA-01078: failure in processing system parameters
The Oracle errors indicate that there is a problem with the SERVICE_NAME
initialization parameter.
Correct the problem with the initialization parameter. Restart the packs:
PFSCTL> pfsboot
When you enter the pfsboot
command, the resulting output shows that the command succeeded:
PFSCTL> pfsboot pfsboot command succeeded.
When you enter the status
command, the following output may result:
hostA Info: Pack PFS_SALES_hostA started. hostB Info: Pack PFS_SALES_hostB started. Info: Local database instance is up. Info: Remote database instance is up. Info: Running primary role locally. Info: Running secondary role on remote node. Info: Cluster is up. Info: Local node part of the cluster. Info: Remote node part of the cluster. Info: No internal process is running locally. Info: No internal process is running remotely. status command succeeded.
The output shows that although the pfsboot
command started the instances, it shut down before starting other processes.
If the packs start successfully and then shut down, then the following scenarios are possible:
Examine the Oracle Real Application Clusters Guard log, the database log, and the trace files for errors. The following output is from the Oracle Real Application Clusters Guard log:
Mon Jan 15 14:37:15 2001 PFS-4019: Info: Attempt to send notification that instance role has changed. Mon Jan 15 14:37:26 2001 PFS-4005: Info: Pack PFS_SALES_hostA starting on home node. Mon Jan 15 14:37:27 2001 PFS-4010: Info: Attempt to initialize all variables. Mon Jan 15 14:37:28 2001 PFS-4011: Info: Attempt to enable IP address. Mon Jan 15 14:37:28 2001 PFS-4012: Info: Attempt to acquire disk storage. Mon Jan 15 14:37:29 2001 PFS-4013: Info: Attempt to start public listener monitor and public listener SALES_hostA_LSNR. Mon Jan 15 14:37:30 2001 PFS-7001: Info: Attempt to start private listener monitor and private listener SALES_hostA_PRIVLSNR. Mon Jan 15 14:37:30 2001 PFS-2020: Info: Start monitor avmlprog SALES_hostA_LSNR 8964 Mon Jan 15 14:37:32 2001 PFS-4014: Info: Attempt to start database instance. Mon Jan 15 14:37:32 2001 PFS-2020: Info: Start monitor avmlprog SALES_hostA_PRIVLSNR 9069 Mon Jan 15 14:37:57 2001 PFS-4032: Info: Check if ACTIVE_INSTANCE_COUNT is set to 1. Mon Jan 15 14:37:58 2001 PFS-4015: Info: Attempt to start instance monitor. Mon Jan 15 14:37:59 2001 PFS-4016: Info: Attempt to check INSTANCE_ROLE. Mon Jan 15 14:38:00 2001 PFS-2020: Info: Start monitor avmuprog SALES 9557 Mon Jan 15 14:38:01 2001 PFS-1001: Info: INSTANCE_ROLE is primary_instance. Mon Jan 15 14:38:02 2001 PFS-4017: Info: Attempt to start ORACLE_PING. Mon Jan 15 14:38:03 2001 PFS-2020: Info: Start monitor avmpprog SALES 9745 Mon Jan 15 14:38:03 2001 PFS-4018: Info: Attempt to enable pack switching. Mon Jan 15 14:38:04 2001 PFS-4019: Info: Attempt to send notification that instance role has changed. Mon Jan 15 14:38:04 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_ hostA_LSNR. Mon Jan 15 14:38:05 2001 PFS-5555: SALES hostA primary up 2001/01/15-14:38:04 Mon Jan 15 14:38:05 2001 PFS-2021: Info: Calling user provided role change notification script: /mnt1/oracle/admin/sales/pfs/user/pfs_SALES_notifyrole.sh Mon Jan 15 14:38:06 2001 PFS-2012: Info: User role notification script succeeded Mon Jan 15 14:38:06 2001 PFS-9900: Info: Attempt to start role change notification. Mon Jan 15 14:38:08 2001 PFS-4020: Info: Attempt to start DBMS_JOBS. Mon Jan 15 14:38:08 2001 PFS-2020: Info: Start monitor avmrprog SALES 10010 Mon Jan 15 14:38:09 2001 PFS-4004: Info: Run method on home node completed. Mon Jan 15 14:38:13 2001 PFS-5002: PFSCTL BOOTONE succeeded. Mon Jan 15 14:38:15 2001 PFS-4019: Info: Attempt to send notification that instance role has changed. Mon Jan 15 14:38:15 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_ hostA_LSNR. Mon Jan 15 14:38:26 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_ hostA_LSNR. Mon Jan 15 14:38:36 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_ hostA_LSNR. Mon Jan 15 14:38:47 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_ hostA_LSNR. Mon Jan 15 14:38:58 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_ hostA_LSNR. Mon Jan 15 14:39:09 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_ hostA_LSNR. Mon Jan 15 14:39:10 2001 PFS-5002: PFSCTL BOOTONE succeeded. Mon Jan 15 14:39:11 2001 PFS-5007: PFSCTL PFSBOOT succeeded. Mon Jan 15 14:39:12 2001 PFS-3000: Info: Pack PFS_SALES_hostA started. Mon Jan 15 14:39:14 2001 PFS-3000: Info: Pack PFS_SALES_hostB started. Mon Jan 15 14:39:15 2001 PFS-3002: Info: Local database instance is up. Mon Jan 15 14:39:17 2001 PFS-3004: Info: Remote database instance is up. Mon Jan 15 14:39:19 2001 PFS-3006: Info: Running primary role locally. Mon Jan 15 14:39:19 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_ hostA_LSNR. Mon Jan 15 14:39:22 2001 PFS-3010: Info: Running secondary role on remote node. Mon Jan 15 14:39:26 2001 PFS-3012: Info: Cluster is up. Mon Jan 15 14:39:29 2001 PFS-3013: Info: Local node part of the cluster. Mon Jan 15 14:39:30 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_ hostA_LSNR. Mon Jan 15 14:39:31 2001 PFS-3014: Info: Remote node part of the cluster. Mon Jan 15 14:39:32 2001 PFS-3072: Info: No internal process is running locally. Mon Jan 15 14:39:33 2001 PFS-3073: Info: No internal process is running remotely Mon Jan 15 14:39:34 2001 PFS-5015: pfsboot command succeeded. Mon Jan 15 14:39:41 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_ hostA_LSNR. Mon Jan 15 14:39:51 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_ hostA_LSNR. Mon Jan 15 14:40:02 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_ hostA_LSNR. Mon Jan 15 14:40:13 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_ hostA_LSNR. Mon Jan 15 14:40:23 2001 PFS-8001: Alert: Shared server service or instance sales not registered with SALES_hostA_LSNR in 120 seconds. Exit. Mon Jan 15 14:40:25 2001 PFS-2019: Info: Real Application Clusters Guard callhome with Oraping_monitor_exits now. Mon Jan 15 14:40:25 2001 PFS-2019: Info: Real Application Clusters Guard callhome with Oraping_for_SALES_error_Will_failover now. Mon Jan 15 14:40:26 2001 PFS-2019: Info: Real Application Clusters Guard callhome with Failing_over_service_SALES now. Mon Jan 15 14:40:30 2001 PFS-4007: Info: Pack PFS_SALES_hostA stopping on home node. Mon Jan 15 14:40:31 2001 PFS-4019: Info: Attempt to send notification that instance role has changed. Mon Jan 15 14:40:31 2001 PFS-5555: SALES hostA primary down 2001/01/15-14:40:31 Mon Jan 15 14:40:32 2001 PFS-2021: Info: Calling user provided role change notification script: /mnt1/oracle/admin/sales/pfs/user/pfs_SALES_notifyrole.sh Mon Jan 15 14:40:32 2001 PFS-2012: Info: User role notification script succeeded Mon Jan 15 14:40:33 2001 PFS-4028: Info: Attempt to halt instance monitor. Mon Jan 15 14:40:34 2001 PFS-4029: Info: Attempt to halt ORACLE_PING. Mon Jan 15 14:40:35 2001 PFS-3064: Info: Service PFS_SALES_Ping_hostA has already been stopped. Mon Jan 15 14:40:35 2001 PFS-9902: Info: Attempt to stop role change notification Mon Jan 15 14:40:36 2001 PFS-4027: Info: Attempt to archive, checkpoint, and dump database. Mon Jan 15 14:40:45 2001 PFS-1012: Info: Local and remote ORACLE systemstates dumped to USER_DUMP_DEST. Mon Jan 15 14:40:45 2001 PFS-4026: Info: Attempt to abort database. Mon Jan 15 14:40:46 2001 PFS-4019: Info: Attempt to send notification that instance role has changed. Mon Jan 15 14:40:47 2001 PFS-5555: SALES hostA primary cleanup 2001/01/15-14:40:46 Mon Jan 15 14:40:47 2001 PFS-2021: Info: Calling user provided role change notification script: /mnt1/oracle/admin/sales/pfs/user/pfs_SALES_notifyrole.sh Mon Jan 15 14:40:48 2001 PFS-2012: Info: User role notification script succeeded Mon Jan 15 14:40:49 2001 PFS-2003: Info: Attempt to start internal Real Application Clusters Guard process on primary instance. Mon Jan 15 14:40:49 2001 PFS-4025: Info: Attempt to halt public listener monitor and public listener SALES_hostA_LSNR. Mon Jan 15 14:40:50 2001 PFS-7003: Info: Attempt to halt private listener monitor and private listener SALES_hostA_PRIVLSNR. Mon Jan 15 14:40:51 2001 PFS-2017: Info: Start to clean up Real Application Clusters Guard processes. Mon Jan 15 14:40:52 2001 PFS-4024: Info: Attempt to release disk storage. Mon Jan 15 14:40:52 2001 PFS-2015: Info: Stop process ./avmlmon.sh_SALES_hostA_LSNR succeeded. Mon Jan 15 14:40:52 2001 PFS-4022: Info: Attempt to disable IP address. Mon Jan 15 14:40:53 2001 PFS-2015: Info: Stop process SALES_hostA_LSNR succeeded Mon Jan 15 14:40:53 2001 PFS-4030: Info: Halt method on home node completed. Mon Jan 15 14:40:54 2001 PFS-2015: Info: Stop process ./avmlmon.sh_SALES_hostA_PRIVLSNR succeeded. Mon Jan 15 14:40:54 2001 PFS-2015: Info: Stop process SALES_hostA_PRIVLSNR succeeded.
The first warning is:
Mon Jan 15 14:38:15 2001 PFS-8002: Warning: Instance sales1 is not registered with SALES_hostA_LSNR.
The first alert is:
Mon Jan 15 14:40:23 2001 PFS-8005: Alert: Shared server service or instance sales not registered with SALES_hostA_LSNR in 120 seconds. Exit.
The message numbers are in the 8000 range, so the problem has been reported from the heartbeat monitor. The message text indicates that there is a problem with service registration. The instance failed to register with the listener within 120 seconds.
For example, suppose a dedicated configuration has the following characteristics:
Suppose that LOCAL_LISTENER
is defined in the SALES_config.hostA.ded.pfs
file as follows:
LOCAL_LISTENER=listener_SALES_hostA
Then listener_SALES_hostA
must be resolved properly in the tnsnames.ora
file:
listener_SALES_hostA= (ADDRESS=(PROTOCOL=TCP)(HOST=144.25.28.74)(PORT=1524))
There are several causes of failed service registration. The best practice is to look for the simplest solutions first. For example, it is common for service registration to fail because the LOCAL_LISTENER
parameter is not set correctly. Ensure that the value of the LOCAL_LISTENER
parameter in the initialization parameter file (init.ora
) matches the entry in the tnsnames.ora
file.
If you cannot invoke the PFSCTL
command line, then check the following conditions:
This section contains the following topics:
If the heartbeat monitor is not operating properly, then check the following items:
If the instance monitor is not operating properly, then check the following items:
If the listener monitor is not operating properly, then check the following items:
The packs cannot solve underlying performance or stability problems in the system. If such problems exist, then you must solve them outside of the packs. To troubleshoot outside of the packs, follow these steps:
PFSCTL> pfshalt
PFSCTL> pfsboot
Table 8-3 shows how to enable and disable IP addresses and storage groups on the HP and Sun platforms.
To enable 195.1.1.150 as a relocatable IP address, enter the following command:
# cmmodnet -a -i 195.1.1.150 195.1.1.0
Display the IP address by entering the following command:
$ netstat -in
You should see output similar to the following:
Name Mtu Network Address Ipkts Opkts lan2 1500 192.1.1.0 192.1.1.3 81859 40987 lan5:1 1500 195.1.1.0 195.1.1.150 0 0 lan0 1500 139.185.141.0 139.185.151.34 22782 23614 lo0 4136 127.0.0.0 127.0.0.1 30084 30084 lan5 1500 195.1.1.0 195.1.1.3 81855 40984
To enable 144.25.27.74 as a relocatable IP address, enter the following command:
# ifconfig hme0:1 144.25.28.74 up
Display the IP addresses by entering the following command:
# ifconfig -a
You should see output similar to the following:
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 144.25.28.70 netmask fffffc00 broadcast 144.25.31.255 hme0:1: flags=1000862<BROADCAST,NOTRAILERS,RUNNING,MULTICAST,IPv4> mtu 1500 ind2 inet 144.25.28.74 netmask fffffc00 broadcast 144.25.31.255 hme1: flags=1008863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,PRIVATE,IPv4> mtu3 inet 204.152.65.1 netmask fffffff0 broadcast 204.152.65.15 hme1:1: flags=1008843<UP,BROADCAST,RUNNING,MULTICAST,PRIVATE,IPv4> mtu 1500 ind3 inet 204.152.65.33 netmask fffffff0 broadcast 204.152.65.47
|
Copyright © 2001, 2002 Oracle Corporation. All Rights Reserved. |
|