< Previous | Next > | |
Product: Cluster Server Guides | |
Manual: Cluster Server 4.1 Installation Guide |
Troubleshooting I/O FencingThe following troubleshooting topics have headings that indicate likely symptoms or that indicate procedures required for a solution. vxfentsthdw Fails When SCSI TEST UNIT READY Command FailsIf you see a message resembling: Issuing SCSI TEST UNIT READY to disk reserved by other node FAILED. Contact the storage provider to have the hardware configuration fixed. The disk array does not support returning success for a SCSI TEST UNIT READY command when another host has the disk reserved using SCSI-III persistent group reservations. This happens with Hitachi Data Systems 99XX arrays if bit 186 of the system mode option is not enabled. vxfentsthdw Fails When Prior Registration Key Exists on DiskAlthough unlikely, you may attempt to use the vxfentsthdw utility to test a disk that has a registration key already set. If you suspect a key exists on the disk you plan to test, use the vxfenadm -g command to display it. # vxfenadm -g diskname
Reading SCSI Registration Keys... Device Name: <diskname> Total Number Of Keys: 0 No keys ... Proceed to test the disk using the vxfentsthdw utility. Node is Unable to Join Cluster While Another Node is Being EjectedA cluster that is currently fencing out (ejecting) a node from the cluster prevents a new node from joining the cluster until the fencing operation is completed. The following are example messages that appear on the console for the new node: ...VCS FEN ERROR V-11-1-25 ... Unable to join running cluster ...VCS FEN ERROR V-11-1-25 ... since cluster is currently fencing ...VCS FEN ERROR V-11-1-25 ... a node out of the cluster. ...VCS GAB.. Port b closed If you see these messages when the new node is booting, the startup script (/sbin/init.d/vxfen) on the node makes up to five attempts to join the cluster. If this is not sufficient to allow the node to join the cluster, reboot the new node or attempt to restart vxfen driver with the command: # /sbin/init.d/vxfen start Removing Existing Keys From DisksTo remove the registration and reservation keys created by another node from a disk, use the following procedure:
System Panics to Prevent Potential Data CorruptionWhen a system experiences a split brain condition and is ejected from the cluster, it panics and displays the following console message: VXFEN:vxfen_plat_panic: Local cluster node ejected from cluster to prevent potential data corruption. How vxfen Driver Checks for Pre-existing Split Brain ConditionThe vxfen driver functions to prevent an ejected node from rejoining the cluster after the failure of the private network links and before the private network links are repaired. For example, suppose the cluster of system 1 and system 2 is functioning normally when the private network links are broken. Also suppose system 1 is the ejected system. When system 1 reboots before the private network links are restored, its membership configuration does not show system 2; however, when it attempts to register with the coordinator disks, it discovers system 2 is registered with them. Given this conflicting information about system 2, system 1 does not join the cluster and returns an error from vxfenconfig that resembles: vxfenconfig: ERROR: There exists the potential for a preexisting split-brain. The coordinator disks list no nodes which are in the current membership. However, they also list nodes which are not in the current membership. I/O Fencing Disabled! Also, the following information is displayed on the console: <date> <system name> vxfen: WARNING: Potentially a preexisting <date> <system name> split-brain. <date> <system name> Dropping out of cluster. <date> <system name> Refer to user documentation for steps <date> <system name> required to clear preexisting split-brain. <date> <system name> <date> <system name> I/O Fencing DISABLED! <date> <system name> <date> <system name> gab: GAB:20032: Port b closed However, the same error can occur when the private network links are working and both systems go down, system 1 reboots, and system 2 fails to come back up. From the view of the cluster from system 1, system 2 may still have the registrations on the coordinator disks. Case 1: System 2 Up, System 1 Ejected (Actual Potential Split Brain)Determine if system1 is up or not. If it is up and running, shut it down and repair the private network links to remove the split brain condition. Reboot system 1. Case 2: System 2 Down, System 1 Ejected (Apparent Potential Split Brain)
Using vxfenclearpre Command to Clear Keys After Split BrainWhen you have encountered a split brain condition, use the vxfenclearpre command to remove SCSI-III registrations and reservations on the coordinator disks as well as on the data disks in all shared disk groups.
|
^ Return to Top | < Previous | Next > |
Product: Cluster Server Guides | |
Manual: Cluster Server 4.1 Installation Guide | |
VERITAS Software Corporation
www.veritas.com |