Previous  |  Next  >  
Product: Cluster Server Guides   
Manual: Cluster Server 4.1 User's Guide   

Detecting System Failure

When a system crashes or is powered off, it stops sending heartbeats to other systems in the cluster. By default, other systems in the cluster wait 21 seconds before declaring it dead. The time of 21 seconds derives from 16 seconds default timeout value for LLT peer inactive timeout, plus 5 seconds default value for GAB stable timeout. The default peer inactive timeout is 16 seconds, and can be modified in the /etc/llttab file. For example, to specify 12 seconds:


set-timer peerinact:1200

Note   Note    After modifying the peer inactive timeout, you must unconfigure, then restart LLT before the change is implemented. To unconfigure LLT, type lltconfig -u. To restart LLT, type lltconfig -c.

GAB stable timeout can be changed by specifying:


gabconfig -t timeout_value_milliseconds

Though this can be done, we do not recommend changing the values of the LLT peer inactive timeout and GAB stable timeout.

If a system reboots, it becomes unavailable until the reboot is complete. The reboot process kills all processes, including HAD. When the VCS process is killed, other systems in the cluster mark all service groups that can go online on the rebooted system as autodisabled. The AutoDisabled flag is cleared when the system goes offline. As long as the system goes offline within the interval specified in the ShutdownTimeout value, VCS treats this as a system reboot. The ShutdownTimeout default value of 120 can be changed by modifying the attribute. See System Attributes for details.

 ^ Return to Top Previous  |  Next  >  
Product: Cluster Server Guides  
Manual: Cluster Server 4.1 User's Guide  
VERITAS Software Corporation
www.veritas.com