Cluster down for Cisco Nexus




Indeni will alert if a cluster is down or any of the members are inoperable.

Remediation Steps

Review the cause for one or more members being down or inoperable.
1. Verify the communication between the FHRP peers . A random, momentary loss of data communication between the peers is the most common problem that results in continuous FHRP state change (ACT<-> STB) unless this error message occurs during the initial installation.
2. Check the CPU utilization by using the "show process CPU" NX-OS command. FHRP state changes are often due to High CPU Utilization.
3. Common problems for the loss of FHRP packets between the peers to investigate are physical layer problems, excessive network traffic caused by spanning tree issues or excessive traffic caused by each Vlan.

In the case of a vPC problem, validate the following:
1. Check that STP bridge assurance is not enabled on the vPC links. Bridge assurance should only be enabled on the vPC peer link
2. Compare the vPC domain IDs of the two switches and ensure that they match. Execute the "show vpc brief" to compare the output that should match across the vPC peer switches.
3. Verify that both the source and destination IP addresses used for the peer-keepalive messages are reachable from the VRF associated with the vPC peer-keepalive link.
Then, execute the "sh vpc peer-keepalive" NX-OS command and review the output from both switches.
4. Verify that the peer-keepalive link is up. Otherwise, the vPC peer link will not come up.
5. Review the vPC peer link configuration, execute the "sh vpc brief" NX-OS command and review the output. Besides, verify that the vPC peer link is configured as a Layer 2 port channel trunk that allows only vPC VLANs.
6. Ensure that type 1 consistency parameters match. If they do not match, then vPC is suspended. Items that are type 2 do not have to match on both Nexus switches for the vPC to be operational. Execute the "sh vpc consistency-parameters" command and review the output
7. Verify that the vPC number that you assigned to the port channel that connects to the downstream device from the vPC peer device is identical on both vPC peer devices
8. If you manually configured the system priority, verify that you assigned the same priority value on both vPC peer devices
9. Verify that the primary vPC is the primary STP root and the secondary vPC is the secondary STP root.
10. Review the logs for relevant findings
11. For more information please review the next vPC troubleshooting guide:

How does this work?

This script logs into the Cisco Nexus switch using SSH and retrieves the HSRP state using the "show hsrp" command. The output includes a complete report of the HSRP state across all configure interfaces.

Why is this important?

Check if a configured HSRP (Hot Standby Router Protocol) group has at least one active member. If no active members exist traffic would not be able to be routed.

Without Indeni how would you find this?

It is possible to poll this data through SNMP. Only state transitions generate a syslog event. There is no explicit event for the last member failing.

View Source Code