Hardware element down for Cisco Nexus




Alert if any hardware elements are not operating correctly.

Remediation Steps

Troubleshoot the hardware element as soon as possible.
1. Use the "show environment [ fan | power | temperature ]" NX-OS command to display information about the hardware environment status.
2. For more information please review the following CISCO Nexus HW Troubleshooting (fan/PS/Temp/Xbar/SUP) guide: https://www.cisco.com/c/en/us/support/docs/switches/nexus-7000-series-switches/200148-Troubleshooting-N7K-HW-fan-PS-Temp-Xbar.html

In case of issue to the transceiver:
1. Use the "show interface transceiver detailed" NX-OS command to display detailed information for transceiver interfaces.
2. Use the "show interface transceiver calibrations" NX-OS command to display calibration information for transceiver interfaces

How does this work?

This script logs into the Cisco Nexus switch using SSH and retrieves the output of the "show environment" command. The output includes a table with the device's hardware environment information including line card, power supply and fan status as well as any available temperature sensor (including the corresponding alarm threshold). If FEX (Fabric Extender) devices are attached to the Cisco Nexus switch the "show environment fex all" command is used to collect similar data for the FEX devices.

Why is this important?

It is important to capture the status of different hardware elements' status including line cards, power supplies and fans. The output includes the elements' overall status as well as available temperature sensor points. This information is critical for monitoring the system health. Any reported hardware element issue or out of range temperature will trigger an alert.

Without Indeni how would you find this?

It is possible to poll this data through SNMP, but alarm status for temperature would have to be correlated, comparing the value and the threshold.

