Configuration changed on standby member-juniper-junos

discobot · November 22, 2018, 1:42pm

Configuration changed on standby member-juniper-junos

Vendor: juniper

OS: junos

Description:
Generally, making configuration changes to the standby member of a device is not recommended. indeni will trigger an issue if this happens.

Remediation Steps:
Make the configuration changes to the active member of the cluster.
|||1. The chassis cluster synchronization feature automatically synchronizes the configuration from the primary node to the secondary node when the secondary joins the primary as a cluster.
|2. Review the following article on Juniper tech support site: Understanding Automatic Chassis Cluster Synchronization Between Primary and Secondary Nodes

How does this work?
This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the “show chassis cluster status” command. The output includes the status of all redundancy groups across the cluster.

Why is this important?
Tracking the state of a cluster member is important. If a cluster member which used to be the active member of the cluster no longer is, it may be the result of an issue. In some cases, it is due to maintenance work (and so was anticipated), but in others it may be due to a failure in the firewall or another component in the network.

Without Indeni how would you find this?
The administrator has to run the “show chassis cluster status” on the device to find whether the cluster member is active or not.

junos-show-chassis-cluster-status

name: junos-show-chassis-cluster-status
description: JUNOS collect clustering status
type: monitoring
monitoring_interval: 1 minute
requires:
    vendor: juniper
    os.name: junos
    product: firewall
    high-availability: true
comments:
    cluster-member-active:
        why: |
            Tracking the state of a cluster member is important. If a cluster member which used to be the active member of the cluster no longer is, it may be the result of an issue. In some cases, it is due to maintenance work (and so was anticipated), but in others it may be due to a failure in the firewall or another component in the network.
        how: |
            This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the "show chassis cluster status" command. The output includes the status of all redundancy groups across the cluster.
        can-with-snmp: true
        can-with-syslog: true
    cluster-state:
        why: |
            Tracking the state of a cluster is important. If a cluster which used to be healthy no longer is, it may be the result of an issue. In some cases, it is due to maintenance work (and so was anticipated), but in others it may be due to a failure in the members of the cluster or another component in the network.
        how: |
            This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the "show chassis cluster status" command. The output includes the status of all redundancy groups across the cluster.
        can-with-snmp: true
        can-with-syslog: true
    cluster-preemption-enabled:
        why: |
            Preemption is a function in clustering which sets a primary member of the cluster to always strive to be the active member. The trouble with this is that if the active member that is set with preemption on has a critical failure and reboots, the cluster will fail over to the secondary and then immediately fail over back to the primary when it completes the reboot. This can result in another crash and the process would happen again and again in a loop.
        how: |
            This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the "show chassis cluster status" command. The output includes the status of all redundancy groups across the cluster.
        can-with-snmp: false
        can-with-syslog: false
steps:
-   run:
        type: SSH
        file: show-chassis-cluster-status.remote.1.bash
    parse:
        type: AWK
        file: show-chassis-cluster-status.parser.1.awk

junos-show-chassis-cluster-information-configuration-synchronization

name: junos-show-chassis-cluster-information-configuration-synchronization
description: Get chassis cluster configuration synchronization status
type: monitoring
monitoring_interval: 10 minute
requires:
    vendor: juniper
    os.name: junos
    product: firewall
    high-availability: true
comments:
    cluster-config-synced:
        why: "The failure of configuration synchronization will cause misbehaviors\
            \ when the cluster failover occurs. For examples, an interfacer which\
            \ should be enabled is still in disabled state, the latest configuration\
            \ fails to apply to the new active node, and etc. \n"
        how: |
            The script runs the "show chassis cluster information configuration synchronization" command via SSH and retrieves the configuration synchronization status.
        can-with-snmp: null
        can-with-syslog: null
steps:
-   run:
        type: SSH
        file: show-chassis-cluster-information-configuration-synchronization.remote.1.bash
    parse:
        type: AWK
        file: show-chassis-cluster-information-configuration-synchronization.parser.1.awk

cross_vendor_config_change_on_standby

package com.indeni.server.rules.library.core
import com.indeni.apidata.time.TimeSpan
import com.indeni.ruleengine.expressions.conditions.{And, Equals, GreaterThanOrEqual}
import com.indeni.ruleengine.expressions.core._
import com.indeni.ruleengine.expressions.data.{SelectTagsExpression, SelectTimeSeriesExpression, TimeSeriesExpression}
import com.indeni.server.common.data.conditions.True
import com.indeni.server.rules.library.{ConditionalRemediationSteps, PerDeviceRule, RuleHelper}
import com.indeni.server.rules.{RuleContext, _}
import com.indeni.server.sensor.models.managementprocess.alerts.dto.AlertSeverity


case class ConfigChangeOnStandbyMemberRule() extends PerDeviceRule with RuleHelper {

  override val metadata: RuleMetadata = RuleMetadata.builder("cross_vendor_config_change_on_standby", "Configuration changed on standby member",
    "Generally, making configuration changes to the standby member of a device is not recommended. indeni will trigger an issue if this happens.",
    AlertSeverity.WARN, categories = Set(RuleCategory.HighAvailability), deviceCategory = DeviceCategory.ClusteredDevices).interval(TimeSpan.fromMinutes(5)).build()

  override def expressionTree(context: RuleContext): StatusTreeExpression = {
    val configUnsavedValue = TimeSeriesExpression[Double]("config-unsaved").last
    val memberStateValue = TimeSeriesExpression[Double]("cluster-member-active").last
    val configSyncValue = TimeSeriesExpression[Double]("cluster-config-synced").last

    StatusTreeExpression(
      // Which objects to pull (normally, devices)
      SelectTagsExpression(context.metaDao, Set(DeviceKey), True),

      StatusTreeExpression(
        // The time-series we check the test condition against:
        SelectTimeSeriesExpression[Double](context.tsDao, Set("config-unsaved", "cluster-member-active", "cluster-config-synced"), denseOnly = false),

        // The condition which, if true, we have an issue. Checked against the time-series we've collected
        And(
          Equals(configUnsavedValue, ConstantExpression(Some(1.0))),
          Equals(memberStateValue, ConstantExpression(Some(0.0))),
          GreaterThanOrEqual(configSyncValue, ConstantExpression(Some(0.0))))
      ).withoutInfo().asCondition()

      // Details of the alert itself
    ).withRootInfo(
      getHeadline(),
      ConstantExpression("The configuration has been changed on this device, but it's not the active member of the cluster. Best practices recommend making changes to the active member of a cluster and then syncing to the standby."),
      ConditionalRemediationSteps("Make the configuration changes to the active member of the cluster.",
        RemediationStepCondition.VENDOR_CISCO ->
          """1. Save the configuration by executing the "copy running startup config" command. Note: Network admin role is required to execute this command.
            |2. Check that there are not unsaved configuration changes by running the “show running-config diff” command to the switches.
            |3. Consider creating snapshots of the configuration by utilizing the Checkpoint and Rollback NX-OS features. The NX-OS checkpoint and rollback feature are extremely useful, and a life saver in some cases, when a new configuration change to a production system has caused unwanted effects or was incorrectly made/planned and we need to immediately return to an original/stable configuration.
            |4. For more information review the following article: <a target="_blank" href="http://www.firewall.cx/cisco-technical-knowledgebase/cisco-data-center/1202-cisco-nexus-checkpoint-rollback-feature.html">Guide to Nexus checkpoint & rollback feature</a>""".stripMargin,
        RemediationStepCondition.VENDOR_JUNIPER ->
          """|1. The chassis cluster synchronization feature automatically synchronizes the configuration from the primary node to the secondary node when the secondary joins the primary as a cluster.
             |2. Review the following article on Juniper tech support site: <a target="_blank" href="https://www.juniper.net/documentation/en_US/junos/topics/concept/chassis-cluster-backup-config-sync.html">Understanding Automatic Chassis Cluster Synchronization Between Primary and Secondary Nodes</a>""".stripMargin
      )
    )
  }
}