Network Device Configuration Standardization – Thoughts on Ethan Banks’ post

Ethan Banks has an interesting newsletter called The Hot Aisle. Worth following if you’re not familiar with it, basically the thoughts of a very experienced network engineer.

In today’s post, Ethan covers the item at the top of his wishlist: network device configuration standardization. Ethan’s wishes are those of many others that I meet with on a regular basis. Many root-cause-analyses that were done over the years pointed to lack of config standardization as the root cause. Get that done well, and you get rid of many of the issues you run into regularly.

But how do you make sure it’s done well? In Ethan’s post he lists a few configurations he’d like standardized. I thought I’d add my $0.02 here on pitfalls people should watch out for. All of these I’ve learned through watching what issues our customers run into.

  1. NTP – it’s one thing making sure all of your network devices uses the same NTP servers, it’s another to make sure they can actually reach them. Don’t just rely on tools that tell you the NTP config is set to the right host, test it! Different devices have different ways of achieving this.
  2. External authentication – very important and common best practice. Be careful of a situation we’ve received multiple reports of (but found nothing online so far, interestingly): if you configure too many external authentication servers and all connectivity is lost, you might lose local authentication as well. The reason being that by the time the authentication times out with each external server, the entire login process times out.
  3. SNMPv3 – good idea, just be careful of things like this.
  4. SSH instead of Telnet – undoubtedly a good idea. We do see customers running older IOS’s that don’t support SSH.
  5. Hardening – very important, but be very very careful about this. For example, hardening can result in GARP response packets being dropped, thereby breaking clustering of certain products. This means that you might not discover the impact of the hardening until months later.
  6. Device configuration backup – VERY VERY important. Be sure to use a product that tests the backup was done correctly. You don’t want to try and restore a partial backup.
  7. OOB management – one of those things people wish for when they’ve locked themselves out of a network (or a number of networks). The most common solution we see with customers are those console/lights-out products that use a separate DSL or cellular line to access. Expensive usually, much more than just setting up a VRF, but usually fool proof (if you actually monitor them to ensure they are connected).

If you’d like to receive this type of content via your email, just sign up for our newsletter. 

Fill out the form below:

[ninja_form id=20]

F5 Self IP and unicast failover IP are not identical

This is a real life sample alert from indeni for F5 Load Balancing Methods

Description:

The F5 unicast failover IP must be the same as the self IP for failover to work. Versions prior to 11.3.0 did not enforce this.

For more information read note #398067 in the 11.4.0 release notes.

Manual Remediation Steps:

Update the unicast failover IP to match the self IP.

How does this alert work?

indeni compares the unicast failover IP and the self IP to identify if there is a mismatch.

 

SolarWinds© NPM© & NCM© vs indeni: Simple management vs actionable, automated root cause analysis

SolarWinds Network Performance Manager (NPM) and Network Configuration Manager (NCM) used to be two powerful tools when they were invented in 1998. Many times we find ourselves looking towards SolarWinds and thinking “Wow, they’ve built something amazing. But, is it still meeting engineers’ needs today?”.

With a big portion of our customers being SolarWinds customers as well (present or past), we get asked a lot “How do you compare to SolarWinds?”. The short answer is: SolarWinds is simple and wide, gives you a 30,000 foot view of your entire environment and simplifies it. indeni, in contrast, dives deep into the devices in your environment to understand where issues lurk and what can become an unpleasant surprise.

indeni provides the true proactivity you are lacking.

Now, for the longer answer. Before we jump in, it’s important you know that in the process of preparing this document, we’ve used SolarWinds’ demo environment (http://oriondemo.solarwinds.com/) and indeni’s demo environment (https://demo.indeni.com). To make life easier, screenshots are included, too.

Let’s start with SolarWinds. It’s attractive and highly customizable, relies mostly on SNMP GET+traps. You can view your devices by vendor, across a map, by groups of devices and any other way you could imagine. Issues are made visible and evident mostly by using the red and orange colors, as well as sending emails to NOC mailboxes.

In the screenshot to the right note that we’ve highlighted the F5© devices. In this document we wanted to use specific examples for certain devices. Usually people pick Cisco, but we believe that the networking world is more than just Cisco. So F5 it is.

If we zoom into the F5 devices, we get a nice view that lists everything that can possibly be pulled out of an F5© BIG-IP© device via SNMP. Connection and traffic stats, device model, uptime, etc. etc. Very informative.

The amount of data SolarWinds NPM provides you with is pretty significant. In addition, it lays it out in a way that makes it accessible when needed, and that’s a very strong element of SolarWinds NPM. Compared to its predecessors – IBM Tivoli, HP OpenView and CA Spectrum – it was a refreshing approach.

What SolarWinds NPM is lacking, and we’ll soon show indeni has, is an ability to tell you what your problems really are. Better yet, what problems you’re about to have. Will your upgrade fail because of deprecated configurations? Will your HA setup operate correctly during a fail over? What is the cause for health monitor failures for certain pool members? All this isn’t shown by SolarWinds NPM as it’s not easily accessible via SNMP and requires a lot of expertise to gather and interpret.

Expertise is what indeni is all about.

indeni’s main job is to provide clear, actionable information and instructions.  Below are a few examples of alerts for F5. Go ahead, click on the screenshots.

See the level of information you’re given? Try getting that out of NPM.

So, to summarize, NPM was nothing short of amazing for its time. Now, you can go ahead and take the next step – set up indeni within 45 minutes.

Check Point Alert of the Week: VPN phase two lifetime mismatch with a VPN peer

This is a real life sample alert from indeni.

Description:

The phase two life time values used by this device and by some of its peers are different. This may cause VPN tunnels to fail after some time has passed from the moment they were set up. It is important to ensure the life time values are equal at both VPN peers for this phase. Note that for the purpose of generating this alert, indeni analyzes the ikemonitor.snoop file.

Affected VPN Peers:

185.4.17.147:
The duration value for the number of SECONDS for which to wait before refreshing the tunnel in this phase is different between this device and its peer. The peer’s value is 86400 and this device’s value is 28800.

Manual Remediation Steps:

Make sure the configuration for the phase two lifetime is a match between this device and its peers.

Read Cisco’s document on this and a Check Point forums thread for more information.

How does this alert work?

indeni checks to see if the IKE traffic is being captured to the ikemonitior.snoop file (this is turned on with “vpn debug mon”). If it is, the snoop file is pulled from the device and analyzed. indeni will parse the packets for both sides of the VPN negotiation and will attempt to determine if a certain parameter in the configuration is causing a problem with the tunnel’s stability.

Check Point Alert of the Week: Cluster member down due to NIC error

It’s so easy sometimes to disconnect the wrong cable.

This is a real life sample alert from indeni.

Description:

The cluster member is in a critical state due to one or more monitored interfaces being down or disconnected from the network.

NICs Down: eth7

Remediation Steps:

For eth7 (Bandwidth: 1000M/full, MAC Address: 00:1C:7F:30:9E:14): While the NIC itself appears to be active and working well, ClusterXL considers a NIC to be down if it is connected to a network where there are no other hosts, including no other cluster members. If the configuration is correct, lack of support for multicast at one of the switches being used could also stop ClusterXL from operating. In order NOT to monitor this interface, add a line containing only “eth7” to the list in the $FWDIR/conf/discntd.if file (create the file if it doesn’t exist) or use the ifdown command to remove the interface entirely from the system.

Read SK30060 for more information.

How does this alert work?

indeni continuously monitors “cphaprob -a if” and compares it to other networking data (such as “ifconfig -a” and “arp -an”) to determine the cause for the NIC being considered down by ClusterXL. Note that indeni does not rely on the log messages for this alert (known as “cluster_info: (ClusterXL) interface is down”).

Interested in learning more? Download our solution for Checkpoint here.

iRules May Break in a Future Upgrade, or Today: F5© Alert of the Week:

This is a real life sample alert from indeni for F5 Load Balancing Methods

Description:

indeni has run a syntax check of the iRules currently configured on this device according to the syntax used in version 11.6.0 (this device is running 11.1.0). The following iRules did not pass the syntax checks. This may impact the behavior of the iRules today, or after an upgrade to a more recent version of TMOS.

Affected iRules:

/Common/Custom_iRule_1202: error: /Common/Custom_iRule_1202:3: error: use curly braces to avoid double substitution[0xsubstr $ip 6 2]

Manual Remediation Steps:

Review the syntax errors mentioned and correct the iRules.

How does this alert work?

With every release of indeni, we update our mechanism for doing syntax checks of iRules so it uses the logic represented by the most recent release of TMOS. Then indeni pulls the iRules configuration (in bigip.conf) and runs the syntax check based on that logic. If any errors are found an alert is issued.

Need more info on F5 load balancing methods? indeni can help you achieve F5 perfect balance.

Sign up for our newsletter:

[ninja_form id=20]

 

F5© Alert of the Week: Device is not in the required trust domain

This is a real life sample alert from indeni

Description:

For sync to work across device groups, you must ensure all devices are in the same trust domain. For more information on this, read SOL13946.

Manual Remediation Steps:

At this device to the correct trust domain. Under Device Management, look for Device Trust.

How does this alert work?

indeni automatically determines a group of devices is set to sync. Then, it runs “show /cm device-group device_trust_group” on each of them and reviews the output to ensure trust is indeed established.

Other F5 alerts:

Monitored or Permanent VPN tunnels down: Check Point Firewalls Configuration Guide

Lincoln tunnel - another kind of permanent tunnel.

This is a real life sample alert from the indeni alert guide for Check Point Firewalls.

Some of the monitored/permanent VPN tunnels have been found to be in an inactive or unstable state.

For more information on permanent tunnels and how to set it up, read the VPN admin guide. For more information on how to monitor permanent tunnels within Check Point’s SmartView Monitor, read Monitoring Tunnels.

Possibly Affected Tunnels:

VPN Community XVPN, tunnel between CP1 and CP2 (1.1.1.1)

Manual Remediation Steps:

Review the network connectivity between the two sites. Normally, permanent tunnels do not run into configuration issues but do run into connectivity issues.

How does this alert work?

indeni loads the VPN Community configurations and then tests the VPN tunnels on each gateway using the ‘vpn tu’ command. For this alert, only the permanent tunnels are examined. Note that indeni does not need to use SmartView Monitor, or the rtmd process, to achieve this. If you are interested in manually checking the status of permanent tunnels, you may use Check Point’s SmartView Monitor. When a permanent tunnel goes down you may sometimes see “No valid SA”.

“Renewing indeni is a no brainer”

I just got off the phone with one of our customers, a multi-billion-dollar enterprise that I’m 100% certain every single US-based reader of this post will recognize. However, I can’t mention them by name.

They have been our customer for two years now and have just renewed their contract. For us, that’s a great show of belief in what we do and something I don’t take for granted.

We are the ultimate SaaS: our software grows on an on-going basis. Many SaaS companies charge you a monthly or annual subscription even though their software changes very little during that time. We at indeni, charge annual subscriptions because our software grows constantly, on a daily basis. That’s a real service.

So, renewals are equally as important to us as the first purchase a customer makes. These renewals help fund the growth of our software. Like many other high-growth startups, we invest every dime we make in growing. No profits, no dividends, just growth.

So I asked this customer: “why did you renew?”. His answer:

  • indeni delivers on its promise of identifying issues in his estate (mostly Check Point firewalls in his case).
  • The support and services we deliver are exceptional.
  • To do what indeni does, he’d need to hire 5 developers, and indeni is a fraction of that cost.
  • His company’s focus is on bringing in more automation into IT so they can focus on business processes. “Less cleaning the drains and fixing the pipes. Moving from the slow ITIL approach to the rapid DevOps where possible.”

He summarized it with: “renewing indeni is a no brainer”.

And that, my friends, is why I’m doing what I do.