How to monitor F5 devices – SNMP vs API vs SSH

F5 has many ways of interfacing with their products and when writing monitoring we had to do some research which one is more suitable in terms of performance. After all, monitoring should not harm the device it monitors. When choosing methods we looked into iControl REST, SNMP and TMSH. See below for how this test was conducted and which one won.

The best way to monitor F5 – How the test was conducted

We ran each type ~20 minutes continuously through command-runner. While running the tests the web interface was used to make sure that the web interface responsiveness was up to par.

The commands to run each test

#REST
while true; do
command-runner.sh full-command –basic-authentication user,password rest-pool-statistics.ind 10.10.10.10
done
#tmsh
while true; do
command-runner.sh full-command –ssh user,password ./show-ltm-pool-detail-raw-recursive.ind 10.10.10.10
done
#SNMP
while true; do
command-runner.sh full-command –ssh user,password ./snmp-pool-statistics.ind 10.10.10.10
done

Results

The test started out with 283 pools (with 200 additional ones created just for this test). However, when trying the tmsh command, command-runner timed out, so we had to reduce to the original 83 pools and rerun the test using rest to make it fair.

  • Test 1: REST = 283 pools
  • Test 2: Tmsh = 83 pools
  • Test 3: SNMP = 83 pools
  • Test 4: REST (take 2) = 83 pools

4 hour graph

24 hour graph for reference

REST

  • Did not produce any timeouts in the GUI in any of the two tests.
  • Always produced results.
  • Management interface only became sluggish one time during the second attempt. Most likely because of the already high swap usage created by the TMSH tests.

TMSH

TMSH produced these once in awhile:

  • When that happened you can see the gaps in the graph. It is unknown what the gap after the graph was because we was working on the snmp metrics at that time.
  • TMSH also failed to give results sometimes.
  • Forced to run with fewer metrics than rest in order to even get a result.

SNMP

  • Truncated the pool names sometimes. It is unclear why ast was always done on long names, but different lengths.
  • Did not produce any timeouts in the GUI.
  • Always produced results.
  • Did not have as many metrics as REST since the exact same metrics was not available in one command (pool state and availability is missing).
  • Management interface became a bit sluggish on and off.

Conclusion

Over all REST won the test with SNMP as second. TMSH did not even qualify as it takes up very large amounts of memory and swap which negatively affected the overall system.

Thank you to Patrik Jonsson for contributing this article.

How to select script monitoring authentication types

Considerations when selecting authentication types

Choosing an authentication method for monitoring your infrastructure devices might sound easy at first glance. After all, a monitoring script would only need read-only, right? Wrong.

Monitoring with indeni goes beyond what normal monitoring tools does. The goal of indeni is to detect problems before they occur, saving you hours of troubleshooting and root cause analysis down the road. To get early detection indeni needs access to the advanced shell. Let’s take a look at what this means on F5 devices.

Example: Selecting authentication types for F5 devices

On an F5, having access to the advanced shell means that the user in question must have administrator access. Also, iControl REST requires the user to be locally authenticated up until version 11.5.4. This means that for systems running versions up to 11.5.4 using RADIUS for authentication administrators will have to resort to the local admin account for REST calls.

On top of that if a system has configured authentication and authorization using RADIUS there is no way of setting the shell to advanced shell on any version. So yet again, administrators must resort to the local admin account in order to set the proper permissions.
We have gone above and beyond to avoid using local admin accounts by investing a lot of time running monitor commands via TMSH. However, this has turned out to cause harm to the system due to TMSH using way too much memory. So what does this mean? In order for get the most out of using indeni, administrators will have to configure authentication according to the following table:

Version
Authentication
Authorization
User
11.5.4 and earlier
Any
Any
Local admin (with SSH access)
11.6.0 and later
Remote
Remote
Local admin (with SSH access)
11.6.0 and later
Local
Local
Any account with role Administrator and shell set to Advanced Shell
11.6.0 and later
Remote
Local
Any account with role Administrator and shell set to Advanced Shell

Thank you to Patrik Jonsson for contributing this article.

How to Reset Device Trust – F5 LTM Load Balancing Methods Troubleshooting ConfigSync and Device Clustering

indeni, cisco

F5 LTM Load Balancing Methods: How to Reset Device Trust.

The official F5 SOL13946 provides information on troubleshooting device clustering and configuration sync for 11v  F5 load balancers  and other products, however it is rather long winded.  This guide is designed as a quick reference when troubleshooting device clustering or config sync. An overview of the config sync process for version 9.x and 10.x units can be found in F5 SOL7024

Version 11.x

  • Communication between machines occurs in the following manner to form a device cluster:
    mcpd process on the local machine connects to the tmm process on the local machine on port 6699
  • tmm process then contacts the peer’s config sync IP on port 4353
  • Once the peer receives, they use tmm to contact mcpd over port 6699 on their local device.
  • If this process fails, it is re-attempted every 5 seconds.
  • If this process succeeds, there is a mesh between peer mcpd processes.

* local machine here refers to the self IP configured for config sync. Check it under Device Management > Devices > click on device > Device Connectivity > Config Sync, for example.

Continue reading

v9.X To 10.2.4 Upgrade Guide: F5 LTM Load Balancing Methods

F5 LTM Load Balancing Methods: Guide on How to Upgrade to 10.2.4

Unfortunately there are still a lot of F5 ® version 9.x installations out there, and there shouldn’t be. Not only is version 9 a long way out of support, it’s really not secure in this day and age. You’re also missing out on using iHealth® which can only be used from version 10.x onwards.

Please note that you may need AskF5™ login credentials to access some of the links within this article. If you do not have an AskF5 login you can request one here.

This guide assumes an upgrade to version 10.2.4 unless otherwise stated, although the same process applies to earlier 10.x versions. Note that 10.0.x releases are no longer supported. 10.2.4 is the recommended version for this branch and is supported to 31st December 2016 (see AskF5 SOL5903).

Prerequisites:
Before considering upgrading, ensure your hardware platform will support 10.2.4 code. The following platforms support it, as confirmed by AskF5 SOL9412:

You can only upgrade directly to 10.2.4 code from 9.3.x or 9.4.x releases.  If you are running an earlier 9.x release, you must upgrade to 9.3.x or 9.4.x before continuing.

Before you begin

Assuming you meet the prerequisite requirements, consider the following:

  • YOU MUST re-activate the unit license before upgrading.  See AskF5 SOL7727 for reasoning.
  • Backup EVERYTHING: /config/bigip.conf, bigip_base.conf, bigip.license, bigip_sys.conf (if it exists), wideip.conf (if it exists), take an SCF* (recommended, but only backs up LTM configuration), take a UCS (note that UCS also saves admin/root passwords and the license, so take this after license reactivation). Save these off box.  Also save the UCS file to /shared/ (you might need to create this directory) and to /var/local/ucs.  An extra 10 minutes spent here can save agony later on! If you are running ASM, also take a backup of the policy.

*to take an SCF, use the command:

# bigpipe export <filename.scf>

for example:

# bigpipe export mybackup.scf
  • You should still read the release notes for the version you are installing.
  • If you have a HA pair, you will, of course, be working on your standby unit.
  • It is recommended to disconnect any unnecessary cables from a unit whilst upgrading (only leave connected the cables you need, such as the management cable). This will prevent any issues with the unit attempting to process traffic whilst upgrading.

Let’s upgrade

1)    Download the 10.2.4 ISO and copy* to /shared/images (create this folder if it does not already exist).
*SCP is the recommended method to copy files to/from the BigIP system. For a Windows client, check out WinSCP 

2)    Install the image2disk utility:

# im /shared/images/<filename.iso>

for example:

# im /shared/images/BIGIP-10.2.4.577.0.iso

3)    Begin the install. The full command syntax is

# image2disk --instslot=HD<slot_number> --format=<format_style> <downloaded_filename.iso>

BUT WAIT! Observe the following options first:

  • Assuming you only want version 10.x and above from now on (highly recommended!), we need to format for volumes, so use:
# image2disk --instslot=<slot_number> --format=volumes /shared/images/>filename.iso>
  • Assuming we wish to run version 10 and version 9 (not recommended, version 9.x is not supported), use:
# image2disk --instslot=<slot_number> --format=partitions /shared/images/>filename.iso>

Note that since we repartition, we will need to reinstall version 9 to a free partition after the version 10 installation completes.

  • To preserve your existing configuration, use
# image2disk --instslot=<slot_number> --nomoveconfig--format=volumes /shared/images/<filename.iso>

DO NOT use the ‘–nomoveconfig’ flag if you are running ASM. Use a backup UCS instead. Using ‘–nomoveconfig’ means: “The configuration of the target boot location (if any is present) is re-installed on the target boot location after re-imaging is complete.”

The ‘slot_number’ variable will be HD1.1 or HD1.2 for example (the inactive partition, NOT the one you are currently running in*)
* If you are not sure which partition you are currently running on, use 

# switchboot -l
  • If installing 10.2.x the flag “–instslot=<slot_number>” is NOT permitted so should be absent from the command to install software.  Reasoning: 10.2.x will always install to HD1.1 after the disk is formatted.

4)    The unit should reboot and install the new software.

5)    Once finished, you may need to do

# switchboot

and select the newly upgraded partition, then

# full_box_reboot

This depends on the specific upgrade version so may not be applicable. Install hotfixes at this point if necessary. It’s always recommended to be running the latest available hotfix.

6)    Wait for the filesystem to build and the installation to finish. This first boot after upgrade can take a while!

7)    Once the unit is up and running, load on any necessary config files.  These can either be a UCS (User Configuration Set, aka ‘Archive’) uploaded via System > Archives, or can be individual bigip.conf, bigip_base.conf, bigip.license (etc) files, which should be uploaded to the /config directory using SCP.  If you upload the individual configuration files, you’ll need to reboot again.  Check the configuration is present and correct via the GUI.  Cable the unit back up and you should be good to go.  If you are running a HA pair, you should failover during a change window and check the newly upgraded unit processes traffic correctly.  Assuming it does, repeat the upgrade process on your second unit.

8)    Beginning with version 10.0.0 of the software, a redundant system configuration must contain failover peer management addresses for each unit. If you roll forward a redundant system configuration from 9.3.x or 9.4.x, the units start up correctly, but the system logs a message every ten minutes reminding you to configure the peer management addresses. To configure the failover peer management addresses, navigate to System > High Availability > Network Failover, then specify the management IP address of the peer unit in the Peer Management Address field. Then do the same on the other unit in the configuration. Once you specify both IP addresses, the system should operate as expected. For more information, see AskF5 SOL9947.

You’re all done!

Chris Spillane is a Senior Security Analyst at NTT Com Security. He has been working with F5 Load Balancers for more than seven years. If you want to contribute as well, click here.

Find more tip on how to manage your F5 load balancers with indeni.

F5 bigd process down

This is a real life sample alert from indeni

Description:

The F5 bigd process is down and has not restarted. Among its responsibilities, bigd runs the monitors for nodes, pool members and services. For more information, read SOL6967.

Manual Remediation Steps:

Review the logs to identify why the bigd process is down. indeni will attempt to determine the source of the issue automatically as well.

How does this alert work?

indeni tracks the status of all of the critical operating-system level processes and alerts if any of them crashes or shuts down unexpectedly.

Configuration Management Tool Comparison: Multi-Vendor Deep Configuration Analysis: Cisco-Focused

NetMRI was originally developed and sold by Netcordia, founded in 2000 by the world’s first Cisco Certified Internetwork Expert (CCIE). It was created to help Cisco admins solve configuration issues in their network equipment by defining certain checks and ensuring that all switching and routing devices conform to the desired configuration.

In-depth configuration analysis for Cisco with NetMRI

NetMRI is a fantastic configuration management tool for Cisco admins – it’s got incredible visibility into Cisco configurations and the ability to dissect and analyze some of the most complicated setups of Cisco routers and switches. However, it falls quite short for other network devices, especially the non-Cisco ones and those for layers 4 and up. This includes Check Point, Fortinet, Juniper and Palo Alto Networks firewalls as well as F5 load balancers. For these, NetMRI supports pulling the configurations (see DSB list) but comes with no built-in configuration checks. As a user, you are required to teach NetMRI how to understand the configuration of these devices and what to look for. Quite a tedious task.

For example, consider the release notes for NetMRI 6.9.1 (released in 2014). The new features focus solely on switching and routing equipment. The same is true for other recent releases of NetMRI, such as 6.8.1. This is caused by Infoblox’s focus on DDI – (DNS, DHCP and IP Address Management) which are the company’s core business. DDI is tightly integrated with switching and routing, hence the focus on those devices by NetMRI as shown in a demo video of NetMRI.

Therefore, users who run a Cisco shop should consider investing in NetMRI and using that as their go-to tool for analyzing the configuration of their routers and switches. Even those running a mixed environment with a heavy investment in Cisco routing and switching gear, should consider using NetMRI to automate their IP address management and routing.

Users who require deep-visibility into their Cisco AND non-Cisco devices, specifically identifying common misconfigurations as well as pointing out which devices are not compliant with the organization’s gold configuration, should take a look at indeni. With indeni you will be able to identify known configuration issues in your Check Point, F5, Fortinet, Juniper and Palo Alto Networks gear, as well as your Cisco equipment.

How To Find Out When Your SSL Certificate Expires on F5 BIG-IP DNS

indeni, cisco

Do you know when the SSL certificate expires on your F5 Load balancers?

Every single deployment of LTM ® we’ve encountered has SSL termination included in it. Think about it – it makes sense, it’s one of the strongest advantages of the F5 hardware.

However, every single deployment we’ve encountered also had SSL certificates configured that have expired or were expiring in the next three months. Apparently, staying on top of your SSL certs isn’t as straightforward as you’d want it to be.

So, we thought we’d put in the effort to summarize in a short post how does one get notified, ahead of time, when SSL certificates expire on their F5 BIG-IP DNS LTM:

  • Buy Enterprise Manager – it has a built-in feature for doing this.
  • Get BIG-IQ, can be done there, too.
  • Write a script – read DevCentral and SOL15288.
  • Run indeni – you can get a limited license free and easy by going here. Within 45 minutes you can easily know which SSL certs need refresh, as well as hundreds of other possible issues lurking in your F5 configuration. You can even run it every 6 months or so, to make sure you’re in top shape.

For your information, this is how the alert would look like in indeni:

Description:

Some SSL certificates are about to expire or have expired.

Certificates expired or about to expire:

www.yoursite.com expires on November 30, 2016

Manual Remediation Steps:

Replace the SSL certificates with new ones.

For more information on how to manage certificates, refer to Managing SSL Certificates for Local Traffic in the F5 user guide.

How does this alert work?

indeni retrieves the SSL certificates configured on an F5 BIG-IP DNS device and analyzes them: checking their expiration date, their validity (are they self-signed or signed by an internal CA?), etc.

F5 Too many RST packets sent

indeni, cisco

This is a real life sample alert from indeni from our F5 Load Balancing Methods Library

Description:

This device is being hit with too many connections that appear to have already been closed or never opened. It is possible the device is under DDoS attack. indeni has found this log message:
May 18 12:49:43 JCNC-ADC1 warning tmm1[11241]: 011e0001:4: Limiting open port RST response from 251 to 250 packets/sec&nbsp;

Manual Remediation Steps:

Review SOL13151 and review the cause of this sudden increase in unexpected connections.

How does this alert work?

indeni crosses information from the log files with SOL’s listed on f5.com to identify when certain logs should receive attention.

F5 IPSec Tunnel Causing Traffic Issues

This is a real life sample alert from indeni for F5 Load Balancing Methods

Description:

Some of the F5 IPsec tunnels have multiple security associations negotiated for them. This may result in traffic issues.

Affected Tunnels:

Tunnel to 165.160.15.20

Manual Remediation Steps:

Review SOL14646.

How does this alert work?

indeni uses the various “show /net ipsec” commands to track the IPsec tunnels.

F5 SIP might have issues in the current software version

https://indeni.com/wp-content/uploads/2015/04/download-17.jpg
This is a real life sample alert from indeni

Description:

indeni has determined that SIP profile is being used with 11.1.0 on this device.

Manual Remediation Steps:

Review SOL16411 and consider upgrading.

How does this alert work?

indeni analyzes the configuration to see if SIP profiles are used with the specific software versions that are affected by this issue.