How to monitor F5 devices – SNMP vs API vs SSH

F5 has many ways of interfacing with their products and when writing monitoring we had to do some research which one is more suitable in terms of performance. After all, monitoring should not harm the device it monitors. When choosing methods we looked into iControl REST, SNMP and TMSH. See below for how this test was conducted and which one won.

The best way to monitor F5 – How the test was conducted

We ran each type ~20 minutes continuously through command-runner. While running the tests the web interface was used to make sure that the web interface responsiveness was up to par.

The commands to run each test

#REST
while true; do
command-runner.sh full-command –basic-authentication user,password rest-pool-statistics.ind 10.10.10.10
done
#tmsh
while true; do
command-runner.sh full-command –ssh user,password ./show-ltm-pool-detail-raw-recursive.ind 10.10.10.10
done
#SNMP
while true; do
command-runner.sh full-command –ssh user,password ./snmp-pool-statistics.ind 10.10.10.10
done

Results

The test started out with 283 pools (with 200 additional ones created just for this test). However, when trying the tmsh command, command-runner timed out, so we had to reduce to the original 83 pools and rerun the test using rest to make it fair.

  • Test 1: REST = 283 pools
  • Test 2: Tmsh = 83 pools
  • Test 3: SNMP = 83 pools
  • Test 4: REST (take 2) = 83 pools

4 hour graph

24 hour graph for reference

REST

  • Did not produce any timeouts in the GUI in any of the two tests.
  • Always produced results.
  • Management interface only became sluggish one time during the second attempt. Most likely because of the already high swap usage created by the TMSH tests.

TMSH

TMSH produced these once in awhile:

  • When that happened you can see the gaps in the graph. It is unknown what the gap after the graph was because we was working on the snmp metrics at that time.
  • TMSH also failed to give results sometimes.
  • Forced to run with fewer metrics than rest in order to even get a result.

SNMP

  • Truncated the pool names sometimes. It is unclear why ast was always done on long names, but different lengths.
  • Did not produce any timeouts in the GUI.
  • Always produced results.
  • Did not have as many metrics as REST since the exact same metrics was not available in one command (pool state and availability is missing).
  • Management interface became a bit sluggish on and off.

Conclusion

Over all REST won the test with SNMP as second. TMSH did not even qualify as it takes up very large amounts of memory and swap which negatively affected the overall system.

Interested in learning more? Download the indeni technical whitepaper.

Thank you to Patrik Jonsson for contributing this article

How to select script monitoring authentication types

Considerations when selecting authentication types

Choosing an authentication method for monitoring your infrastructure devices might sound easy at first glance. After all, a monitoring script would only need read-only, right? Wrong.

Monitoring with indeni goes beyond what normal monitoring tools does. The goal of indeni is to detect problems before they occur, saving you hours of troubleshooting and root cause analysis down the road. To get early detection indeni needs access to the advanced shell. Let’s take a look at what this means on F5 devices.

Example: Selecting authentication types for F5 devices

On an F5, having access to the advanced shell means that the user in question must have administrator access. Also, iControl REST requires the user to be locally authenticated up until version 11.5.4. This means that for systems running versions up to 11.5.4 using RADIUS for authentication administrators will have to resort to the local admin account for REST calls.

On top of that if a system has configured authentication and authorization using RADIUS there is no way of setting the shell to advanced shell on any version. So yet again, administrators must resort to the local admin account in order to set the proper permissions.
We have gone above and beyond to avoid using local admin accounts by investing a lot of time running monitor commands via TMSH. However, this has turned out to cause harm to the system due to TMSH using way too much memory. So what does this mean? In order for get the most out of using indeni, administrators will have to configure authentication according to the following table:

Version
Authentication
Authorization
User
11.5.4 and earlier
Any
Any
Local admin (with SSH access)
11.6.0 and later
Remote
Remote
Local admin (with SSH access)
11.6.0 and later
Local
Local
Any account with role Administrator and shell set to Advanced Shell
11.6.0 and later
Remote
Local
Any account with role Administrator and shell set to Advanced Shell

Interested in learning more about indeni? Download the indeni technical whitepaper.

Thank you to Patrik Jonsson for contributing this article.

Machine learning for logs, cut through the hype.

 

Splunk recently announced new machine learning capabilities in its Splunk Cloud and Splunk Enterprise 6.5 release. Does everyone have machine learning capabilities now? What exactly is machine learning? See below for key considerations for this technology approach and how indeni’s machine learning differs from Splunk.

3 things IT needs to know about machine learning


  • Machine learning algorithms have been around for decades. Most of them, especially those that are mathematically based, are not new. For example Arthur Samuel coined the term “machine learning” in 1959!
  • Machine learning works best with large sets of data. You need a substantial amount of information to determine trends, correlations, etc. Take the example of the NVIDIA self-driving car that was shown at CES this year. Only after 3000 miles of driving on highways, back roads and suburban roads was the car able to stop running over traffic cones and avoid parked cars.
  • If not constrained, Machine learning will have a very high false positive rate. To continue the analogy from above, say you are monitoring multiple types of automobiles. Comparing the device data of a semi-truck to a Tesla would be interesting, but not actionable. Say one of your rules was to alert if the engine noise exceeded 100 decibels, as you believe this level of noise indicates there is an issue with the engine. A semi-truck would generate an alert every time it turned on, whereas a Tesla would hardly say a peep. Giving your machine learning constraints (eg. compare Tesla data only with other Tesla’s) yields far more accurate results.

Moral of the story, if a vendor pitches you on “machine learning” it’s OK to be optimistic but be cautious. Here are some questions you can ask to see if the machine learning will make your team more productive:

  • How does the vendor help the algorithm focus on the important elements? How do they help their technology understand the data to reach the right conclusions?
  • How do they avoid a high rate of false positives? For example, if their machine learning algorithms find “an anomaly” what are the chances it’s a true positive?
  • How does the vendor make its alerts or findings actionable?

4 ways indeni machine learning differs from Splunk


Now that we are on the same page for machine learning, here are four ways that indeni differs from SIEM and Log Management solutions such as Splunk.

#1 indeni ingests configuration data in addition to statistics and logs of devices.

Collecting greater depths of information on devices and the software running on them allows indeni to identify issues with greater accuracy.

#2 indeni has the largest database of device knowledge.

indeni has a growing repository of known infrastructure issues and resolution steps for the largest Enterprises. This information is gathered from our customers, indeni engineers and third party experts around the globe. How does this help on a daily basis?

  • Root cause analysis: Instead of coming up with a hypothesis and then building a query in Splunk so that you can schedule alerts when the same log or event occurs, indeni has the knowledge built into its core alerting system, no scripting or queries required.
  • Troubleshooting: When you receive an alert in indeni, in addition to telling you the affected device or error code, indeni provides a human readable description, the implication of not addressing the event and steps to resolution, helping network and security operations teams prioritize focus areas and shorten the mean time to resolution.

#3 indeni connects admins and engineers across the globe

In addition to applying machine learning to the data in your environment, indeni learns from other indeni customers and applies those learnings to your indeni instance. Our users subscribe to a service called “indeni Insight,” which sends data from their environment to our central repository. The data is sanitized and contains general device characteristics and behavior information. For example what model the device is, what software is running on it, which features are enabled, the status of licenses or contracts, running metrics (CPU, memory, etc.), system logs, active users and much more. The result for administrators and engineers? It’s like leveraging the expertise of thousands of your IT operations friends at Fortune 500 companies.

#4 indeni’s algorithmic model is based on the assumption 99.9% of the time devices are working as expected.

Based on our experience as network and security professionals, we know a device malfunctions only 0.1% of the time. In addition, it is widely documented that 70% of network outages occur due to device misconfigurations. These two constraints are built into our machine learning algorithm which allow us to reduce false positives, saving our customers time and money.

At a glance: indeni vs. Splunk

SimilaritiesDifferences
  • indeni and Splunk ingest data from a variety of devices
  • indeni and Splunk machine learning are based on algorithmic models
  • indeni and Splunk machine learning correlate data
  • indeni ingests configuration data in addition to statistics and logs of network devices
  • indeni has the largest database of device knowledge
  • indeni connects admins and engineers with each other
  • indeni’s algorithmic model is based on the assumption that 99.9% of the time devices are working as expected. We help you find the .1%

Conclusion


indeni is capable of identifying specific issues, which pertain to specific types of products and even specific software builds, at a level of accuracy and actionability never seen before. With indeni customers can find health and operational issues before they happen in their infrastructure, proactively handle them and have a better life. Interested in trying indeni in your environment? Contact us or engage with one of our registered partners.

Vulnerabilities from SWEET32 in F5 Load Balancers Reveal.

How to Mitigate Vulnerabilities from SWEET32 in

F5 Load Balancers.

The SWEET32 vulnerability is targeting long lived SSL sessions using Triple DES in CBC mode. The attack targets the cipher itself and thus there is and will be no hotfix for this. The only way to mitigate is to either disable the 3DES-CBC ciphers or set a limit on the renegotiation size.

This guide will cover how to configure both in the load balancer, and also how to protect your management interface (only possible by changing the cipher string).

For the official F5 article, please refer to this link: https://support.f5.com/csp/#/article/K13167034

Configuring an SSL Renegotiation size limit

SSL renegotiation options determines if the BigIP will allow the client to make mid-stream reconnections or not. The option called size limit will set a limit, in megabytes, of how much data that is allowed to pass through the connection before an SSL renegotiation will be disabled (on that specific connection). Since the attack targets long lived sessions (or rather high amount of SSL data) this will mitigate the attack. F5 recommends to set the limit to 1000MB.

Possible impact of setting a renegotiation size limit

Clients with renegotiation support will need to be able to handle a failed SSL Renegotiation and instead establish a new one completing the full SSL handshake. This should not pose a problem in the vast majority of applications.

SSL session settings are regulated by SSL profiles. While you can change the parent profiles we do not recommend doing so without doing proper testing.

Procedure to set an SSL Renegotiation size limit via the GUI

  1. Log in to the F5 administration interface

 

  1. Navigate to Local Traffic -> Profiles -> SSL -> Client

 

  1. Click on the profile you wish to set the limit to

 

  1. In the settings, locate Renegotiation size

 

  1. If it’s disabled (see the picture below), click on the custom option checkbox to allow changes

  1. Change the value from Indefinite to Specify, then enter 1000

 

  1. Click on “Update”

Procedure to set an SSL Renegotiation size limit via TMSH

  1. Connect to your load balancer via an SSH client

 

  1. If not already in the tmsh, enter it

 

  1. To modify the size limit of a profile, issue this command and replace the profile name with

the full path of the ssl profile you wish to modify:

modify /ltm profile client-ssl <profile_name> renegotiate-size 1000

Example:

modify /ltm profile client-ssl /Common/example.com renegotiate-size 1000

  1. Commit the configuration to disk by running the command “save sys config

 

Disabling 3DES-CBC ciphers in SSL Client profiles and the management interface

This method will remove the affected ciphers from the list of applicable ciphers during SSL connection negotiations. SSL cipher settings are regulated by SSL profiles. While you can change the parent profiles we do not recommend doing so without doing proper testing.

 

Possible impact of disabling 3DES-CBC ciphers

While most modern browsers supports wide array of ciphers this will still reduce the cipher support of the virtual servers using the modified SSL profile. Depending on your current cipher string this could or could not pose a risk to clients and the virtual servers not being able to agree on a cipher.

Please note that the following guide is just meant to show how to disable these specific ciphers and uses the system default cipher list to do so. Using the default list is not recommended. To determine which cipher list you should use, please read up on ciphers.

 

This guide is a really good start:

https://f5.com/Portals/1/Premium/Architectures/RA-SSL-Everywhere-deployment-guide.pdf

Procedure to set an SSL cipher string via the GUI

  1. Log in to the F5 administration interface

 

  1. Navigate to Local Traffic -> Profiles -> SSL -> Client

 

  1. Click on the profile you wish to alter the cipher string on

 

  1. In the settings, locate Ciphers

 

  1. If it’s disabled (see the picture below), click on the custom option checkbox to allow changes.

  1. Change the value from the current value to <CURRENTLIST>: !DHE-RSA-DES-CBC3-

SHA:!DES-CBC3-SHA:!ECDHE-RSA-DES-CBC3-SHA

  1. Click on “Update”

Procedure to set an SSL cipher string via TMSH

 

  1. Connect to your load balancer via an SSH client

 

  1. If not already in the tmsh, enter it

 

  1. Show the current cipher string with the following command:

list /ltm profile client-ssl [profile name] ciphers

Example:

list ltm profile client-ssl /Common/example.com ciphers

ltm profile client-ssl Testpartition/testthemciphers { ciphers DEFAULT

}

  1. Note your current cipher string. In the example above it was “DEFAULT”
  1. Modify the cipher string with the following command:

modify ltm profile client-ssl [profile name] ciphers <CURRENTLIST>:!DHE-RSA-DES-CBC3-SHA:!DES-CBC3-SHA:!ECDHE-RSA-DES-CBC3-SHA

Example:

modify ltm profile client-ssl /Common/example.com ciphers DEFAULT:!DHE-RSA-DES-CBC3-SHA:!DES-CBC3-SHA:!ECDHE-RSA-DES-CBC3-SHA

  1. Commit the configuration to disk by running the command “save sys config

Did you find this article informative? Imagine what other vulnerabilities indeni’s powerful machine learning technology can help mitigate. Click the link below!


Free Trial

Predictive Analytics and the Future of IT

In this world of infinite connectivity we are using data more and more to make sense of our environments. One such technology being incorporated into businesses is “Predictive Analytics”. We are already using mathematical formulas to predict certain events related to the stock market, weather, etc. With the processing power and technology available today, these algorithms have developed a fair degree of accuracy. Which leads to my next question, “Why not use ‘Predictive Analytics’ to predict IT systems / Network failure?” How about being able to anticipate network failures days before they actually happen? If you are managing a complex an IT set-up, you will want to get your hands on this technology.

What is Predictive Analytics for IT?

Predictive Analytics is a branch of data mining that uses mathematical algorithms like regression modeling techniques to describe the relationship between of various variables that contribute to the functioning of a system. Through machine learning, they assess the behavior of the variables under normal circumstances and monitor their behavior continuously to find out if there are significant abnormalities. These algorithms can be set to observe for certain behavior patterns that precede major trouble causing scenarios.

For example, predictive analytics can assign a score (probability) for each individual device or not. Institutions like insurance companies use predictive analytics to find out the relationship between various variables and the risks involved. They evaluate candidates with certain age, marital status, credit history, employment profile, etc are more prone to risky behavior than others and then decide if they want to give policies or not. Can this technology be used in IT systems?

“Monitoring” IT systems are still done the old fashion way – in silos

Various monitoring systems are in place for organizations today:

  • Network monitoring software
  • Virtualization monitoring modules
  • Servers monitoring software
  • Databases/ Applications monitoring software
  • Storage systems monitoring software

If you work in a large, complex organization you need to continuously monitor all the above management modules individually. The biggest issue with this model is as IT Systems and Network grow in complexity, the possibility human error increases and failures are only reported after they happen. The majority of  IT professionals only discover issues after the help-desk starts getting calls from the angry users that something is not functioning. Worse off, if you’re business is B2C, you could have angry customers showing their displeasure via social media and other channels.

Of course, redundancy can be set and monitored for irregularities in the system, of course these alerts are either ignored or a network outage occurs due to a totally different parameter that was overlooked, or due to incorrect threshold level settings. IT pros can easily be overwhelmed monitoring too many parameters.

 

How Predictive Analytics help forecast issues before downtime occurs in IT Systems?

When applied to an IT operations scenario, the predictive analytics system can go more in depth than existing monitoring tools to collect data about all the possible variables being monitored like cluster configurations, tracking CPU, log flows utilization, and packet drop activity. Based on this, algorithms  automatically determine the normal operating behavior of these variables and continuously analyzing live data 24/7/365 to determine if any of these variables significantly deviate from their normal behavior in a certain pattern that might indicate performance problems in the near future.

Predictive analytics accumulates as much data as possible from various sources and uses mathematical algorithms to understand the relationship between the variables in the current state. Based on this information, it can forecast what is likely to happen next, including any potential trouble causing situations. This way it tries to identify network downtime/ IT systems malfunction days before they actually occur.

The main advantage with predictive analytics is none of this data needs to be manually entered, nor is there a requirement to set manual thresholds.  Predictive Analytics systems claim to do this automatically.

Of course, the system needs to integrate with the current monitoring tools running in the organization. One way the predictive analytics systems can be tested is by feeding it with actual values of the variables (of a certain duration in the past) and monitor if it is able to predict major faults that actually happened in the past. This can, to an extent say how well a predictive analytics system can integrate within a particular environment.

Predictive Analytics can also help to forecast IT systems capacity. For example, it can predict the number of servers needed for a cloud based data center/ large organizations based on the past/ present trends of application utilization.

Of course, Predictive Analytics can never be 100% accurate and tends to have some degree of false positives. But for companies with large data centers and geographically dispersed campuses where even a small downtime in IT systems can cause huge financial or reputation losses, this technology might be worth a try? There is at least one company involved in developing Predictive Analytics for IT and network systems.

indeni Insight
indeni is an intelligent assistant that manages your network 24/7/365





Free Trial




How are Customers Using Check Point Firewalls Around the Globe

 

 

In order to conduct the in-depth analysis of configuration and stats on network devices we collect very large amounts of data. For our customers, this data is very useful in benchmarking their network versus other networks around the world. We call this service indeni Insight.

Below is an aggregation of some of the data we’ve collected through this service. We are providing it to help the wider community consider how their network behaves as well as their future plans.

If you are interested in benchmarking your own network within an hour’s work, try indeni today. Once the system is set up reach out to support@indeni.com and we’ll do everything else.

check point firewalls
indeni is like a crystal ball for your Check Point Firewalls

Ready to get started?

Create a healthy network in minutes. Click here to begin your 15 day trial for Check Point Firewalls.


Free Trial


Announcing indeni 5.4: New rule engine, Check Point 61000/41000 support

Welcome 5.4!

In this release we’ve included phase one of our infrastructure operations platform, added new content and as well as Check Point 41k/61k support. In addition, specific feature requests and bugfixes were included. Please reach out to our support team to get the updated release.

IMPORTANT NOTE TO ALL USERS: Starting with 5.4, the licensing mechanism is attached to the indeni instance’s unique identifier (uid) and not the IP address. This allows customers to not only change the IP of their indeni instance, but also set up cold active/standby high-availability in case the primary indeni instance is down or is cut off from the network. To set up cold active/standby, please reach out to our support team.

New content: Continue reading

Announcing the future of infrastructure health

Today I’m excited to announce our platform for infrastructure health. Before I go into what we’ve just done, let me explain why.

What’s the current status of infrastructure health?

What exactly is broken in infrastructure operations? Why are enterprises around the world still grappling with downtime?

Our research, as well as that of others, points to the human element. Over 70% of all outages are caused by human error. This is baffling – the people responsible for running the infrastructure are some of the smartest people out there. I meet them regularly, they know their job well. Many of them have a decade or more of experience in what they do. Still, mistakes occur. Why is that?

Continue reading

Why do infrastructure operations still suck?

Last Friday, I met with an individual that leads a 300-person team, responsible for running the networking and computing infrastructure in 50 data centers around the globe. I asked him what he thought of his OSS stack – the set of tools his team uses to stay on top of what’s going on in their infrastructure.

He hates it.

As I want to keep this blog post PG-rated, I’ll refrain from using his adjectives, but I can tell you he’s not happy with it. It’s a clobber of open source and commercial tools. The tools required a lot of customization and a variety of extensions written over the years. At the end of the day, though, it only gives him up/down monitoring and no ability to proactively avoid the next outage. Over 70% of outages occur due to human error and misconfigurations and the tools available to him are incapable of identifying even one percent of that.

Continue reading

What We’ve Learned From Speaking With Our Customers

A month ago I shared some of our plans for 2016 and mentioned that I’d be speaking with our customers, asking them a few questions. The survey was very successful in my opinion – I spoke with dozens of customers for 30 minutes each and asked them 14 different questions. I would like to thank all of the survey participants for enduring my questions and sharing their honest feedback.

Continue reading