High inode usage, storage may malfunction soon

This is a real life sample alert from indeni. The alert applies to Check Point firewallsF5 Load Balancers, and any other Unix-based device.

Description:

Some file systems have reached a high usage of inodes. To learn more about inodes, please read the Wikipedia entry. SOL12263 also has a reference to this (for F5©).

Affected File Systems:

/dev/mapper/vg–db–sda-set.1.root

73892 inodes used out of 75776.

Top directories using inodes are:
/var/tmp

Manual Remediation Steps:

Look into the directories listed above and consider removing some of the files.

How does this alert work?

indeni analyzes the output of df -i and once a file system crosses the 80% threshold indeni will automatically use the “find” command to find the top directories (for i in `find / -type d `; do echo `ls -a $i | wc -l` $i; done | sort -n as shown here).

Up Next: Crowd-sourced SaaS

Google released something today – a tool called The Customer Journey to Online Purchase. While this just a standalone tool at the moment, I’m sure we’ll witness parts of its capabilities integrated into Google Analytics at some point in the near future.

This release, together a few more startups dotting the landscape (like RelateIQ, as described by Scott Raney of Redpoint, or indeni, on whose blog this post appears) are heralding a new generation of SaaS: the crowd-sourced SaaS. Brian Ascher of Venrock calls it Data-Driven Applications, and so does Scott Raney. However, I believe that term is confusing. Data-Driven causes too many people to think about applications that let you access and visualize data and that’s not what we’re talking about here.

The next generation of SaaS is when the service provided to customers keeps evolving over time based on the data that is stored within it. What would happen if Salesforce.com collected all the data its customers store in its service, analyzes it and uses the results to provide more value to its own customers? indeni is a SFDC customer and I’m sure our data would greatly benefit from being analysed by an automated system that has seen how the data of other SaaS companies looks like. I’m not talking Jigsaw here. I’m talking actionable insights. The kind that derive from data being turned into knowledge.

You see, SaaS is no longer just about being cheaper and easier to use, which is what Salesforce.com started with and Brian Ascher details so well in his Forbes post. That revolution is done – it’s reality now. The question a lot of people are asking themselves is what is the next revolution in software and crowd-driven SaaS is my bet. My belief in it is so big, that I’ve bet my livelihood on it.

I’d be very interested to learn about others who are working on bringing us all to the next level. As with all revolutions that are just starting – it’s very difficult to find on Google who is doing it. We’ll need to go old-school and actually use word-of-mouth. So reach my on twitter: @yonadavl

F5 Active-Active in use, contrary to vendor recommendation

What will happen if one of the active members fails?

This is a real life sample alert from indeni

Description:

This device is part of an Active-Active HA configuration without an additional standby device. This is contrary to F5’s recommendations on the matter.

Manual Remediation Steps:

Add a standby member to the HA group.

How does this alert work?

indeni automatically determines when certain devices are part of an HA group. It uses this determination to execute a large set of checks, such as comparing configurations and licenses. One of the checks is to see if all devices in the group are set to Active and if so – alert.

On-board NICs used on an open server: Check Point Firewalls Configuration Guide to Alerts.

This is a real life sample alert from the indeni Check Point Firewall Configuration Alert Guide.

Description:

indeni has identified that some of the on-board NICs of this open server are being used. Traffic on these on-board NICs may be unstable and some packets may be dropped or experience errors. Review the list of NICs below.

The recommendation to avoid on-board NICs is based on general feedback provided by customers as well as notes at the bottom of some of the pages on Check Point’s Hardware Compatibility List, such as this one.

Affected NICs:

eth0 (Bandwidth: 1000M/full, MAC Address: 90:E2:BA:3C:00:00, IP Address: 1.11.16.16/22)

Manual Remediation Steps:

Modify the network configuration of the device to avoid using the on-board NICs entirely.

How does this alert work?

indeni uses a series of tools, as well as data previously collected, to determine which NICs on an open server are the on-board ones. As some have noticed this is not straightforward. If you believe indeni has labeled a certain NIC as on-board when it shouldn’t be, please let us know.

Interested in learning more?  Download our guide to help build high availability networks for Check Point. 

Check Point Alert of the Week: A NIC and fw_worker shouldn’t be assigned to the same core

This is a real life sample alert from indeni.

Description:

Having one or more NICs assigned to the same CPU core as an fw_worker will result in degraded performance.

Affected Cores:

Core 0 is assigned to both eth1 and fw_0

Manual Remediation Steps:

Update the affinity of cores. Follow the CoreXL documentation.

How does this alert work?

indeni runs “fw ctl affinity -l” and analyzes the output, looking for a situation where one of the fw lines (fw_0, fw_1, etc.) are associated with a core that is also associated with a NIC.

Questions May Return an Incorrect Value: F5© Alert of the Week: iRules using DNS

This is a real life sample alert from indeni for F5 Load Balancing Methods

Description:

The DNS::question iRule command may return an incorrect value.

This issue occurs when all of the following conditions are met:
* An iRule for an LTM DNS event runs the DNS::question command, the iRule then runs a command that suspends the iRule, and after the iRule resumes, the DNS::question command is run again.
* While the iRule was suspended, a subsequent DNS query is processed, triggers the iRule event, and runs the DNS::question command for the subsequent query.

When the suspended iRule resumes and runs the DNS::question command, the value is read from the memory location that was written for the initial DNS::question command. However, because a subsequent DNS query was received, the system will have overwritten the memory location with the value of the subsequent query.

Affected iRules:

/Common/Custom_iRule_1202 uses DNS::question

Manual Remediation Steps:

This device is running a vulnerable version but the iRule referred to above needs to be examined closely to check if it is sensitive to the issue. Please read SOL15489.

How does this alert work?

indeni cross-checks the iRules’ actual content and the current software version used on the F5 device with known issues and alerts when a match is found.

More F5 Load Balancing Methods in our newsletter.

[ninja_form id=20]

 

VPN peer not responding or unreachable

This is a real life sample alert from indeni

Description:

Some of the device’s VPN peers are not responding to VPN traffic.

Affected VPN Peers

VPN peer 191.119.141.40 is currently not responding

Manual Remediation Steps:

The VPN peer is not responding to VPN traffic. Please check network connectivity and contact the administrator of the VPN peer to ensure VPN is still enabled.

Note that many VPN peers do not respond to ICMP pings and will only respond to VPN traffic (such as UDP port 500, IP protocol 50, etc.). To test that the VPN peer responds to VPN traffic, use Nmap’s ike-version script.

How does this alert work?

indeni uses device-specific commands and logs to determine if the VPN peer is not responding.

Announcing indeni 5.1: F5© BIG-IP© support, many improvements

We’re excited to announce version 5.1. While this version has been generally available for a few months now, it has had improvements added to it over the past two months.

New product versions supported:

  • F5© BIG-IP© 11.x

New signatures:

  • The following are some of the F5-related signatures included in this release:
    • Identify node availability issues
    • Pool member connection limit nearing or reached
    • Load balancer connection limit nearing or reached
    • Number of active members in a pool lower than threshold
    • Number of SSL Transactions per Second nearing license limit
    • ConfigSync state not OK
    • Reaper process started
    • Cross check certain log lines with AskF5.com

Bugs fixed and minor improvements:

  • WC-2051: Network Health left-side widgets empty in some cases
  • IS-1365: Discovery of analyzed devices was sometimes slow due to a behavior issue with CentOS’s /dev/random
  • IS-1363: NIC details were not indexed by the Search feature in certain cases
  • IS-1346: Prevent “service indeni4it start” from starting the application more than one time
  • IS-1087: RADIUS authentication with one-time tokens resulted in lockouts
  • IK-1951: VPN debug messages contain partial information
  • IK-1924: “Coredumping setting not as desired” Profile Check – FP
  • IK-1914: “Some members of the same cluster are not being monitored” FP
  • IK-1856: Hardware alert FPs on Check Point Open Servers
  • IK-1626: SNMP monitoring – “Device clock appears to be set incorrectly” FP
  • IK-1919: “SecureXL templates are partially disabled” FP
  • IK-1871: “HSRP cluster members differ in VLAN configuration” FP
  • IK-1670: Live Configuration – all NICS are showing as Down
  • IK-1852: indeni server’s disk filled up without any storage alerts
  • IK-1847: Failed to Communicate alert: wrong details when Check Point shell is not bash
  • WC-1800: Performance of rendering of the list of devices has been improved
  • WC-2061: Network Health – scrolling alerts show acknowledged alerts
  • IS-922: Ignored items list was sometimes cleared instead of stored
  • IS-1371: Full text search improved to increase coverage and improve result sorting
  • IS-1357: “fwaccel stat” added to debug report for Check Point firewalls
  • IS-1088: Improvement to the performance of the generation of inventory reports
  • IK-1901: “RX traffic drastically reduced post fail over, possible ARP issue” add specific interface details