Technical Overview
The Indeni knowledge platform is a powerful software used to collect data from IT infrastructure components and analyze them according to known best practices, common issues and misconfigurations and product bugs. The knowledge base that is used by the platform is community-generated, with people from around the globe contributing their knowledge to form a gigantic library.

Indeni’s goal

The software, called “Indeni” after the company that builds it, has one main goal in mind: to identify what could go wrong in IT infrastructure components and tell the user about it before things actually break. The main output of the product are pre-emptive alerts, containing information regarding what the issue is going to be, why it thinks it’s an issue and how to fix it.

The result – a reduction of up to 90% in critical, severity one, issues. The product is designed to be used by the engineering and operations teams responsible for maintaining a wide range of IT infrastructure components, including switches, routers, firewalls, load balancers, proxies, server virtualization environments, private clouds, public clouds, storage networks and devices and more.

At the base of all of this is the platform – a set of software packages capable of retrieving data from running infrastructure components, storing that data and then analyzing it with the knowledge base in mind.

How Indeni Works

Collector

The first step in that platform’s operation is the collection, done by the collector. The collector is responsible for connecting to the variety of infrastructure components and retrieving data from them using a variety of protocols. Some components are communicated with via vendor-supplied APIs, some via command line (CLI) over SSH, some via SNMP, and some via other means.

Data collected by the collector is parsed and converted into metrics by the collector. Those metrics are then sent to the backend server for storage in its databases.

Server

The server contains the databases in which data retrieved from devices is stored, as well as runs the rule engine. The rule engine is the component responsible for reviewing the data and looking for potential issues. If a rule finds a possible issue, it informs the alerting component of this, together with the actual text of the alert.

Contributing knowledge

If you’re interested in contributing your own knowledge, you will need to learn how to write rules for the rule engine to execute. If your rule requires certain pieces of data that are not yet available, you will also need to learn how to write collection scripts for the collector to execute. The separation between the rules and collector scripts allows for an easier development process, reusability and testability. Testing is a crucial element of knowledge contribution and we will discuss that in-depth as well. Learn more here

Example ACME

This is a made up product we’re using for the sake of examples throughout the documentation. On this page, we’ll describe the device itself to give more depth to it.

This device has two main protocols in which you can communicate with it:

  1. A REST API that was devised by ACME. This REST API is available over HTTP and HTTPS. When utilizing the REST API, you must provide a username and password. Through this API you may retrieve many pieces of information – such as the current hardware component’s health (the humidifier’s water level for example) and the software parameters (such as the level of humidity desired). Recall that REST API responses are in JSON.
  2. An command line interface over SSH that allows to run some of the commands used in the REST API, as well as others that are not available through it.

Some additional notes:

  • The ACME Humidifier 2140 is running AOS (ACME OS).
  • In the 2140 there are two water tanks, one filter and one engine.
Collector

The collector receives messages from the server which contain the details of the device it should connect to, including its credentials. As a first step, the collector interrogates the device, to determine what it is. The result of this interrogation process is a set of tags describing the device. Once the interrogation phase is done, the collector will continue to run all “monitoring” commands repeatedly, collecting data from the device and parsing it into metrics.

Interrogation

In the platform’s data model, each device has tags. These tags can include “device-id”, which is a unique identifier for a device, as well as others. The tags describe the device and give us extra information about it. The process of generating these tags occurs only during interrogation. If a device is changed later, for example by updating the software running on it, the collector will throw away everything it knows about the device and will restart interrogation.

Let us consider a device called ACME Humidifier 2140. This device is connected to the enterprise’s network, has an IP address and is accessible through both a REST API (available over HTTP and HTTPS) and an SSH connection. The collector was provided with the IP address, name and credentials for this device.

The collector has no idea that the ACME Humidifier is the specific device available at the IP address it received. So it needs to determine what’s actually there. It will first run all interrogation commands that have no requirements and collect the first set of tags. Based on those tags, it will run other interrogation commands that require them. It will continue to do so until all interrogation commands are done and the tags are collected. See a sample of interrogation commands and their outputs for the ACME Humidifier.

If we lose connectivity to a device, we will restart the interrogation once we re-establish connectivity. We will essentially delete all the tags we have for the device and start over.

Monitoring

Once a device has been interrogated, the collector will load all the monitoring commands (those of type “monitoring”) whose requirements are met for the device (based on the “requires” directive of the monitoring script). See sample scripts here.

The scripts will be run repeatedly, according to their interval. The metrics generated will be sent off to the server for storage in the DB. Prior to sending to the server, the collector attaches the device-id and collection timestamp to each metric. In some cases, a single monitoring script can generate several thousand metrics.

Interrogation Script (sample)

All scripts in this page are used with the ACME Humidifier 2140.
Initial Script
This script has no requirements (see “requires” in the configuration of it). It will be run for any device responding to SSH, whether it’s the ACME Humidifier or not.
#! META
name: ssh-show-version-for-humidifier
description: run “show version” over SSH hoping to find the ACME Humidifier
type: interrogation

#! REMOTE::SSH
show version

#! PARSER::AWK
BEGIN {
}

# Sample line from the output of “show version” on the ACME Humidifier:
# This ACME Humidifier is running AOS 9.14.11
/ACME Humidifier/ {
writeTag(“vendor”, “acme”)
writeTag(“product”, “humidifier”)
writeTag(“os.name”, “AOS”)
writeTag(“os.version”, $NF) # We get the last element in the line for the os version info
}

END {
}
The result of running the above script on the ACME Humidifier is this:

TAG:vendor=acme
TAG:product=humidifier
TAG:os.name=AOS
TAG:os.version=9.14.11
These tags are now associated with the device.
Followup Script
Now that we know it’s the ACME Humidifier, we can run other commands and get more information.
#! META
name: humidifier-show-model
description: run “show model” over SSH on the ACME Humidifier
type: interrogation
requires:
vendor: acme
product: humidifier

#! REMOTE::SSH
show model

#! PARSER::AWK

BEGIN {
}

# Sample line from the output of “show model” on the ACME Humidifier:
# Model name: ACME Humidifier 2140
/Model name/ {
writeTag(“model”, $NF)
}

# Serial number: ACMEHUM2140AHO0519661
/Serial number/ {
writeTag(“serial”, $NF)
}

END {
}
The result of running the above script on the ACME Humidifier is this:

TAG:model=2140
TAG:serial=ACMEHUM2140AHO0519661
These tags will also now be associated with the device.

Monitoring Script (sample)

All scripts in this page are used with the ACME Humidifier 2140. The script assume that the interrogation was completed according to the interrogation scripts available here.
Hardware Monitoring Script

#! META
name: ssh-hardware-stats
description: fetch hardware stats for ACME Humidifier
type: monitoring
monitoring_interval: 5 minute
requires:
vendor: acme
product: humidifier

#! REMOTE::SSH
show hardware stats

#! PARSER::AWK

BEGIN {
}

# Sample line from the output of “show hardware stats” on the ACME Humidifier:
# Current humidity: 27%
# Fan speed: 200 RPM
# Water remaining: 1.72 L

/Current humidity/ {
# $NF means “give me the last field” in awk.
percentage=$NF

# Remove the “%”, we don’t include that in the value we store (a double metric can only be a number with a decimal place, nothing around it)
sub(/%/, “”, percentage)

# We write the humidity level metric, informing the db that this is a gauge that is updated every 5 minutes.
# We also include information for “live config”.
writeDoubleMetricWithLiveConfig(“humidity”, null, “gauge”, “300”, percentage, “Hardware – Stats”, “percentage”, “”)
}

/Fan speed/ {
# Get the one-before-last field – we don’t need the “RPM” stuff
speed=$(NF-1)

# We write the fan speed metric, same parameters as humidity.
writeDoubleMetricWithLiveConfig(“fan-speed”, null, “gauge”, “300”, speed, “Hardware – Stats”, “number”, “”)
}

/Water remaining/ {
water=$(NF-1)

# We write the water remaining metric, same parameters as humidity.
writeDoubleMetricWithLiveConfig(“water-remaining”, null, “gauge”, “300”, water, “Hardware – Stats”, “number”, “”)
}

END {
}
The result of running the above script on the ACME Humidifier is this:

DOUBLE_METRIC:im.name=humidity,display-name=Hardware – Stats,live-config=true,im.dstype.displayType=percentage,im.identity-tags=””,im.metric-type=ts,dsType=gauge,step=300,value=27
DOUBLE_METRIC:im.name=fan-speed,display-name=Hardware – Stats,live-config=true,im.dstype.displayType=number,im.identity-tags=””,im.metric-type=ts,dsType=gauge,step=300,value=200
DOUBLE_METRIC:im.name=water-remaining,display-name=Hardware – Stats,live-config=true,im.dstype.displayType=number,im.identity-tags=””,im.metric-type=ts,dsType=gauge,step=300,value=1.72
These three metrics are now sent by the collector to the server for storage in the db. The metrics will have the device-id and timestamp included as well (the collector adds them).

Metrics Analyzed

A metric is a measurement of a specific piece of data with a specific time stamp. For example, taking a person’s temperature is a metric – it has a value (such as 103) and a time (such as today at 8:01 AM). Most metrics used in indeni are numeric in form and are called double metrics (DOUBLE_METRIC). They will be a number with a decimal point, such as 0.0, 1.0, 103.51, etc. Metrics that are not numeric, or are more complicated than just a number, are called complex metrics (COMPLEX_METRIC).

We use a DOUBLE_METRIC to describe a current state (like is a BGP peer up or down) or a numeric value that has a timestamp (like temperature). We use a COMPLEX_METRIC to describe a current configuration (human or script controlled), or in some cases, a current state when it requires more than just a 0 or 1 value.
A metric has a type (such as “under-the-tongue-temperature”), tags (which we’ll get into later) and a few other aspects (also, we’ll get into later).

Double Metrics (DOUBLE_METRIC)

Double metrics are numbers. In the case of our ACME Humidifier, one could imagine the following metrics:

humidity (a number from 0.0 to 100.0 representing the current level of humidity in the room)
water-remaining (the amount of water, in liters, remaining in a water tank)
fan-speed (the speed of the fan in RPM)
filter-installed (1.0 if the filter is installed, 0.0 if it’s removed)
filter-ok (1.0 if the filter is OK – that is, it’s not jammed or dirty, and 0.0 if it’s not OK)
motor-ok (similar to filter-ok, just regarding the motor)

A few things to notice about the above metrics:
They are all lower case.
They use dashes to connect words (tags cannot contain spaces, nor underscores)
When you want to have a YES/NO answer, you use 1.0 for YES and 0.0 for NO.
The unit of measurement (liters, RPM, etc.) is of no consequence from a metric perspective. Just save the number.

Double metrics are easy to put on a graph – imagine taking the values that are measured for a given metric and plotting on a graph. Since most data we need from infrastructure components is numeric, these metrics are the most commonly used in the platform.

Complex Metrics (COMPLEX_METRIC)

A complex metric is usually something that is not a number (such as a serial number, or a list of files). Sometimes it may actually be a number but it represents a configuration element and not something to be graphed over time. Currently, complex metrics have a very short history in our database – we only keep the last three values of each metric. The complex metrics are used to check the current configuration of a device or the status of something “complicated”. The system also knows how to alert when a complex metric has changed in value.

There are two types of complex metrics:

Simple string – for example, the timezone that is set on a device (“America/Los_Angeles”). This will actually be stored in a JSON format looking like this:
{“value”: “America/Los_Angeles”}
(the key will always be called “value” here)

List of items – this is a JSON array. An array is a collection of items. In this case, each item has key-value pairs. All items must have the same keys. An array cannot have two items which are exactly the same. An example would be a list of core dump files:
[{“path”: “/somepath1”, “created”: “14502281”}, {“path”: “/somepath2”, “created”: “14502351”}] (this is an array with two items, and each item has the “path” key and “created” key).

Sometimes you may encounter a numeric value as a complex metric. For example, the MTU of a network interface. The reason we consider it as a complex metric is that it is a configuration value that you generally would not be looking to graph over time. Some people may want to track it just to see if it was changed by an admin (a future feature we are planning).

Tags

Tags are a way of differentiating metrics of the same type, on the same device. For example, consider the “water-remaining” metric. What do you do if you have two water tanks? You can’t just write double metrics for each of them without noting which belongs to which tank, that will be confusing!

Hence, the need for tags. You can add as many tags as you want. You must make sure these tags are identifiers for the metric. They are not just “auxillary” information. They actual describe the object holding the metric in a way that shouldn’t change for that object.

For example, this is OK:
Issuing a metric with a tag called “id” and a value of “1” for water tank 1 and “2” for water tank “2”. This means two metrics are issued, differentiated by the value under the “id” tag associated with the metric.

This is not OK:
Issuing a metric with a tag called “full” and a value of “yes” if the tank is full and “no” if the tank isn’t full. This is because tags cannot change for a given object over time.

Are you a contributor? Learn more about Tags here.

Indeni Insight FAQ

Indeni’s unique technology is machine learning based and it accumulate enormous amounts of data on each device it runs checks on 24/7/365. This proves to be helpful to IT Pros looking to improve the stability of their mission critical environments.

The knowledge collected, Indeni Insight, is a massive database of all checks, misconfigurations and alerts created from networks from all over the World. The data is anonymized for the privacy of our clients. As more devices are added, Indeni continuous to expand its learning curve. Virtually, predicting, preventing and giving our users remedial, comprehensive solutions to their most common problems they did not know they had. See below for the most frequently asked questions and contact us if you didn’t find the information you need.

What is Indeni?

Indeni is the newest innovative platform in information technology design with predictive analytic algorithms to help IT teams predict issues before they become major events like outages or network failures. See how Indeni works here.

What data is being sent?

  1. Information of the Indeni server itself (version, memory usage, disk space usage, etc.).
  2. Current configuration of the Indeni server (devices connected to, users defined, etc.). Note that this configuration does not contain passwords or keys.
  3. Alerts issued by Indeni, excluding IP addresses and confidential information Indeni deems sensitive.
  4. A copy of the inventory report (similar to what’s issued via Indeni’s web dashboard), with IP addresses and sensitive information removed.
  5. Trends of critical metrics on devices Indeni is connected to – such as CPU, memory, disk space, NIC utilization.
  6. Samples of logs from devices Indeni is connected to with IP addresses and sensitive information removed.

Why is this data being sent?

  1. The data is used to collect industry-wide statistics on how devices are being used, what issues are occurring, the performance of the devices, etc.
  2. Specific device data, such as the logs, are used to improve our algorithms for expanding our knowledge automatically. For example, certain logs that appear across multiple devices in multiple locations help us discern their importance.

How is the data secured?

Before the data leaves the Indeni server, it is cleaned (IP addresses and sensitive information removed) archived and encrypted. It is then sent to our file store on Amazon Web Services (S3) using HTTPS. The file is then pulled by a separate system, decrypted and analyzed. The results from the analysis are stored in several different databases that are only accessible using two-factor-based credentials available to approved Indeni employees. The data stored in these databases is separated to ensure that it cannot be used in a manner that may compromise the safety of any of our customers.

Who has access to the data?

Approved Indeni employees have access to the data for the sake of analyzing it and improving our knowledge. In addition, they utilize the data to construct reports on trends in the industry as well as customized reports for customers helping them compare themselves to similar organizations around the world.

Will Indeni sign an NDA?

Indeni is highly accustomed to signing NDA’s with users to help respond to legal needs. Please contact your indeni account executive to get this done quickly and easily.

Is the data being shared with Indeni posing a risk to my company’s infrastructure?

No it is not. We have gone to great lengths to ensure this is the case. Our founding team is composed of cyber security experts and we have put the security of our customers and our own assets at the top of our list of priorities.

Supported Products

Please note this page is updated on the first of every month. Contact us for any questions related to device support.

 

Vendor

Model Series

Operating Systems

Check Point

  • 600
  • 700
  • 1100
  • 1200R
  • 1400
  • 2200
  • 3000
  • 4000
  • 5000
  • 12000
  • 13000
  • 15000
  • 21000
  • 23000
  • GAiA – R75.40 – R80
  • SPLAT – R70 and up
  • IPSO – R70 and up

F5

Hardware Appliance

  • 5200v
  • 5250v
  • i5800
  • 7200v
  • 7250v/7255v
  • i7800
  • 10200v/10250v/10255v
  • 10350v-F/10350v-N/10350v
  • i10800
  • i12250v

VIPRION Chassis & Blades

  • 2200/D114
  • 2400/F100
  • 4400/J100
  • 4480/J102
  • 4800/S100
  • Platforms:  BIG IP, BIG IPVE
  • Operating System: TMOS 11.6 → current
  • Software modules: LTM, GTM (DNS), BIG-IQ

Cisco

Routers

  • 800
  • 1800
  • 1900
  • 2800
  • 2900
  • 3800
  • 3900
  • 4000

Industrial Routers (IR)

  • IR 800
  • IR 900
  • IR 1000
  • IR 2000
  • VXR 7200
  • VXR 7300

Switches Hardware Appliance

  • 2960-X
  • 2960-CX
  • 3650
  • 3850
  • 4500E
  • 4900
  • 3850
  • 4500-X

Switch Chassis

  • 6500
  • 6800
  • IOS
  • NX-OS

Palo Alto Networks

Firewall Hardware

  • PA-200
  • VM-50
  • PA-500
  • PA-2020
  • PA-2050
  • PA-3020
  • PA-3050
  • PA-3060
  • PA-4020
  • PA-4050
  • PA-4060
  • PA-5020
  • PA-5050
  • PA-5060

Firewall Chassis

  • PA-7000
  • PA-7050
  • PA-7080

VM Series

  • VM-50
  • VM-100
  • VM-200
  • VM-300
  • VM-1000HV
  • PAN-OS
  • VMware ESXi
  • Linux KVM
  • Microsoft Hyper-V