3.7 Aggregate Policy

Aggregate policy introduces the concept of aggregating multiple issues into a related event. Instead of alerting per metric, a better approach is to generate one alert for a group of metrics associated with a related event, such as “cluster switchover”. 

In 7.3, to enable Aggregation policy for the Cluster switchover event, enter the following in the browser’s URL:

https://<machine IP>/settings/application?agg_policy=true

When the first cluster switchover issue hits our system, it opens a single incident and allows future, related issues to simply aggregate into said incident. The issues will continue to aggregate into the incident until all the issues are resolved. The system will not have concurrent aggregate policies applied to the same cluster. We will consolidate all the issues into a single email notification and we will only open a single ServiceNow incident. 

How does it work?

The system pre-defines a set of High Availability rules that are potentially indicating an aggregation beginning.

  • Cluster down
  • Cluster member no longer active
  • (Clustered Virtual Devices) Cluster down
  • Firewall cluster problem
  • Management high-availability sync down
  • Virtual Firewall cluster member no longer sync down

When one of these rules hits the system, the aggregation begins. The system aggregates subsequent issues observed from all the devices belonging to the same cluster within the 30 minutes window.  

The list of rules to be included in the aggregation are:

  • All the High Availability issues
  • Two Indeni System rules: 
    • Device Not Responding 
    • Device Temporarily Suspended
  • Device restarted (uptime low)
  • Debug mode enabled
  • SecureXL disabled
  • Communication between management server and specific devices not working
  • Core dump files found
  • Critical process(es) down
  • Critical process(es) down (per VS)
  • Network port(s) down
  • Next hop inaccessible
  • Required interface(s) down
  • pnote(s) down
  • Device is logging locally
  • Repeated failed login attempts by a user