3.8 Anomaly Detection
In 8.4, we introduced several new components.
- An anomaly detection engine
- An embedded long-term time-series database
The anomaly detection engine uses machine learning models to identify outliers and unusual behaviors for several metrics. Upon detection of an anomaly, a warning alert is generated.
Awareness of such anomalies identifies early symptoms to emerging issues, allowing you to address them before they become bigger problems. In our implementation, we’re using the popular z-score method to detect anomalies. A z-score measures exactly how many standard deviations above or below the mean a data point is. The system evaluates several metrics based on a week’s worth of data points. Whereas regular issues, the system evaluates 90 minutes’ worth of data points. These data points are stored in the newly embedded time-series database. When a new data point is collected, a z-score is calculated. An alert is generated if the z-score of that data point is greater than 3 (or less than –3). The alert will remain active for 10 minutes. During that time, if no other anomalies are detected, the alert will resolve itself and go into the cooldown state. If another data point has a z-score greater than 3 (or less than –3), the alert will remain active for another 10 minutes.
Note: By default, the system does not write metrics to the long-term time-series database. In other words, this feature is disabled by default. Please create a ticket. Our support team will work with you to ensure you’ve the resources to support anomaly detection before writing metrics to the long-term time-series database.