Splash analytics and prediction

The analytics module allows to detect splashes of values of measured metrics and to predict values of metrics in a specified time interval.

To specify checks the analytics module should perform with receiving new values, a new rule in the section State change conditions must be created. The rule can be specified as a JSON object or configured using the graphical interface.

Detecting splashes

An example of a rule that detects a value splash:

[
  {
    "condition": {
      "_splash": {
        "metric": "averageCpuLoad.oneMinuteAverageLoad",
        "value": {
          "_gt": "1"
        },
      }
    },
    "state": 5,
  }
]
image

In the example for each new value of the metric averageCpuLoad.oneMinuteAverageLoad of the current object, the difference between the current value and what the analytics module system has expected to get will be calculated. When the calculated value (difference) goes beyond the boundaries, set by the user, the state of the object changes to the specified one.

During the process of collecting data from the metric the analytics module analyzes the nature of the metric value changes and calculates the interval at which the metric values are considered normal for a particular sequence of values. This interval is usually sufficient to avoid frequent state changes, meanwhile continuing to respond to truly abnormal splashes.

The rule "value": {"_gt": "1"} here means that only when the deviation is more than 1 (specified in units of the specific metric) from the boundaries of the confidence interval (up or down), the state of the object changes to the one, indicated in the parameter state (in this case 5 | ALARM).

In addition to "_gt" (strictly greater), it is also possible to use

  • "_lt" (strictly less),

  • "_gte" (greater than or equal to),

  • "_lte" (less than or equal to).

Predicting values

An example of a rule to predict metric values in the given interval:

[
  {
    "condition": {
      "_predict": {
        "metric": "MEM.bytesAvailable",
        "value": {
          "_lt": "200000000"
        },
        "history": "1d-ago",
        "period": "10"
      }
    },
    "state": 5
  }
]
image

In the example for each new value of the metric memory.bytesAvailable of the current object, the 10th future metric value will be predicted (taking into account the current period). If the calculated metric value is less than 200000000, then the state of the object will be changed to 5 (ALARM). With building/rebuilding the model, the history of metric values for the last 24 hours will be used.

Fields description

The following fields are used in the condition settings:

Field Acceptable values Description Type

Check type

"_splash" or "_predict" (key value)

The name of the check. Defines the "{…​}" block, inside which the remaining fields are described (listed below).

Required

metric

Any string

The name of the metric to analyze.

Required

value

It depends on metric

The value of the analyzed metric.

Required

history *

This field syntax corresponds to OpenTSDB format.

The history period (depth) of the series to build the model (default is '1w-ago' - one week ago). It is counted back in time from the moment of starting the procedure of building / rebuilding the model.

Optional

period **

Positive integer

The number of forecast periods (default 1).

Optional

refitEach ***

Positive integer

Rebuild the model every n values (by default 1000).

Optional

* The history period of the series (history) should cover at least one full season in the analyzed metric life. If the metric does not have a pronounced seasonality, then it is enough to choose the length of the series, which allows determining its trend. A predictive model cannot be built on a set of fewer than 10 values.

** In the period field the number of forecasting periods is set. For example, "period": 5 means that the system will forecast the fifth value of the metric from the current one. Thus, to obtain the value of the forecast horizon on the timeline, it is necessary to multiply the period value by the value specified in the Period field with configuring monitoring.

*** As updates accumulate, the model gains weight and the speed of updates gradually decreases. In order to update the model, increasing the speed of its work periodically, the parameter refitEach is used. By default, its value is 1000, it means that the model will be rebuilt after each 1000 processed values. The minimum acceptable value for this parameter is 10. However, it is necessary to ensure that the model build time fits the receipt interval of the refitEach values, otherwise, the algorithm will enter a state of a continuous rebuilding of the model.