Splash analytics and prediction
The analytics module allows to detect splashes of values of measured metrics and to predict values of metrics in a specified time interval.
To specify checks the analytics module should perform with receiving new values, a new rule in the section State change conditions must be created. The rule can be specified as a JSON object or configured using the graphical interface.
Detecting splashes
An example of a rule that detects a value splash:
[
{
"condition": {
"_splash": {
"metric": "averageCpuLoad.oneMinuteAverageLoad",
"value": {
"_gt": "1"
},
}
},
"state": 5,
}
]
In the example for each new value of the metric averageCpuLoad.oneMinuteAverageLoad of the current object, the difference between the current value and what the analytics module system has expected to get will be calculated. When the calculated value (difference) goes beyond the boundaries, set by the user, the state of the object changes to the specified one.
During the process of collecting data from the metric the analytics module analyzes the nature of the metric value changes and calculates the interval at which the metric values are considered normal for a particular sequence of values. This interval is usually sufficient to avoid frequent state changes, meanwhile continuing to respond to truly abnormal splashes.
The rule "value": {"_gt": "1"} here means that only when the deviation is more than 1 (specified in units of the specific metric) from the boundaries of the confidence interval (up or down), the state of the object changes to the one, indicated in the parameter state (in this case 5 | ALARM).
In addition to "_gt" (strictly greater), it is also possible to use
-
"_lt" (strictly less),
-
"_gte" (greater than or equal to),
-
"_lte" (less than or equal to).
Predicting values
An example of a rule to predict metric values in the given interval:
[
{
"condition": {
"_predict": {
"metric": "MEM.bytesAvailable",
"value": {
"_lt": "200000000"
},
"history": "1d-ago",
"period": "10"
}
},
"state": 5
}
]
In the example for each new value of the metric memory.bytesAvailable of the current object, the 10th future metric value will be predicted (taking into account the current period). If the calculated metric value is less than 200000000, then the state of the object will be changed to 5 (ALARM). With building/rebuilding the model, the history of metric values for the last 24 hours will be used.
Fields description
The following fields are used in the condition settings:
Field | Acceptable values | Description | Type |
---|---|---|---|
Check type |
"_splash" or "_predict" (key value) |
The name of the check. Defines the "{…}" block, inside which the remaining fields are described (listed below). |
Required |
metric |
Any string |
The name of the metric to analyze. |
Required |
value |
It depends on metric |
The value of the analyzed metric. |
Required |
history * |
This field syntax corresponds to OpenTSDB format. |
The history period (depth) of the series to build the model (default is '1w-ago' - one week ago). It is counted back in time from the moment of starting the procedure of building / rebuilding the model. |
Optional |
period ** |
Positive integer |
The number of forecast periods (default 1). |
Optional |
refitEach *** |
Positive integer |
Rebuild the model every n values (by default 1000). |
Optional |
* The history period of the series (history) should cover at least one full season in the analyzed metric life. If the metric does not have a pronounced seasonality, then it is enough to choose the length of the series, which allows determining its trend. A predictive model cannot be built on a set of fewer than 10 values.
** In the period field the number of forecasting periods is set. For example, "period": 5 means that the system will forecast the fifth value of the metric from the current one. Thus, to obtain the value of the forecast horizon on the timeline, it is necessary to multiply the period value by the value specified in the Period field with configuring monitoring.
*** As updates accumulate, the model gains weight and the speed of updates gradually decreases. In order to update the model, increasing the speed of its work periodically, the parameter refitEach is used. By default, its value is 1000, it means that the model will be rebuilt after each 1000 processed values. The minimum acceptable value for this parameter is 10. However, it is necessary to ensure that the model build time fits the receipt interval of the refitEach values, otherwise, the algorithm will enter a state of a continuous rebuilding of the model.