Movatterモバイル変換


[0]ホーム

URL:


CN111506637A - Multi-dimensional anomaly detection method and device based on KPI (Key Performance indicator) and storage medium - Google Patents

Multi-dimensional anomaly detection method and device based on KPI (Key Performance indicator) and storage medium
Download PDF

Info

Publication number
CN111506637A
CN111506637ACN202010551259.0ACN202010551259ACN111506637ACN 111506637 ACN111506637 ACN 111506637ACN 202010551259 ACN202010551259 ACN 202010551259ACN 111506637 ACN111506637 ACN 111506637A
Authority
CN
China
Prior art keywords
abnormal
anomaly detection
kpi
dimension
anomaly
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010551259.0A
Other languages
Chinese (zh)
Other versions
CN111506637B (en
Inventor
程博
成逸然
张文池
李则言
隋楷心
刘大鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Bishi Technology Co ltd
Original Assignee
Beijing Bishi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Bishi Technology Co ltdfiledCriticalBeijing Bishi Technology Co ltd
Priority to CN202010551259.0ApriorityCriticalpatent/CN111506637B/en
Publication of CN111506637ApublicationCriticalpatent/CN111506637A/en
Application grantedgrantedCritical
Publication of CN111506637BpublicationCriticalpatent/CN111506637B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention relates to the technical field of computers, and discloses a multi-dimensional anomaly detection method, a multi-dimensional anomaly detection device and a storage medium based on KPI (Key Performance indicator), wherein the method comprises the following steps: acquiring transaction data of P + Q minutes before and after warning; filling missing values in the dimension combination of P + Q minutes according to the dimension combination of the alarm occurrence time and evaluating the data scale; obtaining abnormal contributions of all dimension combinations by adopting partial abnormal detection or global abnormal detection according to the evaluation data scale; wherein, part of the abnormal detection only detects the abnormal contribution of the leaf nodes, and the abnormal contribution of the upper node is obtained by adding the abnormal contributions of the lower node; global anomaly detection detects the anomaly contribution of all dimension combinations. The invention is an anomaly detection method irrelevant to the index meaning, fully considers the influence of derived measurement values, can give a uniform anomaly score when a plurality of indexes are abnormal simultaneously, supports dimensionality of more than 10 dimensions, and is a practical method.

Description

Multi-dimensional anomaly detection method and device based on KPI (Key Performance indicator) and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a multi-dimensional anomaly detection method and device based on KPI (Key Performance indicator) indexes and a storage medium.
Background
KPI (transaction amount, transaction success rate, web page access amount, etc.) and multidimensional attributes (such as source system, transaction type, transaction channel, etc.) are common and important business monitoring indexes in the financial industry. When the overall value of an index is abnormal, an operation and maintenance person wants to quickly and accurately locate the attribute combination of the root cause in a huge multi-dimensional search space, which is a great challenge for the traditional operation and maintenance. Although there are also some algorithms and systems that locate by machine learning, these methods are often not universal and reliable. Because they are all affected by unrealistic root assumptions, too violent pruning is performed; or only the basic type of indicators (transaction amount, etc.) are processed, and the derived measurement values are not processed (success rate, etc.); in addition, most of the existing methods require manual fine-tuning of parameters or are too slow.
At present, algorithms (systems) for multidimensional analysis of service indexes mainly include adopter, IDcie, Hotspot, Squeeze and the like. Most methods are mainly derived theoretically, and have a certain distance from the actual landing.
HotSpot and Squeeze assume that the predicted values are accurate, and then follow-up searching steps are carried out, which is difficult to achieve in reality, and the accuracy of prediction/abnormal detection directly determines the result of follow-up root cause analysis.
The adobber only assumes that the root is one-dimensional, which is not suitable for the current complex micro-service system. The result of the Adtributor is simply the one that remains the simplest according to the principle of the oldham razor.
IDice aims at the root cause positioning of a time sequence, the time point of an abnormality is not clear in advance, and the method is different from a scene, so that extra time cost is brought, meanwhile, IDice adopts a very violent pruning strategy to reduce a search space, and uses G L R (Generalized L ike-probability Ratio) to carry out abnormality detection, for example, nodes (support degrees) smaller than a certain threshold value are directly removed, so that pruning influences the root cause judgment of upper-layer nodes.
Although the adopter and the Squeeze can perform root cause positioning on the derivative indexes, the cross-index root cause sequencing cannot be realized.
In an actual application scene, the use of resources is affected by dimension change, value quantity change and data composition change, and the prior algorithm does not perform targeted processing on data with different orders of magnitude, so that the problems of memory overflow and the like are easily caused when the data volume is overlarge.
Disclosure of Invention
The invention aims to solve the problems and provides a multidimensional and proper pruning abnormity automatic detection method, and the technical scheme provided by the invention is a multidimensional abnormity detection method based on KPI indexes, which comprises the following steps: acquiring transaction data of P + Q minutes before and after warning; filling missing values in the dimension combination of P + Q minutes according to the dimension combination of the alarm occurrence time and evaluating the data scale; partial anomaly detection or global anomaly detection is adopted according to the scale of the evaluation data; wherein, part of the abnormal detection only carries out abnormal detection on leaf nodes, and the abnormal contribution of the upper node is obtained by adding the abnormal contributions of the lower node; the global anomaly detection detects anomaly contributions for all dimension combinations.
Preferably, the global anomaly detection includes the following steps:
s101, defining the feature type of the KPI single index;
s102, extracting a KPI characteristic value training set X of each point of all dimension combinations P + Q minutes of the single index, forming 1 binary tree by cutting and splitting each appointed characteristic value Q, and traversing each characteristic type by the appointed characteristic valueThen, t is generated1Binary tree, recorded as T1Wherein P represents a period of time before the warning, and Q represents a period of time after the warning;
s103, extracting KPI feature sets except the current dimension combination, and splitting to form t according to the cutting of S1022Binary tree, recorded as T2
S104, calculating the sub-node T of all dimension combinations under the index1And T2Average height c of1And c2
S105, calculating the abnormal contribution of any dimension combination to the single index;
and S106, when a plurality of indexes are abnormal, repeating S101-S105, and calculating the abnormal contribution of any dimension combination to the plurality of indexes.
Preferably, the feature type of the KPI single indicator in S101 includes at least one of the following features: mean, standard deviation, limit value, current dimension occurrence frequency, current dimension inverse text frequency index, first-order autocorrelation coefficient, linear intensity, curvature intensity, spectral entropy, residual variation standard deviation, number of intersection points, difference value with a front point, trend, periodicity and disorder.
Preferably, the specific cutting splitting mode of S102 is
S1021, extracting KPI characteristic values of each point of all dimension combinations P + Q minutes of the single index to form a training set X;
s1022. randomly extracting k sample points in a training set X to form a subset X of Xk
S1023, each time randomly from XkSpecifying a characteristic value q, and randomly generating a cutting point p;
s1024, sample points with characteristic values q smaller than p are placed into the left child nodes, and sample points with characteristic values q larger than or equal to p are placed into the right child nodes;
s1025, repeating S1024 at the left child node and the right child node, and stopping splitting when all leaf nodes have only one sample point or reach a specified number of layers to generate 1 binary tree;
s1026, after the specified characteristic value q traverses each characteristic type, generating t1Binary tree, recorded as T1
Preferably, in S105, the feature vectors of the child nodes combined in any dimension are respectively substituted into T1And T2Calculating the child node at T1And T2Average height h of1And h2Combined with the average height c in S1041And c2Defining the abnormal contribution of any dimension combination to the single index asIa:
Figure 417913DEST_PATH_IMAGE001
Preferably, the method for detecting partial anomaly includes L ightGBM, extreme value theory.
Preferably, the input source of the transaction data comprises an elastic search, kafka or csv file of a specified format.
Based on the same inventive concept, the invention further provides a multi-dimensional anomaly detection device based on KPI, comprising:
the warning module is used for warning that the current KPI index is abnormal;
the data reading module is used for reading the transaction data of P + Q minutes before and after warning;
the data preprocessing module is used for filling missing values of the dimension combination of P + Q minutes according to the dimension combination of the alarm occurrence time;
the data evaluation module is used for evaluating the current data scale;
the anomaly detection module comprises a partial anomaly detection module and a global anomaly detection module; wherein, the partial anomaly detection module only carries out partial anomaly detection on the leaf nodes, and the anomaly contribution of the upper-layer node is obtained by adding the anomaly contributions of the lower-layer nodes; the global anomaly detection module detects anomaly contributions of all dimension combinations.
The present invention further provides a computer-readable storage medium for storing a computer program for executing any one of the above methods for multi-dimensional abnormality detection based on KPI indicators.
The invention has the beneficial effects that:
(1) the dimensionality supported by the method is more than 10 dimensions, the typical analysis result exceeds 3 dimensions, and the method is a practical method, does not need to manually adjust parameters, and is high in speed.
(2) The invention relates to an abnormality detection method irrelevant to the meaning of indexes, which can give a uniform abnormality score when a plurality of indexes are abnormal simultaneously, such as transaction amount, success rate, response time and the like.
(3) The invention fully considers the influence of derived measured values such as success rate, and the result is more accurate.
Drawings
FIG. 1 is a schematic diagram of a portion of the anomaly detection method of the present invention;
FIG. 2 and FIG. 3 are schematic diagrams of the present invention using extreme value theory to detect partial anomalies;
FIG. 4 is a schematic diagram of a certain tree generated by the global anomaly detection method of the present invention;
fig. 5 is a root cause location method clustering diagram provided inembodiment 3 of the present invention;
FIG. 6 is an explanatory diagram of an information entropy search rule provided inembodiment 3 of the present invention;
FIG. 7 is a block diagram illustrating the surprise in the information entropy search rule according toembodiment 3 of the present invention;
FIG. 8 is a schematic diagram of MCTS pruning according to example 3 of the present invention;
FIG. 9 is a flowchart of a multi-dimensional anomaly detection method according toembodiment 3 of the present invention;
FIG. 10 is a flow chart of a multi-dimensional anomaly detection method provided by the present invention;
FIG. 11 is a flowchart of the overall anomaly detection steps of the multi-dimensional anomaly detection method provided by the present invention;
FIG. 12 is a flow chart of the cutting and splitting steps of the multi-dimensional anomaly detection method provided by the present invention.
Detailed Description
Specific embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While specific embodiments of the invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
The invention provides a multi-dimensional anomaly detection method based on KPI (Key Performance indicator), as shown in figure 10, the method comprises the following steps: acquiring transaction data of P + Q minutes before and after warning; filling missing values in the dimension combination of P + Q minutes according to the dimension combination of the alarm occurrence time and evaluating the data scale; partial anomaly detection or global anomaly detection is adopted according to the scale of the evaluation data; wherein, part of the abnormal detection only carries out abnormal detection on leaf nodes, and the abnormal contribution of the upper node is obtained by adding the abnormal contributions of the lower node; the global anomaly detection detects anomaly contributions for all dimension combinations.
In some alternative embodiments, the input source of the transaction data may include, but is not limited to, an elastic search, kafka, or csv file of a specified format.
As shown in fig. 11, the global anomaly detection can be implemented by, but not limited to, the following steps:
defining the feature type of the KPI single index; the feature types of the KPI single index include, but are not limited to, mean, standard deviation, extreme value, current dimension occurrence frequency, current dimension inverse text frequency index, first-order autocorrelation coefficient, linear intensity, curvature intensity, spectral entropy, residual variation standard deviation, number of intersection points, difference from previous point, trend, periodicity, and clutter.
S102, extracting KPI characteristic value set X of each point of all dimension combinations P + Q minutes of the single index, assigning a characteristic value Q each time, forming 1 binary tree by cutting and splitting, and generating t after the assigned characteristic value traverses each characteristic type1Binary tree, recorded as T1
As shown in fig. 12, the specific cutting and splitting step can be realized by the following processes:
s1021, extracting KPI characteristic value values of each point of all dimension combinations P + Q minutes of the single index to form a training set X;
s1022, randomly drawing k sample points in the training set X to form a subset X of Xk
S1023, randomly selecting X from each timekSpecifying a characteristic value q, and randomly generating a cutting point p;
s1024, sample points with the characteristics smaller than p are placed into the left child nodes, and sample points with the characteristics larger than or equal to p are placed into the right child nodes;
s1025, repeating S1024 at the left child node and the right child node, and stopping splitting when all leaf nodes have only one sample point or reach a specified number of layers to generate 1 binary tree;
s1026, after the specified characteristic value q traverses each characteristic type, generating t1Binary tree, recorded as T1
S103, extracting KPI feature sets except the current dimension combination, and splitting to form t according to the cutting of S1022Binary tree, recorded as T2
S104, calculating all dimension combinations under the index at T1And T2Average height c of1And c2
S105, calculating the abnormal contribution of any dimension combination to the single index;
preferably, in S105, the feature vectors of the child nodes combined in any dimension are respectively substituted into T1And T2Calculating the child node at T1And T2Average height h of1And h2Combined with the average height c in S1041And c2Defining the abnormal contribution of any dimension combination to the single index asIa:
Figure 915890DEST_PATH_IMAGE002
And S106, when a plurality of indexes are abnormal, repeating S101-S105, and calculating the abnormal contribution of any dimension combination to the plurality of indexes.
Based on the same inventive concept, the invention further provides a multi-dimensional anomaly detection device based on KPI, comprising:
the warning module is used for warning that the current KPI index is abnormal;
the data reading module is used for reading the transaction data of P + Q minutes before and after warning;
the data preprocessing module is used for filling missing values of the dimension combination of P + Q minutes according to the dimension combination of the alarm occurrence time;
the data evaluation module is used for evaluating the current data scale;
the anomaly detection module comprises a partial anomaly detection module and a global anomaly detection module; wherein, the partial anomaly detection module only carries out partial anomaly detection on the leaf nodes, and the anomaly contribution of the upper-layer node is obtained by adding the anomaly contributions of the lower-layer nodes; the global anomaly detection module detects anomaly contributions of all dimension combinations.
Embodiment 1 this embodiment provides an L ightGBM anomaly detection method based on KPI
The method comprises the following steps: acquiring transaction data of P + Q minutes before and after warning; filling missing values in the dimension combination of P + Q minutes according to the dimension combination of the alarm occurrence time and evaluating the data scale; partial anomaly detection is adopted according to the scale of the evaluation data; wherein, part of the abnormal detection only carries out abnormal detection on leaf nodes, and the abnormal contribution of the upper node is obtained by adding the abnormal contributions of the lower node; the global anomaly detection detects anomaly contributions for all dimension combinations.
The strategy of the anomaly detection of data of different scales is different, and for the condition that the dimension and the feature are small, as shown in fig. 1, partial anomaly detection is adopted for acceleration, the specific method is that only the leaf node (outer node) is subjected to the anomaly detection, the fraction of the anomaly contribution of an upper node (inner node) is obtained by adding the fractions of the anomaly contribution of a lower node, the training data is enough and relatively stable, the Machine resources are rich, the historical time sequence can be referred to a longer window, and the algorithm is not required to have complete interpretability, the partial anomaly detection method is L g gbm (L g gradient Boosting Machine algorithm), L g gbm adopts a Histogram algorithm, the idea is to discretize continuous floating point features into m discrete values, construct a Histogram with the width of m, traverse the training data, count the accumulation of each floating point in the Histogram, and when the feature selection is carried out, the Histogram segmentation statistics only needs to be divided into discrete values, the optimal discrete values, the traversal speed is better, and the distributed type memory consumption is better, and the distributed support is better.
Embodiment 2 this embodiment provides an extreme value theory anomaly detection method based on KPI index
The method comprises the following steps: acquiring transaction data of P + Q minutes before and after warning; filling missing values in the dimension combination of P + Q minutes according to the dimension combination of the alarm occurrence time and evaluating the data scale; partial anomaly detection is adopted according to the scale of the evaluation data; wherein, part of the abnormal detection only carries out abnormal detection on leaf nodes, and the abnormal contribution of the upper node is obtained by adding the abnormal contributions of the lower node; the global anomaly detection detects anomaly contributions for all dimension combinations.
The strategies of anomaly detection of data of different scales are different, and for the case that the dimensionality and the characteristics are small, as shown in fig. 1, partial anomaly detection is adopted for acceleration, the specific method is that anomaly detection is only carried out on leaf nodes (outer-layer nodes), and the scores of the anomaly contributions of upper-layer nodes (inner-layer nodes) are obtained by adding the scores of the anomaly contributions of lower-layer nodes. For some time sequences without visible regularity, as shown in fig. 2 and fig. 3, the method of adopting the fixed threshold has a better effect, and the fixed threshold is dynamically calculated by using an extreme value theory.
Embodiment 3 this embodiment provides a global anomaly detection method based on KPI indicators
When an alarm occurs in the financial system, the transaction detail data of P + Q minutes before and after the alarm is read as input data, wherein the data input source may be an elastic search, a kafka or a csv file with a specified format. And then, filling missing values of data at other times according to the dimension combination of the alarm occurrence time, and then evaluating the current data scale.
For multi-index data, with the increase of dimensionality and dimensionality values, leaf nodes have less data and only have 0 or 1 in extreme cases, and anomaly detection in such cases is extremely inaccurate. Therefore, a self-research algorithm based on 'influence' is adopted. The ring ratio reference can be made among different KPIs. The global anomaly detection algorithm is described in detail below:
as shown in fig. 9, the following features are extracted according to the history of the single index data, only part of common features are listed below, and part of features such as trend, periodicity, clutter and the like are added to KPI common features of different single indexes.
TABLE 1 KPI common characteristics
Figure 749854DEST_PATH_IMAGE003
Extracting all current detailed data of a certain index by using a sliding window, namely the characteristics of each point on all dimension combination time sequences (P + Q) are recorded as
Figure 700492DEST_PATH_IMAGE004
. For a given training set X, randomly extracting k sample points to form a subset X of XkEach time randomly from XkA feature value q is specified and a cut point p is randomly generated. This cut point p generates a hyperplane, dividing the current data space into two subspaces: sample points with dimensions smaller than p are designated to be placed in the left child node, and sample points with dimensions larger than or equal to p are designated to be placed in the right child node. Stopping splitting until all leaf nodes have only one sample point or reach a specified number of layers, generating T binary trees recorded as T1Fig. 4 shows an example of a generated tree sample.
Then extracting the combination except the current dimensionYFeature sets of other detailed data thanX-YRepeating the training steps to obtain T2。For the dimension combination needing abnormal detection, respectively substituting the feature vectors of the child nodes of the dimension combination into T1And T2Calculating the child node xiAt T1And T2Average height h of1And h2I.e., the degree of the tree, may also be referred to as the shortest path. All child nodes are at T1And T2Average height of (1) is denoted as c1And c2,c1And c2At T by each child or leaf node1And T2Is obtained by the average height weighted average of (1).
Defining the score of global influence or abnormal contribution of the abnormal under the index a to the index aIaComprises the following steps:
Figure 645315DEST_PATH_IMAGE005
when an abnormal accident occurs and a plurality of correlation indexes are abnormal, the average value of the influence of the correlation indexes is finally obtained, wherein the score of the abnormal contribution of each dimension combination is obtained.
As shown in fig. 5, clustering the PDF maps of scores of abnormal contributions, determining the order of subsequent searches and the selection of root causes, where the dimension combinations of different abnormal contributions are clustered into different clusters, each solid line represents a cluster center, and the clustering method is to find all maxima and minima in the abnormal score PDF maps. The ranges determined by the two minima adjacent to each maximum are grouped into a cluster.
The algorithm searches the root cause in the cluster with the largest cluster center and simulates the calculation of the information entropy to define the candidate root cause. When a dimension combination is a root, it will behave as follows: the information entropy is obviously larger than the information quotient of other dimension combinations of the same layer, and is larger than the node and all the child nodes of the layer above the same layer. Meanwhile, this is also part of our pruning, and when a dimension combination is found to satisfy the above conditions, the algorithm will not take all its child nodes as root candidate sets. While the algorithm considers both explanatory and surprise, i.e. whether the combination of dimensions can explain the change of the current overall KPI and whether the change is "surprised", as shown in fig. 6, the explanatory performance ofcombination 1 is higher than that ofcombination 2, socombination 1 is more likely to be the root cause, as shown in fig. 7, and the surprise ofcombination 2 is higher than that ofcombination 1, socombination 2 is more likely to be the root cause. Repeating the above process to find all candidate root cause sets.
The pruning of Volcano is a pruning strategy with improved MCTS (Monte Carlo Tree Search) as a main framework and multiple pruning parallels.
Pre-pruning: since the anomaly scores calculated by the anomaly detection algorithm built in Volcano are all summable, if a node anomaly score equals 0, it must not be the root cause. Pre-pruning the search tree in this manner can generally reduce more than 50% of the nodes.
Clustering and pruning: in the clustering algorithm, the maximum and minimum value clustering is carried out according to the PDF of the abnormal scores of the nodes, and the interior of each cluster is independently searched. The Volcano can configure the number of searched clusters and the upper limit of the number of root factors in each cluster according to the requirements of users to achieve the purpose of pruning.
MCTS pruning: and simulating search by using a sampling idea, then reversely propagating and updating the 'income' of each node, and selecting the node with the maximum 'income' to continuously search until the root is found. As shown in fig. 8, the dark dots represent the dots that have already been searched, and the light dots are the candidate nodes for the next search.
Two parameters, N and Q, are defined for each node. The former represents the number of times the node is accessed by simulation, and the latter represents the sum of the simulation benefits of the node, wherein the calculated anomaly detection score is used for representing the simulation benefits. Finally, the UCT (v) of each candidate node is calculatediV) value, selecting UCT (v)iV) (UCB for Tree, upper bound core Tree search) value as next search path, other nodes will be pruned, UCT (v)iV) the calculation formula is as follows:
Figure 997799DEST_PATH_IMAGE006
post pruning: after a candidate root cause is searched, its child nodes are pruned and are no longer treated as root causes. In order to deal with the actual situation, some special optimization is also performed on post pruning, for example, if the value of the current node is null, downward search is continued, and if the current node only has one direct point (1 to 1), downward search is continued, and the like.
After all candidate root cause sets are found, distributed similarity measurement is carried out on different dimensionality combinations, and JS divergence can be mainly used for KPIs (transaction amount, failure amount, response time and the like) according to different KPI indexes. Un-additive KPI (success rate, response rate, etc.) similarity was measured using Wasserstein (Watherstein distance). The purpose is to combine similar dimension combinations and simplify the result.
In a more preferred embodiment, a computer-readable storage medium is provided for storing a computer program for performing any of the above-described anomaly detection methods.
By analyzing a large amount of financial data, a global anomaly detection strategy is adopted to be different from most of the existing algorithms and devices. Most data in the financial industry are abnormal simultaneously by multiple indexes, so the embodiment is an abnormal detection method irrelevant to the index meaning. In the aspect of searching, a set of scalable searching schemes is used, the time efficiency and the space efficiency are flexibly switched to adapt to data with different sizes, and MCTS is introduced to prune and speed up the searching. Different from the previous 'top-down' search mode, the Volcano carries out 'bottom-up' clustering before searching, so that on one hand, the root cause search can be carried out more effectively, and on the other hand, the Volcano can be used as a pruning means to reduce the search space. Finally, the Volcano carries out similarity test on the results, can combine the results in indexes, and can solve the problem that multiple indexes are mutually included.
In a preferred embodiment, there is provided a KPI indicator-based multi-dimensional anomaly detection apparatus, comprising:
the warning module is used for warning that the current KPI index is abnormal;
the data reading module is used for reading the transaction data of P + Q minutes before and after the warning, and the input source of the transaction data can include but is not limited to an elastic search, kafka or csv files in a specified format;
the data preprocessing module is used for filling missing values of the dimension combination of P + Q minutes according to the dimension combination of the alarm occurrence time;
the data evaluation module is used for evaluating the current data scale;
the anomaly detection module comprises a partial anomaly detection module and a global anomaly detection module; wherein, the partial anomaly detection module only carries out partial anomaly detection on the leaf nodes, and the anomaly contribution of the upper-layer node is obtained by adding the anomaly contributions of the lower-layer nodes; the global anomaly detection module detects anomaly contributions of all dimension combinations.
Wherein, the global anomaly detection module comprises:
defining a submodule, and defining the characteristic type of the KPI single index;
extracting KPI characteristic value set of each point of all dimension combinations P + Q minutes of the single index, forming 1 binary tree by cutting and splitting after each characteristic value Q is appointed, and generating t after the appointed characteristic value traverses each characteristic type1Binary tree, recorded as T1
Extracting KPI characteristic set except the current dimension combination, and forming t according to the cutting division of the first extraction submodule2Binary tree, recorded as T2
Calculating all dimension combinations under the index at T1And T2Average height c of1And c2
Calculating the abnormal contribution of any dimension combination to the single index;
and when a plurality of indexes are abnormal, repeating the submodules and calculating the abnormal contribution of any dimension combination to the plurality of indexes.
In some optional embodiments, defining the feature type of the KPI single indicator in the sub-module comprises at least one of the following features: mean, standard deviation, limit value, current dimension occurrence frequency, current dimension inverse text frequency index, first-order autocorrelation coefficient, linear intensity, curvature intensity, spectral entropy, residual variation standard deviation, number of intersection points, difference value with a front point, trend, periodicity and disorder.
In some optional embodiments, the first extraction sub-module comprises:
extracting KPI characteristic value sets X of each point of all dimension combinations P + Q minutes of the single index to form a training set X;
randomly extracting k sample points in a training set X to form a subset X of Xk
Cutting point generating unit, each time random from XkSpecifying a characteristic value q, and randomly generating a cutting point p;
appointing a sample point with a characteristic value smaller than p to be placed into a left child node, and a sample point with a characteristic value larger than or equal to p to be placed into a right child node;
repeating the feature processing unit at the left child node and the right child node, and stopping splitting when all leaf nodes have only one sample point or reach a specified number of layers to generate 1 binary tree;
after the appointed characteristic value traverses each characteristic type, t is generated1Binary tree, recorded as T1
In some optional embodiments, the second computation submodule specifically substitutes the feature vectors of the child nodes of any dimension combination into T respectively1And T2Calculating the child node at T1And T2Average height h of1And h2Combined with average height c in S41And c2Defining the abnormal contribution of any dimension combination to the single index asIa:
Figure 2664DEST_PATH_IMAGE007
The embodiment of the invention provides a multi-dimensional anomaly detection method and device based on KPI (Key performance indicator): the supported dimensionality is more than 10 dimensions, the typical analysis result exceeds 3 dimensions, and the method is a set of completely practical and production-verified method; the method is an abnormality detection method irrelevant to the meaning of indexes, and can give a uniform abnormality score when a plurality of indexes are abnormal simultaneously, such as transaction amount, success rate, response time and the like; the influence of derived measurement values such as success rate is fully considered, and the result is more accurate.
In a more preferred embodiment, a computer-readable storage medium is provided for storing a computer program for performing any of the above-described anomaly detection methods.
By analyzing a large amount of financial data, a global anomaly detection strategy is adopted to be different from most of the existing algorithms and devices. Most data in the financial industry are abnormal simultaneously by multiple indexes, so the embodiment is an abnormal detection method irrelevant to the index meaning.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that although the present specification describes the embodiments, the above-mentioned embodiments are exemplary and not intended to limit the scope of the present invention, and any changes, modifications, substitutions and alterations made by those skilled in the art without departing from the principle and spirit of the present invention shall be included in the scope of the present invention.

Claims (9)

1. A multi-dimensional anomaly detection method based on KPI indexes comprises the following steps:
acquiring transaction data of P + Q minutes before and after warning;
filling missing values in the dimension combination of P + Q minutes according to the dimension combination of the alarm occurrence time and evaluating the data scale;
obtaining abnormal contributions of all dimension combinations by adopting partial abnormal detection or global abnormal detection according to the data scale;
wherein, part of the abnormal detection only detects the abnormal contribution of the leaf nodes, and the abnormal contribution of the upper node is obtained by adding the abnormal contributions of the lower node; the global anomaly detection detects anomaly contributions for all dimension combinations.
2. A KPI indicator-based multi-dimensional anomaly detection method according to claim 1, characterized in that: the global anomaly detection comprises the following steps:
s101, defining the feature type of the KPI single index;
s102, extracting a KPI characteristic value training set X of each point of all dimension combinations P + Q minutes of the single index, forming 1 binary tree by cutting and splitting after each characteristic value Q is appointed, and generating t after the appointed characteristic value traverses each characteristic type1Binary tree, recorded as T1
S103, extracting KPI feature sets except the current dimension combination, and splitting to form t according to the cutting of S1022Binary tree, recorded as T2
S104, calculating the sub-node T of all dimension combinations X under the index1And T2Average height c of1And c2
S105, calculating the abnormal contribution of any dimension combination to the single index;
and S106, when a plurality of indexes are abnormal, repeating S101-S105, and calculating the abnormal contribution of any dimension combination to the plurality of indexes.
3. A KPI indicator-based multi-dimensional anomaly detection method according to claim 2, characterized in that: the feature type of the KPI single index in S101 comprises at least one of the following features: mean, standard deviation, limit value, current dimension occurrence frequency, current dimension inverse text frequency index, first-order autocorrelation coefficient, linear intensity, curvature intensity, spectral entropy, residual variation standard deviation, number of intersection points, difference value with a front point, trend, periodicity and disorder.
4. A KPI indicator-based multi-dimensional anomaly detection method according to claim 2, characterized in that: s102 the specific cutting and splitting mode is as follows:
s1021, extracting KPI characteristic value values of each point of all dimension combinations P + Q minutes of the single index to form a training set X;
s1022. randomly extracting k sample points in a training set X to form a subset X of Xk
S1023, each time randomly from XkSpecifying a characteristic value q, and randomly generating a cutting point p;
s1024, sample points with characteristic values smaller than p are placed into the left child nodes, and sample points with characteristic values larger than or equal to p are placed into the right child nodes;
s1025, repeating S1024 at the left child node and the right child node, and stopping splitting when all leaf nodes have only one sample point or reach a specified number of layers to generate 1 binary tree;
s1026, after the specified characteristic value q traverses each characteristic type, generating t1Binary tree, recorded as T1
5. A KPI indicator-based multi-dimensional anomaly detection method according to claim 2, characterized in that: in S105, the feature vectors of the child nodes of the dimension combination are respectively substituted into T1And T2Calculating the child node at T1And T2Average height h of1And h2Combined with average height c in S0141And c2Defining the abnormal contribution of any dimension combination to the single index asIa:
Figure DEST_PATH_IMAGE001
6. The method as claimed in claim 1, wherein the partial anomaly detection method comprises L ightGBM, extremum theory.
7. A KPI indicator-based multi-dimensional anomaly detection method according to claim 1, characterized in that: the input source of the transaction data includes an elastic search, kafka, or csv file of a specified format.
8. A multi-dimensional abnormality detection apparatus based on KPI indicators, comprising:
the warning module is used for warning that the current KPI index is abnormal;
the data reading module is used for reading the transaction data of P + Q minutes before and after warning;
the data preprocessing module is used for filling missing values of the dimension combination of P + Q minutes according to the dimension combination of the alarm occurrence time;
the data evaluation module is used for evaluating the current data scale;
the anomaly detection module comprises a partial anomaly detection module and a global anomaly detection module; wherein, the partial anomaly detection module only carries out partial anomaly detection on the leaf nodes, and the anomaly contribution of the upper-layer node is obtained by adding the anomaly contributions of the lower-layer nodes; the global anomaly detection module detects anomaly contributions of all dimension combinations.
9. A computer-readable storage medium for storing a computer program for executing the KPI indicator-based multi-dimensional abnormality detection method according to any one of claims 1 to 7.
CN202010551259.0A2020-06-172020-06-17Multi-dimensional anomaly detection method and device based on KPI (Key Performance indicator) and storage mediumActiveCN111506637B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010551259.0ACN111506637B (en)2020-06-172020-06-17Multi-dimensional anomaly detection method and device based on KPI (Key Performance indicator) and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010551259.0ACN111506637B (en)2020-06-172020-06-17Multi-dimensional anomaly detection method and device based on KPI (Key Performance indicator) and storage medium

Publications (2)

Publication NumberPublication Date
CN111506637Atrue CN111506637A (en)2020-08-07
CN111506637B CN111506637B (en)2020-11-27

Family

ID=71870674

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010551259.0AActiveCN111506637B (en)2020-06-172020-06-17Multi-dimensional anomaly detection method and device based on KPI (Key Performance indicator) and storage medium

Country Status (1)

CountryLink
CN (1)CN111506637B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112446647A (en)*2020-12-142021-03-05上海众源网络有限公司Abnormal element positioning method and device, electronic equipment and storage medium
CN112929363A (en)*2021-02-042021-06-08北京字跳网络技术有限公司Root cause analysis method and equipment for video field performance parameter abnormity
CN113179179A (en)*2021-04-222021-07-27南京大学Algorithm for positioning service calling success rate index abnormal clue
CN113204590A (en)*2021-05-312021-08-03中国人民解放军国防科技大学Unsupervised KPI (Key performance indicator) anomaly detection method based on serialization self-encoder
CN113282876A (en)*2021-07-202021-08-20中国人民解放军国防科技大学Method, device and equipment for generating one-dimensional time sequence data in anomaly detection
CN115439120A (en)*2022-09-062022-12-06连通(杭州)技术服务有限公司Method and equipment for checking abnormal reason of transaction message
CN117170995A (en)*2023-11-022023-12-05中国科学院深圳先进技术研究院Performance index-based interference anomaly detection method, device, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108073497A (en)*2018-01-292018-05-25上海洞识信息科技有限公司A kind of multi objective unusual fluctuation analysis method based on data center's data acquisition platform
CN109947760A (en)*2017-07-262019-06-28华为技术有限公司 A kind of method and device for mining KPI root cause
CN109992479A (en)*2019-03-312019-07-09西安电子科技大学 A method, device and computer equipment for locating abnormality in multi-dimensional KPI data
CN110427278A (en)*2019-07-312019-11-08中国工商银行股份有限公司Method for detecting abnormality and device
CN110955575A (en)*2019-11-142020-04-03国网浙江省电力有限公司信息通信分公司 A business system fault location method based on correlation analysis model
CN111064614A (en)*2019-12-172020-04-24腾讯科技(深圳)有限公司Fault root cause positioning method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109947760A (en)*2017-07-262019-06-28华为技术有限公司 A kind of method and device for mining KPI root cause
CN108073497A (en)*2018-01-292018-05-25上海洞识信息科技有限公司A kind of multi objective unusual fluctuation analysis method based on data center's data acquisition platform
CN109992479A (en)*2019-03-312019-07-09西安电子科技大学 A method, device and computer equipment for locating abnormality in multi-dimensional KPI data
CN110427278A (en)*2019-07-312019-11-08中国工商银行股份有限公司Method for detecting abnormality and device
CN110955575A (en)*2019-11-142020-04-03国网浙江省电力有限公司信息通信分公司 A business system fault location method based on correlation analysis model
CN111064614A (en)*2019-12-172020-04-24腾讯科技(深圳)有限公司Fault root cause positioning method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YONGQIAN SUN等: "HotSpot: Anomaly Localization for Additive KPIs", 《IEEE ACCESS》*

Cited By (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112446647A (en)*2020-12-142021-03-05上海众源网络有限公司Abnormal element positioning method and device, electronic equipment and storage medium
CN112929363A (en)*2021-02-042021-06-08北京字跳网络技术有限公司Root cause analysis method and equipment for video field performance parameter abnormity
CN112929363B (en)*2021-02-042022-05-17北京字跳网络技术有限公司Root cause analysis method and equipment for video field performance parameter abnormity
CN113179179A (en)*2021-04-222021-07-27南京大学Algorithm for positioning service calling success rate index abnormal clue
CN113179179B (en)*2021-04-222023-01-06南京大学Method for positioning clue of abnormal success rate index of service call
CN113204590A (en)*2021-05-312021-08-03中国人民解放军国防科技大学Unsupervised KPI (Key performance indicator) anomaly detection method based on serialization self-encoder
CN113204590B (en)*2021-05-312021-11-23中国人民解放军国防科技大学Unsupervised KPI (Key performance indicator) anomaly detection method based on serialization self-encoder
CN113282876A (en)*2021-07-202021-08-20中国人民解放军国防科技大学Method, device and equipment for generating one-dimensional time sequence data in anomaly detection
CN115439120A (en)*2022-09-062022-12-06连通(杭州)技术服务有限公司Method and equipment for checking abnormal reason of transaction message
CN115439120B (en)*2022-09-062025-08-08连通(杭州)技术服务有限公司 Method and device for troubleshooting abnormal causes of transaction messages
CN117170995A (en)*2023-11-022023-12-05中国科学院深圳先进技术研究院Performance index-based interference anomaly detection method, device, equipment and medium
CN117170995B (en)*2023-11-022024-05-17中国科学院深圳先进技术研究院Performance index-based interference anomaly detection method, device, equipment and medium

Also Published As

Publication numberPublication date
CN111506637B (en)2020-11-27

Similar Documents

PublicationPublication DateTitle
CN111506637B (en)Multi-dimensional anomaly detection method and device based on KPI (Key Performance indicator) and storage medium
CN111444247B (en)Root cause positioning method, root cause positioning device and storage medium based on KPI (key performance indicator)
JP7090936B2 (en) ESG-based corporate evaluation execution device and its operation method
Revathy et al.Comparative analysis of C4. 5 and C5. 0 algorithms on crop pest data
US20210397956A1 (en)Activity level measurement using deep learning and machine learning
Nikolaou et al.Detection of early warning signals in paleoclimate data using a genetic time series segmentation algorithm
US10387805B2 (en)System and method for ranking news feeds
KR102697269B1 (en)Device and method for analyzing and visualizing big data
CN106055613A (en)Cleaning method for data classification and training databases based on mixed norm
CN116737727B (en)Stock transaction data column type storage method and server based on tree structure
SethupathiM et al.Efficient rainfall prediction and analysis using machine learning techniques
CN113094567A (en)Malicious complaint identification method and system based on text clustering
CN116841779A (en)Abnormality log detection method, abnormality log detection device, electronic device and readable storage medium
Dove et al.A user‐friendly guide to using distance measures to compare time series in ecology
CN108846128B (en)Cross-domain text classification method based on adaptive noise reduction encoder
KR102697312B1 (en)Device and method for analyzing and visualizing big data by proposing blocks
CN104331507B (en)Machine data classification is found automatically and the method and device of classification
Gabdrakhmanova et al.The modeling of forecasting new situations in the dynamics of the economic system on the example of several financial indicators
Belgaum et al.Comparative analysis of Start-up Success Rate Prediction Using Machine Learning Techniques
CN110175191B (en)Modeling method for data filtering rule in data analysis
Yu et al.Network inference and change point detection for piecewise-stationary time series
CN119808794B (en) A big data intelligent analysis method and system based on AI
MeghanathanCorrelation Analysis between Maximal Clique Size and Centrality Metrics for Random Networks and Scale-Free Networks.
Bafna et al.Novel Clustering approach for Feature selection
He et al.Tracking Differentiator-based Multiview Dilated Characteristics for Time Series Classification

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp