Disclosure of Invention
The present invention is directed to solve at least one of the problems in the background art and provides a time series exception handling method, a time series exception handling apparatus, an electronic device, and a computer-readable storage medium.
In order to achieve the above object, the present invention provides a method for processing time series exception, comprising the following steps:
acquiring time sequence data, training the time sequence data, and constructing a model;
detecting whether abnormal data exist in the time sequence data obtained in real time according to the model, and if so, recommending part of the abnormal data;
judging whether the recommended part of abnormal data is reasonable or not, and then feeding back a judgment result;
and optimizing the model according to the judgment result, and then continuously detecting the real-time sequence data.
According to one aspect of the invention, acquiring time series data comprises acquiring regular small-scale time series data and irregular large-scale time series data, clustering all time series data when acquiring irregular large-scale time series data, and then training various types of time series data to construct a model.
According to one aspect of the invention, the clustering process is to capture the correlation among the time sequence data to be trained through DBSCAN, and cluster the data with approximate shape and consistent periodicity.
According to an aspect of the present invention, in the clustering process, in calculating the approximation degree of the time-series data, the distance between the time-series data is calculated using DTW.
According to one aspect of the invention, according to the type of the time sequence data, feature data capable of representing the corresponding type of the time sequence data is selected for training, and a model is constructed.
According to one aspect of the invention, RRCF is adopted to select all the feature data for training, all the feature data are iterated to obtain a plurality of decision trees, the decision trees form a decision forest, and then whether abnormal data exist in the real-time sequence data is determined through voting of the decision forest.
According to one aspect of the invention, when constructing the decision tree, the RRCF selects a segmentation dimension for segmenting the feature data when constructing the decision tree, and the RRCF has a probability of selecting the feature data as
gi=max
x∈Sx
j-x
j-1(ii) a Where i is the characteristic data, p
iRepresenting the probability of the feature i being selected, the probability value being between 0 and 1; l
iRepresenting the difference between the maximum value and the minimum value of the characteristic i in a training sample set and in a characteristic set obtained by calculation; gi represents the maximum difference between two adjacent characteristic values in the characteristic set obtained by calculation after the characteristic i is sorted according to the characteristic size in the training sample set; sigma g
jRepresenting g calculated for each feature dimension j
jThe summation ∑ l
jRepresents l calculated for each feature dimension j
jAnd (6) summing.
According to one aspect of the invention, the RRCF equally divides the feature data in the slicing dimension into N intervals [ l [ ]
0,h
0,l
1,h
1,...,l
N-1,h
N-1]And calculating the density d of each interval
i=Count(p,p∈[l
i,h
i]) Wherein the probability that each of the intervals is selected is
Finally randomly selecting a cutting point X from the selected interval
i~Uniform[l
i,h
i](ii) a Wherein l-0 and h-N-1 respectively represent the minimum value of the characteristic in the characteristic dimension solved for the training set, h-N-1 represents the maximum value of the characteristic, the difference between the minimum value and the maximum value is divided by N, and the N intervals are equally divided.
According to one aspect of the present invention, when the abnormal data exists, the abnormal score codip of the abnormal data is calculated by using the dividing point, and when the abnormal score codip is calculated, the ratio codip of the number of the abnormal data contained in the sibling subtree and the father subtree of the dividing point is calculated
NodeSelecting the largest ratio CoDisp
NodeAbnormal data x
iIs an abnormality score of
According to one aspect of the invention, the recommending part of the abnormal data is to select a plurality of most abnormal segments in the abnormal data, and recommend after obtaining labels of the plurality of segments; or
Recommending partial abnormal data by selecting a plurality of uncertain segments in the abnormal data and recommending after obtaining labels of the segments; or
And the recommendation of the abnormal data of the part is to divide the abnormal data into a plurality of groups according to the abnormal scores, obtain a plurality of fragments in each group, and recommend after obtaining the labels of the fragments.
According to one aspect of the invention, after the abnormal data of n labeled segments are obtained by the model, the abnormal data and M decision trees in the decision forest of the model jointly form an abnormal score matrix codip _ M [ x [ [ x ])i][treej]For each exception data xiIf the feedback judgment result is true positive, the decision tree isjHas a weight of twj=twj+δ×CoDisp_M[xi][treej]And selecting a decision tree with higher weight according to the feedback judgment result so as to optimize the model.
In order to achieve the above object, the present invention further provides a time-series exception handling apparatus, including:
the data processing module is used for acquiring time series data, training the time series data and constructing a model;
the abnormal data detection recommending module detects whether abnormal data exist in the time sequence data obtained in real time according to the model, and if the abnormal data exist, part of the abnormal data are recommended;
the abnormal data judgment feedback module judges whether the part of abnormal data is reasonable or not and then feeds back a judgment result;
and the model optimization module optimizes the model according to the feedback judgment result and then continuously detects the real-time sequence data.
According to an aspect of the invention, further comprising:
and the data classification processing module is used for acquiring irregular large-scale time sequence data, clustering all the time sequence data, training various time sequence data and constructing a model.
In order to achieve the above object, the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored in the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the above time-series exception handling method.
To achieve the above object, the present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the above time-series exception handling method.
According to one scheme of the invention, as the number of time sequences to be monitored in a production environment is extremely large, each production unit can generate dozens or even hundreds of monitoring index data, the index data need to be monitored completely, if the time sequences are trained respectively in a targeted manner, the number of models and consumed resources are extremely large, and the existing operation and maintenance resources are difficult to support. Therefore, before the targeted training stage of the index data, the data are clustered, so that the detection processing time can be greatly reduced, and the abnormity can be quickly and accurately processed.
According to one scheme of the invention, a characteristic data selection stage is provided, and more appropriate characteristic data are extracted in a targeted manner according to the statistical information and characteristics of indexes, so that the accuracy of the model is improved.
According to one scheme of the invention, the most abnormal 30 segments are selected, and the labels of the abnormal segments are acquired, so that the explicit abnormality can be further confirmed, and the false positive rate can be reduced.
According to one scheme of the invention, 30 most uncertain segments (namely around the vicinity of an abnormality judgment threshold) are selected, and the labels can help the model to clearly classify boundaries, so that the identification accuracy of fuzzy abnormalities is improved.
According to one aspect of the invention, the abnormal data is divided into 10 groups according to the abnormal scores, each group obtains at most 3 segments, and the labels can capture attitudes of the judgment feedback module on different abnormal judgment conditions, so as to help the model determine the optimal threshold value selection range.
According to one scheme of the invention, the invention provides an unsupervised, white-box and accurate time series exception handling method which is matched with active learning and can actively and efficiently collect feedback information. On the basis of a traditional unsupervised learning frame, an active learning stage is introduced, abnormality is actively recommended to a judgment feedback part (such as a judgment feedback module or operation and maintenance personnel) and feedback is acquired, so that a model is corrected, and the accuracy is improved. The method reserves the advantages of the traditional unsupervised learning in the aspects of parameter adjustment and marking, designs the application strategy of marking feedback in a targeted manner, and further optimizes the recall rate, the detection speed and the capacity of the model.
According to one scheme of the invention, the processing method has no obvious bias on data, can adapt to indexes with specific scene semantics, can meet the operation and maintenance requirements in the field of non-traditional Internet, has higher expandability and universality, and can give specific abnormal reasons for the given abnormal result.
According to one aspect of the present invention, the present invention is able to accurately detect and interpret anomalies, testing on 1 public data set and 2 time series data of a commercial bank's actual production environment, ultimately reaching F1-score of 0.81 and 0.89 on both data sets. Compared with the traditional unsupervised exception handling method, the best F1-score is improved by 0.19-0.5 on two data sets, and the detection time is shortened by 58%.
Detailed Description
The content of the invention will now be discussed with reference to exemplary embodiments. It is to be understood that the embodiments discussed are merely intended to enable one of ordinary skill in the art to better understand and thus implement the teachings of the present invention, and do not imply any limitations on the scope of the invention.
As used herein, the term "include" and its variants are to be read as open-ended terms meaning "including, but not limited to. The term "based on" is to be read as "based, at least in part, on". The terms "one embodiment" and "an embodiment" are to be read as "at least one embodiment".
In view of the above-described drawbacks of the prior art in the background art, the present invention provides a batch task time monitoring method, which can predict the time of a batch task and detect an abnormality of the batch task, and update a task model or generate an alarm according to the prediction and detection results.
FIG. 1 schematically shows a flow diagram of a method for time series exception handling according to one embodiment of the present invention. As shown in fig. 1, a time-series exception handling method according to an embodiment of the present invention includes the following steps:
a. acquiring time sequence data, training the time sequence data, and constructing a model;
b. detecting whether abnormal data exist in the time sequence data obtained in real time according to the model, and if so, recommending part of the abnormal data;
c. judging whether the recommended part of abnormal data is reasonable or not, and then feeding back a judgment result;
d. and optimizing the model according to the judgment result, and then continuously detecting the real-time sequence data.
In practice, the time series data may be represented by x, where x is { x ═ x1,x2,...,xNN is the length of data x, data point x at any time ttIs a specific data value. The time series may be collected from many sources, such as networks, transaction links, request logs, and the like. Same sourceHave a greater probability of having similar characteristics.
Because the number of time sequences to be monitored in a production environment is extremely large, each production unit can generate dozens or even hundreds of monitoring index data, the index data need to be monitored completely, if the time sequences are trained respectively in a targeted manner, the number of models and consumed resources are extremely large, and the existing operation and maintenance resources are difficult to support. Therefore, before the targeted training stage of the index data, the data are clustered, so that the detection processing time can be greatly reduced, and the abnormity can be quickly and accurately processed.
Specifically, according to an embodiment of the present invention, in the step a, in the clustering stage, the algorithm uses DBSCAN to capture the association relationship between the timing indexes to be trained, and clusters the indexes with similar shapes and consistent periodicity. In calculating the index similarity, distance between indexes is calculated using dtw (dynamic Time warping). The DBSCAN does not need to provide the predefined category information, and can control the clustering accuracy by adjusting the clustering radius, so the DBSCAN is very suitable for index clustering scenes.
Figure 2 schematically shows an approximate index map of the same switch acquisition. As shown in fig. 2, two network traffic curves for different ports of the same switch exhibit substantially the same trend and scale. In an actual production environment, the same type of data under the same monitoring unit also has the clustering characteristic, and by utilizing the characteristic, the number of models generated in a model training stage can be greatly reduced, consumed resources are reduced, and the cost performance of an operation and maintenance tool is improved. In addition, the number of data in a part of scenes is small, the accuracy requirement is high, and the cost performance of the single training data is higher than that of pre-clustering at the moment, so that the clustering stage is taken as an optional step.
As can be seen from the above, in the present invention, acquiring time series data includes acquiring regular small-scale time series data and acquiring irregular large-scale time series data, and the acquisition of regular small-scale time series data is only performed by direct training, while the acquisition of irregular large-scale time series data requires clustering, and then training various types of time series data, and then constructing a model.
According to one embodiment of the invention, according to the type of the time sequence data, feature data capable of representing the corresponding type of the time sequence data is selected for training, and a model is constructed. The time series data have different characteristics. For example, percentage type sequence data tends to exhibit a horizontal state with short dips or spikes in failure; transaction sequence data related to services often show periodic peaks/valleys, and a small amount of fluctuation occurs in the case of failure; exchanging infrastructure sequence data such as space, there may be a process that slowly rises over time. Therefore, the invention provides a characteristic data selection stage, and according to the statistical information and characteristics of the indexes, more suitable characteristic data are extracted in a targeted manner, so that the model accuracy is improved. The specific extraction rules are shown in the following table:
TABLE 2
In this embodiment, table 2 contains simple and effective feature data that can cover the different features of most curves, and is easy to calculate and performs well.
According to one embodiment of the invention, RRCF is adopted to select all feature data for training, all feature data are iterated to obtain a plurality of decision trees, the decision trees form a decision forest, and then whether abnormal data exist in real-time sequence data is determined through voting of the decision forest.
When the decision tree is constructed, the RRCF selects the segmentation dimension for segmenting the feature data, and the probability of the RRCF selecting the feature data under the segmentation dimension is
g
i=max
x∈Sx
j-x
j-1(ii) a Where i is the characteristic data, p
iRepresenting the probability, that the feature i is selectedThe value is between 0 and 1; l
iRepresenting the difference between the maximum value and the minimum value of the characteristic i in a training sample set and in a characteristic set obtained by calculation; gi represents the maximum difference between two adjacent characteristic values in the characteristic set obtained by calculation after the characteristic i is sorted according to the characteristic size in the training sample set; sigma g
jRepresenting g calculated for each feature dimension j
jThe summation ∑ l
jRepresents l calculated for each feature dimension j
jAnd (6) summing.
Specifically, the unsupervised anomaly detection basic algorithm selected by the invention is RRCF (robust Random Cut forest), the detection effect of the unsupervised anomaly detection basic algorithm is better than that of other unsupervised anomaly detection algorithms, and a certain difference exists between the accuracy of the unsupervised anomaly detection basic algorithm and the accuracy of the unsupervised anomaly detection basic algorithm used when the vehicle is actually landed. The RRCF trains all training sample feature data in batches, each batch of feature data is subjected to multiple rounds of iteration to obtain a decision tree, and all decision trees finally form a decision forest and decide whether the training sample feature data are abnormal or not through voting. In the process of constructing the decision tree, feature segmentation needs to be selected from multiple dimensions of feature data. The RRCF considers that the segmentation is carried out on the dimension with larger coverage data range, the distinguishing effect of the sample is better, namely the probability that the feature i is selected

l
i=max
x∈Sx
i-min
x∈Sx
iWherein Si represents the probability of the feature i being selected, li represents the difference between the maximum value and the minimum value in the feature i, S represents the training sample set, and x
iRepresenting the value of the feature i calculated for one sample in S. But this does not take into account the effect of the distribution of the dimensions themselves. According to an embodiment of the invention, when a decision tree is constructed and the dimension for cutting branches is selected, in addition to considering the coverage range of data of the dimension, the extreme difference of the data is used as an influence factor, namely, the probability of selecting the characteristic i is selected by the invention
Wherein g is
i=max
x∈Sx
j-x
j-1. Thus, the larger the maximum spacing of the data distribution in each dimension,the degree of discrimination provided by segmentation at the interval is higher, so that segmentation dimensionality is selected more effectively, and model accuracy is improved.
Further, when a decision tree is constructed, after each iteration determines a segmentation dimension, a suitable boundary point needs to be selected on data of the dimension, and left and right subtrees are divided according to the boundary point. After the RRCF equally divides the dimension data, a dividing point is randomly selected, and the distribution characteristics of the dimension are not considered. According to one embodiment of the invention, the RRCF equally divides the feature data in the segmentation dimension into N intervals l
0,h
0,l
1,h
1,...,l
N-1,h
N-1]And calculating the density d of each interval
i=Count(p,p∈[l
i,h
i]) Wherein the probability that each interval is selected is
Finally randomly selecting a cutting point X from the selected interval
i~Uniform[l
i,h
i]. Wherein l-0 and h-N-1 respectively represent the minimum value of the characteristic in the characteristic dimension solved for the training set, h-N-1 represents the maximum value of the characteristic, the difference between the minimum value and the maximum value is divided by N, and the N intervals are equally divided. For example, the left and right endpoints of the ith interval are l
iAnd h
i. The selection strategy can identify the sparse part of the segmentation dimension more accurately, so that the discrimination is improved. In the present embodiment, d
iThe density of the intervals is represented, and refers to the number of samples in the range. Since the spacing widths are the same, the greater the number of samples, the greater the density. Count represents the Count, p represents each sample falling in the interval, i.e. [ l ] is counted
i,h
i]Number of samples in the interval range. Uniform [ l
i,h
i]Represents the interval of pair l
i,h
iMake normalization, X
iIs a randomly selected segmentation point in the normalized interval.
Further, when abnormal data exists, an abnormal score codip of the abnormal data is calculated using the dividing point (specific node), and when the abnormal score codip is calculated, the sibling subtree and father of the dividing point are calculatedProportion CoDisp of abnormal data quantity contained in subtree
NodeThe higher the ratio, the higher the outlier degree of the outlier data. Since the calculation process of each abnormal data involves a plurality of characteristic data, the model is gradually moved upwards from the initial node for detection, and after repeated multiple iterations, the largest proportion CoDisp is selected
NodeAbnormal data x
iIs an abnormality score of
Abnormal score CoDisp
xiMeans x
iThe calculated degree of abnormality is sampled. First, x
iA leaf sample in the decision tree is dropped, and the algorithm searches upwards from the leaf until a branch Node is found, and the sample size of the sub-tree represented by the Node is far smaller than that of the sibling sub-tree thereof. Final sample x
iThe Codisp of (1) is the average value of the Codisp of the Node nodes corresponding to the sample in each tree in the whole forest. In the present embodiment, the largest ratio codip is selected
NodeConsidering the depth at which the node is located, deeper nodes in the tree are more normal. Thus find the demarcation point of the sample where x
iThe subtree is isolated from other large samples and is more representative.
Further, in the step b, recommending part of abnormal data as a plurality of most abnormal segments in the selected abnormal data, and recommending after obtaining labels of the plurality of segments; or
Recommending partial abnormal data by selecting a plurality of uncertain segments in the abnormal data, and recommending after obtaining labels of the plurality of segments; or
And recommending part of abnormal data, namely segmenting the abnormal data into a plurality of groups according to the abnormal scores, acquiring a plurality of fragments in each group, and recommending after acquiring the labels of the fragments.
3-5 schematically show three different anomaly fragment proactive recommender diagrams. As shown in fig. 3, according to an embodiment of the present invention, the scheme a selects the most abnormal 30 segments, and the labels of these abnormal segments can further affirm the explicit abnormality and reduce the false positive rate.
According to another embodiment of the invention, as shown in fig. 4, the scheme B selects the most uncertain 30 segments (i.e., around the anomaly determination threshold), and these labels can help the model to clearly classify the boundary, thereby improving the identification accuracy of the fuzzy anomaly.
As shown in fig. 5, according to the third embodiment of the present invention, the solution C divides the abnormal data into 10 groups according to the abnormal score, each group obtains at most 3 segments, and these labels can capture, for example, attitudes of the judgment feedback module on different abnormal judgment conditions, thereby helping the model determine the optimal threshold selection range.
In experiments disclosing data sets, the F1-score for protocol a was higher than the other two protocols, but each of the other two protocols possessed specific applicable scenarios.
Furthermore, the invention improves the processing efficiency of the model in the online detection stage through various technologies, and enables the model to have the capability of dynamic adjustment according to the feedback of the user. In the on-line detection stage, only the extreme abnormal value is selected as the automatic model feedback data to dynamically adjust the RRCF model, so that the model updating frequency is reduced, and the detection performance is improved. According to an embodiment of the invention, after the abnormal data of n labeled segments are obtained by the model, the abnormal data and M trees in the decision forest of the model jointly form an abnormal score matrix codip _ M [ x ]i][treej]For each exception data xiIf the user marks true sun, tree is usedjWeight tw ofj=twj+δ×CoDosp_M[xi][treej]. The self-correction of the model is fed back, so that the model can be helped to screen out decision trees with higher quality, the decision trees have higher weight in later-stage abnormal judgment, and the decision trees with higher weight are selected, so that the model is optimized, and the influence on the detection result is improved.
Furthermore, the present invention provides a time-series exception handling apparatus for implementing the time-series exception handling method, as shown in fig. 6, the apparatus including:
the data processing module is used for acquiring time series data, training the time series data and constructing a model;
the abnormal data detection recommending module detects whether abnormal data exist in the time sequence data obtained in real time according to the model, and if the abnormal data exist, part of the abnormal data are recommended;
the abnormal data judgment feedback module judges whether the part of abnormal data is reasonable or not and then feeds back a judgment result;
and the model optimization module optimizes the model according to the feedback judgment result and then continuously detects the real-time sequence data.
According to an embodiment of the present invention, further comprising:
and the data classification processing module is used for acquiring irregular large-scale time sequence data, clustering all the time sequence data, training various time sequence data and constructing a model.
In the invention, the data processing module acquires time sequence data, including acquiring regular small-scale time sequence data and irregular large-scale time sequence data, and when acquiring irregular large-scale time sequence data, all the time sequence data are clustered, and then various time sequence data are trained to construct a model.
The clustering process is to capture the incidence relation among the time sequence data to be trained through DBSCAN and cluster the data with approximate shape and consistent periodicity.
In the clustering process, in calculating the approximation degree of the time-series data, the distance between the time-series data is calculated using Dynamic Time Warping (DTW).
And the data classification processing module selects characteristic data which can represent the time sequence data of the corresponding type according to the type of the time sequence data to train and construct a model.
According to one embodiment of the invention, the abnormal data detection recommendation module adopts RRCF to select all feature data for training, the feature data are iterated to obtain a plurality of decision trees, the decision trees form a decision forest, and then whether abnormal data exist in the real-time sequence data or not is determined through decision forest voting.
In this embodiment, when constructing the decision tree, the RRCF selects a segmentation dimension for segmenting the feature data, and the RRCF has a probability of selecting the feature data in the segmentation dimension of
g
i=max
x∈Sx
j-x
j-1(ii) a Where i is the characteristic data, p
iRepresenting the probability of the feature i being selected, the probability value being between 0 and 1; l
iRepresenting the difference between the maximum value and the minimum value of the characteristic i in a training sample set and in a characteristic set obtained by calculation; gi represents the maximum difference between two adjacent characteristic values in the characteristic set obtained by calculation after the characteristic i is sorted according to the characteristic size in the training sample set; sigma g
jRepresenting g calculated for each feature dimension j
jThe summation ∑ l
jRepresents l calculated for each feature dimension j
jAnd (6) summing.
In this embodiment, the RRCF equally divides the feature data in the segmentation dimension into N intervals [ l [ ]
0,h
0,l
1,h
1,...,l
N-1,h
N-1]And calculating the density d of each interval
i=Count(p,p∈[l
i,h
i]) Wherein the probability that each interval is selected is
Finally randomly selecting a cutting point X from the selected interval
i~Uniform[l
i,h
i]. Wherein l-0 and h-N-1 respectively represent the minimum value of the characteristic in the characteristic dimension solved for the training set, h-N-1 represents the maximum value of the characteristic, the difference between the minimum value and the maximum value is divided by N, and the N intervals are equally divided.
When abnormal data exists, the abnormal score CoDisp of the abnormal data is calculated by using the dividing point, and when the abnormal score CoDisp is calculated, the proportion CoDisp of the abnormal data quantity contained in the brother subtree and the father subtree of the dividing point is calculated
NodeSelecting the largest ratio CoDisp
NodeAbnormal data x
iIs an abnormality score of
In the invention, the abnormal data detection recommending module recommends part of abnormal data as a plurality of most abnormal segments in the selected abnormal data, acquires labels of the plurality of segments and then recommends; or
Recommending partial abnormal data by selecting a plurality of uncertain segments in the abnormal data, and recommending after obtaining labels of the plurality of segments; or
And recommending part of abnormal data, namely segmenting the abnormal data into a plurality of groups according to the abnormal scores, acquiring a plurality of fragments in each group, and recommending after acquiring the labels of the fragments.
According to an embodiment of the present invention, after obtaining the abnormal data of n labeled segments, the model and M decision trees in the decision forest of the model jointly form an abnormal score matrix codip _ M [ x [ ]i][treej]For each exception data xiIf the feedback judgment result is true positive, the decision tree isjHas a weight of twj=twj+δ×CoDisp_M[xi][treej]And selecting a decision tree with higher weight according to the feedback judgment result so as to optimize the model.
To achieve the above object, the present invention also provides an electronic device, including: the time-series exception handling system comprises a processor, a memory and a computer program which is stored on the memory and can run on the processor, wherein the computer program realizes the time-series exception handling method when being executed by the processor.
In order to achieve the above object, the present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to implement the above time-series exception handling method.
According to the scheme, the invention provides an unsupervised, white-box and accurate time series exception handling method which is matched with active learning and can actively and efficiently collect feedback information. On the basis of a traditional unsupervised learning frame, an active learning stage is introduced, abnormality is actively recommended to a judgment feedback part (such as a judgment feedback module or operation and maintenance personnel) and feedback is acquired, so that a model is corrected, and the accuracy is improved. The method reserves the advantages of the traditional unsupervised learning in the aspects of parameter adjustment and marking, designs the application strategy of marking feedback in a targeted manner, and further optimizes the recall rate, the detection speed and the capacity of the model.
Moreover, the processing method has no obvious bias on data, can adapt to indexes with specific scene semantics, can meet the operation and maintenance requirements in the field of non-traditional Internet, has higher expandability and universality, and can give specific abnormal reasons to the given abnormal result.
Moreover, the present invention was able to accurately detect and interpret anomalies, tested on 1 public data set and time series data of 2 commercial bank actual production environments, ultimately reaching F1-score of 0.81 and 0.89 on both data sets. Compared with the traditional unsupervised exception handling method, the best F1-score is improved by 0.19-0.5 on two data sets, and the detection time is shortened by 58%.
Those of ordinary skill in the art will appreciate that the modules and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and devices may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, each functional module in the embodiments of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method for transmitting/receiving the power saving signal according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.
It should be understood that the order of execution of the steps in the summary of the invention and the embodiments of the present invention does not absolutely imply any order of execution, and the order of execution of the steps should be determined by their functions and inherent logic, and should not be construed as limiting the process of the embodiments of the present invention.