Disclosure of Invention
Embodiments of the present invention at least partially address the above-mentioned problems.
According to a first aspect of the present invention, an abnormal index detection method is provided. The method comprises the following steps: acquiring an original time sequence, wherein the original time sequence comprises a target data point and a historical data point before the target data point, the target data point comprises an index value reported at a time point to be detected in the original time sequence, and the historical data point comprises an index value sequence which is reported at a time point before the time point to be detected and is arranged in the original time sequence according to a reporting time sequence; inputting the original time sequence into an anomaly detection model, processing the original time sequence by the anomaly detection model to obtain an anomaly detection result aiming at the target data point, and determining whether an index reported by the time point to be detected is abnormal or not according to the anomaly detection result of the target data point, wherein the anomaly detection model is obtained by deep learning and training.
In some embodiments, training the anomaly detection model by deep learning comprises: i. acquiring a sample time sequence and a mark corresponding to the sample time sequence; inputting the sample time sequence and the mark into an abnormality detection model, and processing the sample time sequence and the mark by the abnormality detection model to obtain an abnormality detection result aiming at the target data point; adjusting the anomaly detection model based on the anomaly detection result and the marker; and iv, iterating the steps i-iii for M times, wherein M is a preset iteration number.
In some embodiments, the constant detection model includes a plurality of parallel processing channels including a first channel, a second channel, and a third channel. The first channel is configured to perform windowing on the original time series with different window sizes to generate a plurality of windowed time series having different data lengths, and perform first fully-connected neural network processing on the plurality of windowed time series. The second channel is configured to perform downsampling on the original time series with different sampling intervals to generate a plurality of downsampled time series having different time resolutions, and perform second fully-connected neural network processing on the plurality of downsampled time series, and wherein the third channel is configured to determine a mean of a plurality of segments of the original time series to generate a mean time series, and perform third fully-connected neural network processing on the plurality of mean time series. Splicing the outputs of the plurality of parallel processing channels; and performing Softmax 2 classification on the spliced output.
Softmax is a function for implementing multi-classification. It maps some of the output neurons to real numbers between (0, 1) and normalizes the real numbers so that the sum of the probabilities for each class is 1. The definition of the Softmax function is as follows:
wherein
Is the output of the pre-stage output unit of the classifier. i represents a category index, and the total number of categories is C.
Representing the ratio of the index of the current element to the sum of the indices of all elements. The output values of the multiple classes can be converted into relative probabilities by a Softmax function.
In some embodiments, the plurality of windowed time series includes N windowed time series, N being an integer greater than or equal to 2, and wherein the first fully-connected neural network processing includes: inputting the N windowed time sequences into respective corresponding fully-connected neural network to obtain corresponding N fully-connected neural network outputs; splicing the ith fully-connected neural network output and the (i + 1) th fully-connected neural network output in the N fully-connected neural network outputs to obtain an ith spliced output, wherein i is an integer variable, and the initial value of i is 1; the following loop is performed until i equals N:
inputting the ith spliced output into an intermediate fully-connected neural network to obtain an intermediate fully-connected neural network output;
increasing the value of i progressively;
splicing the intermediate fully-connected neural network output with the (i + 1) th fully-connected neural network output in the N fully-connected neural network outputs to obtain an ith spliced output; and is
An intermediate fully-connected neural network output is provided as an output of the first channel.
In some embodiments, the anomaly detection model includes two or more long-short term memory units connected in series, the two or more long-short term memory units including a first long-short term memory unit through a qth long-short term memory unit, Q being an integer greater than or equal to 2, the two or more long-short term memory units being configured to perform the steps of:
the first stage is as follows:
the first long short term memory unit is configured to perform windowing on the original time series with a first window size to generate a first windowed time series having a first data length, and perform long short term memory processing on the first windowed time series resulting in a first long short term memory output;
and a second stage:
the Pth long-short term memory unit is configured to perform windowing on the original time sequence by using the Pth window size to generate a Pth windowed time sequence with the Pth data length, splice the P-1 th long-short term memory output and the Pth windowed time sequence, and perform long-short term memory processing on the spliced sequence to obtain a Pth long-short term memory output, wherein the P is 2 in the beginning, and the P is an integer which is more than or equal to 2 and less than or equal to Q;
repeating the steps in the second stage until P equals to Q, and obtaining the Q-th long-short term memory output;
softmax 2 classification is performed on the qth long-short term memory output.
In some embodiments, the method further comprises, prior to inputting the original time series into the anomaly detection model: and performing primary anomaly identification on the original time sequence through primary decision.
In some embodiments, the primary decision method comprises a statistical decision method, and the primary anomaly identification of the original time series by the primary decision comprises: extracting historical data points from the original time series; determining the mean value and the standard deviation of the historical data points by a statistical judgment method; determining a numerical value interval meeting the random error according to the mean value and the error; in response to the target data point being outside the range of values, the original time series is identified as anomalous.
In some embodiments, the primary decision method comprises an unsupervised method, and the primary anomaly identification of the original time series by the primary decision comprises: extracting each data point in the original time sequence; classifying the extracted data points through an unsupervised algorithm to obtain a classification result; and performing abnormity judgment on the time sequence based on the classification result.
In some embodiments, the method further comprises: and sending an alarm message in response to the abnormity detection result being the abnormity of the original time sequence.
In some embodiments, the alert message comprises: short message warning message, application program warning message and small program warning message.
According to a second aspect of the present invention, an abnormality index detection apparatus is provided. The device comprises an acquisition module and an abnormality detection module. The acquisition module is configured to acquire an original time sequence, the original time sequence including a target data point and a historical data point before the target data point, the target data point including an index value reported at a time point to be measured in the original time sequence, and the historical data point including an index value sequence reported at a time point before the time point to be measured, which is arranged in the original time sequence according to a reporting time sequence. The anomaly detection module is configured to input the original time sequence into an anomaly detection model, the anomaly detection model processes the original time sequence to obtain an anomaly detection result aiming at the target data point, whether an index reported by the time point to be detected is abnormal or not is determined according to the anomaly detection result of the target data point, and the anomaly detection model is obtained through deep learning and training.
In some embodiments, the anomaly detection model includes a plurality of parallel processing channels including a first channel, a second channel, and a third channel. Wherein the first channel is configured to perform windowing on the original time series with different window sizes to generate a plurality of windowed time series having different data lengths, and to perform first fully-connected neural network processing on the plurality of windowed time series. Wherein the second channel is configured to perform downsampling on the original time series with different sampling intervals to generate a plurality of downsampled time series having different time resolutions, and perform second fully-connected neural network processing on the plurality of downsampled time series, and wherein the third channel is configured to determine a mean of a plurality of segments of the original time series to generate a mean time series, and perform third fully-connected neural network processing on the plurality of mean time series; splicing the outputs of the plurality of parallel processing channels; and performing Softmax 2 classification on the spliced output.
In some embodiments, wherein the anomaly detection model comprises two or more long-short term memory units connected in series, the two or more long-short term memory units comprising a first long-short term memory unit through a qth long-short term memory unit, Q being an integer greater than or equal to 2, the two or more long-short term memory units configured to perform the following steps;
the first stage is as follows:
the first long short term memory unit is configured to perform windowing on the original time series with a first window size to generate a first windowed time series having a first data length, and perform long short term memory processing on the first windowed time series resulting in a first long short term memory output;
and a second stage:
the Pth long-short term memory unit is configured to perform windowing on the original time sequence by using the Pth window size to generate a Pth windowed time sequence with the Pth data length, splice the P-1 th long-short term memory output and the Pth windowed time sequence, and perform long-short term memory processing on the spliced sequence to obtain a Pth long-short term memory output, wherein the P is 2 in the beginning, and the P is an integer which is more than or equal to 2 and less than or equal to Q;
repeating the steps in the second stage until P equals to Q, and obtaining the Q-th long-short term memory output;
softmax 2 classification is performed on the qth long-short term memory output.
According to some embodiments of the invention, there is provided a computer device comprising: a processor; and a memory having instructions stored thereon, the instructions, when executed on the processor, causing the processor to perform any of the above methods.
According to some embodiments of the invention, there is provided a computer readable storage medium having stored thereon instructions which, when executed on a processor, cause the processor to perform any of the above methods.
The invention carries out the abnormity detection of the time sequence based on the deep learning model and has the following advantages: the technical scheme provided by the invention is end-to-end intelligent detection, manual setting of a detection threshold value is not needed, and detection and judgment are only needed through a deep learning model. The technical scheme provided by the invention has high recall rate, and the recall rate can be further improved along with the increase of the data volume. The technical scheme provided by the invention has high accuracy, and the accuracy can be further improved along with the increase of the data volume. In addition, the technical scheme provided by the invention has wide application scenes, the existing model is easy to expand to other scenes, only the data format is matched and the data labeling work is completed, different types of positive and negative samples are added on the basis of the existing model, and the model can be adapted to the application scenes through iterative incremental training learning.
Detailed Description
The time series abnormity detection has great significance for information technology. The performance index of the information technology service is generally used for representing performance parameters of an information technology system, and may include indexes such as user access amount, query request amount, query success amount, CPU utilization rate, storage utilization rate, and network resource utilization rate. The performance index of the information technology service is usually a group of time series data, and important index parameters such as user access amount and server working condition can be obtained by monitoring the time series. When the IT service is abnormal or fails, the service indexes with problems can be quickly detected through time sequence abnormality detection, so that the work of scheduling information technology resources, repairing the abnormality and the like can be better carried out, and stable user experience is provided for users.
The following description provides specific details for a thorough understanding and enabling description of various embodiments of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these details. In some instances, well-known structures and functions have not been shown or described in detail to avoid unnecessarily obscuring the description of the embodiments of the disclosure. The terminology used in the present disclosure is to be understood in its broadest reasonable manner, even though it is being used in conjunction with a particular embodiment of the present disclosure.
First, some terms related to the embodiments of the present disclosure are explained so that those skilled in the art can understand that:
1. deep learning: is a branch of machine learning, algorithms that perform high-level abstractions on data using multiple processing layers that contain complex structures or consist of multiple nonlinear transformations. Deep learning unifies feature extraction work and classification/fitting into a framework, learns and extracts features through a data set, and is a method capable of automatically learning and extracting features.
2. Positive sample: the samples corresponding to the correctly classified classes are intended. Here, the positive sample refers to a time-series sample in which no abnormality exists.
3. Negative sample: samples corresponding to categories other than the correctly classified category. Here, the negative sample refers to a sample of a time series in which an abnormality exists.
4. The precision ratio is as follows: the correctly retrieved items (TP) account for the proportion of all the actually retrieved items (TP + FP).
5. The recall ratio is as follows: the correctly retrieved items (TP) account for the proportion of all items that should be retrieved (TP + FN).
6. F1 score value: harmonic means of exact value and recall.
7. MLP (Multi-Layer Perceptin): the fully-connected neural network comprises an input layer, a hidden layer and an output layer. The features of the input will be connected to the neurons of the hidden layer, which in turn are connected to the neurons of the input layer. In the fully-connected neural network MLP, the layers are fully connected (fully connected means that there is a connection between any neuron in the previous layer and all neurons in the next layer).
8. LSTM (Long short term memory): a long-short term memory network is a time-recursive neural network suitable for processing and predicting relatively long-interval and delayed events in a time sequence.
9. HSDNN (Hierarchical threaded deep neural network): the application provides a deep neural network of range upon range of concatenation. HSDNN consists of a plurality of fully-connected neural networks MLP. Compared with the traditional fully-connected neural network, the HSDNN is divided into a plurality of fully-connected neural networks MLPs, and hidden layers between all the fully-connected neural networks MLPs are different and are not communicated with each other. In the splicing layer, two or more independent fully-connected neural networks MLP are spliced into one large fully-connected neural network MLP.
10. WLSTM (Window threaded Deep Neural Network) is an LSTM Network structure with a variable window.
11. Time series: and a group of data point sequences arranged according to the chronological order. Typically, the time intervals of a set of time series are constant values (e.g., 1 minute, 5 minutes, etc.). Here, the time series refers to a time series of the monitoring class. For example, reporting a monitoring data point every minute, and connecting every minute is a time series. The data points are stored in the system as a pair of (time, value), the value may be an index value, and the time may be a time for acquiring the index value. Pulling the data points for the corresponding time range is a set of values. The original time series includes a target data point and historical data points preceding the target data point. The target data point comprises an index value reported at the time point to be measured in the original time sequence. The historical data points comprise index value sequences which are arranged in the original time sequence according to the reporting time sequence and are reported at time points before the time point to be measured. The index can be used for representing performance parameters of the information technology system, and can include indexes such as user access amount, query request amount, query success amount, CPU utilization rate, storage utilization rate, network resource utilization rate and the like.
12. Random Forest (Random Forest): a classifier includes a plurality of decision trees and its output class is determined by the mode of the class of the individual tree output. Wherein a decision tree refers to a tree-structured prediction model.
Fig. 1 is a schematic diagram of anapplication scenario 100 of a data anomaly detection method according to an embodiment. Referring to fig. 1, the application scenario includes a data reporting device 110 and ananomaly detection device 120 connected via a network. The data reporting device 110 is a device for reporting data points, and theanomaly detecting device 120 is a device for performing anomaly detection processing on the reported data points. Both the data reporting device 110 and theanomaly detection device 120 may be terminals or servers. The terminal may be a smart television, a desktop computer or a mobile terminal. In particular, the mobile terminal may be at least one of a smartphone, a tablet, a laptop, a personal digital assistant, a wearable device, and the like. The server may be implemented as a stand-alone server or as a server cluster of multiple physical servers. The number of data reporting devices 110 may be one or more. For example, the plurality of terminals report respective data to theabnormality detection device 120. The data reporting device 110 may report data points to theanomaly detection device 120 periodically at certain time intervals. Theanomaly detection device 120 can acquire a time sequence that includes a target data point and historical data points reported before the target point. The target data points and the historical data points are arranged according to the reported time sequence.
FIGS. 2a-2c respectively illustrate a user interface for alerting of a time series of anomalies, according to one embodiment of the present invention. In the time series abnormity detection scheme based on deep learning provided by the invention, the deep learning model can automatically carry out intelligent time series abnormity detection according to input data, and output a label of whether the time series abnormity is abnormal or not. The scheme of the invention can be applied to the abnormal detection and intelligent alarm functions of time series in various services (such as cloud-based services, instant messaging software, personal space web pages and the like). When an abnormality occurs in the time series, the abnormality is detected quickly and timely and notified to the corresponding person in charge in the form of an interface shown in fig. 2a-2b through an alarm function. Fig. 2a shows a schematic diagram of an alert being sent to a user in the form of a short message at a user terminal (e.g., a smart phone, a tablet, etc.) in response to detecting a timing anomaly. Here, the alarm message may include an alarm object, an item to which the alarm object belongs, a region, an account ID, an alarm policy, a trigger time, and the like. Figure 2b shows a schematic diagram of an alert being issued to a user in the form of an instant message (e.g. a QQ message) in the communications software at the user terminal in response to the detection of a timing anomaly. Here, the alarm message may include a time point at which the abnormality occurs, a timing curve, a link of the abnormality report, and the like. Figure 2c shows a schematic diagram of an alert being issued to a user at a user terminal in the form of a message in an applet (e.g. a WeChat applet) in response to detecting a timing anomaly. Here, the alarm message may include timing curves, statistical maps, and the like of different service data for different times (e.g., a week before, a day). As will be appreciated by those skilled in the art, the content of the alert message may include alerts for other items, and may also include other alert forms.
FIG. 3 illustrates a flow diagram of an anomalyindex detection method 300 according to one embodiment. Instep 301, an anomaly detection device (which may be a terminal or a server) acquires an original time series, which may include a target data point and historical data points preceding the target data point. The target data point comprises an index value reported at the time point to be measured in the original time sequence, and the index value can be one or more. The historical data point comprises one or more index values (namely, an index value sequence) which are reported at a time point before the time point to be measured and are arranged according to the reporting time sequence in the original time sequence. To more clearly explain the selection of time series data points, the following description is made in conjunction with fig. 4.
A diagram 400 of raw time series extraction according to one embodiment is shown in fig. 4. The abscissa in fig. 4 represents the acquisition time of the time-series data, and the ordinate represents the extraction period of the time-series data. Here, three portions of time series data are respectively selected to constitute the original time series. Here, a window size of 180 minutes with a sampling interval of 1 minute (i.e., a time granularity of 1 minute) is taken as an example. The first portion of time series data includes thecurrent time point 403 to be measured for today and data points within a certain window size before thecurrent time point 403. Here, a total of 181 data points (including the value of the current time point 403) were acquired for today. The second part of the time series data includes data points within the specific window size before and after the one-day-ahead time point 402 corresponding to thecurrent time point 403 to be measured on a comparable basis. A total of 361 data points were collected for the one day time window. The third part of the time series data includes data points in the specific window size before and after thetime point 401 one week before corresponding to thecurrent time point 403 to be measured. A total of 361 data points were collected for a one week prior time window. Thus, in the case of a window size of 180 minutes and a sampling interval of 1 minute, the original time series consists of 903 points in total: 181 data points today, 361 data points one day ago and 361 data points one week ago.
In step 302, the original time series is input into an anomaly detection model for processing, so as to obtain an anomaly detection result for the target data point. And then, determining whether the index reported by the time point to be detected is abnormal or not according to the abnormal detection result of the target data point. Specifically, the index reported by the time point to be detected is determined to be abnormal in response to the abnormal detection result display of the target data point; and responding to the abnormal detection result of the target data point to display that the reported index of the time point to be detected is normally determined. For example, when the data point corresponding to the user access amount of the WeChat applet on a certain day is detected to be abnormal, the access amount of the WeChat applet on the certain day is judged to be abnormal. Here, the abnormality detection model is trained by deep learning. The anomaly detection model herein includes at least HSDNN or WLSTM models, which will be described in detail later.
Before training an anomaly detection model (i.e. machine learning models such as HSDNN and WLSTM), a labeled sample data set needs to be prepared first. A flow diagram of amethod 500 of sample data annotation is shown in fig. 5. The data in the scheme is derived from massive time series operation and maintenance data 601 in various services (for example, cloud-based services (such as Tencent cloud), interactive application data (such as QQ application), personal space web pages (such as QQ space and Qzone)).
First, at 502, positive sample filtering is performed through statistical, unsupervised, and other methods, so as to primarily screen out suspicious sample data, and the positive samples at the screening are stored in the positive
sample data set 505. In one embodiment, the preliminary decision method comprises a statistical decision method. The statistical decision method may include: extracting historical data points from the original time series; determining the mean value and the standard deviation of the historical data points by a statistical judgment method; determining a numerical value interval meeting the random error according to the mean value and the error; in response to the target data point being outside the range of values, the original time series is identified as anomalous. Specifically, the computer device may extract historical data points other than the target data point from the time series, and determine the mean and standard deviation of the historical data points by a statistical decision method, i.e., for the extracted historical data pointsHistorical data point ball mean and standard deviation. In one embodiment, the statistical decision algorithm comprises the three sigma law (tree-sigma rule of thumb). The three-sigma law, also known as the rale law, first assumes that a set of test data contains only random errors, calculates them to obtain a standard deviation, and then determines an interval according to a certain probability, and considers that the value exceeding the interval is not random errors but gross errors. The three sigma law specifically includes: the values are distributed in

) The probability of (1) is 0.6827; the values are distributed in
) The probability of (1) is 0.9545; the values are distributed in
) The probability of (1) is 0.9973. Where σ represents the standard deviation and μ represents the mean. It should be understood that the mean is the mean of historical time points in the time series, and the standard deviation is the standard deviation of historical data points in the time series.
In another embodiment, the preliminary decision method comprises an unsupervised method. The primary anomaly identification of the original time series through primary decision comprises the following steps: extracting each data point in the original time sequence; classifying the extracted data points through an unsupervised algorithm to obtain a classification result; and performing abnormity judgment on the time sequence based on the classification result.
Specifically, the computer device can substitute the unmarked training samples into the formula of the unsupervised algorithm through the unsupervised algorithm in advance to perform unsupervised machine learning training, and adjust the parameters of the formula in the training process to optimize the algorithm. The computer device may extract each data point in the time series, with the understanding that the extracted data points include a target data point and a historical data point. The computer equipment can substitute the extracted data points into the formula of the unsupervised algorithm after the parameters are adjusted to calculate, so that the classification processing is carried out on each data point to obtain a classification result. The computer device can perform exception judgment processing on the time sequence according to the classification result.
Unsupervised algorithms, including at least one of a Recurrent Neural Network algorithm (RNN), an isolated Forest algorithm (Isolation Forest), a class of support vector machines (onelasssvm, OneClass supported vector machine), an exponential Weighted Moving Average algorithm (EWMA, explicit Weighted Moving-Average), and the like.
Among them, a Recurrent Neural Network (RNN) is a type of Neural Network algorithm for processing sequence data. The essential feature is that there are both internal feedback and connections between the processing units.
An isolated Forest (Isolation Forest) is a fast anomaly detection method based on Ensemble learning (Ensemble), has linear time complexity and high accuracy, and is an algorithm meeting the requirement of big data processing.
One class of Support Vector machines (OneClass svm, OneClass Support Vector Machine) is a classifier obtained by performing unsupervised training using training samples of only one class, and the trained classifier judges all other samples not belonging to the class as "not yes", rather than returning a "not yes" result due to belonging to another class.
The Exponentially Weighted Moving Average algorithm (EWMA), is a special Weighted Moving Average method.
It will be appreciated that different unsupervised algorithms will yield different classification results.
In one embodiment, when the unsupervised algorithm is a recurrent neural network algorithm, the classification result of whether the target data point is abnormal or not can be directly output, and it can be understood that the time series can be subjected to abnormality judgment processing according to the classification result representing the target data point to obtain an abnormality judgment result representing whether the time series is suspected to be abnormal or not.
In one embodiment, when the unsupervised algorithm is an isolated forest, the classification result includes an average path length of a leaf node where the target data point is located in a tree of the isolated forest. Then, when the average path length is less than or equal to the preset threshold, it can be determined that the time series is suspected to be abnormal. Otherwise, when the average path length is greater than the preset threshold, it may be determined that the time series is normal.
In one embodiment, when the unsupervised algorithm is a type of support vector machine algorithm, the classification result indicates whether the target data point belongs to a normal category, when the target data point does not belong to the normal category, it can be determined that the time series is suspected to be abnormal, and when the target data point belongs to the normal category, it can be determined that the time series is normal.
In one embodiment, when the unsupervised algorithm is an exponential weighted moving average algorithm, the computer device may smooth the time series through the exponential weighted moving average algorithm, and determine whether the target data point is within a random error range by using a statistical analysis algorithm with respect to the smoothed time series, and if so, determine that the time series is normal, and if not, determine that the time series is suspected to be abnormal.
In the embodiment, the time sequence is subjected to the anomaly judgment processing through the unsupervised algorithm, and the unsupervised algorithm is combined with the anomaly detection model obtained through supervised learning, so that the multi-level anomaly detection processing is realized, and the accuracy of anomaly detection is improved.
In one embodiment, the unsupervised algorithm is plural. The method further comprises the following steps: obtaining an abnormal judgment result corresponding to each unsupervised algorithm; performing combined detection processing according to the abnormal judgment results corresponding to the unsupervised algorithms; and when the result of the combined detection processing shows that the time series is abnormal, judging that the time series is suspected to be abnormal.
In one embodiment, the performing the joint detection processing according to the abnormal decision result corresponding to each unsupervised algorithm includes: and when the abnormity judgment result corresponding to any unsupervised algorithm represents that the time sequence is abnormal, judging that the time sequence is suspected to be abnormal. It can be understood that, since each unsupervised algorithm has its own disadvantages, and the abnormality decision result obtained by each unsupervised algorithm may have the situations of imperfection and undetected abnormality, the abnormality decision results corresponding to each unsupervised algorithm are jointly decided, and when the abnormality decision result corresponding to any unsupervised algorithm indicates that the time sequence is abnormal, the time sequence is determined to be suspected to be abnormal. Namely, the anomaly judgment results of each unsupervised algorithm are comprehensively considered, so that the primary anomaly identification of the time series can be more accurate.
In one embodiment, the performing the joint detection processing according to the abnormal decision result corresponding to each unsupervised algorithm includes: and determining preset weights corresponding to the unsupervised algorithms, and determining a joint detection processing result according to the abnormal judgment result corresponding to each unsupervised algorithm and the corresponding preset weights.
The abnormal judgment result corresponding to each unsupervised algorithm comprises any one of the abnormal time sequence or the normal time sequence. The computer can determine a first proportion of the abnormal judgment results of the abnormal time sequence and a second proportion of the abnormal judgment results of the normal time sequence according to the weight of each unsupervised algorithm and the corresponding abnormal judgment results, compares the first proportion and the second proportion, and takes the abnormal judgment results corresponding to larger values as the results of the joint detection processing.
It can be understood that when the first proportion of the abnormal judgment result of the time series abnormality is greater than the second proportion of the abnormal judgment result of the time series abnormality, the time series abnormality is taken as a joint detection processing result. And otherwise, when the first proportion of the abnormal judgment result of the abnormal time sequence is smaller than the second proportion of the abnormal judgment result of the normal time sequence, the normal time sequence is taken as the result of the joint detection processing.
For ease of understanding, this is now exemplified. For example, there are 3 unsupervised algorithms A, B and C, the corresponding preset weights are 0.4, and 0.2, respectively, the abnormality determination result obtained by the unsupervised algorithm a is time series abnormality, the abnormality determination result obtained by the unsupervised algorithm B is time series abnormality, the abnormality determination result obtained by the unsupervised algorithm C is time series normality, the first percentage of the abnormality determination result of time series abnormality is 0.8, and the second percentage of the abnormality determination result of time series normality is 0.2. Then, the computer device may treat the time series anomaly as a result of the joint detection process.
It can be understood that the result of the joint detection processing is determined according to the abnormality judgment result corresponding to each unsupervised algorithm and the corresponding preset weight, and the abnormality judgment result of each unsupervised algorithm is comprehensively and reasonably considered, so that the primary abnormality identification of the time series can be more accurate.
The computer device may determine whether the time series is suspected to be abnormal according to a result of the joint detection processing. And when the result of the joint detection processing indicates that the time sequence is abnormal, the computer equipment judges that the time sequence is suspected to be abnormal. Further, when the result of the joint detection processing indicates that the time series is normal, the computer device may determine that the time series is normal.
It should be noted that the computer device may combine the statistical decision algorithm with at least one unsupervised algorithm to perform the primary anomaly identification on the time series.
In one embodiment, the computer device may perform anomaly identification on the time series through a statistical decision algorithm at a first level, perform joint detection processing on the time series through a plurality of unsupervised algorithms at a second level after the suspected anomaly of the time series is identified, perform feature extraction on the time series at a third level after the suspected anomaly of the time series is determined through the joint detection, input the extracted feature data into an anomaly detection model obtained through supervised machine learning training for further detection, and invoke an anomaly processing strategy when the anomaly detection model outputs an anomaly detection result that a target data point is abnormal.
After the positive samples are filtered and stored in the positive sample data set 605 instep 502, and the suspicious samples are input into an annotation platform (for example, an annotation platform described below), instep 503, the samples are manually annotated by the operation and maintenance personnel through the annotation platform, and the negative samples obtained after manual annotation are stored in the negativesample data set 504, and the positive samples obtained after manual annotation are stored in the positivesample data set 505.
FIG. 6 shows a schematic diagram of a time series annotation platform. The manual marking work on the data sample needs to be completed by a marking tool as shown in fig. 6. As shown in the following figure, the system provides not only the current time curve but also the time series curve of day and week before as reference for the samples that need to be labeled. And the operation and maintenance personnel can label the sample on the labeling platform through comparison and judgment, and click the corresponding operation button to label the sample as a positive sample or a negative sample. The annotation platform makes annotation personnel be concentrated in the annotation work of the abnormal sample by comparing and referring to data of today, before one day and before one week, so that the efficiency of time series annotation is effectively improved, and a large amount of annotation data is obtained to provide for the training and testing of the model.
FIG. 7 shows a flow diagram of amethod 700 for off-line training and on-line testing of a deep learning model, according to an embodiment of the invention. In offline model training, the labeled sample data set (including the positive sample set and the negative sample set) 701 above is first subjected tosimple preprocessing 702. In one embodiment, the total sample data size is 232818 sets of sample data. The data set is divided into a training data set and a test data set. The training data set includes 151606 sets of sample data, while the test data set includes 81212 sets of sample data. 102675 sets of negative samples and 48931 sets of positive samples are included in the training dataset; and 7330 sets of negative examples and 73882 sets of positive examples were included in the test dataset. The preprocessing mainly comprises the step of carrying out maximum and minimum normalization on the sample data, namely unifying the time series in a range of [0, 1] as the input of the deep learning model. Two schemes, HSDNN and WLSTM, are mainly proposed herein for the deep learning model. Instep 703, the preprocessed data is input into a deep learning model, and the time series data is learned by the deep learning model, and instep 704, a flag indicating whether the time series data is abnormal is output. And then, adjusting parameters in the deep learning model based on the label of whether the time series data is abnormal or not.
When the deep learning model is used for online detection: instep 706, the real-time series data to be detected is obtained first. Instep 707, positive sample filtering is performed using the statistical, unsupervised, etc. methods described above. The specific implementation of the statistical, unsupervised approach is described above with respect to fig. 5. Instep 708, the screened suspicious sample is preprocessed. The preprocessing mainly comprises the maximum and minimum normalization of the sample data, namely, the time series is unified to be within the range of [0, 1 ]. Instep 709, the deep learning model trained offline is loaded to perform anomaly detection on the preprocessed time series data. Instep 710, an anomaly sample output through anomaly detection by the deep learning model is output. Thereafter, the data set used for off-line training of the deep learning model is updated with the output anomaly samples.
In one embodiment, the anomaly detection model is trained by deep learning comprising: i. acquiring a sample time sequence and a mark corresponding to the sample time sequence; inputting the sample time sequence and the mark into an abnormality detection model, and processing the sample time sequence and the mark by the abnormality detection model to obtain an abnormality detection result aiming at the target data point; adjusting the anomaly detection model based on the anomaly detection result and the marker; and iv, iterating the steps i-iii for M times, wherein M is a preset iteration number.
In one embodiment, the anomaly detection model may be a network structure of a stacked-splice fully-connected neural network HSDNN. Fig. 8 shows a schematic diagram of anetwork structure 800 of a stacked tiled fully-connected neural network HSDNN. HSDNN is a network structure formed by splicing a plurality of fully-connected neural network structures, and each crossed icon shown in fig. 8 represents a fully-connected neural network structure. HSDNN differs from conventional fully-connected neural networks in that HSDNN includes many massive fully-connected neural networks MLP, such as the hidden layers between each of the fully-connected neural network MLP blocks in fig. 8 are different and not in communication with each other. In the splicing layer (see the shaded part in the figure) in fig. 8, two or more independent fully-connected neural networks are spliced into a larger fully-connected neural network, and a local fully-connected network structure is formed. This involves a number of blocky fully-connected neural networks, referred to herein as stacked-tiled fully-connected neural networks HSDNN. The stacked and spliced fully-connected neural network HSDNN comprises a plurality of parallel processing channels, and compared with the traditional fully-connected neural network MLP, the stacked and spliced fully-connected neural network HSDNN has the advantages that: through the splicing of a plurality of parallel processing channels (window change, down sampling and segmentation aggregation), the model can better acquire the overall characteristics and the local characteristics of the time series, and has higher recall rate, accuracy and F1 score value.
Innetwork 800, HSDNN includes 3 data input modules: awindow transform module 802, adownsampling module 803, and asegment aggregation module 804. Thewindow transform module 802, thedownsampling module 803, and the segmentation andaggregation module 804 respectively form three parallel processing channels: a first channel 806, asecond channel 807, and athird channel 808. The first channel 806 is configured to perform windowing on the original time series with different window sizes to generate a plurality of windowed time series having different data lengths, and perform first fully-connected neural network processing on the plurality of windowed time series. Here, the window sizes are selected to be 10 minutes inblock 8021, 30 minutes inblock 8022, 60 minutes inblock 8023, and 180 minutes inblock 8024, respectively, which are the same as the original time sequence. Thesecond pass 807 is configured to perform downsampling on the original time series with different sampling intervals to generate a plurality of downsampled time series having different time resolutions, and perform second fully-connected neural network processing on the plurality of downsampled time series. The sampling intervals are here chosen to be 18 minutes inblock 8031, 6 minutes inblock 8032, 3 minutes inblock 8033, and 1 minute inblock 8034, respectively, which is the same as the original time sequence. Thethird channel 808 is configured to determine a mean of a plurality of segments of the original time series to generate a mean time series, and perform a third fully-connected neural network processing on the plurality of mean time series. Here, by averaging the points in every 30 minutes in the original time series, 803 points in the original time series become 31 points (7 points today (the point containing the current time), 12 points before one day, 12 points before one week) by segmentation and aggregation. The outputs of the multiple parallel processing channels are spliced at 805, andSoftmax 2 classification is performed on the spliced outputs.
In one embodiment, the following steps are performed for the first channel and the second channel. Taking the first channel as an example, the plurality of windowed time-series includes N windowed time-series, N being an integer greater than or equal to 2, and wherein the first fully-connected neural network processing includes:
1) inputting the N windowed time sequences into respective corresponding fully-connected neural network to obtain corresponding N fully-connected neural network outputs;
2) splicing the ith fully-connected neural network output and the (i + 1) th fully-connected neural network output in the N fully-connected neural network outputs to obtain an ith spliced output, wherein i is an integer variable, and the initial value of i is 1;
the following loop is performed until i equals N:
inputting the ith spliced output into an intermediate fully-connected neural network to obtain an intermediate fully-connected neural network output;
increasing the value of i progressively;
splicing the intermediate fully-connected neural network output with the (i + 1) th fully-connected neural network output in the N fully-connected neural network outputs to obtain an ith spliced output; and is
3) An intermediate fully-connected neural network output is provided as an output of the first channel.
As will be appreciated by those skilled in the art, the number of channels (e.g., up-sampling channels may be increased), the number of layers spliced in each channel, the window size, the sampling interval, the length of the segments in the segment aggregation, etc., may be appropriately adjusted.
In one embodiment, the anomaly detection model may be a windowed long-term memory network WLSTM network. Fig. 9 shows a schematic diagram of the structure of a windowed long short term memorynetwork WLSTM network 900. An LSTM long-short term memory network is a time-recursive neural network with a chain form of repeating neural network modules that is suitable for processing and predicting relatively long-spaced and delayed events in a time series. Compared with a single-layer neural network (tanh layer) of repeating modules of the recurrent neural network RNN, the repeating modules (cells) in the chain structure of the LSTM have four layers of interactive neural network layers. The core concept of LSTM is the cellular state and the "gate" structure. The cell state corresponds to the path of information transmission, allowing information to be passed on in a sequence. The cellular state can convey relevant information during the processing of the sequence. Thus, even information from earlier time steps can be carried to cells in later time steps, overcoming the effects of short-term memory. The addition and removal of information is accomplished through a "gate" structure that learns which information to save or forget during the training process.
TheWLSTM network 900 includes a plurality of long short term memory LSTM units in series. In one embodiment,WLSTM network 900 includes two or more long-short term memory cells connected in series, the two or more long-short term memory cells including a first long-short term memory cell through a qth long-short term memory cell, Q being an integer greater than or equal to 2, the two or more long-short term memory cells configured to perform the following steps.
The first stage is as follows: the first long short term memory unit is configured to perform windowing on the original time series using a first window size to generate a first windowed time series having a first data length, and perform long short term memory processing on the first windowed time series resulting in a first long short term memory output. And a second stage: the pth long short term memory unit is configured to perform windowing on the original time series with a pth window size to generate a pth windowed time series having a pth data length, concatenate the pth-1 long short term memory output with the pth windowed time series, and perform long short term memory processing on the concatenated series to obtain a pth long short term memory output, where pth is initially 2, and P is an integer greater than or equal to 2 and less than or equal to Q. Repeating the steps in the second stage until P equals to Q, and obtaining the Q-th long-short term memory output; softmax 2 classification is performed on the qth long-short term memory output.
In another embodiment, the plurality of long-short term memory LSTM units includes 4 LSTM units. A first LSTM cell, a second LSTM cell, a third LSTM cell and a fourth LSTM cell. The first LSTM unit is configured to perform windowing on the original time series with a first window size 901 (where the first window size is 7 minutes) to generate a first windowed time series having a first data length, and perform LSTM processing on the first windowed time series; yielding a first LSTM output. The second LSTM unit is configured to perform windowing on the original time series using a second window size 902 (where the first window size is 20 minutes) to generate a second windowed time series having a second data length, to splice the first LSTM output with the second windowed time series, and to perform LSTM processing on the spliced series to obtain a second LSTM output. The third LSTM unit is configured to perform windowing on the original time series with a third window size 903 (where the first window size is 60 minutes) to generate a third windowed time series having a third data length, to splice the second LSTM output with the third windowed time series, and to perform LSTM processing on the spliced sequence to obtain a third LSTM output. The fourth LSTM unit is configured to perform windowing on the original time series using a fourth window size 904 (where the fourth window size is 180 minutes, the same as the original time series) to generate a fourth windowed time series having a fourth data length, resulting in a fourth LSTM output.Softmax 2 classification is performed on the spliced fourth LSTM output, outputting a probability of being abnormal or normal. By this adjustment of the size of the time window, local features and global features in the original time sequence are more easily obtained.
FIG. 10 is a schematic diagram of anapparatus 1000 for off-line training and on-line testing of a deep learning model according to an embodiment of the present invention. Theapparatus 1000 includes at least an acquisition module 1001 and an anomaly detection module 1002. The obtaining module 1001 is configured to obtain an original time sequence, where the original time sequence includes a target data point and a historical data point before the target data point, the target data point includes an index value reported at a time point to be measured in the original time sequence, and the historical data point includes an index value sequence reported at a time point before the time point to be measured, which is arranged in the original time sequence according to a reporting time sequence. The anomaly detection module 1002 is configured to input the original time sequence into an anomaly detection model, the anomaly detection model processes the original time sequence to obtain an anomaly detection result for the target data point, and determines whether an index reported at a time point to be detected is abnormal according to the anomaly detection result of the target data point, where the anomaly detection model is obtained by deep learning and training.
For two deep learning models provided by the invention, based on 151606 training data sets and 81212 testing data sets, we can obtain available WLSTM and HSDNN deep models through refined parameters. As shown in the table below, the recall rate of the WLSTM model can reach 89.78%, the accuracy rate can reach 94.93%, and the F1 score value can reach 92.28%. On the other hand, the recall rate of the HSDNN model can reach 93.06%, the accuracy rate can reach 95.18%, and the F1 score value can reach 93.06%.
Compare Dapeng Liu et al "oppentice: the recall rate and accuracy rate of the deep learning based scheme are greatly improved in the Opprenotice scheme (accuracy rate 83% in the case of recall greater than 66%) in the 2015 Internet protocol detection through the machine learning.
| Model name | Recall rate | Rate of accuracy | F1 score value |
| HSDNN | 91.04% | 95.18% | 93.06% |
| WLSTM | 89.78% | 94.93% | 92.28% |
| Opprence scheme | Over 66 percent | 83% | 73.53% |
Table 1 test data for the deep learning model set forth the results.
And extracting the time sequence to be detected from the data source in real time. The time sequence to be detected is firstly filtered through a positive sample by statistics and an unsupervised algorithm, and then suspicious time sequence data is output and transmitted to a preprocessing layer for minimum and maximum normalization preprocessing. And then, loading the offline trained deep learning model for anomaly detection. And if the detected sample is a negative sample (abnormal), carrying out corresponding alarm sending. Meanwhile, the output abnormal sample and the label can be fed back to a sample library trained by the offline model to be used as the optimization of the offline model.
As described above, the technical solution proposed by the present invention has low maintenance cost and low labor cost: the conventional manual threshold setting method requires a lot of manpower to maintain the setting of the threshold. The machine learning method based on the feature engineering requires experts to mine a large number of features according to different business data. The time series abnormity detection model based on deep learning avoids setting of threshold values and characteristic engineering, and is low in maintenance cost and labor cost.
The technical scheme provided by the invention has high accuracy: by collecting and labeling data for overseas operation and maintenance. Time series are widely present in the field of monitoring of various services. Especially in the field of operation and maintenance monitoring, all service indexes are reported in a time series mode. In particular, the time series may come from the field of operation and maintenance monitoring of internet enterprises. Each service index can be reported in a time series form, and the monitoring system monitors the time series. The abnormity of the time sequence can reflect the problem of the corresponding service index. The corresponding service indicators may be: for example, even the amount of warning transmission, the amount of memory usage, etc. in the communication software. And the abnormal detection result indicates that the corresponding service index exceeds a preset threshold value for the original time sequence abnormal indication. A large amount of data can be used for training and testing a deep learning model, so that good model performance is obtained; in addition, the accuracy of the model can be increased with the increase of the training data volume, which is also a very important advantage of the deep learning model. The technical scheme provided by the invention has high recall rate (wide coverage area) and can detect various types of abnormalities. The abnormal coverage rate is obviously higher than that of a characteristic engineering-based learning method, the test recall rate is about 90%, and the accuracy rate is stabilized to be more than 95%. The technical scheme provided by the invention is easy to expand: only data format matching is needed, iterative incremental learning can be performed on the existing model, and other application scenarios are easy to expand.
FIG. 11 shows a schematic diagram of anexample computer device 1100 for anomaly indicator detection.Computing device 1100 can be a variety of different types of devices, such as server computers, devices associated with clients (e.g., client devices), systems on a chip, and/or any other suitable computing device or computing system.
Computing device 1100 may include at least oneprocessor 1102, memory 1104, communication interface(s) 1106,display device 1108, other input/output (I/O) devices 1110, and one or moremass storage devices 1112, which may be capable of communicating with each other, such as through system bus 1114 or other appropriate connection.
Theprocessor 1102 may be a single processing unit or multiple processing units, all of which may include single or multiple computing units or multiple cores. Theprocessor 1102 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitry, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, theprocessor 1102 may be configured to retrieve and execute computer readable instructions stored in the memory 1104, themass storage device 1112, or other computer readable media, such as program code of an operating system 1116, program code of an application 1118, program code ofother programs 1120, etc., to implement the methods for anomaly indicator detection provided by embodiments of the present invention.
Memory 1104 andmass storage device 1112 are examples of computer storage media for storing instructions that are executed byprocessor 1102 to carry out the various functions described above. By way of example, memory 1104 may generally include both volatile and nonvolatile memory (e.g., RAM, ROM, and the like). In addition,mass storage device 1112 may generally include a hard disk drive, solid state drive, removable media including external and removable drives, memory cards, flash memory, floppy disks, optical disks (e.g., CD, DVD), storage arrays, network attached storage, storage area networks, and the like. Memory 1104 andmass storage device 1112 may both be referred to herein collectively as memory or computer storage media and may be non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code that may be executed byprocessor 1102 as a particular machine configured to implement the operations and functions described in the examples herein.
A number of program modules can be stored on themass storage device 1112. These programs include an operating system 1116, one or more application programs 1118,other programs 1120, andprogram data 1122, and they can be loaded into memory 1104 for execution. Examples of such applications or program modules may include, for instance, computer program logic (e.g., computer program code or instructions) for implementing the following components/functions: an acquisition module 1001, anomaly detection 1002, and/or further embodiments described herein.
Although illustrated in fig. 11 as being stored in memory 1104 ofcomputing device 1100,modules 1116, 1118, 1120, and 1122, or portions thereof, may be implemented using any form of computer-readable media that is accessible bycomputing device 1100. As used herein, "computer-readable media" includes at least two types of computer-readable media, namely computer storage media and communication media.
Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device.
In contrast, communication media may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism. Computer storage media, as defined herein, does not include communication media.
Computing device 1100 may also include one ormore communication interfaces 1106 for exchanging data with other devices, such as over a network, direct connection, or the like, as previously discussed. The one ormore communication interfaces 1106 can facilitate communication within a variety of networks and protocol types, including wired networks (e.g., LAN, cable, etc.) and wireless networks (e.g., WLAN, cellular, satellite, etc.), the Internet, and so forth. Thecommunication interface 1106 may also provide for communication with external storage devices (not shown), such as in a storage array, network attached storage, storage area network, or the like.
In some examples, adisplay device 1108, such as a monitor, may be included for displaying information and images. Other I/O devices 1110 may be devices that receive various inputs from a user and provide various outputs to the user, and may include touch input devices, gesture input devices, cameras, keyboards, remote controls, mice, printers, audio input/output devices, and so forth.
Variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed subject matter, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, the indefinite article "a" or "an" does not exclude a plurality, and "a plurality" means two or more. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.