Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, rightThe present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, andIt is not used in the restriction present invention.
Fig. 1 is the application scenario diagram of data exception detection method in one embodiment.Referring to Fig.1, it is wrapped in the application scenariosInclude the data reporting equipment 110 and abnormality detecting apparatus 120 by network connection.Data reporting equipment 110 is for upper count offThe equipment at strong point, abnormality detecting apparatus 120 are the equipment for carrying out abnormality detection processing to the data point reported.Data reportEquipment 110 and abnormality detecting apparatus 120 can be terminal or server.Terminal can be intelligent TV set, desktop computerOr mobile terminal, mobile terminal may include mobile phone, tablet computer, laptop, personal digital assistant and wearable deviceAt least one of Deng.Server can with the server cluster of independent server either multiple physical servers composition comeIt realizes.Data reporting equipment 110 can be one or more, for example, multiple terminals report respectively to abnormality detecting apparatus 120 respectivelyFrom data.
Data reporting equipment 110 can be according to certain time interval regularly to 120 reported data of abnormality detecting apparatusPoint.Available abnormality detecting apparatus 120 includes number of targets strong point and the historical data point reported before number of targets strong pointTime series, wherein number of targets strong point and the historical data point are arranged according to the chronological order reported.Abnormal inspectionMeasurement equipment 120 can carry out primary anomalous identification to time series by primary judgement mode;Primary judgement mode is different from havingThe machine learning algorithm of supervision.When recognize time series it is doubtful abnormal when, abnormality detecting apparatus 120 then can be to time seriesCarry out feature extraction.Abnormality detecting apparatus 120 can will extract obtained characteristic and input abnormality detection model, and output is directed toThe abnormality detection result at number of targets strong point;Wherein, abnormality detection model is by there is the machine learning algorithm of supervision to be trainedIt arrives.
Fig. 2 is the flow diagram of data exception detection method in one embodiment.The present embodiment is mainly different with the dataNormal detection method is applied to be illustrated in computer equipment, which can set for the abnormality detection in Fig. 1Standby 120.Referring to Fig. 2, this method specifically comprises the following steps:
S202, acquisition time sequence;It wherein, include number of targets strong point and being reported before number of targets strong point in time seriesHistorical data point, number of targets strong point and historical data point are arranged according to the chronological order reported.
It is appreciated that time series, is one group and is formed according to the data point that the chronological order reported is arrangedOrdered series of numbers.
Fig. 3 is the graphical schematic diagram of time series in one embodiment.In order to more intuitively understand time series, now tieFig. 3 is closed to be illustrated.Referring to Fig. 3, horizontal axis is time shaft, and the longitudinal axis is the request ordered series of numbers reported, for example, reporting in 16:25Received number of requests is 20730.It can be formed by each data point reported according to the time sequencing arrangement reportedTime series, the curve 302 in Fig. 3 are the intuitive graph-based of time series.
In the embodiment of the present application, time series includes number of targets strong point and historical data point.Moreover, in the time sequence of acquisitionIn column, number of targets strong point and historical data point are arranged according to the chronological order reported.So, time series is to includeAccording to the ordered series of numbers at number of targets strong point and historical data point that the chronological order reported is arranged.
Wherein, whether number of targets strong point is the data point for needing to carry out abnormality detection, that is, needs to detect number of targets strong point differentOften.Historical data point is the data point reported before number of targets strong point.
In one embodiment, number of targets strong point is the data point reported in current time.In another embodiment, meshMark data point is also possible to a specified data point.It is understood that, it is possible to specify need to do the data point conduct of abnormality detectionNumber of targets strong point.
In one embodiment, step S202 comprises determining that number of targets strong point;What acquisition reported before number of targets strong pointHistorical data point;Historical data point and target data point are arranged according to the chronological order reported, obtain time sequenceColumn.
In one embodiment, obtaining the historical data point reported before number of targets strong point includes: to obtain in number of targetsThe historical data point reported in preset duration before calling time on corresponding to strong point.For example, preset duration is 3 hours,The historical data point reported in 3 hours before calling time on corresponding to the number of targets strong point can so be obtained.
In another embodiment, it obtains the historical data point reported before number of targets strong point and comprises determining that number of targetsCall time on corresponding to strong point it is year-on-year on call time, obtain the preset duration and/or later before calling time on year-on-year basisPreset duration in historical data point.
It is understood that, it is assumed that it calls time on corresponding to target data point, is this week interim 14:00, then, on year-on-year basisOn to call time be exactly 14:00 in a upper period.Wherein, a period can be as unit of day, week or moon etc..
For example, being called time on corresponding to number of targets strong point as 14:00 on January 2nd, 2000, preset duration is 3 hours,So, it is assumed that with one day for a period, it is year-on-year on to call time can be 14:00 on January 1st, 2000, the history number of acquisitionStrong point can be the 14:00 and its historical data that reports in 3 hours before and in 3 hours later on January 1st, 2000Point.Similarly, it is assumed that with one week for a period, it is year-on-year on to call time can be 14:00 on December 26th, 1999, acquisitionHistorical data point can 14:00 and its be reported in 3 hours before and in 3 hours later on December 26th, 1999Historical data point.
Fig. 4 is that historical data point chooses schematic diagram in one embodiment.In order to more clearly understand the choosing of historical data pointIt takes, is illustrated now in conjunction with Fig. 4.It calls time in horizontal axis expression in Fig. 4, the longitudinal axis indicates the period, can be with one day referring to Fig. 4Or one week is the period, historical data point can be chosen in such a way that (1) plants ring than selection, i.e. selection number of targets strong pointCorresponding reports the historical data point reported in 180 minutes before time point 402.It can also be planted according to (2) and (3) are plantedIn any one year-on-year mode chosen, (2) kind be with one day be a period, then upper corresponding to number of targets strong pointCall time a little 402 to report time point be 404 on year-on-year basis, then it is available report on year-on-year basis time point 404 and its before and afterHistorical data point in 180 minutes.(3) kind is with one week for a cycle, then giving the correct time on corresponding to number of targets strong pointBetween point 402 to report time point be 406 on year-on-year basis, then it is available report on year-on-year basis time point 406 and its before and after 180 pointsHistorical data point in clock.
S204 carries out primary anomalous identification to time series by primary judgement mode.
It should be noted that primary judgement mode can be different from the machine learning algorithm of supervision.
Wherein, whether primary judgement mode is that primary anomalous identification is carried out to time series, abnormal with time decision sequenceMode.It is appreciated that primary judgement mode is a kind of general term, as long as being different from the machine learning algorithm and energy that have supervisionEnough modes for carrying out primary anomalous identification to time series can be known as primary judgement mode.
Primary judgement mode may include statistical decision algorithm and/or unsupervised algorithm.Statistical decision algorithm, for leading toCross statistical analysis come differentiate time series whether Yi Chang method.Unsupervised algorithm is to not having markd training sample to carry outMachine learning training, to find the algorithm of the structured knowledge of training sample concentration.
S206, when recognize time series it is doubtful abnormal when, then feature extraction is carried out to time series.
Specifically, computer equipment can carry out primary anomalous identification to time series by primary judgement mode, primaryAnomalous identification result includes normal and doubtful exception.It is normal when recognizing time series, then it can not continue subsequent abnormal inspectionSurvey processing.When recognize time series it is doubtful abnormal when, computer equipment then can carry out feature extraction to time series, with rightTime series carries out signature analysis, extracts characteristic.
It is appreciated that computer equipment can carry out feature extraction to time series from multiple dimensions.In one embodimentIn, computer equipment can carry out feature extraction to time series from time-domain dimension and frequency domain dimension.
Time domain (Time domain) is to describe mathematical function or physical signal to the relationship of time.Frequency domain (frequencyDomain) refer to when analyzing function or signal, analyze its part related with frequency, rather than portion related with the timeDivide opposite with one word of time domain.
In one embodiment, step S206 include: when recognize time series it is doubtful abnormal when, then the clock synchronization under time domainBetween the corresponding temporal signatures data of sequential extraction procedures;And/or to time series carry out frequency-domain transform, and under frequency domain to transformation afterTime series extract corresponding frequency domain character data.
Temporal signatures data are the characteristics extracted under time domain.Frequency domain character data are the spies extracted under frequency domainLevy data.
It in one embodiment, include: to time sequence to the corresponding temporal signatures data of time sequential extraction procedures under time domainIt arranges for statistical analysis, obtains statistical nature data;The trend of fit time sequence is distributed, and obtains fit characteristic data;It extractsCharacteristic in time series for classification, obtains characteristic classification data.
It is appreciated that temporal signatures data include in statistic quality evidence, fit characteristic data and characteristic classification data etc.At least one.
Wherein, statistical nature data are obtained characteristics for statistical analysis to time series.Fit characteristic numberAccording to, it is that processing is fitted to the trend distribution of time series, obtained characteristic.
Characteristic classification data refers to the characteristic for indicating to classify belonging to time series.In one embodiment, time sequenceClassification belonging to column includes the shapes such as burr type, leveling style or oscillation type.It is appreciated that the characteristic in time series for classificationAccording to that is, for indicating classification belonging to time series.
In one embodiment, computer equipment can statistic quality by Feature Engineering, in extraction time sequenceAccording to, fit characteristic data and characteristic classification data.Feature Engineering, essence are an engineering activities, it is therefore an objective to from initial dataMiddle extraction characteristic uses for algorithm and/or model.
In one embodiment, computer equipment can be distinguished according to numerical statistic shown in table 1, algorithm or featureExtract statistical nature data, fit characteristic data and characteristic classification data.
Table 1
It is illustrated now in conjunction with table 1.Computer equipment can be by being most worth (maximum value, minimum value to time seriesDeng), mean value, on year-on-year basis, the numerical statistics such as ring ratio, obtain statistical nature data.Computer equipment can pass through various rolling averagesAlgorithm, deep learning algorithm scheduling algorithm are fitted processing to the trend distribution of time series, obtain fit characteristic data.It calculatesMachine equipment can carry out the analysis such as entropy feature, Distribution value feature and wavelet analysis feature to time series, and pass through entropy feature, valueDistribution characteristics and wavelet analysis feature determine classification belonging to time series, obtain characteristic classification data.
It is appreciated that time series is under time domain, and computer equipment can carry out frequency to time series under normal conditionDomain conversion, time series is converted to frequency domain, and extracts corresponding frequency domain character to transformed time series under frequency domainData.
In one embodiment, the time series under time domain can be passed through Fourier transformation by computer equipmentUnder (Fourier Transform) conversion to frequency domain.It is appreciated that Fourier transform is a kind of method for analyzing signal, it canAnalyze the ingredient of signal, it is also possible to these ingredient composite signals.Fourier transformation, for original reluctant time domain will to be analyzedUnder signal ingredient, and by these ingredients synthesis be converted to be easy to analyze frequency domain under signal.I.e. analyze time domain under whenBetween sequence signal component, and the synthesis of these ingredients is converted into signal under frequency domain, the time under frequency domain after being convertedSequence.
It is appreciated that computer equipment can extract at least one extracted in temporal signatures data and frequency domain character dataKind.I.e. computer equipment can only extract temporal signatures data or frequency domain character data, can also both extract temporal signatures dataFrequency domain character data are extracted again.
It should be noted that temporal signatures data are extracted, the feature being able to reflect on time dimension, so that the feature extractedData can more accurately embody the feature of time series.And frequency domain character data can intuitively reflect the spy on frequency domainSign, and be easier to extract compared to temporal signatures data, thus improve feature extraction efficiency.Furthermore it has become apparent that ground, was both extractedTemporal signatures data extract frequency domain character data again, then can be from the feature of various dimensions extraction time sequence, so that the spy extractedIt is more comprehensive to levy data, to improve the accuracy of abnormality detection.
S208 will extract obtained characteristic and input abnormality detection model, abnormal inspection of the output for number of targets strong pointSurvey result;Abnormality detection model is by there is the machine learning algorithm of supervision to be trained to obtain.
Specifically, computer equipment can carry out machine learning training using the machine learning algorithm for having supervision in advance, obtainTo abnormality detection model.It is appreciated that abnormality detection model, is the machine learning model for having exceptional data point detection function.I.e. abnormality detection model can be used for detecting whether number of targets strong point is abnormal.
Computer equipment can input abnormality detection model for the characteristic that feature extraction obtains is carried out to time seriesIn.Computer equipment can be analyzed and processed characteristic by abnormality detection model, and output is for number of targets strong pointAbnormality detection result.
It is appreciated that the abnormality detection result for number of targets strong point includes that number of targets strong point is normal or number of targets strong point is differentOften.
Fig. 5 is the graph-based schematic diagram of abnormality detection result in one embodiment.In order to intuitively understand abnormal inspectionIt surveys as a result, being illustrated now in conjunction with Fig. 5.Fig. 5 carries out abnormality detection processing to a series of target data point and obtainsThe graph-based of abnormality detection result.Referring to Fig. 5, the enclosed data point of circle 502 deviates considerably from normalized curve, that is, illustratesThe data point that 2017-10-19,8:50 are reported is abnormal.So, the data point reported using 8:50 carries out different as target data pointOften when detection, it is assumed that preset duration is 3 hours, then the historical data point within available first 3 hours, by historical data point andTarget data point obtains time series according to reporting the sequencing of time to be arranged.
It is appreciated that computer equipment can be according to for mesh when abnormality detection result includes number of targets strong point exceptionThe abnormality detection result for marking data point, calls corresponding abnormality processing strategy.Abnormality processing strategy is for abnormal number of targetsThe processing method that strong point is taken.
In one embodiment, abnormality processing strategy includes in the abnormal object data point for detecting continuous preset quantityWhen, trigger warning information.
Above-mentioned data exception detection method obtains the history number for including number of targets strong point and reporting before number of targets strong pointThe time series at strong point;Number of targets strong point and historical data point are arranged according to the chronological order reported.Pass through primaryJudgement mode carries out primary anomalous identification to time series, is equivalent to the abnormality detection for carrying out the first level.When recognizing the timeWhen sequence is doubtful abnormal, feature extraction is carried out to time series;Obtained characteristic input will be extracted by there is the machine of supervisionThe abnormality detection model that the training of device learning algorithm obtains, is equivalent to the abnormality detection for carrying out the second level, and output is directed to number of targetsThe abnormality detection result at strong point.The abnormality detection of multi-layer is used, and will differ from the machine learning algorithm of supervisionPrimary judgement mode and there is supervision algorithm to combine, depth is carried out by the supervised learning obtained abnormality detection model of trainingDegree detection, improves the accuracy of abnormality detection result.
In one embodiment, primary judgement mode includes statistical decision algorithm.Step S204 passes through primary judgement modeCarrying out primary anomalous identification to time series includes: that historical data point is extracted from time series;It is true by statistical decision algorithmDetermine the mean value and standard deviation of historical data point;According to mean value and standard deviation, the numerical intervals for meeting random error are determined;Work as targetWhen data point is located at outside numerical intervals, then the doubtful exception of recognition time sequence.
Specifically, computer equipment can extract the historical data point in addition to number of targets strong point from time series, lead toMean value and standard deviation that statistical decision algorithm determines historical data point are crossed, i.e., is averaged to the historical data point of extraction and standardDifference.
In one embodiment, statistical decision algorithm includes three sigma law (three-sigma ruleofthumb).Three sigma laws are also known as Pauta criterion, it is first to assume that one group of detection data contains only random error, rightIt carries out calculation processing and obtains standard deviation, by one section of certain determine the probability, it is believed that all errors more than this section, justIt is not belonging to random error but gross error, the data containing the error should give rejecting.
Three sigma laws specifically: probability of the numeric distribution in (μ-σ, μ+σ) is 0.6827;Numeric distribution is in (μ -2+ 2 σ of σ, μ) in probability be 0.9545;Probability of the numeric distribution in (+3 σ of μ -3 σ, μ) is 0.9973.Wherein, σ represents standardDifference, μ represent mean value.X=μ is the symmetry axis of image.It is appreciated that mean value, is the equal of the historical data point in time seriesValue.Standard deviation is the standard deviation of the historical data point in time series.
Specifically, computer equipment can obtain meeting with chance error according to the difference of mean value and the standard deviation of presupposition multipleOne endpoint of the numerical intervals of difference obtains the numerical value for meeting random error according to the sum of mean value and the standard deviation of presupposition multipleAnother endpoint in section.I.e. computer equipment can meet being used as within the scope of the standard deviation of the positive and negative presupposition multiple of mean valueThe numerical intervals of random error.In one embodiment, presupposition multiple can be any one in one times, two times and three times.It is appreciated that being located at, to meet the error of data within the numerical intervals of random error be random error, then, be located at meet withData within the numerical intervals of chance error difference are normal data, and the data within the numerical intervals for meeting random error are differentRegular data.Therefore, when target data point is located at outside numerical intervals, the computer equipment then doubtful exception of recognition time sequence.
Fig. 6 is the schematic illustration of three sigma laws in one embodiment.Intuitively understand to become apparent from, now in conjunction withFig. 6 is explained.Referring to Fig. 6, probability of the numeric distribution in (μ-σ, μ+σ) is 68.3%;Numeric distribution is in (μ -2 σ, μ+ 2 σ) in probability be 95.5%;Probability of the numeric distribution in (+3 σ of μ -3 σ, μ) is 0.99.7%.Assuming that presupposition multiple is threeTimes, then, when target data point is located at except this section (+3 σ of μ -3 σ, μ), computer equipment then can be with recognition time sequenceDoubtful exception.
It is appreciated that in other embodiments, computer, which is set, to use other statistical decision algorithms to time seriesCarry out primary anomalous identification.
In above-described embodiment, historical data point is extracted from time series;Using statistical decision algorithm, according to historical dataPoint determines the numerical intervals for meeting random error;When target data point is located at outside numerical intervals, then recognition time sequence is doubtfulIt is abnormal.It is equivalent to through statistical means application priori knowledge, to identify that whether doubtful the time series including number of targets strong point is differentOften, it ensure that the accuracy of anomalous identification to a certain extent.In addition, the exception that statistical decision algorithm and supervised learning are obtainedDetection model combines, and realizes the abnormality detection processing of multi-layer, further improves the accuracy of abnormality detection.
In one embodiment, primary judgement mode includes unsupervised algorithm.Step S204 passes through primary judgement mode pairIt includes: each data point in extraction time sequence that time series, which carries out primary anomalous identification,;By unsupervised algorithm to extractionEach data point carries out classification processing;The classification results obtained according to classification processing carry out abnormal decision process to time series;It is differentThe abnormal court verdict that normal decision process obtains, for indicating the whether doubtful exception of time series.
It is to not having markd training sample to carry out machine learning training, to find to instruct as it was noted above, unsupervised algorithmPractice the algorithm of the structured knowledge in sample set.
Specifically, computer equipment can will not have markd training sample to substitute into without prison by preparatory unsupervised algorithmIt superintends and directs in the formula of algorithm, carries out unsupervised machine learning training, adjust in the training process to the parameter of formula, to calculationMethod, which is done, to be optimized.Computer equipment can be with each data point in extraction time sequence, it will be understood that the data point of extraction includes targetData point and historical data point.The data point of extraction can be substituted into the public affairs of the unsupervised algorithm after adjusting parameter by computer equipmentIt is calculated in formula, to carry out classification processing to each data point, obtains classification results.Computer equipment can be tied according to classificationFruit carries out abnormal decision process to time series.
Unsupervised algorithm, including recurrent neural network algorithm (RNN, Recurrent Neural Network), isolate it is gloomyWoods algorithm (Isolation Forest), one-class support vector machine (OneClassSVM, OneClass Support VectorMachine), in exponentially weighted moving average (EWMA) algorithm (EWMA, Exponentially Weighted Moving-Average) etc.At least one.
Wherein, recurrent neural network algorithm (RNN, Recurrent Neural Network) is a kind of for handling sequenceThe neural network algorithm of column data.Its substantive characteristics is that the feedback link of the existing inside between processing unit has feedforward to connect againIt connects.
Isolated forest (Isolation Forest) is that the rapid abnormal based on integrated study (Ensemble) detectsMethod has linear time complexity and high accurancy and precision, is the algorithm for meeting big data processing requirement.
One-class support vector machine (OneClassSVM, OneClass Support Vector Machine), it is using onlyThere is a kind of training sample to carry out the classifier that unsupervised training obtains, the classifier trained will not belong to all of suchOther samples are determined as "no", rather than due to belonging to the another kind of "no" result just returned.
Exponentially weighted moving average (EWMA) algorithm (EWMA, Exponentially Weighted Moving-Average), is oneThe special method of weighted moving average of kind.
It is appreciated that the obtained classification results of different unsupervised algorithms are different.
In one embodiment, when unsupervised algorithm is recurrent neural network algorithm, then number of targets can directly be exportedWhether strong point is abnormal classification results, it will be understood that can be to time series according to the classification results for indicating number of targets strong pointAbnormal decision process is carried out, obtains indicating the whether doubtful abnormal abnormal court verdict of time series.
In one embodiment, when unsupervised algorithm is isolated forest, classification results then include number of targets strong point in orphanThe average path length for the leaf node being located in the tree of vertical forest.So, when the average path length is less than or equal in advanceIf when threshold value, then can be determined that the doubtful exception of time series.Conversely, then may be used when the average path length is greater than preset thresholdTo determine that time series is normal.
In one embodiment, when unsupervised algorithm is one-class support vector machine algorithm, classification results are then number of targetsWhether strong point belongs to normal category, when number of targets strong point is not belonging to normal category, then can be determined that time series is doubtful differentOften, when number of targets strong point belongs to normal category, then it can be determined that time series is normal.
In one embodiment, when unsupervised algorithm is exponentially weighted moving average (EWMA) algorithm, computer equipment can lead toIt crosses exponentially weighted moving average (EWMA) algorithm to be smoothed time series, for the time series after smoothing processing using statisticsParser, determines whether number of targets strong point is located within the scope of random error, if so, determining that time series is normal, if it is not, thenDetermine the doubtful exception of time series.
In above-described embodiment, abnormal decision process is carried out to time series by unsupervised algorithm, quite by unsupervised calculationThe abnormality detection models coupling that method is obtained with supervised learning realizes the abnormality detection processing of multi-layer, improves differentThe accuracy often detected.
In one embodiment, unsupervised algorithm is multiple;This method further include: obtain corresponding to each unsupervised algorithmAbnormal court verdict;The abnormal court verdict according to corresponding to each unsupervised algorithm carries out united detection processing;Work as joint-detectionWhen the result of processing indicates the time series exception, then the doubtful exception of the time series is determined.
In one embodiment, the abnormal court verdict according to corresponding to each unsupervised algorithm carries out united detection processing packetIt includes: when the abnormal court verdict corresponding to any one unsupervised algorithm indicates time series exception, then determining the time sequenceArrange doubtful exception.It is appreciated that each unsupervised algorithm is obtained different since various unsupervised algorithms have the shortcomings that respectiveNormal court verdict all there may be it is not perfect, do not detect abnormal situation, so, by exception corresponding to each unsupervised algorithmCourt verdict carries out cascading judgement, and the abnormal court verdict corresponding to any one unsupervised algorithm indicates that time series is abnormalWhen, then determine the doubtful exception of time series.The abnormal court verdict for comprehensively considering each unsupervised algorithm, can make to the timeThe primary anomalous identification of sequence is more accurate.
In one embodiment, the abnormal court verdict according to corresponding to each unsupervised algorithm carries out united detection processing packetIt includes: determining default weight corresponding to each unsupervised algorithm, according to abnormal court verdict and phase corresponding to each unsupervised algorithmThe default weight answered, determines the result of united detection processing.
Abnormal court verdict corresponding to each unsupervised algorithm include time series exception or time series it is normal bothAny one in situation.Computer set can weight according to each unsupervised algorithm and corresponding abnormal court verdict, reallySecond accounting of the normal abnormal court verdict of the first accounting and time series of the abnormal court verdict for sequence variation of fixing time,First accounting and the second accounting are compared, knot of the abnormal court verdict as united detection processing corresponding to the larger valueFruit.
It is appreciated that the first accounting of the abnormal court verdict when time sequence variation, it is normally different to be greater than time seriesWhen the second accounting of normal court verdict, then time series is used as to united detection processing result extremely.Conversely, when time series is differentFirst accounting of normal abnormal court verdict, less than time series normally the second accounting of abnormal court verdict when, then by whenBetween sequence just frequently as the result of united detection processing.
In order to make it easy to understand, existing illustrate.For example, having 3 kinds of unsupervised algorithm A, B and C, default weight is distinguished accordinglyIt is 0.4,0.4,0.2, the abnormal court verdict that unsupervised algorithm A is obtained is that time series is abnormal, and unsupervised algorithm B obtains differentNormal court verdict is that time series is abnormal, and the abnormal court verdict that unsupervised algorithm C is obtained is that time series is normal, then time sequenceThe first accounting for arranging abnormal abnormal court verdict is then 0.8, and the second accounting of the normal abnormal court verdict of time series is0.2.So, time series can be used as the result of united detection processing by computer equipment extremely.
It is appreciated that the abnormal court verdict according to corresponding to each unsupervised algorithm and accordingly default weight, determine connectionClose detection processing as a result, comprehensive and reasonable contemplation can make to the abnormal court verdict of each unsupervised algorithm to the timeThe primary anomalous identification of sequence is more accurate.
Computer equipment can be according to the whether doubtful exception of result judgement time series of united detection processing.When joint is examinedWhen surveying the result of processing indicates time series exception, computer equipment then determines the doubtful exception of the time series.Further,When the result of united detection processing indicates that time series is normal, computer equipment then can be determined that time series is normal.
It should be noted that computer equipment can combine statistical decision algorithm and at least one unsupervised algorithmPrimary anomalous identification is carried out to time series.
In one embodiment, computer equipment can carry out time series by statistical decision algorithm in the first levelAnomalous identification, after recognizing the doubtful exception of time series, the second level by multiple unsupervised algorithms to time series intoRow united detection processing carries out feature to time series in third layer and mentions after joint-detection determines time series doubtful exceptionIt takes, and there is the machine learning of supervision to train in obtained abnormality detection model the input of the characteristic of extraction and carry out furtherDetection calls abnormality processing strategy when the abnormality detection result of abnormality detection model output number of targets strong point exception.
Fig. 7 is the schematic illustration of data exception detection method in one embodiment.Referring to Fig. 7, time series is successively passed throughThe primary anomalous identification of first layer statistical decision algorithm is crossed, if identification is abnormal, passes through the connection of a variety of unsupervised algorithms of the second layerDetection is closed, if it is determined that the doubtful exception of time series, then carry out feature extraction, and entering third layer has supervisory detection (to pass throughAbnormality detection model is detected), if detecting number of targets strong point exception, abnormality processing strategy can be called.
In above-described embodiment, abnormal court verdict corresponding to each unsupervised algorithm is subjected to cascading judgement, i.e., synthesis is examinedThe abnormal court verdict for considering each unsupervised algorithm can make the primary anomalous identification to time series more accurate.
In one embodiment, this method further includes by there is the machine learning algorithm of supervision training abnormality detection modelStep, specifically includes the following steps: obtaining sample time-series and corresponding label;Wherein, the label of positive sample time seriesLabel for normal labeled, negative sample time series is;Extract the sample characteristics data in sample time-series;RootAccording to sample characteristics data and respective markers, the model parameter of the update for initial machine learning model is iteratively determined;By moreThe model parameter of new model parameter adjustment initial machine learning model, until obtaining abnormal inspection when meeting iteration stopping conditionSurvey model.
It is appreciated that when thering is the machine learning algorithm of supervision to carry out machine learning training, it is used with markd sampleThis time series.Sample time-series are for the time series as training sample.Wherein, the label of positive sample time seriesLabel for normal labeled, negative sample time series is.Sample characteristics data are the characteristics of sample time-seriesAccording to.
In one embodiment, sample database can be pre-set in computer equipment.Sample database is for storing sample numberAccording to.Computer equipment can obtain sample time-series and corresponding label from sample database.
Computer equipment can extract the sample characteristics data in sample time-series, according to sample characteristics data and accordinglyLabel iteratively determines the model parameter of the update for initial machine learning model.Wherein, for initial machine learning modelUpdate model parameter, be the model parameter of the initial machine learning model model parameter to be updated to.It is appreciated that everyIn secondary iterative process, a new model parameter can be all determined, the model parameter by initial machine learning model is needed to be updated toThis new model parameter.This new model parameter, the as model parameter for the update of initial machine learning model.
Computer equipment can be directly according to the model parameter of the model parameter adjustment initial machine learning model of update, i.e.,The model parameter of initial machine learning model is adjusted to the model parameter of determined update, such iteration, untilWhen meeting iteration stopping condition, abnormality detection model is obtained.Mould when i.e. computer equipment can will meet iteration stopping conditionShape parameter obtains abnormality detection model as final model parameter.
It should be noted that model parameter (the model for needing to adjust of initial machine learning model mentioned hereParameter), refer to after last iteration processing updates model parameter and before working as time iterative processing and updating model parameter, initiallyThe model parameter of machine learning model, and it is not limited to the model parameter of the most initial before model parameter updates.
Iteration stopping condition is off the condition that iteration updates model parameter.In one embodiment, iteration stopping itemPart can be the number of iterations and meet default the number of iterations.For example, default the number of iterations is 20 times, then, reach 20 times in iterationAfterwards, so that it may stop iteration.Iteration stopping condition is also possible to model parameter and reaches stable.Model parameter reaches stable, canTo refer to that model parameter does not change or the variation of model parameter is in default variation range.
In one embodiment, the model of the update for initial machine learning model is determined in each iterative processAfter parameter, computer equipment can also first be verified model parameter more new effects, after being verified, then be executed by updateModel parameter adjustment initial machine learning model model parameter the step of.
In above-described embodiment, by there is the machine learning of supervision training to obtain abnormality detection model, using there is the different of supervisionNormal detection model carries out depth detection to the time series after primary anomalous identification, improves the accuracy of abnormality detection.
In one embodiment, this method further includes the steps that verifying model parameter more new effects, specifically includesFollowing steps: after each iteration determines the model parameter updated, the first experimental model and the second experimental model are determined;It is describedThe model parameter of first experimental model is the model parameter of the initial machine learning model before time iteration updates, the second experimental modelModel parameter be the model parameter through excessive secondary the determined update of iteration;It is real that identical experiment data are inputted described first respectivelyTest in model and the second experimental model, export first experimental model the first experimental result and second experimental modelSecond experimental result;When the second experimental result reaches default optimal conditions compared to the first experimental result, then execute by updateModel parameter adjustment initial machine learning model model parameter the step of.
Experimental model is the model for verifying model parameter more new effects.It is appreciated that in iterative processing new every timeBefore beginning, the first experimental model and the second experimental model are all completely the same, and all with the initial machine before time iteration updatesDevice learning model is consistent.After each iteration determines the model parameter updated, the first experimental model of holding is constant, and (i.e. first is realThe model parameter for testing model is the model parameter of the initial machine learning model before time iteration updates), by the update determinedModel parameter is updated to the second experimental model.At this point, the model parameter of the second experimental model is is determined more through excessive secondary iterationNew model parameter.
Computer equipment can input identical experiment data in the first experimental model and the second experimental model respectively, outputFirst experimental result of the first experimental model and the second experimental result of the second experimental model.Computer equipment can be real by firstIt tests result and the second experimental result is compared, when the second experimental result reaches default optimal conditions compared to the first experimental resultWhen, then by the model parameter of initial machine learning model, it is adjusted to the model parameter through excessive secondary the determined update of iteration.
Wherein, optimal conditions are preset, are the conditions that can play optimization function after pre-set model parameter updates.It canTo understand, in the case where meeting default optimal conditions, initial machine will be updated to when time model parameter of the determined update of iterationOptimization function can be played in learning model.
In one embodiment, default optimal conditions may include the accurate of the second experimental result and the first experimental resultDegree, when the accuracy of the second experimental result is higher than the accuracy of the first experimental result, it may be considered that reaching default optimal conditions.It is appreciated that experimental data has pre-set actual result.Computer equipment can test the second experimental result and firstAs a result it is compared respectively with pre-set actual result, to determine the accurate of the second experimental result and the first experimental resultDegree.
In above-described embodiment, in the training process of abnormality detection model for having supervision, pass through the first experimental model andTwo experimental models verify model parameter more new effects, when the second experimental result reaches default compared to the first experimental resultWhen optimal conditions, then the step of executing the model parameter by the model parameter adjustment initial machine learning model of update.It avoidsThe wasting of resources caused by the update being not necessarily to, meanwhile, the validity of model training is demonstrated, also convenient for model trainingOptimization.
Fig. 8 is the technological frame figure of data exception detection method in one embodiment.It mainly include offline mould referring to Fig. 8Type training, model modification compliance test result and online three parts of abnormality detection.
For this part of off-line model training, abnormality detection model is obtained for training.It, can during off-line trainingTo obtain the data for making the data of training sample, and will acquire from the database of storing data by statistical decision algorithmPrimary anomalous identification is carried out with unsupervised algorithm, is then introduced into sample database as training sample, it manually can be according to primary differentCommon sense is not as a result, mark training sample, manually to add respective markers.The sample of training sample is extracted by Feature EngineeringEigen data, and the machine for having supervision is carried out using there is supervision algorithm according to the sample characteristics data and respective markers of extractionLearning training obtains abnormality detection model.
For this part of model modification compliance test result, in the training process of abnormality detection model, A, B reality can be passed throughModel is tested to verify model modification effect, in each repetitive exercise abnormality detection model, if verifying more new effects reach optimizationCondition is then iterated update.
For this part of online abnormality detection, data extraction can be carried out, to extract the time for including number of targets strong pointThen sequence passes sequentially through statistical decision algorithm and carries out primary anomalous identification and multiple unsupervised algorithm joint-detections, works as outputWhen time series is doubtful abnormal, then pass through the characteristic of Feature Engineering extraction time sequence, and it is (i.e. different to be loaded with monitor modelNormal detection model), it carries out abnormality detection.It should be noted that passing through the characteristic of Feature Engineering extraction time sequence and addingThe step of being loaded with monitor model does not limit sequencing, can also first be loaded with monitor model, then extract by Feature EngineeringThe characteristic of time series.It is appreciated that can be number of targets when exporting the abnormality detection result of number of targets strong point exceptionAbnormal marking is added at strong point automatically, and is updated to sample database.After adding abnormal marking automatically for number of targets strong point, it can pass throughManual examination and verification after the approval, then are updated to sample database.
In one embodiment, number of targets strong point is the data point reported in current time;This method further include: work as exceptionTesting result is then to carry out exception record in the number of targets strong point exception that current time reports;It is getting in future timeAfter the data point reported, the number of targets strong point that the data point reported in future time is reported as current time again, and returnThe step of returning acquisition time sequence is to continue to execute, until triggering when the abnormal object data point of continuous preset quantity is recordedWarning information.
It should be noted that future time, refers to next time for reported data point.For example, reporting per minuteData point, current time 8:52, then, future time is then 8:53.
Specifically, computer equipment is being got after the data point that future time reports, then, future time is as newCurrent time, the number of targets that computer equipment can report the data point reported in future time as current time againStrong point, i.e., new number of targets strong point, return step S202 is to continue to execute.It includes new target that i.e. computer equipment is availableThe time series of data point and the historical data point reported before the new number of targets strong point.Similarly, in the time seriesNew number of targets strong point and the historical data point reported before the new number of targets strong point, according to the chronological order reportedIt is arranged.Computer equipment can continue to execute step S204~S208, for the time series reacquired to obtain needleTo the abnormality detection result at new number of targets strong point.When abnormality detection result be new number of targets strong point exception when, then continue intoRow exception record.Similarly, it is getting after the data point that future time reports, is continuing to repeat above step, until continuousWhen the abnormal object data point of preset quantity is recorded, warning information is triggered.
Wherein, the abnormal object data point of continuous preset quantity is recorded, refer to be recorded report it is continuous in timeThe abnormal object data point of preset quantity.For example, a data point is reported per minute, preset quantity 3, then, if being recordedWhen the number of targets strong point that time 8:52,8:53 and 8:54 are reported is all abnormal, then continuous 3 abnormal object data are recorded in explanationPoint can then trigger warning information.It should be noted that hypothetical record is to the target reported in time 8:52,8:53 and 8:55Data point is abnormal, and the number of targets strong point that 8:54 is reported is normal, then, the target data reported due to 8:52,8:53 and 8:55Point is on upper call time and discontinuous, and a 8:54 has been lacked in centre, so, the number of targets of continuous 3 exceptions is not just recordedWarning information is not triggered at strong point then.
Wherein, warning information is the prompt information for reporting, reflecting by exceptional data point.It is appreciated that alarmInformation can be shown by least one of forms such as text, voice, video and figure.
In one embodiment, warning information includes single order warning information and second order warning information.Single order warning information isThe warning information on basis, second order warning information are used for the advanced detailed warning information of displaying after to the triggering of single order warning information.
Fig. 9 is the interface schematic diagram of warning information in one embodiment.It is appreciated that Fig. 9 is the boundary of single order warning informationFace schematic diagram.As shown in figure 9, normal curve is all significantly deviateed at the number of targets strong point in dotted line frame 902, so being all differentOften, it is equivalent to the abnormal object data point that continuous preset quantity is recorded, then triggers single order warning information shown in Fig. 9, this oneIt include abnormal displaying chart and character introduction in rank warning information, for example " at time point: 2018-06-19 14:35 occurs differentIt often " is character introduction, point at the beginning of being abnormal for introduction.There are one view chained address, users couple in Fig. 9After the view chained address is triggered, the displaying interface of second order warning information can be entered, second order warning information can pass throughThe form of view shows detailed warning information.Wherein, view refers to the view in Computer Database, is a Virtual table,Its content is by query-defined.The same with true table, view includes a series of denominative columns and rows data of bands.
In above-described embodiment, the data point newly reported is cyclically detected according to data exception detection method whether extremely, whenWhen the abnormal object data point of continuous preset quantity is recorded, warning information is triggered, safety is improved.In addition, one once in a whileAbnormal object data point may be fortuitous event, and may not need alarm there is no greater risk, and continuous preset quantity is recordedCompared to one abnormal object data point of abnormal object data point for, risk is bigger, at this time trigger warning information moreGround is accurate.
As shown in Figure 10, in one embodiment, a kind of data exception detection device 1000 is provided, the device 1000 packetIt includes: obtaining module 1002, primary judging module 1004, characteristic extracting module 1006 and abnormality detection module 1008, in which:
Module 1002 is obtained, acquisition time sequence is used for;Including number of targets strong point and at number of targets strong point in time seriesThe historical data point reported before;Number of targets strong point and historical data point are arranged according to the chronological order reported.
Primary judging module 1004, for carrying out primary anomalous identification to time series by primary judgement mode.
Characteristic extracting module 1006, for when recognize time series it is doubtful abnormal when, then feature is carried out to time seriesIt extracts.
Abnormality detection module 1008 inputs abnormality detection model for that will extract obtained characteristic, and output is directed to meshMark the abnormality detection result of data point;Abnormality detection model is by there is the machine learning algorithm of supervision to be trained to obtain.
In one embodiment, primary judgement mode includes statistical decision algorithm;Primary judging module 1004 be also used to fromHistorical data point is extracted in time series;The mean value and standard deviation of historical data point are determined by statistical decision algorithm;According to equalValue and standard deviation determine the numerical intervals for meeting random error;When target data point is located at outside numerical intervals, then recognition timeThe doubtful exception of sequence.
In one embodiment, primary judgement mode includes unsupervised algorithm;Primary judging module 1004 is also used to extractEach data point in time series;Classification processing is carried out by each data point of the unsupervised algorithm to extraction;According to classification processingObtained classification results carry out abnormal decision process to time series;The abnormal court verdict that abnormal decision process obtains, is used forIndicate the whether doubtful exception of time series.
In one embodiment, unsupervised algorithm is multiple;Primary judging module 1004 is also used to obtain each unsupervised calculationAbnormal court verdict corresponding to method;The abnormal court verdict according to corresponding to each unsupervised algorithm carries out united detection processing;When the result of united detection processing indicates time series exception, then the doubtful exception of time series is determined.
In one embodiment, characteristic extracting module 1006 be also used to when recognize time series it is doubtful abnormal when, then existTo the corresponding temporal signatures data of time sequential extraction procedures under time domain;And/or frequency-domain transform is carried out to time series, and in frequency domainUnder corresponding frequency domain character data are extracted to transformed time series.
In one embodiment, characteristic extracting module 1006 is also used to for statistical analysis to time series, is countedCharacteristic;The trend of fit time sequence is distributed, and obtains fit characteristic data;Feature in extraction time sequence for classificationData obtain characteristic classification data.
As shown in figure 11, in one embodiment, the device 1000 further include:
Model training module 1007, for obtaining sample time-series and corresponding label;Wherein, positive sample time seriesLabel be that the label of negative sample time series is;Extract the sample characteristics in sample time-seriesData;According to sample characteristics data and respective markers, the model ginseng for the update of initial machine learning model is iteratively determinedNumber;By the model parameter of the model parameter adjustment initial machine learning model of update, until being obtained when meeting iteration stopping conditionAbnormality detection model.
In one embodiment, model training module 1010 is also used to determine the model parameter updated in each iterationAfterwards, the first experimental model and the second experimental model are determined;The model parameter of first experimental model is initial before time iteration updatesThe model parameter of machine learning model, the model parameter of the second experimental model is through the model of excessive secondary the determined update of iteration ginsengNumber;Identical experiment data are inputted respectively in the first experimental model and the second experimental model, the first of the first experimental model of outputSecond experimental result of experimental result and the second experimental model;When the second experimental result reaches default compared to the first experimental resultWhen optimal conditions, then by the model parameter of the model parameter adjustment initial machine learning model updated.
In one embodiment, number of targets strong point is the data point reported in current time;Device 1000 further include:
Alarm module (not shown), for being different at the number of targets strong point that current time reports when abnormality detection resultChang Shi then carries out exception record;It is getting after the data point that future time reports, the data point that will be reported in future timeAgain the number of targets strong point reported as current time, and the step of returning to acquisition time sequence to be to continue to execute, until recordTo continuous preset quantity abnormal object data point when, trigger warning information.
Figure 12 is the schematic diagram of internal structure of computer equipment in one embodiment.Referring to Fig.1 2, which canTo be abnormality detecting apparatus 120 shown in Fig. 1.It is appreciated that computer equipment is also possible to terminal.The computer equipment packetInclude processor, memory and the network interface connected by system bus.Wherein, memory include non-volatile memory medium andBuilt-in storage.The non-volatile memory medium of the computer equipment can storage program area and computer program.The computer journeySequence is performed, and processor may make to execute a kind of data exception detection method.The processor of the computer equipment is for providingCalculating and control ability, support the operation of entire computer equipment.Computer program can be stored in the built-in storage, the calculatingWhen machine program is executed by processor, processor may make to execute a kind of data exception detection method.The network of computer equipment connectsMouth is for carrying out network communication.
It will be understood by those skilled in the art that structure shown in Figure 12, only part relevant to application schemeThe block diagram of structure, does not constitute the restriction for the computer equipment being applied thereon to application scheme, and specific computer is setStandby may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, data exception detection device provided by the present application can be implemented as a kind of computer programForm, computer program can be run in computer equipment as shown in figure 12, the non-volatile memory medium of computer equipmentThe each program module for forming the data exception detection device can be stored, for example, acquisition module 1002 shown in Fig. 10, primary are sentencedCertainly module 1004, characteristic extracting module 1006 and abnormality detection module 1008.Computer journey composed by each program moduleSequence is for executing the computer equipment in the data exception detection method of each embodiment of the application described in this specificationThe step of, for example, computer equipment can pass through the acquisition module in data exception detection device 1000 as shown in Figure 101002 acquisition time sequences;The historical data point for including number of targets strong point in time series and being reported before number of targets strong point;Number of targets strong point and historical data point are arranged according to the chronological order reported.Computer equipment can be sentenced by primaryCertainly module 1004 carries out primary anomalous identification to time series by primary judgement mode.Computer equipment can be mentioned by featureModulus block 1006 when recognize time series it is doubtful abnormal when, then feature extraction is carried out to time series.Computer equipment can be withAbnormality detection model is inputted by obtained characteristic is extracted by abnormality detection module 1008, output is for number of targets strong pointAbnormality detection result;Abnormality detection model is by there is the machine learning algorithm of supervision to be trained to obtain.
A kind of computer equipment, including memory and processor are stored with computer program, computer program in memoryWhen being executed by processor, so that processor executes each step in the data exception detection method as described in the application any embodimentSuddenly.
A kind of storage medium being stored with computer program, when the computer program is executed by processor, so that processingDevice executes each step in the data exception detection method as described in the application any embodiment.
It should be understood that although each step in each embodiment of the application is not necessarily to indicate according to step numbersSequence successively execute.Unless expressly stating otherwise herein, there is no stringent sequences to limit for the execution of these steps, theseStep can execute in other order.Moreover, in each embodiment at least part step may include multiple sub-steps orMultiple stages, these sub-steps or stage are not necessarily to execute completion in synchronization, but can be at different timesExecute, these sub-steps perhaps the stage execution sequence be also not necessarily successively carry out but can with other steps or itsThe sub-step or at least part in stage of its step execute in turn or alternately.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be withRelevant hardware is instructed to complete by computer program, the program can be stored in a non-volatile computer and can be readIn storage medium, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, provided hereinEach embodiment used in any reference to memory, storage, database or other media, may each comprise non-volatileAnd/or volatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiledJourney ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory(RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM(SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM(ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straightConnect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned realityIt applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not depositedIn contradiction, all should be considered as described in this specification.
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneouslyIt cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the artIt says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the inventionRange.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.