Movatterモバイル変換


[0]ホーム

URL:


CN115587009A - Cloud platform time sequence data anomaly detection method, system, equipment and medium - Google Patents

Cloud platform time sequence data anomaly detection method, system, equipment and medium
Download PDF

Info

Publication number
CN115587009A
CN115587009ACN202211206362.7ACN202211206362ACN115587009ACN 115587009 ACN115587009 ACN 115587009ACN 202211206362 ACN202211206362 ACN 202211206362ACN 115587009 ACN115587009 ACN 115587009A
Authority
CN
China
Prior art keywords
data
abnormal
sample
normal
cloud platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211206362.7A
Other languages
Chinese (zh)
Inventor
苏海明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Jinan data Technology Co ltd
Original Assignee
Inspur Jinan data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Jinan data Technology Co ltdfiledCriticalInspur Jinan data Technology Co ltd
Priority to CN202211206362.7ApriorityCriticalpatent/CN115587009A/en
Publication of CN115587009ApublicationCriticalpatent/CN115587009A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明提出了一种云平台时序数据异常检测方法、系统、设备和介质,该方法包括:获取云平台原始监控数据,采用滑动窗口进行分割后标注为样本数据,样本数据包括正常样本和异常样本;然后对样本数据进行预处理;对预处理后的样本数据经过EMD处理为各模态分量,针对所述各模态分量通过滤波器进行平滑去噪处理得到平滑后的时间序列构建正常数据集;提取正常数据集的特征向量,并将特征向量输入至异常数据检测模型中输出异常检测数据;基于该方法,还提出了一种云平台时序数据异常检测系统、设备和介质。本发明采用一种监督机器学习方法,在一定程度上解除了异常数据人工标注的麻烦,也解决了异常正常数据不平衡对监督学习的影响。

Figure 202211206362

The present invention proposes a method, system, device and medium for abnormal detection of time series data on a cloud platform. The method includes: obtaining the original monitoring data of the cloud platform, dividing it by a sliding window and marking it as sample data, and the sample data includes normal samples and abnormal samples ; Then the sample data is preprocessed; the preprocessed sample data is processed into each modal component through EMD, and the smoothing and denoising processing is carried out for each modal component through a filter to obtain a smoothed time series to construct a normal data set ; Extract the feature vector of the normal data set, and input the feature vector into the abnormal data detection model to output the abnormal detection data; based on this method, a cloud platform time series data abnormal detection system, equipment and medium are also proposed. The invention adopts a supervised machine learning method, which relieves the trouble of manual labeling of abnormal data to a certain extent, and also solves the influence of abnormal and normal data imbalance on supervised learning.

Figure 202211206362

Description

Translated fromChinese
一种云平台时序数据异常检测方法、系统、设备和介质A method, system, device, and medium for abnormal detection of time-series data on a cloud platform

技术领域technical field

本发明属于数据检测技术领域,特别涉及一种云平台时序数据异常检测方法、系统、设备和介质。The invention belongs to the technical field of data detection, and in particular relates to a method, system, device and medium for abnormal detection of time series data on a cloud platform.

背景技术Background technique

云平台也称为云计算平台,是指基于硬件资源和软件资源的服务,提供计算、网络和存储能力。云平台监控系统不间断的从平台中采集大量时序KPI,例如CPU使用率,网络吞吐量等,来判断平台的运行状态。随着云平台的越来越成熟,平台规模也从最初的几台、十几台上升到几百台甚至上千台,同时平台自身的服务也越来越多、调用也越来越复杂,使监控数据具有海量且复杂的特点。传统云平台中针对KPI的异常检测大多采用阈值的方式,即运维人员根据经验设置阈值,当KPI数据达到此阈值时,产生异常告警。但实际的应用过程中发现,阈值设置太依赖于经验,很难全面的对繁杂的KPI准确设置阈值;阈值太高,对异常情况产生漏报,质量隐患难以发现,阈值太低,往往引发告警风暴,干扰运维人员的判断;另外对于一些数据抖动但低于阈值的情况,这种方式无法检测,产生漏报。Cloud platform, also known as cloud computing platform, refers to services based on hardware resources and software resources, providing computing, network and storage capabilities. The cloud platform monitoring system continuously collects a large number of timing KPIs from the platform, such as CPU usage, network throughput, etc., to judge the operating status of the platform. As the cloud platform becomes more and more mature, the scale of the platform has also increased from the initial few or a dozen to hundreds or even thousands. Make the monitoring data have massive and complex characteristics. In traditional cloud platforms, KPI anomaly detection mostly adopts the threshold method, that is, the operation and maintenance personnel set the threshold based on experience, and when the KPI data reaches this threshold, an abnormal alarm is generated. However, in the actual application process, it is found that the threshold setting is too dependent on experience, and it is difficult to accurately set the threshold for complicated KPIs; if the threshold is too high, abnormal situations will be missed, and quality risks are difficult to find; if the threshold is too low, alarms will often be triggered. Storms interfere with the judgment of operation and maintenance personnel; in addition, for some data jitter but below the threshold, this method cannot be detected, resulting in false negatives.

单纯基于阈值来判定异常的方式无法满足云平台中异常检测的需求,这就要求我们使用机器学习和数据挖掘技术进行自动异常检测。但在真实云平台监控中,异常发生概率比较低,难以积累大量异常样本,这就导致在机器学习流程框架中需要具备自动化构建样本的能力,产生足够的数据以支持异常检测模型的训练。另外由于时序监控数据复杂,有时序数据的特点,即依赖于时间变化,用数值反应变化程度,这种变化可以有多种模式,常见的时序数据模式有平稳型、波动型和周期型。在监控系统中数据模式随业务改变而发生变化,这就要求在针对时序监控数据的异常检测中,检测模型有足够的泛化能力在不同的数据模式下检测出异常。The way of judging anomalies based solely on thresholds cannot meet the needs of anomaly detection in cloud platforms, which requires us to use machine learning and data mining technologies for automatic anomaly detection. However, in real cloud platform monitoring, the probability of abnormal occurrence is relatively low, and it is difficult to accumulate a large number of abnormal samples. This leads to the need to have the ability to automatically construct samples in the machine learning process framework, and generate enough data to support the training of the abnormal detection model. In addition, due to the complexity of time-series monitoring data and the characteristics of time-series data, that is, it depends on time changes and uses numerical values to reflect the degree of change. This change can have multiple modes. The common time-series data modes are stationary, fluctuating, and periodic. In the monitoring system, the data pattern changes with the change of business, which requires that in the anomaly detection of time series monitoring data, the detection model has sufficient generalization ability to detect anomalies under different data patterns.

发明内容Contents of the invention

为了解决上述技术问题,本发明提出了一种云平台时序数据异常检测方法、系统、设备和介质。在一定程度上解除了异常数据人工标注的麻烦,也解决了异常正常数据不平衡对监督学习的影响。In order to solve the above technical problems, the present invention proposes a method, system, device and medium for abnormal detection of time series data on a cloud platform. To a certain extent, it relieves the trouble of manual labeling of abnormal data, and also solves the impact of abnormal and normal data imbalance on supervised learning.

为实现上述目的,本发明采用以下技术方案:To achieve the above object, the present invention adopts the following technical solutions:

一种云平台时序数据异常检测方法,包括以下步骤:A cloud platform time series data anomaly detection method, comprising the following steps:

获取云平台原始监控数据,采用滑动窗口进行分割后标注为样本数据,所述样本数据包括正常样本和异常样本;然后对所述样本数据进行预处理;Obtaining the original monitoring data of the cloud platform, using a sliding window to segment and mark as sample data, the sample data includes normal samples and abnormal samples; and then preprocessing the sample data;

对预处理后的样本数据经过EMD处理为各模态分量,针对所述各模态分量通过滤波器进行平滑去噪处理得到平滑后的时间序列构建正常数据集;The preprocessed sample data is processed into each modal component through EMD, and smoothing and denoising processing is carried out for each modal component through a filter to obtain a smoothed time series to construct a normal data set;

提取正常数据集的特征向量,并将所述特征向量输入至异常数据检测模型中输出异常检测数据;所述异常数据检测模型采用构建最小超球面将所述正常数据集中正常数据圈出来,输出除正常数据之外的异常检测数据。Extract the feature vector of the normal data set, and input the feature vector into the abnormal data detection model to output the abnormal detection data; the abnormal data detection model uses the construction of the minimum hypersphere to circle the normal data in the normal data set, and outputs except Anomaly detection data other than normal data.

进一步的,所述获取云平台原始监控数据,采用滑动窗口进行分割后标注为样本数据的过程包括:Further, the process of obtaining the original monitoring data of the cloud platform, using a sliding window to segment and labeling as sample data includes:

采集云平台一段时间的监控数据作为原始监控数据,并设置采集周期;Collect the monitoring data of the cloud platform for a period of time as the original monitoring data, and set the collection cycle;

采用滑动窗口进行分割后滤除掉所述原始监控数据中的缺失点;Using a sliding window to segment and filter out missing points in the original monitoring data;

为原始监控数据增加数据标签,其中正常样本标签为0、异常样本标签为1。Add data labels to the original monitoring data, where the normal sample label is 0 and the abnormal sample label is 1.

进一步的,对所述样本数据进行预处理的过程为:Further, the process of preprocessing the sample data is:

确定样本数据为x={x1,x2,...,xm};Determine the sample data as x={x1 ,x2 ,...,xm };

采用公式

Figure BDA0003873088360000021
进行归一化处理;use the formula
Figure BDA0003873088360000021
Perform normalization processing;

其中,xi表示第i个样本数据,i=1,2...m;x'i为归一化处理后的样本数据。Wherein, xi represents the i-th sample data, i=1, 2...m; x'i is the sample data after normalization processing.

进一步的,所述对预处理后的样本数据经过EMD处理为各模态分量的过程包括:Further, the process of processing the preprocessed sample data into various modal components through EMD includes:

所有归一化处理后的样本数据构成KPI时序数据X(n);将KPI时序数据X(n)经过EMD处理为各模态分量和余量的和:All normalized sample data constitute the KPI time series data X(n); the KPI time series data X(n) is processed by EMD into the sum of each modal component and residual:

Figure BDA0003873088360000022
Figure BDA0003873088360000022

其中,Ci(n)为第i个IMF分量,N为IMF总数,Rn为余量,n为数据样本长度。Among them, Ci (n) is the i-th IMF component, N is the total number of IMFs, Rn is the margin, and n is the data sample length.

进一步的,所述对所述各模态分量通过SG滤波器进行平滑去噪处理得到平滑后的时间序列的过程为:Further, the process of smoothing and denoising the modal components to obtain a smoothed time series through the SG filter is as follows:

Figure BDA0003873088360000031
Figure BDA0003873088360000031

其中,X'(n)为最终得到的平滑后的时间序列;F为SG滤波器。Among them, X'(n) is the final smoothed time series; F is the SG filter.

进一步的,提取正常数据集的特征向量的过程包括:提取正常数据集的基本统计特征、时域特征和频域特征;基本统计特征、时域特征和频域特征构成了正常数据集的特征向量;Further, the process of extracting the feature vector of the normal data set includes: extracting the basic statistical features, time domain features and frequency domain features of the normal data set; the basic statistical features, time domain features and frequency domain features constitute the feature vector of the normal data set ;

所述基本统计特征包括均值、方差、极值、波段和功率谱特征;The basic statistical features include mean value, variance, extremum, band and power spectrum features;

所述时域特征包括均值、方差、极值、过零点、边界点、波段长短和峰值特征;The time-domain features include mean value, variance, extremum, zero-crossing point, boundary point, band length and peak feature;

所述频域特征包括功率谱,功率密度比,中值频率和平均功率频率特征。The frequency domain features include power spectrum, power density ratio, median frequency and average power frequency features.

进一步的,所述异常数据检测模型采用OneClassSVM模型。Further, the abnormal data detection model adopts the OneClassSVM model.

本发明还提出了一种云平台时序数据异常检测系统,所述系统包括预处理模块、分解去噪模块和检测模块;The present invention also proposes a cloud platform timing data anomaly detection system, the system includes a preprocessing module, a decomposition and denoising module and a detection module;

所述预处理模块用于获取云平台原始监控数据,采用滑动窗口进行分割后标注为样本数据,所述样本数据包括正常样本和异常样本;然后对所述样本数据进行预处理;The preprocessing module is used to obtain the original monitoring data of the cloud platform, which is marked as sample data after being segmented by a sliding window, and the sample data includes normal samples and abnormal samples; then the sample data is preprocessed;

所述分解去噪模块用于对预处理后的样本数据经过EMD处理分解为各模态分量,针对所述各模态分量通过滤波器进行平滑去噪处理得到平滑后的时间序列构建正常数据集;The decomposition and denoising module is used to decompose the preprocessed sample data into various modal components through EMD processing, and perform smoothing and denoising processing on the various modal components to obtain a smoothed time series to construct a normal data set ;

所述检测模块用于提取正常数据集的特征向量,并将所述特征向量输入至异常数据检测模型中输出异常检测数据;所述异常数据检测模型采用构建最小超球面将所述正常数据集中正常数据圈出来,输出除正常数据之外的异常检测数据。The detection module is used to extract the feature vector of the normal data set, and input the feature vector into the abnormal data detection model to output the abnormal detection data; the abnormal data detection model adopts the construction of the minimum hypersphere to normalize the normal data set The data is circled, and the abnormality detection data other than the normal data is output.

本发明还提出了一种设备,包括:The invention also proposes a device comprising:

存储器,用于存储计算机程序;memory for storing computer programs;

处理器,用于执行所述计算机程序时实现所述的方法步骤。A processor configured to implement the steps of the method when executing the computer program.

本发明还提出了一种可读存储介质,所述可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如所述的方法步骤。The present invention also proposes a readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method described above are implemented.

发明内容中提供的效果仅仅是实施例的效果,而不是发明所有的全部效果,上述技术方案中的一个技术方案具有如下优点或有益效果:The effects provided in the summary of the invention are only the effects of the embodiments, rather than all the effects of the invention. One of the above technical solutions has the following advantages or beneficial effects:

本发明提出了一种云平台时序数据异常检测方法、系统、设备和介质,该方法包括:获取云平台原始监控数据,采用滑动窗口进行分割后标注为样本数据,样本数据包括正常样本和异常样本;然后对样本数据进行预处理;对预处理后的样本数据经过EMD处理为各模态分量,针对所述各模态分量通过滤波器进行平滑去噪处理得到平滑后的时间序列构建正常数据集;提取正常数据集的特征向量,并将特征向量输入至异常数据检测模型中输出异常检测数据;异常数据检测模型采用构建最小超球面将所述正常数据集中正常数据圈出来,输出除正常数据之外的异常检测数据。基于一种云平台时序数据异常检测方法,还提出了一种云平台时序数据异常检测系统、设备和介质。本发明通过EMD与SG滤波对样本集进行过滤,构建正常样本数据集,之后提取正常样本数据集的基本统计特征,时域特征与频域特征,构建数据特征向量,将构建出的特征向量输入OneClassSVM,采用一种监督机器学习方法,在一定程度上解除了异常数据人工标注的麻烦,也解决了异常正常数据不平衡对监督学习的影响。The present invention proposes a method, system, device and medium for abnormal detection of time series data on a cloud platform. The method includes: obtaining the original monitoring data of the cloud platform, dividing it by a sliding window and marking it as sample data, and the sample data includes normal samples and abnormal samples ; Then the sample data is preprocessed; the preprocessed sample data is processed into each modal component through EMD, and the smoothing and denoising processing is carried out for each modal component through a filter to obtain a smoothed time series to construct a normal data set ; extract the feature vector of the normal data set, and input the feature vector into the abnormal data detection model to output the abnormal detection data; the abnormal data detection model uses the construction of the minimum hypersphere to circle the normal data in the normal data set, and outputs the normal data except the normal data Out-of-the-box anomaly detection data. Based on a cloud platform time series data anomaly detection method, a cloud platform time series data anomaly detection system, equipment and media are also proposed. The present invention filters the sample set through EMD and SG filtering, constructs a normal sample data set, and then extracts the basic statistical features, time domain features and frequency domain features of the normal sample data set, constructs a data feature vector, and inputs the constructed feature vector OneClassSVM, using a supervised machine learning method, relieves the trouble of manual labeling of abnormal data to a certain extent, and also solves the impact of abnormal and normal data imbalance on supervised learning.

附图说明Description of drawings

如图1为本发明实施例1提供的一种云平台时序数据异常检测方法流程图;Figure 1 is a flow chart of a method for detecting anomalies in time series data on a cloud platform provided by Embodiment 1 of the present invention;

如图2为本发明实施例2提供的一种云平台时序数据异常检测系统示意图;Figure 2 is a schematic diagram of a cloud platform timing data anomaly detection system provided in Embodiment 2 of the present invention;

如图3为本发明实施例3提供的一种电子设备连接示意图。FIG. 3 is a schematic diagram of connection of an electronic device provided by Embodiment 3 of the present invention.

具体实施方式detailed description

为能清楚说明本方案的技术特点,下面通过具体实施方式,并结合其附图,对本发明进行详细阐述。下文的公开提供了许多不同的实施例或例子用来实现本发明的不同结构。为了简化本发明的公开,下文中对特定例子的部件和设置进行描述。此外,本发明可以在不同例子中重复参考数字和/或字母。这种重复是为了简化和清楚的目的,其本身不指示所讨论各种实施例和/或设置之间的关系。应当注意,在附图中所图示的部件不一定按比例绘制。本发明省略了对公知组件和处理技术及工艺的描述以避免不必要地限制本发明。In order to clearly illustrate the technical features of this solution, the present invention will be described in detail below through specific implementation modes and in conjunction with the accompanying drawings. The following disclosure provides many different embodiments or examples for implementing different structures of the present invention. To simplify the disclosure of the present invention, components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in different instances. This repetition is for the purpose of simplicity and clarity and does not in itself indicate a relationship between the various embodiments and/or arrangements discussed. It should be noted that components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and processes are omitted herein to avoid unnecessarily limiting the present invention.

实施例1Example 1

本发明实施例1提出了一种云平台时序数据异常检测方法,适用于多种云平台基础架构的性能优化方案,例如x86,arm及mips平台均可适用。如图1为本发明实施例1一种云平台时序数据异常检测方法流程图。Embodiment 1 of the present invention proposes a cloud platform timing data anomaly detection method, which is applicable to performance optimization schemes of various cloud platform infrastructures, such as x86, arm and mips platforms. FIG. 1 is a flow chart of a method for detecting anomalies in time series data on a cloud platform according to Embodiment 1 of the present invention.

在步骤S100中,获取云平台原始监控数据,采用滑动窗口进行分割后标注为样本数据,所述样本数据包括正常样本和异常样本;然后对所述样本数据进行预处理。In step S100 , the original monitoring data of the cloud platform is acquired, segmented by using a sliding window and marked as sample data, the sample data includes normal samples and abnormal samples; and then the sample data is preprocessed.

获取云平台原始监控数据,采用滑动窗口进行分割后标注为样本数据的过程包括:采集云平台一段时间的监控数据作为原始监控数据,并设置采集周期;采用滑动窗口进行分割后滤除掉所述原始监控数据中的缺失点;为原始监控数据增加数据标签,其中正常样本标签为0、异常样本标签为1。The process of obtaining the original monitoring data of the cloud platform, segmenting it with a sliding window and marking it as sample data includes: collecting the monitoring data of the cloud platform for a period of time as the original monitoring data, and setting the collection period; Missing points in the original monitoring data; add data labels to the original monitoring data, where the normal sample label is 0 and the abnormal sample label is 1.

本发明中使用的训练数据为云平台四个月的监控数据,监控数据采集工具为telegraf,使用其自带插件cpu、disk、mem采集,采集周期为60s。采集的数据包含CPU,内存等关键性能指标的监控数据。异常点使用机器识别与人工判定的方式标注。基于滑动窗口的方式对数据进行分割,滤除数据中的缺失点,并添加数据标签,正常样本标签为0,异常样本标签为1。共提取出样本数据总数为21542,其中正常样本总数为20074,异常样本数为1468。The training data used in the present invention is the monitoring data of four months on the cloud platform, and the monitoring data collection tool is telegraf, which uses its own plug-in cpu, disk, and mem to collect, and the collection period is 60s. The collected data includes monitoring data of key performance indicators such as CPU and memory. Outliers are marked by machine recognition and manual judgment. Segment the data based on the sliding window method, filter out the missing points in the data, and add data labels, the normal sample label is 0, and the abnormal sample label is 1. A total of 21,542 sample data were extracted, of which 20,074 were normal samples and 1,468 were abnormal samples.

对样本数据进行预处理的过程为:The process of preprocessing the sample data is:

确定样本数据为x={x1,x2,...,xm};Determine the sample data as x={x1 ,x2 ,...,xm };

采用公式

Figure BDA0003873088360000051
进行归一化处理;use the formula
Figure BDA0003873088360000051
Perform normalization processing;

其中,xi表示第i个样本数据,i=1,2...m;x'i为归一化处理后的样本数据。Wherein, xi represents the i-th sample data, i=1, 2...m; x'i is the sample data after normalization processing.

变换后得到无量纲的时序数据范围为[0,1]。本发明所采用的数据都经过统一步骤的预处理后再作为算法模型的输入数据,以保证数据的标准化处理。After transformation, the range of dimensionless time series data is [0,1]. The data used in the present invention are all pre-processed in a unified step and then used as input data of the algorithm model to ensure the standardized processing of the data.

在步骤S110中,对预处理后的样本数据经过EMD处理为各模态分量,针对所述各模态分量通过滤波器进行平滑去噪处理得到平滑后的时间序列构建正常数据集。In step S110 , the preprocessed sample data is subjected to EMD processing into various modal components, and smoothing and denoising processing is performed on each modal component through a filter to obtain a smoothed time series to construct a normal data set.

由于KPI数据受外在影响比较大,数据突变比较大,因此可将KPI数据描述为整体表现为非线性,非平稳态的信号。而EMD算法可以很好的对这种类型的信号进行处理,将数据分解为有限数目的线型,稳定的IMF之和。因此,可将EMD的过程描述为对于给定的KPI时序数据X(n),经过EMD处理后,可表示为各模态分量与余量之和。Since KPI data is greatly affected by external influences and the data mutation is relatively large, KPI data can be described as a signal that is nonlinear and non-stationary as a whole. The EMD algorithm can process this type of signal very well, decomposing the data into a limited number of linear and stable IMF sums. Therefore, the process of EMD can be described as the sum of each modal component and margin after EMD processing for a given KPI time series data X(n).

对预处理后的样本数据经过EMD处理为各模态分量的过程包括:The process of EMD processing the preprocessed sample data into each modal component includes:

所有归一化处理后的样本数据构成KPI时序数据X(n);将KPI时序数据X(n)经过EMD处理为各模态分量和余量的和:All normalized sample data constitute the KPI time series data X(n); the KPI time series data X(n) is processed by EMD into the sum of each modal component and residual:

Figure BDA0003873088360000061
Figure BDA0003873088360000061

其中,Ci(n)为第i个IMF分量,N为IMF总数,Rn为余量,n为数据样本长度。Among them, Ci (n) is the i-th IMF component, N is the total number of IMFs, Rn is the margin, and n is the data sample length.

本发明中对EMD分量中的每个IMF分量通过SG滤波器进行平滑去噪处理。SG滤波器的原理为:选取某个数据点长度为k的邻域作为滑动窗口,对邻域内的各个数据用一元p阶多项式进行拟合,通过最小二乘法求取多项式系数,进而得出滑动窗口中心点的最佳拟合值,该拟合值即为去噪后的值,滑动窗口依次沿着每一点滑动,从而实现了平滑去噪处理。In the present invention, each IMF component in the EMD component is smoothed and denoised through the SG filter. The principle of the SG filter is: select a neighborhood with a data point length of k as a sliding window, fit each data in the neighborhood with a one-dimensional p-order polynomial, and obtain the polynomial coefficients by the least square method, and then obtain the sliding window. The best fitting value of the center point of the window, the fitting value is the value after denoising, and the sliding window slides along each point in turn, thus realizing smooth denoising processing.

对各模态分量通过SG滤波器进行平滑去噪处理得到平滑后的时间序列的过程为:The process of smoothing and denoising each modal component through the SG filter to obtain a smoothed time series is as follows:

Figure BDA0003873088360000062
Figure BDA0003873088360000062

其中,X'(n)为最终得到的平滑后的时间序列;F为SG滤波器。Among them, X'(n) is the final smoothed time series; F is the SG filter.

本发明中将数据基于滑动窗口的方式进行分割后,将形成的数据曲线,数据曲线中包含有1个异常点。通过对数据进行EMD分解后得到的数据。在进行EMD分解后,原始数据分解出4个分量的IMF曲线与和余量R组成。In the present invention, after the data is segmented based on the sliding window, the data curve will be formed, and the data curve contains one abnormal point. The data obtained by EMD decomposition of the data. After EMD decomposition, the original data is decomposed into four component IMF curves and the residual R.

KPI数据经EMD分解后得到4阶模态分量,根据其EMD分量中的物理意义,各个分量代表原始信号中个频率的分量。若是将各个IMF分量相加,则代表原始KPI数据中主要的变化情况。最终将非平稳信号变为平稳信号,但保留了其瞬时变化。After the KPI data is decomposed by EMD, the fourth-order modal components are obtained. According to the physical meaning of the EMD components, each component represents a frequency component in the original signal. If the IMF components are added together, it represents the main changes in the original KPI data. Finally, the non-stationary signal is turned into a stationary signal, but its instantaneous variation is preserved.

之后使用分别对各个模态进行SG滤波,滤除瞬时变化曲线的噪点,使整条曲线变得平滑,最终重构出的曲线。Then use SG filtering for each mode to filter out the noise of the instantaneous change curve, smooth the whole curve, and finally reconstruct the curve.

本发明中的方法通过EMD分解之后提取IMF分量,之后对各模态的分量通过SG滤波重组,最终构建出包含数据变化,且去除噪点的KPI曲线。本发明中使用的TCN-AE模型来说,其本质为AE模型,是对正常数据模式的拟合,因此本发明使用基于EMD与SG滤波的方法构建出仅包含数据变化的正常数据集。The method in the present invention extracts IMF components after EMD decomposition, and then recombines the components of each mode through SG filtering, and finally constructs a KPI curve that includes data changes and removes noise. The TCN-AE model used in the present invention is essentially an AE model, which is a fitting of normal data patterns. Therefore, the present invention uses methods based on EMD and SG filtering to construct normal data sets that only contain data changes.

在步骤S120中,提取正常数据集的特征向量,并将所述特征向量输入至异常数据检测模型中输出异常检测数据;所述异常数据检测模型采用构建最小超球面将所述正常数据集中正常数据圈出来,输出除正常数据之外的异常检测数据。In step S120, the eigenvector of the normal data set is extracted, and the eigenvector is input into the abnormal data detection model to output the abnormal detection data; Circled to output anomaly detection data in addition to normal data.

提取数据的特征,特征包含三方面的特征,第一为基本统计特征:均值、方差、极值、波段、功率谱特征。第二时域方面特征:均值、方差、极值、过零点、边界点、波段长短峰值特征;第三频域特征:功率谱,功率密度比,中值频率,平均功率频率特征。这样共构成了15维的特征向量。Extract the features of the data. The features include three aspects. The first is the basic statistical features: mean, variance, extreme value, band, and power spectrum features. The second time domain features: mean value, variance, extreme value, zero crossing point, boundary point, band length peak features; the third frequency domain features: power spectrum, power density ratio, median frequency, average power frequency features. In this way, a total of 15-dimensional feature vectors are formed.

特征向量提取完成后使用OneClassSVM学习正常数据的数据模式。OneClassSVM的思路非常简单,就是寻找一个最小超球面将样本中的正例圈出来,预测就是用这个超平面做决策,在圈内的样本就认为是正样本。After feature vector extraction is completed, OneClassSVM is used to learn the data pattern of normal data. The idea of OneClassSVM is very simple. It is to find a minimum hypersphere to circle the positive examples in the sample. The prediction is to use this hyperplane to make decisions, and the samples in the circle are considered positive samples.

使用训练集样本训练完成OneClassSVM异常检测模型后,对测试集或线上数据提取基本统计特征,时域特征与频域特征,将特征输入训练好的模型中,输出异常检测结果。After using the training set samples to train the OneClassSVM anomaly detection model, extract basic statistical features, time domain features and frequency domain features from the test set or online data, input the features into the trained model, and output the anomaly detection results.

本发明实施例1提出了一种云平台时序数据异常检测方法,通过EMD与SG滤波对样本集进行过滤,构建正常样本数据集,之后提取正常样本数据集的基本统计特征,时域特征与频域特征,构建数据特征向量,将构建出的特征向量输入OneClassSVM,采用一种监督机器学习方法,但在一定程度上解除了异常数据人工标注的麻烦,也解决了异常正常数据不平衡对监督学习的影响。Embodiment 1 of the present invention proposes an anomaly detection method for time series data on a cloud platform. The sample set is filtered by EMD and SG filtering to construct a normal sample data set, and then the basic statistical features, time domain features and frequency features of the normal sample data set are extracted. Domain features, constructing data feature vectors, inputting the constructed feature vectors into OneClassSVM, adopting a supervised machine learning method, but to a certain extent, it relieves the trouble of manual labeling of abnormal data, and also solves the impact of abnormal and normal data imbalance on supervised learning Impact.

实施例2Example 2

基于本发明实施例1提出了一种云平台时序数据异常检测方法,本发明实施例2提出了一种云平台时序数据异常检测系统,如图2为本发明实施例2一种云平台时序数据异常检测系统示意图,该系统包括预处理模块、分解去噪模块和检测模块;Based on Embodiment 1 of the present invention, a method for detecting abnormality of time-series data on a cloud platform is proposed. Embodiment 2 of the present invention proposes a system for detecting abnormality of time-series data on a cloud platform. Figure 2 shows a time-series data of a cloud platform in Embodiment 2 of the present invention. Schematic diagram of an anomaly detection system, which includes a preprocessing module, a decomposition and denoising module, and a detection module;

预处理模块用于获取云平台原始监控数据,采用滑动窗口进行分割后标注为样本数据,所述样本数据包括正常样本和异常样本;然后对所述样本数据进行预处理;The preprocessing module is used to obtain the original monitoring data of the cloud platform, which is marked as sample data after being segmented by a sliding window, and the sample data includes normal samples and abnormal samples; then the sample data is preprocessed;

分解去噪模块用于对预处理后的样本数据经过EMD处理分解为各模态分量,针对所述各模态分量通过滤波器进行平滑去噪处理得到平滑后的时间序列构建正常数据集;The decomposition and denoising module is used to decompose the preprocessed sample data into various modal components through EMD processing, and perform smoothing and denoising processing for each modal component through a filter to obtain a smoothed time series to construct a normal data set;

检测模块用于提取正常数据集的特征向量,并将所述特征向量输入至异常数据检测模型中输出异常检测数据;所述异常数据检测模型采用构建最小超球面将所述正常数据集中正常数据圈出来,输出除正常数据之外的异常检测数据。The detection module is used to extract the feature vector of the normal data set, and input the feature vector to the abnormal data detection model to output the abnormal detection data; out, output the anomaly detection data in addition to the normal data.

预处理模块的过程为:获取云平台原始监控数据,采用滑动窗口进行分割后标注为样本数据的过程包括:采集云平台一段时间的监控数据作为原始监控数据,并设置采集周期;采用滑动窗口进行分割后滤除掉所述原始监控数据中的缺失点;为原始监控数据增加数据标签,其中正常样本标签为0、异常样本标签为1。The process of the preprocessing module is as follows: obtain the original monitoring data of the cloud platform, use the sliding window to divide and mark it as sample data. The process includes: collecting the monitoring data of the cloud platform for a period of time as the original monitoring data, and setting the collection cycle; After segmentation, the missing points in the original monitoring data are filtered out; data labels are added to the original monitoring data, wherein the normal sample label is 0, and the abnormal sample label is 1.

本发明中使用的训练数据为云平台四个月的监控数据,监控数据采集工具为telegraf,使用其自带插件cpu、disk、mem采集,采集周期为60s。采集的数据包含CPU,内存等关键性能指标的监控数据。异常点使用机器识别与人工判定的方式标注。基于滑动窗口的方式对数据进行分割,滤除数据中的缺失点,并添加数据标签,正常样本标签为0,异常样本标签为1。共提取出样本数据总数为21542,其中正常样本总数为20074,异常样本数为1468。The training data used in the present invention is the monitoring data of four months on the cloud platform, and the monitoring data collection tool is telegraf, which uses its own plug-in cpu, disk, and mem to collect, and the collection period is 60s. The collected data includes monitoring data of key performance indicators such as CPU and memory. Outliers are marked by machine recognition and manual judgment. Segment the data based on the sliding window method, filter out the missing points in the data, and add data labels, the normal sample label is 0, and the abnormal sample label is 1. A total of 21,542 sample data were extracted, of which 20,074 were normal samples and 1,468 were abnormal samples.

对样本数据进行预处理的过程为:The process of preprocessing the sample data is:

确定样本数据为x={x1,x2,...,xm};Determine the sample data as x={x1 ,x2 ,...,xm };

采用公式

Figure BDA0003873088360000091
进行归一化处理;use the formula
Figure BDA0003873088360000091
Perform normalization processing;

其中,xi表示第i个样本数据,i=1,2...m;x'i为归一化处理后的样本数据。Wherein, xi represents the i-th sample data, i=1, 2...m; x'i is the sample data after normalization processing.

变换后得到无量纲的时序数据范围为[0,1].本发明所采用的数据都经过统一步骤的预处理后再作为算法模型的输入数据,以保证数据的标准化处理。After the transformation, the range of the dimensionless time series data is [0,1]. The data used in the present invention are all preprocessed in a unified step and then used as input data of the algorithm model to ensure the standardized processing of the data.

分解去噪模块处理的过程包括:由于KPI数据受外在影响比较大,数据突变比较大,因此可将KPI数据描述为整体表现为非线性,非平稳态的信号。而EMD算法可以很好的对这种类型的信号进行处理,将数据分解为有限数目的线型,稳定的IMF之和。因此,可将EMD的过程描述为对于给定的KPI时序数据X(n),经过EMD处理后,可表示为各模态分量与余量之和。The processing process of the decomposition and denoising module includes: Since the KPI data is relatively affected by external factors and the data mutation is relatively large, the KPI data can be described as a nonlinear and non-stationary signal as a whole. The EMD algorithm can process this type of signal very well, decomposing the data into a limited number of linear and stable IMF sums. Therefore, the process of EMD can be described as the sum of each modal component and margin after EMD processing for a given KPI time series data X(n).

对预处理后的样本数据经过EMD处理为各模态分量的过程包括:The process of EMD processing the preprocessed sample data into each modal component includes:

所有归一化处理后的样本数据构成KPI时序数据X(n);将KPI时序数据X(n)经过EMD处理为各模态分量和余量的和:All normalized sample data constitute the KPI time series data X(n); the KPI time series data X(n) is processed by EMD into the sum of each modal component and residual:

Figure BDA0003873088360000092
Figure BDA0003873088360000092

其中,Ci(n)为第i个IMF分量,N为IMF总数,Rn为余量,n为数据样本长度。Among them, Ci (n) is the i-th IMF component, N is the total number of IMFs, Rn is the margin, and n is the data sample length.

本发明中对EMD分量中的每个IMF分量通过SG滤波器进行平滑去噪处理。SG滤波器的原理为:选取某个数据点长度为k的邻域作为滑动窗口,对邻域内的各个数据用一元p阶多项式进行拟合,通过最小二乘法求取多项式系数,进而得出滑动窗口中心点的最佳拟合值,该拟合值即为去噪后的值,滑动窗口依次沿着每一点滑动,从而实现了平滑去噪处理。In the present invention, each IMF component in the EMD component is smoothed and denoised through the SG filter. The principle of the SG filter is: select a neighborhood with a data point length of k as a sliding window, fit each data in the neighborhood with a one-dimensional p-order polynomial, and obtain the polynomial coefficients by the least square method, and then obtain the sliding window. The best fitting value of the center point of the window, the fitting value is the value after denoising, and the sliding window slides along each point in turn, thus realizing smooth denoising processing.

对各模态分量通过SG滤波器进行平滑去噪处理得到平滑后的时间序列的过程为:The process of smoothing and denoising each modal component through the SG filter to obtain a smoothed time series is as follows:

Figure BDA0003873088360000101
Figure BDA0003873088360000101

其中,X'(n)为最终得到的平滑后的时间序列;F为SG滤波器。Among them, X'(n) is the final smoothed time series; F is the SG filter.

本发明中将数据基于滑动窗口的方式进行分割后,将形成的数据曲线,数据曲线中包含有1个异常点。通过对数据进行EMD分解后得到的数据。在进行EMD分解后,原始数据分解出4个分量的IMF曲线与和余量R组成。In the present invention, after the data is segmented based on the sliding window, the data curve will be formed, and the data curve contains one abnormal point. The data obtained by EMD decomposition of the data. After EMD decomposition, the original data is decomposed into four component IMF curves and the residual R.

KPI数据经EMD分解后得到4阶模态分量,根据其EMD分量中的物理意义,各个分量代表原始信号中个频率的分量。若是将各个IMF分量相加,则代表原始KPI数据中主要的变化情况。最终将非平稳信号变为平稳信号,但保留了其瞬时变化。After the KPI data is decomposed by EMD, the fourth-order modal components are obtained. According to the physical meaning of the EMD components, each component represents a frequency component in the original signal. If the IMF components are added together, it represents the main changes in the original KPI data. Finally, the non-stationary signal is turned into a stationary signal, but its instantaneous variation is preserved.

之后使用分别对各个模态进行SG滤波,滤除瞬时变化曲线的噪点,使整条曲线变得平滑,最终重构出的曲线。Then use SG filtering for each mode to filter out the noise of the instantaneous change curve, smooth the whole curve, and finally reconstruct the curve.

本发明中的系统通过EMD分解之后提取IMF分量,之后对各模态的分量通过SG滤波重组,最终构建出包含数据变化,且去除噪点的KPI曲线。本发明中使用的TCN-AE模型来说,其本质为AE模型,是对正常数据模式的拟合,因此本发明使用基于EMD与SG滤波的方法构建出仅包含数据变化的正常数据集。The system in the present invention extracts IMF components after EMD decomposition, and then reorganizes the components of each mode through SG filtering, and finally constructs a KPI curve that includes data changes and removes noise. The TCN-AE model used in the present invention is essentially an AE model, which is a fitting of normal data patterns. Therefore, the present invention uses methods based on EMD and SG filtering to construct normal data sets that only contain data changes.

检测模块实现的过程为:取数据的特征,特征包含三方面的特征,第一为基本统计特征:均值、方差、极值、波段、功率谱特征。第二时域方面特征:均值、方差、极值、过零点、边界点、波段长短峰值特征;第三频域特征:功率谱,功率密度比,中值频率,平均功率频率特征。这样共构成了15维的特征向量。The process implemented by the detection module is: the characteristics of the data are taken, and the characteristics include three aspects of characteristics. The first is the basic statistical characteristics: mean value, variance, extreme value, band, and power spectrum characteristics. The second time domain features: mean value, variance, extreme value, zero crossing point, boundary point, band length peak features; the third frequency domain features: power spectrum, power density ratio, median frequency, average power frequency features. In this way, a total of 15-dimensional feature vectors are formed.

特征向量提取完成后使用OneClassSVM学习正常数据的数据模式。OneClassSVM的思路非常简单,就是寻找一个最小超球面将样本中的正例圈出来,预测就是用这个超平面做决策,在圈内的样本就认为是正样本。After feature vector extraction is completed, OneClassSVM is used to learn the data pattern of normal data. The idea of OneClassSVM is very simple. It is to find a minimum hypersphere to circle the positive examples in the sample. The prediction is to use this hyperplane to make decisions, and the samples in the circle are considered positive samples.

使用训练集样本训练完成OneClassSVM异常检测模型后,对测试集或线上数据提取基本统计特征,时域特征与频域特征,将特征输入训练好的模型中,输出异常检测结果。After using the training set samples to train the OneClassSVM anomaly detection model, extract basic statistical features, time domain features and frequency domain features from the test set or online data, input the features into the trained model, and output the anomaly detection results.

本发明实施例2提出了一种云平台时序数据异常检测系统,通过EMD与SG滤波对样本集进行过滤,构建正常样本数据集,之后提取正常样本数据集的基本统计特征,时域特征与频域特征,构建数据特征向量,将构建出的特征向量输入OneClassSVM,采用一种监督机器学习方法,但在一定程度上解除了异常数据人工标注的麻烦,也解决了异常正常数据不平衡对监督学习的影响。Embodiment 2 of the present invention proposes a cloud platform time series data anomaly detection system, which filters the sample set through EMD and SG filtering to construct a normal sample data set, and then extracts the basic statistical features, time domain features and frequency of the normal sample data set. Domain features, constructing data feature vectors, inputting the constructed feature vectors into OneClassSVM, adopting a supervised machine learning method, but to a certain extent, it relieves the trouble of manual labeling of abnormal data, and also solves the impact of abnormal and normal data imbalance on supervised learning Impact.

实施例3Example 3

本发明还提出了一种设备,如图3所示为本发明实施例3提供的一种电子设备连接示意图,包括:The present invention also proposes a device, as shown in Figure 3, which is a schematic diagram of the connection of an electronic device provided by Embodiment 3 of the present invention, including:

存储器,用于存储计算机程序;memory for storing computer programs;

处理器,用于执行所述计算机程序时实现方法步骤如下:When the processor is used to execute the computer program, the method steps are as follows:

如图1为本发明实施例1一种云平台时序数据异常检测方法流程图。FIG. 1 is a flow chart of a method for detecting anomalies in time series data on a cloud platform according to Embodiment 1 of the present invention.

在步骤S100中,获取云平台原始监控数据,采用滑动窗口进行分割后标注为样本数据,所述样本数据包括正常样本和异常样本;然后对所述样本数据进行预处理。In step S100 , the original monitoring data of the cloud platform is acquired, segmented by using a sliding window and marked as sample data, the sample data includes normal samples and abnormal samples; and then the sample data is preprocessed.

获取云平台原始监控数据,采用滑动窗口进行分割后标注为样本数据的过程包括:采集云平台一段时间的监控数据作为原始监控数据,并设置采集周期;采用滑动窗口进行分割后滤除掉所述原始监控数据中的缺失点;为原始监控数据增加数据标签,其中正常样本标签为0、异常样本标签为1。The process of obtaining the original monitoring data of the cloud platform, segmenting it with a sliding window and marking it as sample data includes: collecting the monitoring data of the cloud platform for a period of time as the original monitoring data, and setting the collection period; Missing points in the original monitoring data; add data labels to the original monitoring data, where the normal sample label is 0 and the abnormal sample label is 1.

本发明中使用的训练数据为云平台四个月的监控数据,监控数据采集工具为telegraf,使用其自带插件cpu、disk、mem采集,采集周期为60s。采集的数据包含CPU,内存等关键性能指标的监控数据。异常点使用机器识别与人工判定的方式标注。基于滑动窗口的方式对数据进行分割,滤除数据中的缺失点,并添加数据标签,正常样本标签为0,异常样本标签为1。共提取出样本数据总数为21542,其中正常样本总数为20074,异常样本数为1468。The training data used in the present invention is the monitoring data of four months on the cloud platform, and the monitoring data collection tool is telegraf, which uses its own plug-in cpu, disk, and mem to collect, and the collection period is 60s. The collected data includes monitoring data of key performance indicators such as CPU and memory. Outliers are marked by machine recognition and manual judgment. Segment the data based on the sliding window method, filter out the missing points in the data, and add data labels, the normal sample label is 0, and the abnormal sample label is 1. A total of 21,542 sample data were extracted, of which 20,074 were normal samples and 1,468 were abnormal samples.

对样本数据进行预处理的过程为:The process of preprocessing the sample data is:

确定样本数据为x={x1,x2,...,xm};Determine the sample data as x={x1 ,x2 ,...,xm };

采用公式

Figure BDA0003873088360000121
进行归一化处理;use the formula
Figure BDA0003873088360000121
Perform normalization processing;

其中,xi表示第i个样本数据,i=1,2...m;x'i为归一化处理后的样本数据。Wherein, xi represents the i-th sample data, i=1, 2...m; x'i is the sample data after normalization processing.

变换后得到无量纲的时序数据范围为[0,1].本发明所采用的数据都经过统一步骤的预处理后再作为算法模型的输入数据,以保证数据的标准化处理。After the transformation, the range of the dimensionless time series data is [0,1]. The data used in the present invention are all preprocessed in a unified step and then used as input data of the algorithm model to ensure the standardized processing of the data.

在步骤S110中,对预处理后的样本数据经过EMD处理为各模态分量,针对所述各模态分量通过滤波器进行平滑去噪处理得到平滑后的时间序列构建正常数据集。In step S110 , the preprocessed sample data is subjected to EMD processing into various modal components, and smoothing and denoising processing is performed on each modal component through a filter to obtain a smoothed time series to construct a normal data set.

由于KPI数据受外在影响比较大,数据突变比较大,因此可将KPI数据描述为整体表现为非线性,非平稳态的信号。而EMD算法可以很好的对这种类型的信号进行处理,将数据分解为有限数目的线型,稳定的IMF之和。因此,可将EMD的过程描述为对于给定的KPI时序数据X(n),经过EMD处理后,可表示为各模态分量与余量之和。Since KPI data is greatly affected by external influences and the data mutation is relatively large, KPI data can be described as a signal that is nonlinear and non-stationary as a whole. The EMD algorithm can process this type of signal very well, decomposing the data into a limited number of linear and stable IMF sums. Therefore, the process of EMD can be described as the sum of each modal component and margin after EMD processing for a given KPI time series data X(n).

对预处理后的样本数据经过EMD处理为各模态分量的过程包括:The process of EMD processing the preprocessed sample data into each modal component includes:

所有归一化处理后的样本数据构成KPI时序数据X(n);将KPI时序数据X(n)经过EMD处理为各模态分量和余量的和:All normalized sample data constitute the KPI time series data X(n); the KPI time series data X(n) is processed by EMD into the sum of each modal component and residual:

Figure BDA0003873088360000122
Figure BDA0003873088360000122

其中,Ci(n)为第i个IMF分量,N为IMF总数,Rn为余量,n为数据样本长度。Among them, Ci (n) is the i-th IMF component, N is the total number of IMFs, Rn is the margin, and n is the data sample length.

本发明中对EMD分量中的每个IMF分量通过SG滤波器进行平滑去噪处理。SG滤波器的原理为:选取某个数据点长度为k的邻域作为滑动窗口,对邻域内的各个数据用一元p阶多项式进行拟合,通过最小二乘法求取多项式系数,进而得出滑动窗口中心点的最佳拟合值,该拟合值即为去噪后的值,滑动窗口依次沿着每一点滑动,从而实现了平滑去噪处理。In the present invention, each IMF component in the EMD component is smoothed and denoised through the SG filter. The principle of the SG filter is: select a neighborhood with a data point length of k as a sliding window, fit each data in the neighborhood with a one-dimensional p-order polynomial, and obtain the polynomial coefficients by the least square method, and then obtain the sliding window. The best fitting value of the center point of the window, the fitting value is the value after denoising, and the sliding window slides along each point in turn, thus realizing smooth denoising processing.

对各模态分量通过SG滤波器进行平滑去噪处理得到平滑后的时间序列的过程为:The process of smoothing and denoising each modal component through the SG filter to obtain a smoothed time series is as follows:

Figure BDA0003873088360000131
Figure BDA0003873088360000131

其中,X'(n)为最终得到的平滑后的时间序列;F为SG滤波器。Among them, X'(n) is the final smoothed time series; F is the SG filter.

本发明中将数据基于滑动窗口的方式进行分割后,将形成的数据曲线,数据曲线中包含有1个异常点。通过对数据进行EMD分解后得到的数据。在进行EMD分解后,原始数据分解出4个分量的IMF曲线与和余量R组成。In the present invention, after the data is segmented based on the sliding window, the data curve will be formed, and the data curve contains one abnormal point. The data obtained by EMD decomposition of the data. After EMD decomposition, the original data is decomposed into four component IMF curves and the residual R.

KPI数据经EMD分解后得到4阶模态分量,根据其EMD分量中的物理意义,各个分量代表原始信号中个频率的分量。若是将各个IMF分量相加,则代表原始KPI数据中主要的变化情况。最终将非平稳信号变为平稳信号,但保留了其瞬时变化。After the KPI data is decomposed by EMD, the fourth-order modal components are obtained. According to the physical meaning of the EMD components, each component represents a frequency component in the original signal. If the IMF components are added together, it represents the main changes in the original KPI data. Finally, the non-stationary signal is turned into a stationary signal, but its instantaneous variation is preserved.

之后使用分别对各个模态进行SG滤波,滤除瞬时变化曲线的噪点,使整条曲线变得平滑,最终重构出的曲线。Then use SG filtering for each mode to filter out the noise of the instantaneous change curve, smooth the whole curve, and finally reconstruct the curve.

本发明中的方法通过EMD分解之后提取IMF分量,之后对各模态的分量通过SG滤波重组,最终构建出包含数据变化,且去除噪点的KPI曲线。本发明中使用的TCN-AE模型来说,其本质为AE模型,是对正常数据模式的拟合,因此本发明使用基于EMD与SG滤波的方法构建出仅包含数据变化的正常数据集。The method in the present invention extracts IMF components after EMD decomposition, and then recombines the components of each mode through SG filtering, and finally constructs a KPI curve that includes data changes and removes noise. The TCN-AE model used in the present invention is essentially an AE model, which is a fitting of normal data patterns. Therefore, the present invention uses methods based on EMD and SG filtering to construct normal data sets that only contain data changes.

在步骤S120中,提取正常数据集的特征向量,并将所述特征向量输入至异常数据检测模型中输出异常检测数据;所述异常数据检测模型采用构建最小超球面将所述正常数据集中正常数据圈出来,输出除正常数据之外的异常检测数据。In step S120, the eigenvector of the normal data set is extracted, and the eigenvector is input into the abnormal data detection model to output the abnormal detection data; Circled to output anomaly detection data in addition to normal data.

提取数据的特征,特征包含三方面的特征,第一为基本统计特征:均值、方差、极值、波段、功率谱特征。第二时域方面特征:均值、方差、极值、过零点、边界点、波段长短峰值特征;第三频域特征:功率谱,功率密度比,中值频率,平均功率频率特征。这样共构成了15维的特征向量。Extract the features of the data. The features include three aspects. The first is the basic statistical features: mean, variance, extreme value, band, and power spectrum features. The second time domain features: mean value, variance, extreme value, zero crossing point, boundary point, band length peak features; the third frequency domain features: power spectrum, power density ratio, median frequency, average power frequency features. In this way, a total of 15-dimensional feature vectors are formed.

特征向量提取完成后使用OneClassSVM学习正常数据的数据模式。OneClassSVM的思路非常简单,就是寻找一个最小超球面将样本中的正例圈出来,预测就是用这个超平面做决策,在圈内的样本就认为是正样本。After feature vector extraction is completed, OneClassSVM is used to learn the data pattern of normal data. The idea of OneClassSVM is very simple. It is to find a minimum hypersphere to circle the positive examples in the sample. The prediction is to use this hyperplane to make decisions, and the samples in the circle are considered positive samples.

使用训练集样本训练完成OneClassSVM异常检测模型后,对测试集或线上数据提取基本统计特征,时域特征与频域特征,将特征输入训练好的模型中,输出异常检测结果。After using the training set samples to train the OneClassSVM anomaly detection model, extract basic statistical features, time domain features and frequency domain features from the test set or online data, input the features into the trained model, and output the anomaly detection results.

本发明实施例3提出了一种设备,通过EMD与SG滤波对样本集进行过滤,构建正常样本数据集,之后提取正常样本数据集的基本统计特征,时域特征与频域特征,构建数据特征向量,将构建出的特征向量输入OneClassSVM,采用一种监督机器学习方法,但在一定程度上解除了异常数据人工标注的麻烦,也解决了异常正常数据不平衡对监督学习的影响。Embodiment 3 of the present invention proposes a device that filters the sample set through EMD and SG filtering to construct a normal sample data set, and then extracts the basic statistical features, time domain features and frequency domain features of the normal sample data set to construct data features Vector, input the constructed feature vector into OneClassSVM, using a supervised machine learning method, but to a certain extent, it relieves the trouble of manual labeling of abnormal data, and also solves the impact of abnormal and normal data imbalance on supervised learning.

需要说明:本发明技术方案还提供了一种电子设备,包括:通信接口,能够与其它设备比如网络设备等进行信息交互;处理器,与通信接口连接,以实现与其它设备进行信息交互,用于运行计算机程序时,执行上述一个或多个技术方案提供的一种云平台时序数据异常检测方法,而所述计算机程序存储在存储器上。当然,实际应用时,电子设备中的各个组件通过总线系统耦合在一起。可理解,总线系统用于实现这些组件之间的连接通信。总线系统除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。本申请实施例中的存储器用于存储各种类型的数据以支持电子设备的操作。这些数据的示例包括:用于在电子设备上操作的任何计算机程序。可以理解,存储器可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory)、可编程只读存储器(PROM,Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM,Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM,Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM,ferromagnetic random access memory)、快闪存储器(FlashMemory)、磁表面存储器、光盘、或只读光盘(CD-ROM,Compact Disc Read-Only Memory);磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM,Random AccessMemory),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM,Static Random Access Memory)、同步静态随机存取存储器(SSRAM,Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM,Dynamic Random Access Memory)、同步动态随机存取存储器(SDRAM,SynchronousDynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM,Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM,Enhanced Synchronous Dynamic Random AccessMemory)、同步连接动态随机存取存储器(SLDRAM,SyncLink Dynamic Random AccessMemory)、直接内存总线随机存取存储器(DRRAM,Direct Rambus Random Access Memory)。本申请实施例描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。上述本申请实施例揭示的方法可以应用于处理器中,或者由处理器实现。处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、DSP(Digital Signal Processing,即指能够实现数字信号处理技术的芯片),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤,可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中,该存储介质位于存储器,处理器读取存储器中的程序,结合其硬件完成前述方法的步骤。处理器执行所述程序时实现本申请实施例的各个方法中的相应流程,为了简洁,在此不再赘述。Note: the technical solution of the present invention also provides an electronic device, including: a communication interface capable of information interaction with other devices such as network devices; a processor connected to the communication interface to realize information interaction with other devices, using When the computer program is running, the method for detecting abnormality of time series data on the cloud platform provided by one or more technical solutions above is executed, and the computer program is stored in the memory. Of course, in practical applications, various components in the electronic device are coupled together through a bus system. It can be understood that the bus system is used to realize the connection communication between these components. In addition to the data bus, the bus system also includes a power bus, a control bus and a status signal bus. The memory in the embodiment of the present application is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program used to operate on an electronic device. It can be understood that the memory may be a volatile memory or a nonvolatile memory, and may also include both volatile and nonvolatile memory. Wherein, the non-volatile memory can be a read-only memory (ROM, Read Only Memory), a programmable read-only memory (PROM, Programmable Read-Only Memory), an erasable programmable read-only memory (EPROM, Erasable Programmable Read-Only Memory), Only Memory), Electrically Erasable Programmable Read-Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), Magnetic Random Access Memory (FRAM, ferromagnetic random access memory), Flash Memory (FlashMemory), Magnetic Surface Memory, Optical disc, or compact disc read-only memory (CD-ROM, Compact Disc Read-Only Memory); magnetic surface storage can be magnetic disk storage or magnetic tape storage. The volatile memory may be random access memory (RAM, Random Access Memory), which is used as an external cache. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM, Static Random Access Memory), Synchronous Static Random Access Memory (SSRAM, Synchronous Static Random Access Memory), Dynamic Random Access Memory Memory (DRAM, Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous Dynamic Random Access Memory), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), Synchronous Connection Dynamic Random Access Memory (SLDRAM, SyncLink Dynamic Random Access Memory), Direct Memory Bus Random Access Memory (DRRAM, Direct Rambus Random Access Memory). The memories described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memories. The methods disclosed in the foregoing embodiments of the present application may be applied to, or implemented by, a processor. A processor may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in a processor or an instruction in the form of software. The above-mentioned processor may be a general-purpose processor, DSP (Digital Signal Processing, that is, a chip capable of implementing digital signal processing technology), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. The processor may implement or execute the various methods, steps, and logic block diagrams disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module can be located in a storage medium, and the storage medium is located in a memory, and the processor reads the program in the memory, and combines with its hardware to complete the steps of the foregoing method. When the processor executes the program, the corresponding processes in the various methods of the embodiments of the present application are implemented, and details are not repeated here for the sake of brevity.

实施例4Example 4

本发明还提出了一种可读存储介质,可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现方法步骤如下:The present invention also proposes a readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method steps are as follows:

如图1为本发明实施例1一种云平台时序数据异常检测方法流程图。FIG. 1 is a flow chart of a method for detecting anomalies in time series data on a cloud platform according to Embodiment 1 of the present invention.

在步骤S100中,获取云平台原始监控数据,采用滑动窗口进行分割后标注为样本数据,所述样本数据包括正常样本和异常样本;然后对所述样本数据进行预处理。In step S100 , the original monitoring data of the cloud platform is acquired, segmented by using a sliding window and marked as sample data, the sample data includes normal samples and abnormal samples; and then the sample data is preprocessed.

获取云平台原始监控数据,采用滑动窗口进行分割后标注为样本数据的过程包括:采集云平台一段时间的监控数据作为原始监控数据,并设置采集周期;采用滑动窗口进行分割后滤除掉所述原始监控数据中的缺失点;为原始监控数据增加数据标签,其中正常样本标签为0、异常样本标签为1。The process of obtaining the original monitoring data of the cloud platform, segmenting it with a sliding window and marking it as sample data includes: collecting the monitoring data of the cloud platform for a period of time as the original monitoring data, and setting the collection period; Missing points in the original monitoring data; add data labels to the original monitoring data, where the normal sample label is 0 and the abnormal sample label is 1.

本发明中使用的训练数据为云平台四个月的监控数据,监控数据采集工具为telegraf,使用其自带插件cpu、disk、mem采集,采集周期为60s。采集的数据包含CPU,内存等关键性能指标的监控数据。异常点使用机器识别与人工判定的方式标注。基于滑动窗口的方式对数据进行分割,滤除数据中的缺失点,并添加数据标签,正常样本标签为0,异常样本标签为1。共提取出样本数据总数为21542,其中正常样本总数为20074,异常样本数为1468。The training data used in the present invention is the monitoring data of four months on the cloud platform, and the monitoring data collection tool is telegraf, which uses its own plug-in cpu, disk, and mem to collect, and the collection period is 60s. The collected data includes monitoring data of key performance indicators such as CPU and memory. Outliers are marked by machine recognition and manual judgment. Segment the data based on the sliding window method, filter out the missing points in the data, and add data labels, the normal sample label is 0, and the abnormal sample label is 1. A total of 21,542 sample data were extracted, of which 20,074 were normal samples and 1,468 were abnormal samples.

对样本数据进行预处理的过程为:The process of preprocessing the sample data is:

确定样本数据为x={x1,x2,...,xm};Determine the sample data as x={x1 ,x2 ,...,xm };

采用公式

Figure BDA0003873088360000161
进行归一化处理;use the formula
Figure BDA0003873088360000161
Perform normalization processing;

其中,xi表示第i个样本数据,i=1,2...m;x'i为归一化处理后的样本数据。Wherein, xi represents the i-th sample data, i=1, 2...m; x'i is the sample data after normalization processing.

变换后得到无量纲的时序数据范围为[0,1].本发明所采用的数据都经过统一步骤的预处理后再作为算法模型的输入数据,以保证数据的标准化处理。After the transformation, the range of the dimensionless time series data is [0,1]. The data used in the present invention are all preprocessed in a unified step and then used as input data of the algorithm model to ensure the standardized processing of the data.

在步骤S110中,对预处理后的样本数据经过EMD处理为各模态分量,针对所述各模态分量通过滤波器进行平滑去噪处理得到平滑后的时间序列构建正常数据集。In step S110 , the preprocessed sample data is subjected to EMD processing into various modal components, and smoothing and denoising processing is performed on each modal component through a filter to obtain a smoothed time series to construct a normal data set.

由于KPI数据受外在影响比较大,数据突变比较大,因此可将KPI数据描述为整体表现为非线性,非平稳态的信号。而EMD算法可以很好的对这种类型的信号进行处理,将数据分解为有限数目的线型,稳定的IMF之和。因此,可将EMD的过程描述为对于给定的KPI时序数据X(n),经过EMD处理后,可表示为各模态分量与余量之和。Since KPI data is greatly affected by external influences and the data mutation is relatively large, KPI data can be described as a signal that is nonlinear and non-stationary as a whole. The EMD algorithm can process this type of signal very well, decomposing the data into a limited number of linear and stable IMF sums. Therefore, the process of EMD can be described as the sum of each modal component and margin after EMD processing for a given KPI time series data X(n).

对预处理后的样本数据经过EMD处理为各模态分量的过程包括:The process of EMD processing the preprocessed sample data into each modal component includes:

所有归一化处理后的样本数据构成KPI时序数据X(n);将KPI时序数据X(n)经过EMD处理为各模态分量和余量的和:All normalized sample data constitute the KPI time series data X(n); the KPI time series data X(n) is processed by EMD into the sum of each modal component and residual:

Figure BDA0003873088360000171
Figure BDA0003873088360000171

其中,Ci(n)为第i个IMF分量,N为IMF总数,Rn为余量,n为数据样本长度。Among them, Ci (n) is the i-th IMF component, N is the total number of IMFs, Rn is the margin, and n is the data sample length.

本发明中对EMD分量中的每个IMF分量通过SG滤波器进行平滑去噪处理。SG滤波器的原理为:选取某个数据点长度为k的邻域作为滑动窗口,对邻域内的各个数据用一元p阶多项式进行拟合,通过最小二乘法求取多项式系数,进而得出滑动窗口中心点的最佳拟合值,该拟合值即为去噪后的值,滑动窗口依次沿着每一点滑动,从而实现了平滑去噪处理。In the present invention, each IMF component in the EMD component is smoothed and denoised through the SG filter. The principle of the SG filter is: select a neighborhood with a data point length of k as a sliding window, fit each data in the neighborhood with a one-dimensional p-order polynomial, and obtain the polynomial coefficients by the least square method, and then obtain the sliding window. The best fitting value of the center point of the window, the fitting value is the value after denoising, and the sliding window slides along each point in turn, thus realizing smooth denoising processing.

对各模态分量通过SG滤波器进行平滑去噪处理得到平滑后的时间序列的过程为:The process of smoothing and denoising each modal component through the SG filter to obtain a smoothed time series is as follows:

Figure BDA0003873088360000172
Figure BDA0003873088360000172

其中,X'(n)为最终得到的平滑后的时间序列;F为SG滤波器。Among them, X'(n) is the final smoothed time series; F is the SG filter.

本发明中将数据基于滑动窗口的方式进行分割后,将形成的数据曲线,数据曲线中包含有1个异常点。通过对数据进行EMD分解后得到的数据。在进行EMD分解后,原始数据分解出4个分量的IMF曲线与和余量R组成。In the present invention, after the data is segmented based on the sliding window, the data curve will be formed, and the data curve contains one abnormal point. The data obtained by EMD decomposition of the data. After EMD decomposition, the original data is decomposed into four component IMF curves and the residual R.

KPI数据经EMD分解后得到4阶模态分量,根据其EMD分量中的物理意义,各个分量代表原始信号中个频率的分量。若是将各个IMF分量相加,则代表原始KPI数据中主要的变化情况。最终将非平稳信号变为平稳信号,但保留了其瞬时变化。After the KPI data is decomposed by EMD, the fourth-order modal components are obtained. According to the physical meaning of the EMD components, each component represents a frequency component in the original signal. If the IMF components are added together, it represents the main changes in the original KPI data. Finally, the non-stationary signal is turned into a stationary signal, but its instantaneous variation is preserved.

之后使用分别对各个模态进行SG滤波,滤除瞬时变化曲线的噪点,使整条曲线变得平滑,最终重构出的曲线。Then use SG filtering for each mode to filter out the noise of the instantaneous change curve, smooth the whole curve, and finally reconstruct the curve.

本发明中的方法通过EMD分解之后提取IMF分量,之后对各模态的分量通过SG滤波重组,最终构建出包含数据变化,且去除噪点的KPI曲线。本发明中使用的TCN-AE模型来说,其本质为AE模型,是对正常数据模式的拟合,因此本发明使用基于EMD与SG滤波的方法构建出仅包含数据变化的正常数据集。The method in the present invention extracts IMF components after EMD decomposition, and then recombines the components of each mode through SG filtering, and finally constructs a KPI curve that includes data changes and removes noise. The TCN-AE model used in the present invention is essentially an AE model, which is a fitting of normal data patterns. Therefore, the present invention uses methods based on EMD and SG filtering to construct normal data sets that only contain data changes.

在步骤S120中,提取正常数据集的特征向量,并将所述特征向量输入至异常数据检测模型中输出异常检测数据;所述异常数据检测模型采用构建最小超球面将所述正常数据集中正常数据圈出来,输出除正常数据之外的异常检测数据。In step S120, the eigenvector of the normal data set is extracted, and the eigenvector is input into the abnormal data detection model to output the abnormal detection data; Circled to output anomaly detection data in addition to normal data.

提取数据的特征,特征包含三方面的特征,第一为基本统计特征:均值、方差、极值、波段、功率谱特征。第二时域方面特征:均值、方差、极值、过零点、边界点、波段长短峰值特征;第三频域特征:功率谱,功率密度比,中值频率,平均功率频率特征。这样共构成了15维的特征向量。Extract the features of the data. The features include three aspects. The first is the basic statistical features: mean, variance, extreme value, band, and power spectrum features. The second time domain features: mean value, variance, extreme value, zero crossing point, boundary point, band length peak features; the third frequency domain features: power spectrum, power density ratio, median frequency, average power frequency features. In this way, a total of 15-dimensional feature vectors are formed.

特征向量提取完成后使用OneClassSVM学习正常数据的数据模式。OneClassSVM的思路非常简单,就是寻找一个最小超球面将样本中的正例圈出来,预测就是用这个超平面做决策,在圈内的样本就认为是正样本。After feature vector extraction is completed, OneClassSVM is used to learn the data pattern of normal data. The idea of OneClassSVM is very simple. It is to find a minimum hypersphere to circle the positive examples in the sample. The prediction is to use this hyperplane to make decisions, and the samples in the circle are considered positive samples.

使用训练集样本训练完成OneClassSVM异常检测模型后,对测试集或线上数据提取基本统计特征,时域特征与频域特征,将特征输入训练好的模型中,输出异常检测结果。After using the training set samples to train the OneClassSVM anomaly detection model, extract basic statistical features, time domain features and frequency domain features from the test set or online data, input the features into the trained model, and output the anomaly detection results.

本发明实施例1提出了一种存储介质,通过EMD与SG滤波对样本集进行过滤,构建正常样本数据集,之后提取正常样本数据集的基本统计特征,时域特征与频域特征,构建数据特征向量,将构建出的特征向量输入OneClassSVM,采用一种监督机器学习方法,但在一定程度上解除了异常数据人工标注的麻烦,也解决了异常正常数据不平衡对监督学习的影响。Embodiment 1 of the present invention proposes a storage medium that filters the sample set through EMD and SG filtering to construct a normal sample data set, and then extracts the basic statistical features, time domain features and frequency domain features of the normal sample data set to construct the data The eigenvector, inputting the constructed eigenvector into OneClassSVM, adopts a supervised machine learning method, but to a certain extent, it relieves the trouble of manual labeling of abnormal data, and also solves the influence of abnormal and normal data imbalance on supervised learning.

本申请实施例还提供了一种存储介质,即计算机存储介质,具体为计算机可读存储介质,例如包括存储计算机程序的存储器,上述计算机程序可由处理器执行,以完成前述方法所述步骤。计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、Flash Memory、磁表面存储器、光盘、或CD-ROM等存储器。The embodiment of the present application also provides a storage medium, that is, a computer storage medium, specifically a computer-readable storage medium, for example, including a memory storing a computer program, and the above-mentioned computer program can be executed by a processor to complete the steps in the foregoing method. The computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM.

本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台电子设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps for realizing the above-mentioned method embodiments can be completed by hardware related to program instructions, and the aforementioned program can be stored in a computer-readable storage medium. When the program is executed, the It includes the steps of the above method embodiments; and the aforementioned storage medium includes: various media that can store program codes such as removable storage devices, ROM, RAM, magnetic disks or optical disks. Alternatively, if the above-mentioned integrated units of the present application are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes several instructions for Make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: various media capable of storing program codes such as removable storage devices, ROM, RAM, magnetic disks or optical disks.

本申请实施例提供的一种云平台时序数据异常检测的处理设备和介质中相关部分的说明可以参见本申请实施例1提供的一种云平台时序数据异常检测方法中对应部分的详细说明,在此不再赘述。For the description of the relevant parts of the processing device and medium for the abnormal detection of time-series data on the cloud platform provided in the embodiment of the present application, please refer to the detailed description of the corresponding part in the method for detecting the abnormal time-series data of the cloud platform provided in Embodiment 1 of the present application. This will not be repeated here.

需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。另外,本申请实施例提供的上述技术方案中与现有技术中对应技术方案实现原理一致的部分并未详细说明,以免过多赘述。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that there is a relationship between these entities or operations. There is no such actual relationship or order between them. Furthermore, the terms "comprising", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion such that elements inherent in a process, method, article, or apparatus including a series of elements are included. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element. In addition, the part of the technical solution provided by the embodiment of the present application that is consistent with the realization principle of the corresponding technical solution in the prior art is not described in detail, so as to avoid redundant description.

上述虽然结合附图对本发明的具体实施方式进行了描述,但并非对本发明保护范围的限制。对于所属领域的技术人员来说,在上述说明的基础上还可以做出其它不同形式的修改或变形。这里无需也无法对所有的实施方式予以穷举。在本发明的技术方案的基础上,本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, it does not limit the protection scope of the present invention. For those skilled in the art, on the basis of the above description, other modifications or changes in different forms can also be made. It is not necessary and impossible to exhaustively list all the implementation manners here. On the basis of the technical solution of the present invention, various modifications or deformations that can be made by those skilled in the art without creative efforts are still within the protection scope of the present invention.

Claims (10)

1. A cloud platform time sequence data anomaly detection method is characterized by comprising the following steps:
the method comprises the steps of obtaining original monitoring data of the cloud platform, adopting a sliding window to segment the original monitoring data, and marking the segmented data as sample data, wherein the sample data comprises normal samples and abnormal samples; then preprocessing the sample data;
EMD processing is carried out on the preprocessed sample data to obtain each modal component, smoothing and denoising processing is carried out on each modal component through a filter to obtain a smoothed time sequence to construct a normal data set;
extracting a characteristic vector of a normal data set, and inputting the characteristic vector into an abnormal data detection model to output abnormal detection data; and the abnormal data detection model adopts the minimum hypersphere to circle the normal data in a concentrated way, and outputs the abnormal detection data except the normal data.
2. The method according to claim 1, wherein the step of obtaining the original monitoring data of the cloud platform and marking the data as sample data after being segmented by using a sliding window comprises:
collecting monitoring data of the cloud platform for a period of time as original monitoring data, and setting a collection period;
adopting a sliding window to carry out segmentation and then filtering to remove missing points in the original monitoring data;
and adding data labels to the original monitoring data, wherein the normal sample label is 0, and the abnormal sample label is 1.
3. The method according to claim 1, wherein the preprocessing of the sample data comprises:
determining the sample data as x = { x =1 ,x2 ,...,xm };
Using a formula
Figure FDA0003873088350000011
Carrying out normalization processing;
wherein x isi Represents the ith sample data, i =1,2.. M; x is a radical of a fluorine atomi ' is the sample data after normalization processing.
4. The method according to claim 1, wherein the EMD processing of the preprocessed sample data into modal components comprises:
all the sample data after normalization processing form KPI time sequence data X (n); and (3) subjecting KPI time sequence data X (n) to EMD processing to obtain the sum of each modal component and the allowance:
Figure FDA0003873088350000012
wherein, Ci (N) is the ith IMF component, N is the total number of IMFs, Rn For the remainder, n is the data sample length.
5. The method for detecting the abnormality of the cloud platform time series data according to claim 4, wherein the process of smoothing and denoising the modal components through an SG filter to obtain a smoothed time series comprises:
Figure FDA0003873088350000021
wherein, X' (n) is the finally obtained smoothed time series; and F is an SG filter.
6. The method for detecting the abnormal time series data of the cloud platform as claimed in claim 1, wherein the process of extracting the feature vector of the normal data set comprises: extracting basic statistical characteristics, time domain characteristics and frequency domain characteristics of a normal data set; the basic statistical characteristics, the time domain characteristics and the frequency domain characteristics form characteristic vectors of the normal data set;
the basic statistical characteristics comprise mean value, variance, extreme value, wave band and power spectrum characteristics;
the time domain features comprise mean value, variance, extreme value, zero crossing point, boundary point, wavelength band length and peak value features;
the frequency domain features include power spectra, power density ratios, median frequencies, and average power frequency features.
7. The method as claimed in claim 6, wherein the anomaly data detection model is a oneClassSVM model.
8. A cloud platform time sequence data anomaly detection system is characterized by comprising a preprocessing module, a decomposition denoising module and a detection module;
the preprocessing module is used for acquiring original monitoring data of the cloud platform, and marking the data as sample data after the data is segmented by adopting a sliding window, wherein the sample data comprises normal samples and abnormal samples; then preprocessing the sample data;
the decomposition denoising module is used for decomposing the preprocessed sample data into various modal components through EMD processing, and smoothing denoising processing is carried out on the various modal components through a filter to obtain a smoothed time sequence to construct a normal data set;
the detection module is used for extracting a characteristic vector of a normal data set and inputting the characteristic vector into an abnormal data detection model to output abnormal detection data; and the abnormal data detection model adopts the minimum hypersphere to circle the normal data in a concentrated way, and outputs the abnormal detection data except the normal data.
9. An apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1 to 7 when executing the computer program.
10. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.
CN202211206362.7A2022-09-292022-09-29Cloud platform time sequence data anomaly detection method, system, equipment and mediumPendingCN115587009A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202211206362.7ACN115587009A (en)2022-09-292022-09-29Cloud platform time sequence data anomaly detection method, system, equipment and medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202211206362.7ACN115587009A (en)2022-09-292022-09-29Cloud platform time sequence data anomaly detection method, system, equipment and medium

Publications (1)

Publication NumberPublication Date
CN115587009Atrue CN115587009A (en)2023-01-10

Family

ID=84772701

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202211206362.7APendingCN115587009A (en)2022-09-292022-09-29Cloud platform time sequence data anomaly detection method, system, equipment and medium

Country Status (1)

CountryLink
CN (1)CN115587009A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116383754A (en)*2023-06-052023-07-04丹纳威奥贯通道系统(青岛)有限公司On-line monitoring system and method for production of locomotive accessories

Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103488800A (en)*2013-10-162014-01-01云南电力试验研究院(集团)有限公司电力研究院SVM (Support Vector Machine)-based power consumption abnormality detection method
CN109031422A (en)*2018-08-092018-12-18吉林大学A kind of seismic signal noise suppressing method based on CEEMDAN and Savitzky-Golay filtering
CN109032829A (en)*2018-07-232018-12-18腾讯科技(深圳)有限公司Data exception detection method, device, computer equipment and storage medium
CN109598152A (en)*2018-10-112019-04-09天津大学Hardware Trojan horse inspection optimization method based on EMD noise reduction data prediction
CN111507376A (en)*2020-03-202020-08-07厦门大学 A single-index anomaly detection method based on the fusion of multiple unsupervised methods
CN113591897A (en)*2021-05-282021-11-02济南浪潮数据技术有限公司Method, device and equipment for detecting monitoring data abnormity and readable medium
CN114298240A (en)*2021-12-302022-04-08中山大学 An active anomaly detection method for multivariate time series and related device
CN114844796A (en)*2022-04-292022-08-02济南浪潮数据技术有限公司Method, device and medium for detecting abnormity of time-series KPI

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103488800A (en)*2013-10-162014-01-01云南电力试验研究院(集团)有限公司电力研究院SVM (Support Vector Machine)-based power consumption abnormality detection method
CN109032829A (en)*2018-07-232018-12-18腾讯科技(深圳)有限公司Data exception detection method, device, computer equipment and storage medium
CN109031422A (en)*2018-08-092018-12-18吉林大学A kind of seismic signal noise suppressing method based on CEEMDAN and Savitzky-Golay filtering
CN109598152A (en)*2018-10-112019-04-09天津大学Hardware Trojan horse inspection optimization method based on EMD noise reduction data prediction
CN111507376A (en)*2020-03-202020-08-07厦门大学 A single-index anomaly detection method based on the fusion of multiple unsupervised methods
CN113591897A (en)*2021-05-282021-11-02济南浪潮数据技术有限公司Method, device and equipment for detecting monitoring data abnormity and readable medium
CN114298240A (en)*2021-12-302022-04-08中山大学 An active anomaly detection method for multivariate time series and related device
CN114844796A (en)*2022-04-292022-08-02济南浪潮数据技术有限公司Method, device and medium for detecting abnormity of time-series KPI

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116383754A (en)*2023-06-052023-07-04丹纳威奥贯通道系统(青岛)有限公司On-line monitoring system and method for production of locomotive accessories
CN116383754B (en)*2023-06-052023-08-18丹纳威奥贯通道系统(青岛)有限公司On-line monitoring system and method for production of locomotive accessories

Similar Documents

PublicationPublication DateTitle
CN113642754B (en)Complex industrial process fault prediction method based on RF noise reduction self-coding information reconstruction and time convolution network
Ji et al.A novel deep learning approach for anomaly detection of time series data
CN108985632A (en)A kind of electricity consumption data abnormality detection model based on isolated forest algorithm
CN113008583B (en) Method and device for state monitoring and abnormal automatic alarm of rotating machinery
CN113855037B (en)Atrial fibrillation identification method and device based on Transformer
CN112529678B (en) A Time-Series Anomaly Detection Method for Financial Indices Based on Self-Supervised Discriminative Network
CN118427670B (en)Online data monitoring method and system for battery exchange cabinet
CN110442600A (en)A kind of time series method for detecting abnormality
CN118468087B (en) Power distribution cabinet fault monitoring method, device, equipment and storage medium
Bounoua et al.Controller performance monitoring: A survey of problems and a review of approaches from a data-driven perspective with a focus on oscillations detection and diagnosis
CN118171167A (en)Early warning method and system for bearing capacity of uplift pile
Coursey et al.Remaining useful life estimation of hard disk drives using bidirectional lstm networks
CN117473275B (en)Energy consumption detection method for data center
US20250102401A1 (en)Method and apparatus of denoising mechanical vibration signal, medium, and device
CN118013443A (en)Online real-time vacuum dry pump abnormality detection method based on generation model algorithm
CN117251817A (en)Radar fault detection method, device, equipment and storage medium
CN115587009A (en)Cloud platform time sequence data anomaly detection method, system, equipment and medium
Chapman et al.A nonparametric approach to detecting changes in variance in locally stationary time series
CN119671069A (en) ERP data anomaly analysis method and system based on big data
CN118656772B (en)Real-time oil reservoir data stream processing and storage optimizing method
CN113222046B (en)Feature alignment self-encoder fault classification method based on filtering strategy
CN114189428A (en)Fault root cause analysis method and system of box-type wave division system and electronic equipment
Huang et al.FreqWave-TranDuD: A Multivariate Time Series Anomaly Detection Method Based on Wavelet and Fourier Transforms
CN118245923A (en)AIOps-based intelligent alarm analysis method and system
CN116910592A (en) Log detection method, device, electronic equipment and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp