CN117914629B

Movatterモバイル変換

Info

Publication number: CN117914629B
Application number: CN202410306380.5A
Authority: CN
Inventors: 陈鹏辉; 金琛森; 程似锦
Original assignee: Taizhou Big Data Development Co ltd
Current assignee: Taizhou Digital Group Co ltd
Priority date: 2024-03-18
Filing date: 2024-03-18
Publication date: 2024-05-28
Anticipated expiration: 2044-03-18
Also published as: CN117914629A

Abstract

The invention provides a network security detection method and a system, which relate to the technical field of network security detection, wherein the method comprises the following steps: performing feature importance ranking on the preprocessed data by adopting a feature extraction algorithm based on principal component analysis to obtain a representative feature subset; expanding and generating the feature subset by using a self-encoder to obtain network behavior feature time sequence data; decomposing the network behavior characteristic time sequence data to different time scales by adopting a wavelet transformation method; and carrying out abnormal point identification on time sequence data of different time scales through an abnormal detection algorithm to obtain abnormal behaviors deviating from a normal mode. The invention improves the detection accuracy.

Description

Translated fromChinese

一种网络安全检测方法及系统Network security detection method and system

技术领域Technical Field

本发明涉及网络安全检测技术领域，特别是指一种网络安全检测方法及系统。The present invention relates to the technical field of network security detection, and in particular to a network security detection method and system.

背景技术Background technique

随着网络技术的飞速发展，网络行为日益复杂，这给网络安全检测带来了巨大的挑战。网络行为特征时序数据作为反映网络状态和行为模式的重要信息源，在网络安全检测中发挥着关键作用。然而，原始网络行为特征时序数据往往维度高、噪声多、特征冗余，直接用于安全检测效果不佳，且易受到数据稀疏性和不平衡性的影响。With the rapid development of network technology, network behavior is becoming increasingly complex, which brings huge challenges to network security detection. As an important information source reflecting network status and behavior patterns, network behavior feature time series data plays a key role in network security detection. However, the original network behavior feature time series data is often high-dimensional, noisy, and feature redundant. It is not effective when used directly for security detection and is easily affected by data sparsity and imbalance.

传统的网络安全检测方法通常直接对原始数据进行处理，忽略了数据内部的结构和关联性，导致检测性能有限。此外，这些方法往往缺乏对特征重要度的有效评估，无法从海量特征中提炼出真正对检测有贡献的关键信息。Traditional network security detection methods usually process raw data directly, ignoring the internal structure and correlation of the data, resulting in limited detection performance. In addition, these methods often lack effective evaluation of feature importance and are unable to extract key information that truly contributes to detection from massive features.

发明内容Summary of the invention

本发明要解决的技术问题是提供一种网络安全检测方法及系统，提高了检测的准确性。The technical problem to be solved by the present invention is to provide a network security detection method and system, which improves the accuracy of detection.

为解决上述技术问题，本发明的技术方案如下：In order to solve the above technical problems, the technical solution of the present invention is as follows:

第一方面，一种网络安全检测方法，所述方法包括：In a first aspect, a network security detection method is provided, the method comprising:

对原始网络行为特征时序数据进行预处理，以得到预处理数据；Preprocessing the original network behavior characteristic time series data to obtain preprocessed data;

采用基于主成分分析的特征提取算法，对预处理数据进行特征重要度排序，以获取代表性的特征子集；A feature extraction algorithm based on principal component analysis is used to sort the feature importance of the preprocessed data to obtain a representative feature subset;

利用自编码器对特征子集进行扩展和生成，以得到网络行为特征时序数据；Use autoencoders to expand and generate feature subsets to obtain network behavior feature time series data;

采用小波变换方法，将网络行为特征时序数据分解到不同的时间尺度上；The wavelet transform method is used to decompose the network behavior characteristic time series data into different time scales;

通过异常检测算法，对不同时间尺度的时序数据进行异常点识别，以得到与正常模式偏离的异常行为。Through the anomaly detection algorithm, the time series data of different time scales are used to identify anomalies in order to obtain abnormal behaviors that deviate from the normal pattern.

进一步的，对原始网络行为特征时序数据进行预处理，以得到预处理数据，包括：Furthermore, the original network behavior characteristic time series data is preprocessed to obtain preprocessed data, including:

根据数据的特性和分析目标，确定一个初始的窗口大小；Determine an initial window size based on the characteristics of the data and the analysis objectives;

从数据的起始点开始，将窗口放置在数据上；Starting from the starting point of the data, place the window on the data;

计算当前窗口内所有数据的平均值；Calculate the average value of all data in the current window;

将计算出的平均值作为新的数据点记录在滤波后的数据集中；The calculated average value is recorded as a new data point in the filtered data set;

将窗口向右移动一个数据点的位置，重复步骤，直到窗口滑动到数据的末尾；Move the window to the right by one data point, and repeat the steps until the window slides to the end of the data;

在计算每个位置的平均值后，以得到一个新的数据点，新的数据点构成了预处理数据。After calculating the average value at each position, a new data point is obtained, which constitutes the preprocessed data.

进一步的，计算当前窗口内所有数据的平均值，包括：通过计算当前窗口内所有数据的平均值，其中，是滤波后的数据序列中的一个点，表示在原始数据序列中位置处的移动平均值，是移动平均滤波器的窗口大小，表示在计算一个点的移动平均值时要考虑的数据点的数量，窗口大小是一个正整数，是一个求和符号，表示对从到的所有项进行求和，是原始数据序列中的一个点，表示在位置处的数据值，在求和过程中，从0变化到，是权重向量中的一个元素，它对应于窗口中位置的数据点的权重。Furthermore, the average value of all data in the current window is calculated, including: Calculate the average value of all data in the current window, where is a point in the filtered data sequence, indicating the position in the original data sequence The moving average at is the window size of the moving average filter, which represents the number of data points to consider when calculating the moving average of a point. The window size is a positive integer. Is a summation symbol, indicating the sum of arrive Sum all the terms of is a point in the original data sequence, indicating that The data value at , in the summation process, From 0 to , is an element in the weight vector corresponding to the position in the window The weight of the data point.

进一步的，采用基于主成分分析的特征提取算法，对预处理数据进行特征重要度排序，以获取代表性的特征子集，包括：Furthermore, a feature extraction algorithm based on principal component analysis is used to sort the feature importance of the preprocessed data to obtain a representative feature subset, including:

获取预处理的数据集，预处理的数据集是一个包含n个样本总数和p个特征的数据矩阵X，其中，数据矩阵X为：；Get the preprocessed data set. The preprocessed data set is a data matrixX containing a total numberof n samples andp features, where the data matrixX is: ;

其中，每一行代表一个样本，每一列代表一个特征；Among them, each row represents a sample and each column represents a feature;

根据数据矩阵X，计算数据矩阵X中每个特征的均值，其中，均值向量为：；According to the data matrixX , calculate the mean of each feature in the data matrixX , where the mean vector for: ;

其中，表示第p个特征的均值，第j个特征的均值的计算公式为：，其中，j=1，2，…，p；i是一个索引，用于遍历数据集中的所有样本，i从1变化到n，表示数据矩阵X中第i行第j列的元素；in, represents the mean of thepth feature and the jth feature The calculation formula is: , wherej = 1, 2, ...,p ;i is an index used to traverse all samples in the data set,i varies from 1 ton , represents the element in the i-th row andj-th column of the data matrixX ;

根据均值向量，计算协方差矩阵，协方差矩阵表示为：；According to the mean vector , calculate the covariance matrix , the covariance matrix Expressed as: ;

其中，表示在协方差矩阵中，位于协方差矩阵中的第p行第p列的协方差；协方差矩阵中第行第列的协方差的计算公式为：；in, Represented in the covariance matrix In the covariance matrix The covariance of the pth row and pth column in ; the covariance matrix B Line Covariance of columns The calculation formula is: ;

其中，k=1，2，…，p，表示数据矩阵X中第i行第k列的元素，是第j个特征的均值，是第k个特征的均值；wherek = 1, 2, …,p , represents the element in thei-th row andk-th column of the data matrix X, is the mean of thejth feature, is the mean of thekth feature;

对协方差矩阵进行特征分解，得到协方差矩阵的特征向量和特征值；Perform eigendecomposition on the covariance matrix to obtain the eigenvectors and eigenvalues of the covariance matrix;

根据特征值的大小排序主成分，并计算累计解释方差的比例；Sort the principal components according to the size of the eigenvalues and calculate the proportion of the cumulative explained variance;

根据确定的主成分，获取主成分对应的特征向量，将原始特征投影到主成分构成的子空间上；According to the determined principal components, the eigenvectors corresponding to the principal components are obtained, and the original features are projected onto the subspace formed by the principal components;

分析每个原始特征在主成分中的相关性；Analyze the correlation of each original feature in the principal component;

根据每个原始特征在主成分中的相关性，对原始特征进行排序，以得到排序结果；According to the correlation of each original feature in the principal component, the original features are sorted to obtain the sorting results;

根据排序结果，获取代表性的特征子集。According to the sorting results, a representative feature subset is obtained.

进一步的，利用自编码器对特征子集进行扩展和生成，以得到网络行为特征时序数据，包括：Furthermore, the autoencoder is used to expand and generate feature subsets to obtain network behavior feature time series data, including:

确定输入的原始的网络行为特征时序数据；Determine the input original network behavior characteristic time series data;

使用自编码器将原始的网络行为特征时序数据编码为一个隐变量的分布；Use an autoencoder to encode the original network behavior feature time series data into a distribution of latent variables;

从编码器输出的隐变量分布中采样隐变量，使用自编码器将采样的隐变量解码为特征时序数据的候选集；Sample latent variables from the latent variable distribution output by the encoder, and use the autoencoder to decode the sampled latent variables into a candidate set of feature time series data;

从原始特征集中确定原始特征子集，根据原始特征子集，利用采样的隐变量和解码器生成与特定的特征子集相似的多个特征时序数据；Determine an original feature subset from the original feature set, and generate multiple feature time series data similar to the specific feature subset using the sampled latent variables and the decoder according to the original feature subset;

根据原始特征子集和与特定的特征子集相似的多个特征时序数据，形成一个扩展的特征时序数据集。An extended feature time series data set is formed according to the original feature subset and multiple feature time series data similar to the specific feature subset.

进一步的，采用小波变换方法，将网络行为特征时序数据分解到不同的时间尺度上，包括：Furthermore, the wavelet transform method is used to decompose the network behavior characteristic time series data into different time scales, including:

根据数据的复杂性和分析需求，确定小波变换的分解层数；Determine the number of decomposition layers of wavelet transform according to the complexity of the data and the analysis requirements;

根据网络行为特征时序数据、小波基函数以及分解层数，执行小波变换，以将原始数据分解成小波系数；According to the network behavior characteristic time series data, wavelet basis function and decomposition layer number, wavelet transform is performed to decompose the original data into wavelet coefficients;

分析小波系数，以得到在不同时间尺度上的行为模式。The wavelet coefficients are analyzed to obtain behavioral patterns at different time scales.

进一步的，通过异常检测算法，对不同时间尺度的时序数据进行异常点识别，以得到与正常模式偏离的异常行为，包括：Furthermore, anomaly detection algorithms are used to identify anomalies in time series data of different time scales to obtain abnormal behaviors that deviate from normal patterns, including:

使用基于密度的方法，对每个时间尺度的时序数据分别进行异常点识别，以得到识别数据；Using density-based methods, outliers are identified for time series data at each time scale to obtain identified data;

根据识别数据以及预设的阈值，以识别出每个时间尺度上的异常点。Based on the identification data and the preset threshold, the abnormal points on each time scale are identified.

第二方面，一种网络安全检测系统，包括：In a second aspect, a network security detection system includes:

获取模块，用于对原始网络行为特征时序数据进行预处理，以得到预处理数据；采用基于主成分分析的特征提取算法，对预处理数据进行特征重要度排序，以获取代表性的特征子集；The acquisition module is used to preprocess the original network behavior feature time series data to obtain preprocessed data; the feature extraction algorithm based on principal component analysis is used to sort the preprocessed data by feature importance to obtain a representative feature subset;

处理模块，用于利用自编码器对特征子集进行扩展和生成，以得到网络行为特征时序数据；采用小波变换方法，将网络行为特征时序数据分解到不同的时间尺度上；通过异常检测算法，对不同时间尺度的时序数据进行异常点识别，以得到与正常模式偏离的异常行为。The processing module is used to expand and generate feature subsets using an autoencoder to obtain network behavior feature time series data; use a wavelet transform method to decompose the network behavior feature time series data into different time scales; and use an anomaly detection algorithm to identify anomalies in the time series data of different time scales to obtain abnormal behaviors that deviate from normal patterns.

第三方面，一种计算设备，包括：According to a third aspect, a computing device includes:

一个或多个处理器；one or more processors;

存储装置，用于存储一个或多个程序，当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现上述方法。The storage device is used to store one or more programs. When the one or more programs are executed by the one or more processors, the one or more processors implement the above method.

第四方面，一种计算机可读存储介质，所述计算机可读存储介质中存储有程序，该程序被处理器执行时实现上述方法。In a fourth aspect, a computer-readable storage medium stores a program, and the program implements the above method when executed by a processor.

本发明的上述方案至少包括以下有益效果：The above solution of the present invention includes at least the following beneficial effects:

本发明的上述方案，通过预处理和特征提取步骤，本发明能够有效地消除原始数据中的噪声和冗余信息，并提取出对检测有贡献的关键特征，从而提高了检测的准确性。利用基于变分自编码器的数据增强方法，本发明能够对特征子集进行扩展和生成，丰富了数据集并提高了模型的泛化能力，使得检测方法在面对未知攻击时具有更好的适应性。采用小波变换方法将增强后的数据分解到不同的时间尺度上，能够揭示数据的内在结构和多尺度特征。通过异常检测算法对不同时间尺度的时序数据进行异常点识别，本发明能够准确发现与正常模式偏离的异常行为，提升了异常检测的性能和效率。本发明的方法具有灵活性和可扩展性，可以根据不同的网络环境和安全需求进行调整和优化。The above scheme of the present invention, through the preprocessing and feature extraction steps, can effectively eliminate the noise and redundant information in the original data, and extract the key features that contribute to the detection, thereby improving the accuracy of the detection. Using the data enhancement method based on the variational autoencoder, the present invention can expand and generate feature subsets, enrich the data set and improve the generalization ability of the model, so that the detection method has better adaptability when facing unknown attacks. The enhanced data is decomposed into different time scales by using the wavelet transform method, which can reveal the inherent structure and multi-scale characteristics of the data. By using the anomaly detection algorithm to identify anomalies in time series data of different time scales, the present invention can accurately detect abnormal behaviors that deviate from the normal mode, and improve the performance and efficiency of anomaly detection. The method of the present invention is flexible and scalable, and can be adjusted and optimized according to different network environments and security requirements.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明的实施例提供的网络安全检测方法的流程示意图。FIG1 is a schematic flow chart of a network security detection method provided by an embodiment of the present invention.

图2是本发明的实施例提供的网络安全检测系统示意图。FIG. 2 is a schematic diagram of a network security detection system provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将参照附图更细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。The exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although the exemplary embodiments of the present disclosure are shown in the accompanying drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. On the contrary, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

如图1所示，本发明的实施例提出一种网络安全检测方法，所述方法包括：As shown in FIG1 , an embodiment of the present invention provides a network security detection method, the method comprising:

步骤11，对原始网络行为特征时序数据进行预处理，以得到预处理数据；Step 11, preprocessing the original network behavior feature time series data to obtain preprocessed data;

步骤12，采用基于主成分分析的特征提取算法，对预处理数据进行特征重要度排序，以获取代表性的特征子集；Step 12, using a feature extraction algorithm based on principal component analysis to sort the preprocessed data by feature importance to obtain a representative feature subset;

步骤13，利用自编码器对特征子集进行扩展和生成，以得到网络行为特征时序数据；Step 13, using the autoencoder to expand and generate the feature subset to obtain network behavior feature time series data;

步骤14，采用小波变换方法，将网络行为特征时序数据分解到不同的时间尺度上；Step 14, using wavelet transform method to decompose the network behavior characteristic time series data into different time scales;

步骤15，通过异常检测算法，对不同时间尺度的时序数据进行异常点识别，以得到与正常模式偏离的异常行为。Step 15, using an anomaly detection algorithm, identify anomalies in the time series data at different time scales to obtain abnormal behaviors that deviate from the normal mode.

在本发明实施例中，步骤12，主成分分析能够有效地降低数据的维度，减少计算量和存储需求，通过对特征进行重要度排序，可以选择出对检测任务最有贡献的特征子集，提高检测精度，主成分分析能够消除原始特征之间的多重共线性问题。步骤13，自编码器能够学习数据的内在表示，并通过编码和解码过程生成新的数据样本，丰富数据集，通过数据增强，可以增加模型的泛化能力，使其在面对未知攻击时具有更好的鲁棒性，自编码器的隐藏层能够学习到数据的更深层次特征，有助于提升后续检测任务的性能。步骤14，小波变换能够提供数据在不同时间尺度上的信息，有助于捕捉网络行为的短期和长期变化模式，通过小波变换提取的特征更加细致和全面，能够反映数据的局部和全局特性，小波变换具有一定的去噪能力，能够进一步提纯数据信号，提高检测准确性。步骤15，异常检测算法能够有效地识别出与正常模式偏离的异常行为，及时发现潜在的安全威胁，通过对不同时间尺度的数据进行异常检测，能够更全面地捕捉异常模式，提高检测精度和召回率，异常检测算法通常具有较高的处理速度，能够满足实时检测的需求，及时响应安全事件。In the embodiment of the present invention, step 12, principal component analysis can effectively reduce the dimension of data, reduce the amount of calculation and storage requirements, and by sorting the features by importance, the feature subset that contributes most to the detection task can be selected to improve the detection accuracy. Principal component analysis can eliminate the multicollinearity problem between the original features. Step 13, the autoencoder can learn the intrinsic representation of the data, and generate new data samples through the encoding and decoding process to enrich the data set. Through data enhancement, the generalization ability of the model can be increased, making it more robust in the face of unknown attacks. The hidden layer of the autoencoder can learn the deeper features of the data, which helps to improve the performance of subsequent detection tasks. Step 14, wavelet transform can provide information on data at different time scales, which helps to capture the short-term and long-term change patterns of network behavior. The features extracted by wavelet transform are more detailed and comprehensive, and can reflect the local and global characteristics of the data. Wavelet transform has a certain denoising ability, which can further purify the data signal and improve the detection accuracy. Step 15, the anomaly detection algorithm can effectively identify abnormal behaviors that deviate from normal patterns and promptly discover potential security threats. By performing anomaly detection on data at different time scales, it can more comprehensively capture abnormal patterns and improve detection accuracy and recall rate. Anomaly detection algorithms usually have a high processing speed and can meet the needs of real-time detection and respond to security incidents in a timely manner.

在本发明一优选的实施例中，上述步骤11，包括：In a preferred embodiment of the present invention, the above step 11 includes:

步骤111，根据数据的特性和分析目标，确定一个初始的窗口大小；Step 111, determining an initial window size according to the characteristics of the data and the analysis objectives;

步骤112，从数据的起始点开始，将窗口放置在数据上；Step 112, starting from the starting point of the data, placing the window on the data;

步骤113，计算当前窗口内所有数据的平均值；Step 113, calculating the average value of all data in the current window;

步骤114，将计算出的平均值作为新的数据点记录在滤波后的数据集中；Step 114, recording the calculated average value as a new data point in the filtered data set;

步骤115，将窗口向右移动一个数据点的位置，重复步骤，直到窗口滑动到数据的末尾；Step 115, move the window to the right by the position of one data point, and repeat the steps until the window slides to the end of the data;

步骤116，在计算每个位置的平均值后，以得到一个新的数据点，新的数据点构成了预处理数据。Step 116, after calculating the average value of each position, a new data point is obtained, and the new data point constitutes the preprocessed data.

在本发明实施例中，步骤111，根据数据的特性和分析目标来确定窗口大小，可以确保预处理过程更加适应数据本身的特性，提高处理效果，初始窗口大小的选择为后续的数据处理提供了灵活性，可以根据实际需求进行调整。步骤112，从数据的起始点开始滑动窗口，确保每个数据点都有机会被包含在窗口内进行处理，实现数据的全面覆盖，通过窗口的滑动，可以将数据分成多个小段进行分析，有助于捕捉数据的局部特征。步骤113，计算窗口内数据的平均值可以起到平滑数据的作用，减少数据中的随机波动和噪声，平均值作为窗口内数据的代表，能够反映窗口内数据的整体水平和趋势。步骤114，使用平均值作为新的数据点，可以在保留数据主要特征的同时减少数据量，简化后续处理过程，平均值代替原始数据点可以减少噪声的影响，提高数据的信噪比。步骤115，通过逐个数据点移动窗口并进行处理，可以保持数据在时间上的连续性，便于后续的时序分析，滑动窗口的方式确保了每个数据点都被处理到，保证了数据处理的完整性。步骤116，通过在整个数据集上应用滑动平均方法，得到的新数据点集更加平滑，减少了原始数据中的波动和毛刺。In the embodiment of the present invention, step 111, determining the window size according to the characteristics of the data and the analysis target can ensure that the preprocessing process is more adapted to the characteristics of the data itself and improve the processing effect. The selection of the initial window size provides flexibility for subsequent data processing and can be adjusted according to actual needs. Step 112, sliding the window from the starting point of the data to ensure that each data point has the opportunity to be included in the window for processing, so as to achieve comprehensive coverage of the data. By sliding the window, the data can be divided into multiple small segments for analysis, which helps to capture the local characteristics of the data. Step 113, calculating the average value of the data in the window can play a role in smoothing the data and reduce random fluctuations and noise in the data. The average value, as a representative of the data in the window, can reflect the overall level and trend of the data in the window. Step 114, using the average value as a new data point can reduce the amount of data while retaining the main characteristics of the data and simplify the subsequent processing process. The average value replaces the original data point to reduce the impact of noise and improve the signal-to-noise ratio of the data. Step 115, by moving the window and processing each data point one by one, the continuity of the data in time can be maintained, which is convenient for subsequent time series analysis. The sliding window method ensures that each data point is processed and ensures the integrity of data processing. In step 116, by applying the sliding average method to the entire data set, the new data point set obtained is smoother, and the fluctuations and glitches in the original data are reduced.

在本发明一优选的实施例中，计算当前窗口内所有数据的平均值，包括：In a preferred embodiment of the present invention, calculating the average value of all data in the current window includes:

通过计算当前窗口内所有数据的平均值，其中，是滤波后的数据序列中的一个点，表示在原始数据序列中位置处的移动平均值，是移动平均滤波器的窗口大小，表示在计算一个点的移动平均值时要考虑的数据点的数量，窗口大小是一个正整数，是一个求和符号，表示对从到的所有项进行求和，是原始数据序列中的一个点，表示在位置处的数据值，在求和过程中，从0变化到，是权重向量中的一个元素，它对应于窗口中位置的数据点的权重。pass Calculate the average value of all data in the current window, where is a point in the filtered data sequence, indicating the position in the original data sequence The moving average at is the window size of the moving average filter, which represents the number of data points to consider when calculating the moving average of a point. The window size is a positive integer. Is a summation symbol, indicating the sum of arrive Sum all the terms of is a point in the original data sequence, indicating that The data value at , in the summation process, From 0 to , is an element in the weight vector corresponding to the position in the window The weight of the data point.

在本发明实施例中，通过引入权重向量，不仅能够计算窗口内数据的简单平均值，还能够根据数据点的重要性或可信度进行加权平均，这样的处理方式使得平均值更加灵活，且能够更好地反映数据的实际特性。使用加权平均方法计算窗口内数据的平均值，可以有效抑制数据中的随机噪声，权重向量的设置可以根据数据的噪声特性进行调整，以进一步减少噪声对平均值的影响。通过对不同的数据点赋予不同的权重，可以强化数据中的某些重要特征，使其在平均值计算中占据更大的比重，有助于在后续的异常检测中更准确地识别出与正常模式偏离的异常行为。权重向量的引入为数据预处理提供了额外的灵活性。根据不同的应用场景和数据特性，可以调整权重向量的设置，以达到最佳的数据处理效果。由于加权平均方法能够更好地适应数据的实际特性，因此可以提升整个网络安全检测算法的适应性和准确性。在面对复杂多变的网络环境和安全威胁时，这样的处理方法能够更有效地保障网络的安全。In the embodiment of the present invention, by introducing the weight vector, not only can the simple average value of the data in the window be calculated, but also the weighted average can be performed according to the importance or credibility of the data point. Such a processing method makes the average value more flexible and can better reflect the actual characteristics of the data. The average value of the data in the window can be calculated using the weighted average method, which can effectively suppress the random noise in the data. The setting of the weight vector can be adjusted according to the noise characteristics of the data to further reduce the impact of noise on the average value. By assigning different weights to different data points, certain important features in the data can be strengthened so that they occupy a larger proportion in the average value calculation, which helps to more accurately identify abnormal behaviors that deviate from the normal mode in subsequent anomaly detection. The introduction of the weight vector provides additional flexibility for data preprocessing. According to different application scenarios and data characteristics, the setting of the weight vector can be adjusted to achieve the best data processing effect. Since the weighted average method can better adapt to the actual characteristics of the data, it can improve the adaptability and accuracy of the entire network security detection algorithm. In the face of complex and changeable network environments and security threats, such a processing method can more effectively protect the security of the network.

在本发明一优选的实施例中，上述步骤12，包括：步骤121，获取预处理的数据集，预处理的数据集是一个包含n个样本总数和p个特征的数据矩阵X，其中，数据矩阵X为：；In a preferred embodiment of the present invention, the above step 12 includes: step 121, obtaining a preprocessed data set, the preprocessed data set is a data matrixX containing a total number ofn samples andp features, wherein the data matrixX is: ;

步骤122，根据数据矩阵X，计算数据矩阵X中每个特征的均值，其中，均值向量为：；Step 122, based on the data matrixX , calculate the mean of each feature in the data matrixX , where the mean vector for: ;

其中，表示第p个特征的均值，第个特征的均值的计算公式为：，其中，j=1，2，…，p；i是一个索引，用于遍历数据集中的所有样本，i从1变化到n，表示数据矩阵X中第i行第j列的元素；in, represents the mean of thepth feature, The mean of the features The calculation formula is: , wherej = 1, 2, ...,p ;i is an index used to traverse all samples in the data set,i varies from 1 ton , represents the element in the i-th row andj-th column of the data matrixX ;

步骤123，根据均值向量，计算协方差矩阵，协方差矩阵表示为：；Step 123, according to the mean vector , calculate the covariance matrix , the covariance matrix Expressed as: ;

步骤124，对协方差矩阵进行特征分解，得到协方差矩阵的特征向量和特征值；Step 124, performing eigendecomposition on the covariance matrix to obtain eigenvectors and eigenvalues of the covariance matrix;

步骤125，根据特征值的大小排序主成分，并计算累计解释方差的比例；Step 125, sorting the principal components according to the size of the eigenvalues, and calculating the proportion of the cumulative explained variance;

步骤126，根据确定的主成分，获取主成分对应的特征向量，将原始特征投影到主成分构成的子空间上；Step 126, according to the determined principal component, obtain the eigenvector corresponding to the principal component, and project the original feature onto the subspace formed by the principal component;

步骤127，分析每个原始特征在主成分中的相关性；Step 127, analyzing the correlation of each original feature in the principal component;

步骤128，根据每个原始特征在主成分中的相关性，对原始特征进行排序，以得到排序结果；Step 128, sorting the original features according to the correlation of each original feature in the principal component to obtain a sorting result;

步骤129，根据排序结果，获取代表性的特征子集。Step 129, obtaining a representative feature subset according to the sorting result.

在本发明实施例中，步骤124至步骤126实现了主成分分析（PCA）的核心过程，通过对数据集的协方差矩阵进行特征分解，并选择重要的主成分，成功地将原始高维数据降至低维空间，这种降维不仅简化了数据结构，还降低了后续处理的计算复杂度。PCA能够将原始特征转换为互不相关的新特征（主成分），步骤127和步骤128进一步分析了原始特征在主成分中的相关性，并根据这些相关性对特征进行了排序，这一过程有助于去除数据中的冗余信息，使得后续分析更加聚焦于真正重要的特征。通过选择累计解释方差比例较高的主成分，步骤125确保了降维后的数据仍然保留了原始数据集中的大部分重要信息，这样，在减少数据维度的同时，也最大限度地保留了数据的内在结构和模式。使用PCA进行预处理后，可以得到更加精简且相关性较低的特征子集（步骤129）。这样的特征子集不仅提高了后续机器学习模型的训练效率和性能，还有助于增强模型的可解释性，因为每个主成分都代表了原始特征的一种组合方式。当数据降至二维或三维时（步骤126中可能的情况），可以更容易地进行可视化分析。这有助于直观地理解数据的分布、聚类和异常值等关键信息，从而为进一步的探索性数据分析提供了便利。PCA对于数据中的噪声和离群点具有一定的稳健性。通过选择适当数量的主成分，可以在一定程度上减少这些不良因素对后续分析的影响。In an embodiment of the present invention, steps 124 to 126 implement the core process of principal component analysis (PCA). By performing feature decomposition on the covariance matrix of the data set and selecting important principal components, the original high-dimensional data is successfully reduced to a low-dimensional space. This dimension reduction not only simplifies the data structure, but also reduces the computational complexity of subsequent processing. PCA can convert the original features into new features (principal components) that are unrelated to each other. Steps 127 and 128 further analyze the correlation of the original features in the principal components and sort the features according to these correlations. This process helps to remove redundant information in the data, so that subsequent analysis focuses more on truly important features. By selecting principal components with a higher cumulative explained variance ratio, step 125 ensures that the data after dimensionality reduction still retains most of the important information in the original data set. In this way, while reducing the data dimension, the inherent structure and pattern of the data are also retained to the maximum extent. After preprocessing with PCA, a more streamlined and less correlated feature subset can be obtained (step 129). Such a feature subset not only improves the training efficiency and performance of the subsequent machine learning model, but also helps to enhance the interpretability of the model, because each principal component represents a combination of the original features. When the data is reduced to two or three dimensions (as is possible in step 126), it is easier to perform visual analysis. This helps to intuitively understand key information such as the distribution, clustering, and outliers of the data, thereby facilitating further exploratory data analysis. PCA is robust to noise and outliers in the data. By selecting an appropriate number of principal components, the impact of these undesirable factors on subsequent analysis can be reduced to a certain extent.

在本发明实施例中，步骤124对之前计算得到的协方差矩阵进行特征分解。特征分解是一种将矩阵分解为其特征向量和特征值的方法，协方差矩阵的特征向量表示数据的主要变化方向，而特征值则表示这些方向上的变化量。特征分解有助于理解数据的内在结构和模式，特征值和特征向量为接下来的数据降维提供了基础。步骤125，主成分（PCs）是通过将特征值从大到小排序来确定的，每个主成分解释的方差比例可以通过其对应的特征值与所有特征值之和的比率来计算，累计解释方差的比例则是选择前几个主成分所能解释的总方差的比例；通过选择解释大部分方差的前几个主成分，可以在保留重要信息的同时降低数据的维度，累计解释方差的比例帮助确定需要保留多少个主成分以保留足够的信息。In an embodiment of the present invention, step 124 performs eigendecomposition on the previously calculated covariance matrix. Eigendecomposition is a method of decomposing a matrix into its eigenvectors and eigenvalues. The eigenvectors of the covariance matrix represent the main change directions of the data, while the eigenvalues represent the amount of change in these directions. Eigendecomposition helps to understand the inherent structure and pattern of the data, and the eigenvalues and eigenvectors provide the basis for the subsequent data dimensionality reduction. Step 125, the principal components (PCs) are determined by sorting the eigenvalues from large to small, and the proportion of variance explained by each principal component can be calculated by the ratio of its corresponding eigenvalue to the sum of all eigenvalues. The proportion of the cumulative explained variance is the proportion of the total variance that can be explained by selecting the first few principal components; by selecting the first few principal components that explain most of the variance, the dimension of the data can be reduced while retaining important information, and the proportion of the cumulative explained variance helps determine how many principal components need to be retained to retain enough information.

步骤126，选择与前几个主成分相对应的特征向量，并将原始数据投影到由这些特征向量构成的低维子空间上，这个过程实际上是将原始数据转换为新的坐标系，其中坐标轴是主成分。投影后的数据在新的坐标系下具有更简单的结构，便于分析和处理，主成分之间是正交的，即它们是不相关的。这有助于消除原始特征之间的多重共线性问题。步骤127，通过分析每个原始特征在主成分上的载荷（即原始特征在主成分上的系数），可以了解每个原始特征对主成分的贡献程度以及它们之间的相关性，有助于理解每个原始特征在数据集中的重要性和作用。步骤128，根据每个原始特征在主成分中的相关性大小对其进行排序，排序的结果可以帮助确定哪些特征对主成分（即数据的主要变化方向）的贡献最大；排序结果提供了原始特征重要性的直观表示，有助于识别关键特征。步骤129，根据特征的排序结果选择一个代表性的特征子集。这个子集可能包含对主成分贡献最大的几个特征，或者根据某种阈值标准来选择，通过选择一个较小的特征子集，可以降低模型的复杂性，减少过拟合的风险，使用更少的特征可以加快模型的训练速度和提高计算效率。Step 126, select the eigenvectors corresponding to the first few principal components, and project the original data onto the low-dimensional subspace formed by these eigenvectors. This process actually converts the original data into a new coordinate system, in which the coordinate axes are the principal components. The projected data has a simpler structure in the new coordinate system, which is convenient for analysis and processing. The principal components are orthogonal, that is, they are uncorrelated. This helps to eliminate the multicollinearity problem between the original features. Step 127, by analyzing the load of each original feature on the principal component (that is, the coefficient of the original feature on the principal component), it is possible to understand the contribution of each original feature to the principal component and the correlation between them, which helps to understand the importance and role of each original feature in the data set. Step 128, sort each original feature according to its correlation in the principal component. The sorting result can help determine which features contribute the most to the principal component (that is, the main direction of change of the data); the sorting result provides an intuitive representation of the importance of the original features, which helps to identify key features. Step 129, select a representative feature subset based on the sorting result of the features. This subset may contain several features that contribute most to the principal component, or it may be selected based on some threshold criteria. By selecting a smaller feature subset, the complexity of the model can be reduced and the risk of overfitting can be reduced. Using fewer features can speed up model training and improve computational efficiency.

在本发明一优选的实施例中，上述步骤13，包括：In a preferred embodiment of the present invention, the above step 13 includes:

步骤131，确定输入的原始的网络行为特征时序数据；Step 131, determining the input original network behavior feature time series data;

步骤132，使用自编码器将原始的网络行为特征时序数据编码为一个隐变量的分布；Step 132, using an autoencoder to encode the original network behavior feature time series data into a distribution of a latent variable;

步骤133，从编码器输出的隐变量分布中采样隐变量，使用自编码器将采样的隐变量解码为特征时序数据的候选集；Step 133, sampling latent variables from the latent variable distribution output by the encoder, and using an autoencoder to decode the sampled latent variables into a candidate set of feature time series data;

步骤134，从原始特征集中确定原始特征子集，根据原始特征子集，利用采样的隐变量和解码器生成与特定的特征子集相似的多个特征时序数据；Step 134, determining an original feature subset from the original feature set, and generating a plurality of feature time series data similar to the specific feature subset using the sampled latent variables and the decoder according to the original feature subset;

步骤135，根据原始特征子集和与特定的特征子集相似的多个特征时序数据，形成一个扩展的特征时序数据集。Step 135 , forming an extended feature time series data set based on the original feature subset and a plurality of feature time series data similar to the specific feature subset.

在本发明实施例中，步骤131，涉及确定和收集用于分析的原始网络行为特征时序数据，时序数据是指按照时间顺序排列的数据，例如网络流量、用户行为记录等。通过分析时序数据，可以更好地理解网络行为模式和异常。步骤132，自编码器是一种无监督的神经网络模型，由编码器和解码器两部分组成。编码器将输入数据压缩为一个低维的隐变量表示，而解码器则尝试从这个隐变量表示重构原始输入，在这一步中，原始的网络行为特征时序数据被编码为一个隐变量的分布。隐变量表示通常具有更低的维度，有助于减少数据的复杂性，编码器学习数据的内在结构和重要特征。In an embodiment of the present invention, step 131 involves determining and collecting raw network behavior feature time series data for analysis. Time series data refers to data arranged in chronological order, such as network traffic, user behavior records, etc. By analyzing time series data, network behavior patterns and anomalies can be better understood. Step 132, the autoencoder is an unsupervised neural network model consisting of an encoder and a decoder. The encoder compresses the input data into a low-dimensional latent variable representation, while the decoder attempts to reconstruct the original input from this latent variable representation. In this step, the raw network behavior feature time series data is encoded into a distribution of latent variables. The latent variable representation usually has a lower dimension, which helps to reduce the complexity of the data. The encoder learns the intrinsic structure and important features of the data.

步骤133，从编码器输出的隐变量分布中随机采样隐变量，然后使用解码器将这些采样的隐变量解码为可能的特征时序数据候选集。能够生成与原始数据相似的新数据，有助于数据增强，通过采样不同的隐变量，可以探索数据的不同表示和潜在结构。步骤134，选择原始特征集的一个子集，并利用之前采样的隐变量和解码器生成与该特征子集相似的新特征时序数据，通过选择特定的特征子集，可以专注于分析最重要的特征，生成与选定特征子集相似的新数据，用于扩展数据集和增强模型训练。步骤135，将原始特征子集和生成的新特征时序数据合并，形成一个更大的、扩展的特征时序数据集通过合并原始数据和生成的数据，得到一个更大的数据集，有助于改善模型的训练效果，扩展的数据集包含更多的数据变异性和模式，有助于提高模型的泛化能力。Step 133, randomly sample latent variables from the latent variable distribution output by the encoder, and then use the decoder to decode these sampled latent variables into possible candidate sets of feature time series data. The ability to generate new data similar to the original data is helpful for data enhancement. By sampling different latent variables, different representations and potential structures of the data can be explored. Step 134, select a subset of the original feature set, and use the previously sampled latent variables and decoders to generate new feature time series data similar to the feature subset. By selecting a specific feature subset, you can focus on analyzing the most important features and generate new data similar to the selected feature subset for expanding the data set and enhancing model training. Step 135, merge the original feature subset and the generated new feature time series data to form a larger, expanded feature time series data set. By merging the original data and the generated data, a larger data set is obtained, which helps to improve the training effect of the model. The expanded data set contains more data variability and patterns, which helps to improve the generalization ability of the model.

在本发明一优选的实施例中，上述步骤14，包括：In a preferred embodiment of the present invention, the above step 14 includes:

步骤141，根据数据的复杂性和分析需求，确定小波变换的分解层数；Step 141, determining the number of decomposition layers of wavelet transform according to the complexity of the data and the analysis requirements;

步骤142，根据网络行为特征时序数据、小波基函数以及分解层数，执行小波变换，以将原始数据分解成小波系数；Step 142, performing wavelet transform according to the network behavior characteristic time series data, wavelet basis functions and the number of decomposition layers to decompose the original data into wavelet coefficients;

步骤143，分析小波系数，以得到在不同时间尺度上的行为模式。Step 143, analyzing the wavelet coefficients to obtain behavior patterns at different time scales.

在本发明实施例中，步骤141，小波变换是一种在时间和频率上都具有良好局部化特性的信号处理方法，分解层数决定了小波变换将信号分解到的细节级别。更高的分解层数意味着可以捕捉到更精细的数据特征，但也可能增加计算复杂性和噪声。根据数据的特性选择分解层数，使得分析更加适应数据本身的复杂性，避免不必要的过度分解，从而提高计算效率。步骤142，使用选定的小波基函数（如Haar）和确定的分解层数，对网络行为特征时序数据执行小波变换。小波变换将原始数据分解成一系列小波系数，这些系数表示了数据在不同时间尺度和频率上的特征。小波变换提供了多尺度分析的能力，使得能够同时观察数据的全局和局部特征，通过小波系数，可以有效地提取网络行为特征时序数据中的关键信息。步骤143，对从小波变换得到的小波系数进行分析。通过观察不同尺度上的小波系数变化，可以发现网络行为在不同时间尺度上的模式、趋势和异常。通过分析小波系数，可以有效地识别出网络行为在不同时间尺度上的模式，小波系数的突变或异常值可能指示网络行为中的异常事件，有助于及时发现和处理网络问题，通过分析小波系数的变化趋势，可以对网络行为的未来趋势进行一定程度的预测。In an embodiment of the present invention, step 141, wavelet transform is a signal processing method with good localization characteristics in both time and frequency, and the number of decomposition layers determines the level of detail to which the wavelet transform decomposes the signal. A higher number of decomposition layers means that more refined data features can be captured, but it may also increase computational complexity and noise. The number of decomposition layers is selected according to the characteristics of the data, so that the analysis is more adapted to the complexity of the data itself, avoiding unnecessary over-decomposition, thereby improving computational efficiency. Step 142, using the selected wavelet basis function (such as Haar) and the determined number of decomposition layers, perform wavelet transform on the network behavior feature time series data. The wavelet transform decomposes the original data into a series of wavelet coefficients, which represent the characteristics of the data at different time scales and frequencies. The wavelet transform provides the ability of multi-scale analysis, making it possible to observe the global and local characteristics of the data at the same time. Through the wavelet coefficients, the key information in the network behavior feature time series data can be effectively extracted. Step 143, analyze the wavelet coefficients obtained from the wavelet transform. By observing the changes in wavelet coefficients at different scales, the patterns, trends and anomalies of network behavior at different time scales can be discovered. By analyzing the wavelet coefficients, we can effectively identify the patterns of network behavior on different time scales. Sudden changes or outliers in the wavelet coefficients may indicate abnormal events in network behavior, which helps to timely discover and deal with network problems. By analyzing the changing trends of the wavelet coefficients, we can predict the future trends of network behavior to a certain extent.

在本发明一优选的实施例中，上述步骤15，包括：In a preferred embodiment of the present invention, the above step 15 includes:

步骤151，使用基于密度的方法，对每个时间尺度的时序数据分别进行异常点识别，以得到识别数据；Step 151, using a density-based method, performing outlier identification on the time series data of each time scale to obtain identification data;

步骤152，根据识别数据以及预设的阈值，以识别出每个时间尺度上的异常点。Step 152, identifying abnormal points on each time scale according to the identification data and a preset threshold.

在本发明实施例中，步骤151，对每个时间尺度的网络行为特征时序数据应用基于密度的方法（如DBSCAN、LOF算法等）进行异常点识别，基于密度的方法通过考察数据点之间的密度差异来识别异常点，即那些在低密度区域中的数据点被认为是异常的。基于密度的方法对于不同分布和形状的数据集都有较好的适应性，不需要事先假设数据的分布形式，通过考虑数据点之间的密度关系，能够更准确地识别出那些与周围数据点明显不同的异常点。对每个时间尺度的数据分别进行处理，能够捕捉到不同时间尺度上的异常行为模式。步骤152，根据步骤151中得到的识别数据（即每个数据点的异常得分或标签），结合预设的阈值来判断哪些数据点是异常点，例如，可以设置一个异常得分阈值，得分高于该阈值的数据点被认为是异常的。通过设定阈值，可以根据实际需求调整异常点的识别灵敏度，使得分析更加灵活，阈值的引入提供了一个明确的判断标准，使得异常点的识别结果更加清晰和易于解释，结合阈值判断，能够有效地检测出每个时间尺度上的异常点，有助于及时发现网络行为中的异常情况并进行处理。In an embodiment of the present invention, step 151, a density-based method (such as DBSCAN, LOF algorithm, etc.) is applied to the network behavior feature time series data of each time scale to identify abnormal points. The density-based method identifies abnormal points by examining the density difference between data points, that is, those data points in low-density areas are considered abnormal. The density-based method has good adaptability to data sets of different distributions and shapes, and does not need to assume the distribution form of the data in advance. By considering the density relationship between data points, it can more accurately identify those abnormal points that are significantly different from the surrounding data points. The data of each time scale is processed separately, and abnormal behavior patterns on different time scales can be captured. Step 152, based on the identification data obtained in step 151 (that is, the abnormal score or label of each data point), combined with a preset threshold, it is determined which data points are abnormal points. For example, an abnormal score threshold can be set, and data points with scores higher than the threshold are considered abnormal. By setting the threshold, the sensitivity of anomaly recognition can be adjusted according to actual needs, making the analysis more flexible. The introduction of the threshold provides a clear judgment standard, making the anomaly recognition results clearer and easier to explain. Combined with the threshold judgment, it can effectively detect anomalies on each time scale, which helps to timely discover and deal with abnormal situations in network behavior.

在具体的应用时，可以应用于检测云数据中心的网络入侵，云数据中心是企业存储和处理关键业务数据的重要场所，因此其网络安全至关重要。网络入侵是云数据中心面临的主要威胁之一，它可能导致数据泄露、服务中断和系统瘫痪等严重后果。在这个场景中，将使用上述的网络安全检测方法来检测云数据中心的网络入侵行为。In specific applications, it can be used to detect network intrusions in cloud data centers. Cloud data centers are important places for enterprises to store and process key business data, so their network security is crucial. Network intrusion is one of the main threats facing cloud data centers, which may lead to serious consequences such as data leakage, service interruption and system paralysis. In this scenario, the above network security detection method will be used to detect network intrusions in cloud data centers.

具体使用过程为：The specific usage process is:

首先，收集云数据中心的网络流量数据，这些数据包括网络包的大小、传输时间、源IP地址、目标IP地址、端口号等信息，这些数据构成了原始的网络行为特征时序数据。First, the network traffic data of the cloud data center is collected. This data includes the size of the network packet, transmission time, source IP address, destination IP address, port number and other information. These data constitute the original network behavior feature time series data.

由于原始数据可能包含噪声、冗余信息和不一致性，因此需要进行预处理。预处理可以包括数据清洗（去除重复和无效数据）、数据归一化（将数据缩放到统一范围）和数据平滑（使用移动平均滤波器等方法去除短期波动）。预处理后的数据将作为后续分析的输入。Since raw data may contain noise, redundant information, and inconsistencies, preprocessing is required. Preprocessing can include data cleaning (removing duplicate and invalid data), data normalization (scaling data to a uniform range), and data smoothing (using methods such as moving average filters to remove short-term fluctuations). The preprocessed data will serve as input for subsequent analysis.

接下来，使用基于主成分分析（PCA）的特征提取算法对预处理后的数据进行处理，PCA是一种降维技术，可以将高维数据转换为低维空间中的表示，同时保留数据中的主要变化模式。Next, the preprocessed data is processed using a feature extraction algorithm based on principal component analysis (PCA), which is a dimensionality reduction technique that can transform high-dimensional data into a representation in a low-dimensional space while retaining the main patterns of variation in the data.

通过计算数据的协方差矩阵并进行特征分解，PCA可以提取出数据的主成分，并按照其对应的特征值进行排序，选择累计解释方差比例较高的主成分作为代表性的特征子集，这些特征子集能够有效地表示原始数据的结构，并去除冗余和相关性较低的特征。By calculating the covariance matrix of the data and performing eigendecomposition, PCA can extract the principal components of the data, sort them according to their corresponding eigenvalues, and select principal components with a higher cumulative explained variance ratio as representative feature subsets. These feature subsets can effectively represent the structure of the original data and remove redundant and less relevant features.

为了增强模型的泛化能力和检测性能，可以使用自编码器对代表性的特征子集进行扩展和生成，自编码器是一种无监督的神经网络模型，它可以学习数据的内在表示并生成新的数据样本。To enhance the generalization and detection performance of the model, representative feature subsets can be expanded and generated using autoencoders, which are unsupervised neural network models that can learn the intrinsic representation of data and generate new data samples.

通过训练自编码器，将代表性的特征子集作为输入，并使其通过编码器和解码器的过程进行重构，在编码阶段，自编码器将输入数据压缩为隐变量的表示；在解码阶段，自编码器从隐变量中恢复原始数据，通过这种方式，可以生成与原始特征子集相似的多个网络行为特征时序数据样本。By training the autoencoder, a representative feature subset is taken as input and reconstructed through the encoder and decoder process. In the encoding stage, the autoencoder compresses the input data into a representation of latent variables; in the decoding stage, the autoencoder restores the original data from the latent variables. In this way, multiple network behavior feature time series data samples similar to the original feature subset can be generated.

随后，采用小波变换方法对生成的网络行为特征时序数据进行多尺度分析，小波变换是一种时频分析方法，它可以将信号分解成不同频率和时间尺度的成分。Subsequently, the wavelet transform method is used to perform multi-scale analysis on the generated network behavior characteristic time series data. Wavelet transform is a time-frequency analysis method that can decompose the signal into components of different frequencies and time scales.

选择合适的小波基函数和分解层数，对生成的特征时序数据执行小波变换。这将使得数据被分解到不同的时间尺度上，从而能够捕捉到网络流量的长期趋势、周期性变化和突发行为等特征，通过对小波系数的分析，可以揭示出网络行为在不同时间尺度上的模式。Select appropriate wavelet basis functions and decomposition layers, and perform wavelet transform on the generated characteristic time series data. This will decompose the data into different time scales, so as to capture the long-term trend, periodic changes and sudden behavior of network traffic. By analyzing the wavelet coefficients, the patterns of network behavior at different time scales can be revealed.

最后，使用异常检测算法对不同时间尺度的网络行为特征时序数据进行异常点识别，异常检测算法可以采用基于统计的方法、机器学习算法或深度学习模型等。Finally, anomaly detection algorithms are used to identify anomalies in the network behavior feature time series data at different time scales. The anomaly detection algorithms can adopt statistical-based methods, machine learning algorithms, or deep learning models.

通过对正常网络流量的行为模式进行建模和学习，异常检测算法能够识别出与正常模式偏离的异常行为，这些异常行为可能表示网络入侵、恶意流量或其他安全威胁，当检测到异常点时，可以及时触发警报并采取相应的安全措施来应对潜在的入侵行为。By modeling and learning the behavioral patterns of normal network traffic, anomaly detection algorithms can identify abnormal behaviors that deviate from normal patterns. These abnormal behaviors may indicate network intrusion, malicious traffic or other security threats. When anomalies are detected, alarms can be triggered in a timely manner and corresponding security measures can be taken to deal with potential intrusions.

通过将上述网络安全检测方法应用于云数据中心的网络入侵检测场景中，可以有效地识别和处理网络流量中的异常行为。通过数据预处理、特征提取与排序、特征扩展与生成、多尺度分析和异常检测等步骤的结合使用，可以提高云数据中心的网络安全性和可靠性，保护关键业务数据免受潜在的网络威胁。By applying the above network security detection method to the network intrusion detection scenario of cloud data centers, abnormal behaviors in network traffic can be effectively identified and processed. By combining data preprocessing, feature extraction and sorting, feature expansion and generation, multi-scale analysis and anomaly detection, the network security and reliability of cloud data centers can be improved, and critical business data can be protected from potential network threats.

如图2所示，本发明的实施例还提供一种网络安全检测系统20，包括：As shown in FIG. 2 , an embodiment of the present invention further provides a network security detection system 20, comprising:

获取模块21，用于对原始网络行为特征时序数据进行预处理，以得到预处理数据；采用基于主成分分析的特征提取算法，对预处理数据进行特征重要度排序，以获取代表性的特征子集；The acquisition module 21 is used to preprocess the original network behavior feature time series data to obtain preprocessed data; use a feature extraction algorithm based on principal component analysis to sort the preprocessed data by feature importance to obtain a representative feature subset;

处理模块22，用于利用自编码器对特征子集进行扩展和生成，以得到网络行为特征时序数据；采用小波变换方法，将网络行为特征时序数据分解到不同的时间尺度上；通过异常检测算法，对不同时间尺度的时序数据进行异常点识别，以得到与正常模式偏离的异常行为。The processing module 22 is used to expand and generate feature subsets using an autoencoder to obtain network behavior feature time series data; use a wavelet transform method to decompose the network behavior feature time series data into different time scales; and use an anomaly detection algorithm to identify anomalies in the time series data of different time scales to obtain abnormal behaviors that deviate from the normal mode.

可选的，对原始网络行为特征时序数据进行预处理，以得到预处理数据，包括：Optionally, the original network behavior feature time series data is preprocessed to obtain preprocessed data, including:

可选的，计算当前窗口内所有数据的平均值，包括：Optionally, calculate the average of all data in the current window, including:

通过，计算当前窗口内所有数据的平均值，其中，是滤波后的数据序列中的一个点，表示在原始数据序列中位置处的移动平均值，是移动平均滤波器的窗口大小，表示在计算一个点的移动平均值时要考虑的数据点的数量，窗口大小是一个正整数，是一个求和符号，表示对从到的所有项进行求和，是原始数据序列中的一个点，表示在位置处的数据值，在求和过程中，从0变化到，是权重向量中的一个元素，它对应于窗口中位置的数据点的权重。pass , calculate the average value of all data in the current window, where, is a point in the filtered data sequence, indicating the position in the original data sequence The moving average at is the window size of the moving average filter, which represents the number of data points to consider when calculating the moving average of a point. The window size is a positive integer. Is a summation symbol, indicating the sum of arrive Sum all the terms of is a point in the original data sequence, indicating that The data value at , in the summation process, From 0 to , is an element in the weight vector corresponding to the position in the window The weight of the data point.

可选的，采用基于主成分分析的特征提取算法，对预处理数据进行特征重要度排序，以获取代表性的特征子集，包括：Optionally, a feature extraction algorithm based on principal component analysis is used to sort the preprocessed data by feature importance to obtain a representative feature subset, including:

获取预处理的数据集，预处理的数据集是一个包含n个样本总数和p个特征的数据矩阵X，其中，数据矩阵X为：；Get the preprocessed data set. The preprocessed data set is a datamatrix X containing a total numberof n samples andp features , where the data matrixX is: ;

可选的，利用自编码器对特征子集进行扩展和生成，以得到网络行为特征时序数据，包括：Optionally, an autoencoder is used to expand and generate feature subsets to obtain network behavior feature time series data, including:

可选的，采用小波变换方法，将网络行为特征时序数据分解到不同的时间尺度上，包括：Optionally, a wavelet transform method is used to decompose the network behavior characteristic time series data into different time scales, including:

分析小波系数，以得到在不同时间尺度上的行为模式。The wavelet coefficients are analyzed to obtain the behavior patterns at different time scales.

可选的，通过异常检测算法，对不同时间尺度的时序数据进行异常点识别，以得到与正常模式偏离的异常行为，包括：Optionally, anomaly detection algorithms are used to identify anomalies in time series data at different time scales to obtain abnormal behaviors that deviate from normal patterns, including:

需要说明的是，该装置是与上述方法相对应的装置，上述方法实施例中的所有实现方式均适用于该实施例中，也能达到相同的技术效果。It should be noted that the device is a device corresponding to the above method, and all implementation methods in the above method embodiment are applicable to this embodiment and can achieve the same technical effect.

本发明的实施例还提供一种计算设备，包括：处理器、存储有计算机程序的存储器，所述计算机程序被处理器运行时，执行如上所述的方法。上述方法实施例中的所有实现方式均适用于该实施例中，也能达到相同的技术效果。The embodiment of the present invention further provides a computing device, comprising: a processor, a memory storing a computer program, wherein when the computer program is executed by the processor, the method described above is executed. All implementations in the above method embodiment are applicable to this embodiment and can achieve the same technical effect.

本发明的实施例还提供一种计算机可读存储介质，存储指令，当所述指令在计算机上运行时，使得计算机执行如上所述的方法。上述方法实施例中的所有实现方式均适用于该实施例中，也能达到相同的技术效果。The embodiment of the present invention also provides a computer-readable storage medium storing instructions, which, when executed on a computer, enable the computer to execute the method described above. All implementations in the above method embodiment are applicable to this embodiment and can achieve the same technical effect.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and units described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.

在本发明所提供的实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the embodiments provided by the present invention, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art, or the part of the technical solution, can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several instructions for a computer device (which can be a personal computer, server, or network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: various media that can store program codes, such as USB flash drives, mobile hard disks, ROM, RAM, magnetic disks, or optical disks.

此外，需要指出的是，在本发明的装置和方法中，显然，各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本发明的等效方案。并且，执行上述系列处理的步骤可以自然地按照说明的顺序按时间顺序执行，但是并不需要一定按照时间顺序执行，某些步骤可以并行或彼此独立地执行。对本领域的普通技术人员而言，能够理解本发明的方法和装置的全部或者任何步骤或者部件，可以在任何计算装置（包括处理器、存储介质等）或者计算装置的网络中，以硬件、固件、软件或者它们的组合加以实现，这是本领域普通技术人员在阅读了本发明的说明的情况下运用他们的基本编程技能就能实现的。In addition, it should be pointed out that in the apparatus and method of the present invention, it is obvious that each component or each step can be decomposed and/or recombined. These decompositions and/or recombinations should be regarded as equivalent schemes of the present invention. Moreover, the steps of performing the above series of processing can naturally be performed in chronological order according to the order of description, but it is not necessary to perform them in chronological order, and some steps can be performed in parallel or independently of each other. For those of ordinary skill in the art, it is understandable that all or any steps or components of the method and apparatus of the present invention can be implemented in hardware, firmware, software or a combination thereof in any computing device (including processors, storage media, etc.) or a network of computing devices, which can be achieved by those of ordinary skill in the art using their basic programming skills after reading the description of the present invention.

因此，本发明的目的还可以通过在任何计算装置上运行一个程序或者一组程序来实现。所述计算装置可以是公知的通用装置。因此，本发明的目的也可以仅仅通过提供包含实现所述方法或者装置的程序代码的程序产品来实现。也就是说，这样的程序产品也构成本发明，并且存储有这样的程序产品的存储介质也构成本发明。显然，所述存储介质可以是任何公知的存储介质或者将来所开发出来的任何存储介质。还需要指出的是，在本发明的装置和方法中，显然，各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本发明的等效方案。并且，执行上述系列处理的步骤可以自然地按照说明的顺序按时间顺序执行，但是并不需要一定按照时间顺序执行。某些步骤可以并行或彼此独立地执行。Therefore, the purpose of the present invention can also be achieved by running a program or a group of programs on any computing device. The computing device can be a well-known general device. Therefore, the purpose of the present invention can also be achieved by simply providing a program product containing a program code that implements the method or device. That is to say, such a program product also constitutes the present invention, and the storage medium storing such a program product also constitutes the present invention. Obviously, the storage medium can be any well-known storage medium or any storage medium developed in the future. It should also be pointed out that in the device and method of the present invention, it is obvious that each component or each step can be decomposed and/or recombined. These decompositions and/or recombinations should be regarded as equivalent schemes of the present invention. In addition, the steps of performing the above-mentioned series of processing can naturally be performed in chronological order according to the order of description, but it is not necessary to perform them in chronological order. Some steps can be performed in parallel or independently of each other.

以上所述是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明所述原理的前提下，还可以作出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above is a preferred embodiment of the present invention. It should be pointed out that for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principles of the present invention. These improvements and modifications should also be regarded as the scope of protection of the present invention.