CN102457525A

Movatterモバイル変換

Info

Publication number: CN102457525A
Application number: CN2011104246134A
Authority: CN
Inventors: 李继国; 刘杭州; 张亦辰
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2011-12-19
Filing date: 2011-12-19
Publication date: 2012-05-16

Abstract

本发明公开了一种基于负荷的异常入侵检测方法，属于计算网络安全技术领域。本发明在进行异常入侵检测时，先利用CPP算法对待检测数据包负荷进行分块，然后仅对前N块数据进行特征提取，从而减小了数据处理量，提高检测速度，对高速网络的监视有更好的适应性。本发明进一步地采用多分类器系统构建正常通信轮廓，提高了检测的准确率。本发明还公开了一种基于负荷的异常入侵检测系统，包括基于CPP的负荷分块模块、特征提取模块、检测模块和响应模块。相比现有技术，本发明可实现高速网络的异常入侵快速检测。

The invention discloses a load-based abnormal intrusion detection method, which belongs to the technical field of computing network security. When performing abnormal intrusion detection, the present invention first uses the CPP algorithm to block the load of the data packet to be detected, and then only performs feature extraction on the first N blocks of data, thereby reducing the amount of data processing, improving the detection speed, and monitoring the high-speed network. Have better adaptability. The present invention further adopts a multi-classifier system to construct the normal communication profile, which improves the detection accuracy. The invention also discloses a load-based abnormal intrusion detection system, which includes a CPP-based load block module, feature extraction module, detection module and response module. Compared with the prior art, the invention can realize the fast detection of abnormal intrusion of the high-speed network.

Description

Translated fromChinese

一种基于负荷的异常入侵检测方法及系统A load-based abnormal intrusion detection method and system

技术领域technical field

本发明涉及一种异常入侵检测方法，尤其涉及一种基于负荷的异常入侵检测方法及系统，属于计算网络安全技术领域。The invention relates to an abnormal intrusion detection method, in particular to a load-based abnormal intrusion detection method and system, and belongs to the technical field of computing network security.

背景技术Background technique

近年来，随着计算机技术的不断发展，网络规模的不断扩大，入侵行为己经越来越严重的威胁到了计算机系统和网络的安全。入侵就是未经授权蓄意尝试访问信息、篡改信息，使系统不可靠或不能使用。由于入侵方式越来越多样化，手段越来越先进，传统的静态安全技术如：防火墙、数据加密技术等，己经无法满足系统和网络的安全性需求。In recent years, with the continuous development of computer technology and the continuous expansion of network scale, intrusion behaviors have become more and more serious threats to the security of computer systems and networks. Intrusion is a deliberate attempt to access information without authorization, to alter information, and to render a system unreliable or unusable. As the methods of intrusion become more and more diversified and the means are more and more advanced, traditional static security technologies such as firewalls and data encryption technologies cannot meet the security requirements of systems and networks.

入侵检测技术作为一种重要的动态安全技术，很好地弥补了静态安全技术的不足。入侵检测技术主要分为两类：误用入侵检测和异常入侵检测。误用入侵检测是指利用已知系统和应用软件的弱点攻击模式来检测入侵。由于该技术主要是依赖于已知的系统缺陷和入侵，所以可以准确的检测到已知的入侵，但无法检测到系统未知的攻击行为。异常入侵检测是指能够根据异常行为和使用计算机资源情况检测出来的入侵。异常入侵检测试图用定量方式描述可接受的行为特征，以区分非正常的、潜在的入侵性行为。该方法可以检测未知的入侵行为，但是由于描述的可接受行为特征可能与实际情况偏差较大导致检测的准确性不高。As an important dynamic security technology, intrusion detection technology makes up for the deficiency of static security technology. Intrusion detection techniques are mainly divided into two categories: misuse intrusion detection and anomaly intrusion detection. Misuse intrusion detection refers to the detection of intrusions by exploiting known system and application software weakness attack patterns. Since this technology mainly relies on known system flaws and intrusions, it can accurately detect known intrusions, but cannot detect unknown system attacks. Abnormal intrusion detection refers to intrusions that can be detected based on abnormal behavior and the use of computer resources. Anomaly intrusion detection attempts to describe acceptable behavior characteristics in a quantitative way to distinguish abnormal and potentially invasive behaviors. This method can detect unknown intrusion behaviors, but the accuracy of detection is not high because the described acceptable behavior characteristics may deviate greatly from the actual situation.

在异常入侵检测中，有研究表明误警率过高是其真正的限制因素。基于负荷的异常入侵检测系统能够精确地检测将恶意数据保存在数据包负荷中的网络攻击，但是在使用网络数据包的负荷来进行异常检测时，通常面临一个问题：即有时网络数据包的负荷很大，如端口21和端口80上的网络数据包的负荷。如果使用网络数据包100%的负荷来进行建模，则所得到的异常入侵检测系统就很难适用于监视高速网络。In anomaly intrusion detection, some studies have shown that the high false alarm rate is the real limiting factor. The load-based anomaly intrusion detection system can accurately detect network attacks that store malicious data in the data packet payload, but when using the load of the network data packet for anomaly detection, it usually faces a problem: that is, sometimes the load of the network data packet Large, such as loads of network packets on ports 21 and 80. If 100% load of network packets is used for modeling, the obtained anomaly intrusion detection system is difficult to be suitable for monitoring high-speed networks.

基于负荷的异常入侵检测是近年发展起来的一种新的入侵检测方法，目前已经取得了一定的进展。Wang和Stolfo等提出了基于负荷的网络异常入侵检测系统PAYL，PAYL计算

Figure 2011104246134100002DEST_PATH_IMAGE001

(

的一种，

Figure 2011104246134100002DEST_PATH_IMAGE003

个连续的字节，在时即为)在负荷中的发生频率，以此为特征，对每一个不同长度的数据包建立一个正常通信行为轮廓。PAYL的正常通信轮廓是

的发生频率均值和标准差，进行检测时，若待检测数据包的简单马氏距离超出一定门限，则就判定该数据包异常。PAYL能够有效的检测各种攻击。Perdisci，Lee等人提出了一个使用多分类器系统提高基于有效负荷的异常检测系统检测率的方案(McPAD)。McPAD使用MCS(Multiple Classifier System，多分类器系统)来提高基于有效负荷的异常检测的检测率，其采用多个单类分类器来构建正常通信轮廓，以提高检测准确率。在检测时，通过特征提取，得到在不同特征空间上对同一个数据包的描述，然后把每一个特征空间作为相应的代表正常通信轮廓的单类分类器的输入，来对数据包进行分类。最后综合多个单类分类器的输出，对数据包异常与否做出最终判定。实验结果表明，McPAD在检测将恶意数据保存在数据包负荷中的网络攻击时，能够在较低的误警率下得到很高的检测率；而且，在检测诸如多态混合攻击这样的高级攻击时，McPAD也能在相对较低的误警率下，得到较高的检测率。Zhang等提出使用降噪模糊支持向量机(noise against fuzzy support vector machine)来改进PAYL和McPAD，其主要解决McPAD等在检测多态混合攻击时准确率较低的问题，借助于降噪模糊支持向量机获得了更好的检测效果。但是，上述基于负荷的异常检测系统在监视高速、高带宽的网络时，在数据包负荷较大的情况下不能有效地进行检测。Load-based anomaly intrusion detection is a new intrusion detection method developed in recent years, and has made some progress. Wang and Stolfo proposed a load-based network anomaly intrusion detection system PAYL, PAYL calculation

(

kind of

consecutive bytes, in time is ) in the load, as a feature, to establish a normal communication behavior profile for each data packet of different length. The normal communication profile of PAYL is

When performing detection, if the simple Mahalanobis distance of the data packet to be detected exceeds a certain threshold, it is determined that the data packet is abnormal. PAYL can effectively detect various attacks. Perdisci, Lee et al. proposed a scheme to improve the detection rate of payload-based anomaly detection systems using a multi-classifier system (McPAD). McPAD uses MCS (Multiple Classifier System, multi-classifier system) to improve the detection rate of payload-based anomaly detection, which uses multiple single-class classifiers to construct normal communication profiles to improve detection accuracy. During detection, the description of the same data packet in different feature spaces is obtained through feature extraction, and then each feature space is used as the input of the corresponding single-class classifier representing the normal communication profile to classify the data packet. Finally, the output of multiple single-class classifiers is integrated to make a final judgment on whether the data packet is abnormal or not. Experimental results show that McPAD can achieve a high detection rate at a low false alarm rate when detecting network attacks that store malicious data in packet payloads; When , McPAD can also get a higher detection rate at a relatively low false alarm rate. Zhang et al. proposed to use noise-reduction fuzzy support vector machine (noise against fuzzy support vector machine) to improve PAYL and McPAD. The machine has obtained a better detection effect. However, the above-mentioned load-based anomaly detection system cannot effectively detect when the data packet load is large when monitoring a high-speed, high-bandwidth network.

发明内容Contents of the invention

本发明所要解决的技术问题在于克服现有基于负荷的异常入侵检测方法难以对高速网络中的数据包进行快速检测的不足，提供一种基于负荷的异常入侵检测方法，可以在保证检测准确率的前提下，对高速网络中的数据包进行快速检测。The technical problem to be solved by the present invention is to overcome the deficiency that the existing load-based abnormal intrusion detection method is difficult to quickly detect the data packets in the high-speed network, and provide a load-based abnormal intrusion detection method that can ensure the accuracy of detection Under the premise, fast detection of data packets in the high-speed network.

本发明具体采用以下技术方案解决上述技术问题：The present invention specifically adopts the following technical solutions to solve the above technical problems:

一种基于负荷的异常入侵检测方法，包括以下步骤：A load-based abnormal intrusion detection method, comprising the following steps:

步骤A、预先训练得到正常通信轮廓；Step A, pre-training to obtain a normal communication profile;

步骤B、对待检测数据包进行特征提取；Step B, performing feature extraction on the data packet to be detected;

步骤C、利用正常通信轮廓，根据待检测数据包的特征进行检测，判断待检测数据包是否为异常数据包；Step C, using the normal communication profile to detect according to the characteristics of the data packet to be detected, and determine whether the data packet to be detected is an abnormal data packet;

在训练得到正常通信轮廓时，首先利用CPP算法对训练数据包进行分块；然后仅对分块后的前N块进行特征提取；然后根据提取出的特征样本，训练得到正常通信轮廓；When training to obtain the normal communication profile, first use the CPP algorithm to block the training data packet; then only perform feature extraction on the firstN blocks after the block; then, according to the extracted feature samples, train to obtain the normal communication profile;

在对待检测数据包进行特征提取前，先利用CPP算法对待检测数据包进行分块，然后仅使用分块后的前N块来提取特征；Before performing feature extraction on the data packet to be detected, first use the CPP algorithm to block the data packet to be detected, and then only use the firstN blocks after the block to extract features;

其中，N为小于数据包分块总数的整数。Wherein,N is an integer less than the total number of data packet blocks.

进一步地，所述特征提取采用方法，具体为：按照一组变化的整数v值，计算在数据包负荷中相距v字节的字符对的发生频率，得到该数据包在多个特征空间的特征，一个v值对应一个特征空间；所述正常通信轮廓由多个与所述特征空间一一对应的单类分类器构成，每个单类分类器均通过在其所对应的特征空间中训练得到。Further, the feature extraction adopts The method is specifically: according to a set of changing integerv values, calculate the occurrence frequency of character pairs that are apart fromv bytes in the data packet load, and obtain the characteristics of the data packet in multiple feature spaces, onev value corresponds to one feature space ; The normal communication profile is composed of a plurality of single-class classifiers corresponding to the feature space one by one, and each single-class classifier is obtained by training in its corresponding feature space.

更进一步地，所述单类分类器具体按照以下方法训练得到：首先对从该单类分类器所对应的特征空间中提取的特征样本进行聚类；然后在每一个簇内挑选距簇中心较近的特征样本；以挑选出的特征样本为训练样本集对该单类分类器进行训练。Furthermore, the single-class classifier is specifically trained according to the following method: first, cluster the feature samples extracted from the feature space corresponding to the single-class classifier; The nearest feature samples; the single-class classifier is trained with the selected feature samples as the training sample set.

优选地，所述在每一个簇内挑选距簇中心较近的特征样本，具体按照以下方法：判断该簇中的样本数是否大于一预先设定的阈值，如是，选择该簇中距离中心较近的前

个样本；如否，则选择该簇中距离中心较近的前

Figure 2011104246134100002DEST_PATH_IMAGE009

个样本；其中

和

均为预设的整数，且。Preferably, the selection of feature samples that are closer to the center of the cluster in each cluster is specifically in accordance with the following method: determine whether the number of samples in the cluster is greater than a preset threshold, and if so, select a feature sample that is closer to the center of the cluster. near front

samples; if not, select the top of the cluster that is closer to the center

samples; of which

and

are preset integers, and .

根据本发明的还可以得到一种基于负荷的异常入侵检测系统，该系统包括：According to the present invention, a load-based abnormal intrusion detection system can also be obtained, and the system includes:

基于CPP的负荷分块模块，利用CPP算法对待检测数据包进行分块，然后将前N块数据交给特征提取模块处理，其中，N为小于数据包分块总数的整数；The load block module based on CPP utilizes the CPP algorithm to block the data packet to be detected, and then hands the firstN blocks of data to the feature extraction module for processing, whereinN is an integer less than the total number of blocks of the data packet;

特征提取模块，用于提取分块后的待检测数据包的前N块数据的特征，并将提取的特征发送给检测模块；所述特征提取采用

方法，具体为：按照一组变化的整数v值，计算在数据包负荷中相距v字节的字符对的发生频率，得到该数据包在多个特征空间的特征，一个v值对应一个特征空间；The feature extraction module is used to extract the feature of the firstN block data of the data packet to be detected after the block, and sends the extracted feature to the detection module; the feature extraction adopts

The method is specifically: according to a set of changing integerv values, calculate the occurrence frequency of character pairs that are apart fromv bytes in the data packet load, and obtain the characteristics of the data packet in multiple feature spaces, onev value corresponds to one feature space ;

检测模块，根据特征提取模块提取的特征，利用预先训练得到的正常通信行为轮廓来对待检测数据包进行分类，如果将待检测数据包分类为异常，则把该数据包发送给响应模块处理；否则，进行下一个数据包的检测；所述正常通信轮廓由多个与所述特征空间一一对应的单类分类器构成，每个单类分类器均通过在其所对应的特征空间上训练得到；所述单类分类器具体按照以下方法训练得到：首先对从该单类分类器所对应的特征空间中提取的特征样本进行聚类；然后在每一个簇内挑选距簇中心较近的特征样本；以挑选出的特征样本为训练样本集对该单类分类器进行训练；The detection module, according to the features extracted by the feature extraction module, uses the pre-trained normal communication behavior profile to classify the data packets to be detected, if the data packets to be detected are classified as abnormal, the data packets are sent to the response module for processing; otherwise , to detect the next data packet; the normal communication profile is composed of a plurality of single-class classifiers corresponding to the feature space one by one, and each single-class classifier is obtained by training on its corresponding feature space ; The single-class classifier is specifically trained according to the following method: first, the feature samples extracted from the feature space corresponding to the single-class classifier are clustered; then in each cluster, select the feature closer to the center of the cluster Sample; the single-class classifier is trained with the selected feature samples as the training sample set;

响应模块，用于对检测模块判定为异常的数据包做出响应，记录数据包的相应信息，发出报警。The response module is used to respond to the data packets judged to be abnormal by the detection module, record corresponding information of the data packets, and issue an alarm.

相比现有技术，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

本发明由于采用了CPP算法对数据包进行分块，并且仅使用部分负荷进行检测，所以减少了数据处理量，提高检测速度，对高速网络的监视有更好的适应性；同时，由于采用多分类器系统构建正常通信轮廓，提高了检测的准确率。Since the present invention adopts the CPP algorithm to block the data packet and only uses part of the load for detection, the amount of data processing is reduced, the detection speed is improved, and the monitoring of the high-speed network has better adaptability; at the same time, due to the use of multiple The classifier system constructs the normal communication profile, which improves the detection accuracy.

附图说明Description of drawings

图1为本发明的异常入侵检测系统的结构示意图；Fig. 1 is a schematic structural diagram of an abnormal intrusion detection system of the present invention;

图2为CPP算法流程图；Fig. 2 is the flow chart of CPP algorithm;

图3为本发明中正常通信轮廓的构建原理图；Fig. 3 is a construction principle diagram of a normal communication profile in the present invention;

图4为改进的ISUC算法流程图。Figure 4 is a flowchart of the improved ISUC algorithm.

具体实施方式Detailed ways

下面结合附图对本发明的技术方案进行详细说明：The technical scheme of the present invention is described in detail below in conjunction with accompanying drawing:

本发明的基于负荷的异常入侵检测系统，如图1所示，包括：The load-based abnormal intrusion detection system of the present invention, as shown in Figure 1, includes:

特征提取模块，用于提取分块后的待检测数据包的前N块数据的特征，并将提取的特征发送给检测模块；所述特征提取采用方法，具体为：按照一组变化的整数v值，计算在数据包负荷中相距v字节的字符对的发生频率，得到该数据包在多个特征空间的特征，一个v值对应一个特征空间；The feature extraction module is used to extract the feature of the firstN block data of the data packet to be detected after the block, and sends the extracted feature to the detection module; the feature extraction adopts The method is specifically: according to a set of changing integerv values, calculate the occurrence frequency of character pairs that are apart fromv bytes in the data packet load, and obtain the characteristics of the data packet in multiple feature spaces, onev value corresponds to one feature space ;

下面结合上述异常入侵检测系统对本发明的异常检测方法进行进一步说明。Below, the anomaly detection method of the present invention will be further described in conjunction with the above-mentioned anomaly intrusion detection system.

基于CPP的负荷分块模块：进行分块的目的是减少特征提取阶段所处理的数据量，将负荷分块后就可以只使用部分分块来提取特征。本发明采用CPP算法来对数据包进行分块。CPP算法为现有技术，详细内容可参考见文献（Athicha Muthitacharoen, Benjie Chen and David Mazieres. A low-bandwidth network file system. Symposium on Operating Systems Principles, 2001, 174-187.），其流程如图4所示。CPP根据负荷的内容来决定分块边界，它使用莱宾指纹(Rabin fingerprinting)来判定一个分块的结束；在一个长为

Figure 2011104246134100002DEST_PATH_IMAGE011

字节的滑动窗口上，CPP计算一系列的莱宾指纹

，它从有效负荷的前

字节开始计算，然后向着负荷的尾部每次滑动一个字节，以计算后续的莱宾指纹。当

的值等于预先设定的停止标准

Figure 2011104246134100002DEST_PATH_IMAGE013

时，就判定当前分块结束，并开始下一分块的计算。这一过程可以描述如下，假设有一个字节序列

,则对于一个长度为的子序列

，它的莱宾指纹可由(1)式来计算：CPP-based load block module: The purpose of block is to reduce the amount of data processed in the feature extraction stage. After the load is block, only part of the block can be used to extract features. The invention adopts the CPP algorithm to divide the data packets into blocks. The CPP algorithm is an existing technology. For details, please refer to the literature (Athicha Muthitacharoen, Benjie Chen and David Mazieres. A low-bandwidth network file system. Symposium on Operating Systems Principles, 2001, 174-187.), the flow chart is shown in Figure 4 shown. CPP determines the block boundary according to the content of the load. It uses Rabin fingerprinting (Rabin fingerprinting) to determine the end of a block;

Over a sliding window of bytes, the CPP computes a series of Lebin fingerprints

, which starts from the payload's previous

The bytes start counting, and then slide toward the end of the payload one byte at a time to compute subsequent Lebin fingerprints. when

The value of is equal to the preset stopping criterion

When , it is judged that the current block is over, and the calculation of the next block is started. This process can be described as follows, assuming a sequence of bytes

, then for a length of subsequence of

, its Lebbing fingerprint It can be calculated by formula (1):

(1)

其中的

Figure 2011104246134100002DEST_PATH_IMAGE019

和

都是常量，滑动窗口的长度需要通过实验寻找最优值，在本发明方法中，

的值取32时，实验结果会较好。当

Figure 2011104246134100002DEST_PATH_IMAGE021

的值在550~600(选取的停止标准

)时，就结束当前的分块，然后开始一个新的分块；否则，就把当前的比特加入到当前分块，并把窗口向后滑动一个比特，来计算新的莱宾指纹。one of them

and

are constants, the length of the sliding window Need to find the optimal value by experiment, in the method of the present invention,

When the value of is 32, the experimental results will be better. when

The value is between 550~600 (chosen stop standard

), end the current block, and then start a new block; otherwise, add the current bit to the current block, and slide the window back one bit to calculate a new Lebin fingerprint.

特征提取模块：对从基于CPP的负荷分块模块获得的数据包的前

块后，就进行特征提取，特征提取采用

Figure 2011104246134100002DEST_PATH_IMAGE023

方法，

方法计算负荷中相距为

(=0, 1, 2, …)字节的字符对的发生频率，在不同

值下得到的

分布率，给出了关于一个数据包不同的结构信息，通过融合使用不同的

值提取的信息，我们可以重构(或者部分重构)直接使用

技术在

时所提取的信息。对于一个固定的

值，

计算字母对的发生频率时，使用的是一个长度为

的滑动窗口，但是并不关心首字节和尾字节之间的

个字节的值，把这

字节看成空白。假设有一个数据包负荷

，这里

是

中位置

处的字节值；则一个g，

(

)在

中的发生频率可以由(2)式来计算：Feature extraction module: For the front of the data packet obtained from the CPP-based payload block module

After the block, feature extraction is performed, and the feature extraction uses

method,

The method calculates the distance in the load as

( =0, 1, 2, ...) The occurrence frequency of character pairs of bytes, in different

value obtained under

distribution rate, which gives information about the different structures of a packet, by fusing different

Value extraction information, we can reconstruct (or partially reconstruct) directly use

technology in

information extracted at the time. for a fixed

value,

When calculating the frequency of letter pairs, a length of

sliding window, but don't care about the gap between the first byte and the last byte

byte value, put this

Bytes are treated as blanks. Suppose there is a packet payload

,here

yes

middle position

the byte value at; then a g,

(

)exist

The frequency of occurrence in can be calculated by formula (2):

(2)

在

中的出现次数，由长度为

的滑动窗口来计算；

是窗口在

上总共滑行的次数，可以看作是对概率

(在

中找到

的概率)的估计；这样，

的发生频率就可以由(3)式来计算：

exist

The number of occurrences in , given by length

The sliding window to calculate;

is the window in

The total number of slides on the can be viewed as a probability

(exist

found in

The probability of ) is estimated; thus,

The frequency of occurrence can be calculated by formula (3):

(3)

(3)式可以这样解释：在有效负荷中相距字节的字母对

的发生频率，是以开始

结尾的所有的

的发生频率的和。根据

值的不同，可以得到对一个数据包在不同的特征空间的描述；假设

的取值分别为

，则就得到在

个特征空间上对数据包的描述，也就是得到在

个特征空间中的特征。Equation (3) can be interpreted as follows: In the payload, the distance letter pairs of bytes

The frequency of occurrence is start

all at the end

and the frequency of occurrence. according to

With different values, you can get a description of a data packet in different feature spaces; assuming

The values are respectively

, then you get in

A description of the data packet in a feature space, that is, obtained in

features in a feature space.

检测模块：根据特征提取模块提取的特征，利用预先训练得到的正常通信行为轮廓来对待检测数据包进行分类，如果将待检测数据包分类为异常，则把该数据包发送给响应模块处理；否则，进行下一个数据包的检测。本发明的正常通信轮廓采用多分类器系统。当融合的多个分类器是“多样化”的分类器时，多分类器系统就能获得准确率的提高。一种产生多样化的方法是使多分类器系统的每个分类器基于对模式在不同特征空间上的描述。本发明中得到对一个数据包在不同特征空间上的描述，即特征提取，是通过

方法来实现的。在使用

进行特征提取之后，根据

值(假设有

个不同的值)的不同，可以得到对同一个数据包进行描述的个不同的特征空间；在每一个特征空间上训练得到一个单类分类器，本具体实施方式中采用支持向量机（SVM），这样就得到个在不同特征空间上描述的正常通信轮廓，其原理如图3所示。其中多分类器中的每一个单类分类器由改进的ISUC算法训练得到。本发明对ISUC算法（参见文献李晓黎, 刘继敏, 史忠植. 基于支持向量机与无监督聚类相结合的中文网页分类器. 计算机学报, 2001, 24(1):62-68.）的改进主要有以下两个方面：(1)放弃使用两个分类器来进行检测，仅使用单类SVM来构建正常通信轮廓，这是因为异常入侵检测中较高的虚警率是不可接受的，因此放弃使用分类准确率较低的UC分类。(2)使用k-means、CURE、模糊K-均值等聚类算法对训练样本进行聚类，本具体实施方式中采用UC算法聚类；然后使用簇中心来对训练样本进行挑选，在每一个簇内挑选距簇中心较近的样本来训练单类SVM，挑选规则如下：

选取距簇中心较近的样本进行训练；同时考虑簇的大小，并根据簇的大小，合理调整每个簇挑选出的样本数目，较大的簇选出较多的样本，而较小的簇挑选较少的样本。具体而言，就是在每一个簇内挑选距簇中心较近的特征样本时，按照以下方法：判断该簇中的样本数是否大于一预先设定的阈值，如是，选择该簇中距离中心较近的前

个样本；如否，则选择该簇中距离中心较近的前

个样本；其中

和均为预设的整数，且

。当然，也可以采用更简单的不考虑簇的大小，从每个簇中挑选相同数目的距簇中心较近的特征样本。如图4所示，本发明改进的ISUC算法流程如下：Detection module: According to the features extracted by the feature extraction module, the pre-trained normal communication behavior profile is used to classify the data packets to be detected. If the data packets to be detected are classified as abnormal, the data packets are sent to the response module for processing; otherwise , to detect the next data packet. The normal communication profile of the present invention employs a multi-classifier system. When the multiple classifiers fused are "diversified" classifiers, the multi-classifier system can achieve an increase in accuracy. One way to generate diversity is to base each classifier of a multi-classifier system on a description of the pattern on a different feature space. In the present invention, the description of a data packet in different feature spaces, that is, feature extraction, is obtained through

method to achieve. In use

After feature extraction, according to

value (assuming

different values), you can get the description of the same data packet different feature spaces; a single-class classifier is trained on each feature space, and a support vector machine (SVM) is used in this specific implementation, so that it is obtained A normal communication profile described on different feature spaces, the principle of which is shown in Figure 3. Each single-class classifier in the multi-classifier is trained by the improved ISUC algorithm. The present invention improves the ISUC algorithm (see literature Li Xiaoli, Liu Jimin, Shi Zhongzhi. Chinese web page classifier based on the combination of support vector machine and unsupervised clustering. Journal of Computer Science, 2001, 24(1):62-68.) mainly includes The following two aspects: (1) Abandon the use of two classifiers for detection, and only use a single-class SVM to construct the normal communication profile. This is because the high false alarm rate in abnormal intrusion detection is unacceptable, so the use of UC classification with low classification accuracy. (2) Clustering algorithms such as k-means, CURE, fuzzy K-means are used to cluster the training samples, in this embodiment, the UC algorithm is used for clustering; then the cluster centers are used to select the training samples, and in each Select samples closer to the cluster center in the cluster to train the single-class SVM. The selection rules are as follows:

Select samples closer to the cluster center for training; At the same time, the size of the clusters is considered, and according to the size of the clusters, the number of samples selected by each cluster is reasonably adjusted. Larger clusters select more samples, while smaller clusters select fewer samples. Specifically, when selecting feature samples that are closer to the center of each cluster in each cluster, the following method is used: determine whether the number of samples in the cluster is greater than a preset threshold, and if so, select the feature sample that is closer to the center of the cluster. near front

samples; if not, select the top of the cluster that is closer to the center

samples; of which

and are preset integers, and

. Of course, it is also possible to select the same number of feature samples closer to the center of the cluster from each cluster without considering the size of the cluster. As shown in Figure 4, the improved ISUC algorithm process of the present invention is as follows:

Step1.,

.Step1. ,

.

Step2. 如果, 执行Step6, 其中为聚簇后的簇中心数目.Step2. If , execute Step6, where is the number of cluster centers after clustering.

Step3. 在簇

中寻找据簇中心

较近的所有样本

, 如果簇的样本数大于

(指定的判断较大簇的标准), 则判断样本距

较近的标准为

, 否则，判断标准为

, 其中

.Step3. In the cluster

Find the data cluster center

All recent samples

, if the cluster The sample size is greater than

(the specified criterion for judging larger clusters), then judge the sample distance

The closest standard is

, otherwise, the judgment standard is

, in

.

Step4.

.Step4.

.

Step5.

, 执行Step2.Step5.

, Execute Step2.

Step6. 用挑选的样本集

训练得到最终的单类SVM.Step6. Use the selected sample set

Train to get the final one-class SVM.

具体的检测过程为：将在不同

值下提取的特征空间中的特征，交给正常通信轮廓中相应的单类SVM(即在不同特征空间上训练得到的正常通信轮廓)进行分类，最后融合多个单类分类器对数据包的分类结果做出数据包异常与否的最终判定。The specific detection process is: will be in different

The features in the feature space extracted under the value are given to the corresponding single-class SVM in the normal communication profile (that is, the normal communication profile trained on different feature spaces) for classification, and finally multiple single-class classifiers are fused to classify the data packets. The classification result makes the final judgment of whether the data packet is abnormal or not.

响应模块：对检测模块判定为异常的数据包做出响应，记录数据包的相应信息，并发出报警。Response module: respond to the data packets judged to be abnormal by the detection module, record the corresponding information of the data packets, and issue an alarm.

Claims

Translated fromChinese

1.一种基于负荷的异常入侵检测方法，包括以下步骤：步骤A、预先训练得到正常通信轮廓；步骤B、对待检测数据包进行特征提取；步骤C、利用正常通信轮廓，根据待检测数据包的特征进行检测，判断待检测数据包是否为异常数据包；其特征在于，1. A load-based abnormal intrusion detection method, comprising the following steps: step A, pre-training to obtain a normal communication profile; step B, performing feature extraction on a data packet to be detected; step C, utilizing the normal communication profile, The feature is detected, and it is judged whether the data packet to be detected is an abnormal data packet; it is characterized in that,

2.如权利要求1所述基于负荷的异常入侵检测方法，其特征在于，所述特征提取采用

Figure 2011104246134100001DEST_PATH_IMAGE002

方法，具体为：按照一组变化的整数v值，计算在数据包负荷中相距v字节的字符对的发生频率，得到该数据包在多个特征空间的特征，一个v值对应一个特征空间；所述正常通信轮廓由多个与所述特征空间一一对应的单类分类器构成，每个单类分类器均通过在其所对应的特征空间中训练得到。2. the abnormal intrusion detection method based on load as claimed in claim 1, is characterized in that, described feature extraction adopts

The method is specifically: according to a set of changing integerv values, calculate the occurrence frequency of character pairs that are apart fromv bytes in the data packet load, and obtain the characteristics of the data packet in multiple feature spaces, onev value corresponds to one feature space ; The normal communication profile is composed of a plurality of single-class classifiers corresponding to the feature space one by one, and each single-class classifier is obtained by training in its corresponding feature space.

3.如权利要求2所述基于负荷的异常入侵检测方法，其特征在于，所述单类分类器具体按照以下方法训练得到：首先对从该单类分类器所对应的特征空间中提取的特征样本进行聚类；然后在每一个簇内挑选距簇中心较近的特征样本；以挑选出的特征样本为训练样本集对该单类分类器进行训练。3. load-based abnormal intrusion detection method as claimed in claim 2, is characterized in that, described single-class classifier specifically obtains according to following method training: first extracting from the feature space corresponding to this single-class classifier The samples are clustered; then in each cluster, select the feature samples that are closer to the center of the cluster; use the selected feature samples as the training sample set to train the single-class classifier.

4.如权利要求3所述基于负荷的异常入侵检测方法，其特征在于，所述在每一个簇内挑选距簇中心较近的特征样本，具体按照以下方法：判断该簇中的样本数是否大于一预先设定的阈值，如是，选择该簇中距离中心较近的前个样本；如否，则选择该簇中距离中心较近的前

Figure 2011104246134100001DEST_PATH_IMAGE006

个样本；其中

和

均为预设的整数，且

Figure 2011104246134100001DEST_PATH_IMAGE008

。4. The abnormal intrusion detection method based on load as claimed in claim 3, is characterized in that, in each cluster, select the feature samples closer to the center of the cluster, specifically according to the following method: judge whether the number of samples in the cluster is is greater than a preset threshold, if so, select the front of the cluster that is closer to the center samples; if not, select the top of the cluster that is closer to the center

samples; of which

and

are preset integers, and

.

5.如权利要求1-4任一项所述基于负荷的异常入侵检测方法，其特征在于，在利用CPP算法对数据包进行分块时，滑动窗口的长度取值为32。5. The load-based abnormal intrusion detection method according to any one of claims 1-4, wherein the length of the sliding window is 32 when the data packet is divided into blocks using the CPP algorithm.

6.一种基于负荷的异常入侵检测系统，其特征在于，该系统包括：6. A load-based abnormal intrusion detection system, characterized in that the system comprises: