CN103490992B

Movatterモバイル変換

Info

Publication number: CN103490992B
Application number: CN201310470865.XA
Authority: CN
Inventors: 郭薇; 周翰逊; 张国栋; 贾大宇
Original assignee: Shenyang Aerospace University
Current assignee: Shanghai Taiyu Information Technology Co ltd; Shenzhen Pengbo Information Technology Co ltd
Priority date: 2013-10-10
Filing date: 2013-10-10
Publication date: 2016-10-19
Anticipated expiration: 2033-10-10
Also published as: CN103490992A

Abstract

The present invention relates to field of information security technology, be a kind of detection method for instant messaging anthelmintic specifically.The present invention is divided into two steps: first, in the study stage by characteristic function, behavior and the instant messaging anthelmintic behavior of domestic consumer is made a distinction.Then, at detection-phase, the similarity of current network flow and learning data is calculated by simple mahalanobis distance.So that testing mechanism is insensitive to site access pattern, by non-parametric CUSUM, similarity is calculated, generate when the distance of new network traffics has exceeded the permission distance of algorithm setting and report to the police.

Description

Translated fromChinese

即时通讯蠕虫检测方法Instant Messaging Worm Detection Method

技术领域technical field

本发明涉及信息安全技术领域，具体地来说为一种用于检测即时通讯蠕虫的检测方法。The invention relates to the technical field of information security, in particular to a detection method for detecting instant messaging worms.

背景技术Background technique

即时通讯(IM)服务非常受欢迎,作为一种即时的交流方式在整个互联网拥有数以千万计的用户。诸多热门系统，如的MSN Messenger(Windows XP中的Windows Messenger),雅虎信使(YIM),AOL Instant Messenger(AIM)，和腾讯QQ已经改变了我们与朋友、熟人和商业同事的交流方式。然而，即时通讯客户端中存在的漏洞构成极大的安全挑战。Instant Messaging (IM) service is very popular, as a means of instant communication has tens of millions of users throughout the Internet. Popular systems such as Microsoft's MSN Messenger (Windows Messenger in Windows XP), Yahoo Messenger (YIM), AOL Instant Messenger (AIM), and Tencent QQ have changed the way we communicate with friends, acquaintances, and business colleagues. However, vulnerabilities in instant messaging clients pose a great security challenge.

即时通讯蠕虫是在即时通讯网络中广泛传播，通过利用IM客户端和协议漏洞，以及即时消息服务所造成的一个安全问题。当即时通讯蠕虫运行时，它通常位于即时通讯客户端，并试图将自己发送给所有的朋友和被感染的用户。有些蠕虫利用公共引擎发送信息，诱骗收件人收到蠕虫运行副本。有些IM蠕虫甚至能够交换接受者短信并且分析他们的回复。目前有许多IM蠕虫实例如Chock，SoFunny，JS Menger。Instant messaging worms are widely spread in the instant messaging network, through the use of IM client and protocol vulnerabilities, and a security problem caused by instant messaging services. When an instant messaging worm runs, it usually resides in the instant messaging client and tries to send itself to all friends and infected users. Some worms use public engines to send messages that trick recipients into receiving a running copy of the worm. Some IM worms can even exchange text messages from recipients and analyze their replies. There are many examples of IM worms such as Chock, SoFunny, JS Menger.

IM蠕虫不同于定期扫描病毒和电子邮件蠕虫。虽然研究人员已经很努力去理解和遏制扫描蠕虫和电子邮件蠕虫的繁殖，但由于不同的感染机制这些研究并不是很适合IM蠕虫。M.Williamson等人对即时通讯蠕虫应用抑制技术以减缓蠕虫的传播。但是该方法可能会延迟有效通讯并且限制太多的IM用户允许只有一个新的联系人/天等等。IM worms are different from regular scans for viruses and e-mail worms. While researchers have made great efforts to understand and contain the proliferation of scan worms and e-mail worms, these studies are not well suited to IM worms due to the different infection mechanisms. M.Williamson et al applied suppression technology to instant messaging worms to slow down the spread of worms. But this approach may delay effective communication and limit too many IM users to allow only one new contact/day and so on.

发明内容Contents of the invention

针对现有技术中存在的上述不足之处，本发明要解决的技术问题在于提供一种即时通讯蠕虫检测方法。Aiming at the above-mentioned deficiencies in the prior art, the technical problem to be solved by the present invention is to provide a method for detecting instant messaging worms.

本发明采用如下的技术方案：The present invention adopts following technical scheme:

一种即时通讯蠕虫检测方法，用于通讯服务器上，包括以下步骤：A method for detecting instant messaging worms, used on a communication server, comprising the following steps:

1）学习阶段通过网络上感染蠕虫的数据分析网络上蠕虫的行为特征，通过特征函数分析正常用户的行为数据，存入数据库中；1) In the learning stage, analyze the behavior characteristics of worms on the network through the data of worms infected on the network, analyze the behavior data of normal users through characteristic functions, and store them in the database;

2）检测阶段检测模块接受通过网关的新数据并采用简单马氏距离与步骤1）中的数据库中特征函数的相似度进行对比，进而判断出新数据是否受蠕虫感染。2) Detection stage The detection module accepts the new data passing through the gateway and uses the simple Mahalanobis distance to compare it with the similarity of the feature function in the database in step 1), and then judges whether the new data is infected by worms.

进一步地，简单马氏距离计算公式为：Further, the simple Mahalanobis distance calculation formula is:

$d d ((x x,, \overset{&OverBar; &OverBar;}{y the y})) = = {Σ Σ}_{i i = = 00}^{m m - - 11} \frac{{(({(({x x}_{i i} - - \overset{&OverBar; &OverBar;}{{y the y}_{i i}}))}^{+ +}))}^{22}}{{σ σ}_{i i}^{22}} - - - - - - ((66))$

其中，为简单马氏距离，m为特征函数的数目，x_i为新数据的第i个特征值，y_i为学习阶段数据的第i个特征值，为学习阶段第i个平均特征值，x为新数据特征向量，y为学习阶段平均特征向量，为第i个特征值的方差，计算出新数据的简单马氏距离用{X_n,n=1,2,3…}表示简单马氏距离序列，这里n表示时间间隔，简单马氏距离越大，表示蠕虫感染的几率越大。in, is the simple Mahalanobis distance, m is the number of feature functions, x_i is the i-th eigenvalue of the new data, y_i is the i-th eigenvalue of the learning stage data, is the i-th average eigenvalue in the learning phase, x is the new data feature vector, y is the average feature vector in the learning phase, Calculate the simple Mahalanobis distance of the new data for the variance of the i-th eigenvalue Use {X_n ,n=1,2,3...} to represent the simple Mahalanobis distance sequence, where n represents the time interval, and the larger the simple Mahalanobis distance, the greater the probability of worm infection.

进一步地，采用无参数CUSUM算法使检测算法对站点访问模式不敏感：首先在不损失任何特性下，{X_n,n=1,2,3…}转化到另一个随机序列{Z_n,n=1,2,3…}，使所有Z_n中的负值不会随时间积累，定义Z_n如下：Furthermore, the non-parameter CUSUM algorithm is used to make the detection algorithm insensitive to the site access mode: firstly, {X_n ,n=1,2,3…} is transformed into another random sequence {Z_n ,n =1,2,3…}, so that all negative values in Z_n will not accumulate over time, and Z_n is defined as follows:

Z_n=X_n-β (11)Z_n =X_n -β (11)

参数β是一个常量，针对特定的网络条件它有助于产生一个带有负值的随机序列{Z_n,n=1,2,3…}，递归条件如下：The parameter β is a constant, which helps to generate a random sequence {Z_n ,n=1,2,3...} with negative values for specific network conditions, and the recursive conditions are as follows:

y_n=(y_n-1+Z_n)⁺y_n =(y_n-1 +Z_n )⁺

y₀=0 (12)y₀ =0 (12)

其中当(y_n-1+Z_n)>0时，(y_n-1+Z_n)⁺等于(y_n-1+Z_n)，否则为0，y_n越大，表明攻击越强，其中y_n是测试统计值，y_n表示X_n的累积正值；Among them, when (y_n-1 + Z_n )>0, (y_n-1 + Z_n )⁺ is equal to (y_n-1 + Z_n ), otherwise it is 0, and the larger the y_n is, the stronger the attack is, where y_n is the test statistical value, and y_n represents the cumulative positive value of X_n ;

${y the y}_{n no} = = {S S}_{n no} - - \underset{11 < < κ κ < < n no}{min min} {S S}_{k k} - - - - - - ((1313))$

其中，初始S₀=0；in, Initial S₀ =0;

则判决函数表示为： $d_{N} (y_{n}) = \{\begin{matrix} 0, y_{n} \leq N; \\ 1, y_{n} > N . \end{matrix}$ (14)Then the decision function is expressed as: $d_{N} ({the y}_{no}) = \{\begin{matrix} 0, {the y}_{no} \leq N; \\ 1, {the y}_{no} > N . \end{matrix}$ (14)

其中，N代表蠕虫检测阈值，d_N(y_n)表示在时间n的判决，检验统计y_n大于N，则d_N(y_n)为1，表示有攻击发生，否则d_N(y_n)为0，表示正常运行。Among them, N represents the worm detection threshold, d_N (y_n ) represents the judgment at time n, and the test statistic y_n is greater than N, then d_N (y_n ) is 1, indicating that there is an attack, otherwise d_N (y_n ) 0 means normal operation.

进一步地，为了计算简单马氏距离，采用增量学习更新统计值来保持统计的正确性，设E_i为第i个样本的一个特征值，设定三个变量(E,ω,n)，n为历史样本长度，当观察到新的样本，三变量被更新如式(7)，(8)和(9):Further, in order to calculate the simple Mahalanobis distance, incremental learning is used to update the statistical value to maintain the correctness of the statistics. Let E_i be a feature value of the i-th sample, and set three variables (E, ω, n), n is the historical sample length, when a new sample is observed, the three variables are updated as in equations (7), (8) and (9):

$E E. = = E E. + + \frac{{e e}_{n no + + 11} - - E E.}{n no + + 11} - - - - - - ((77))$

$ω ω = = ω ω + + {e e}_{n no + + 11}^{22} - - - - - - ((88))$

n=n+1 (9)n=n+1 (9)

样本方差计算为如式(10)：The sample variance is calculated as formula (10):

${σ σ}^{22} = = \frac{ω ω - - n no * * {E E.}^{22}}{n no - - 11} - - - - - - ((1010)) . .$

进一步地，所述的特征函数为：特征函数URL()：Further, the characteristic function is: characteristic function URL():

$URL URL (()) = = \{\begin{matrix} \underset{&ForAll; &ForAll; URL URL &Element; &Element; U u}{Max Max} \{\begin{matrix} Count count & ((URL URL)) \end{matrix}\},, U u &NotEqual; &NotEqual; 00 \\ 00,, U u = = Φ Φ \end{matrix} - - - - - - ((11))$

这里的U是用户设定发送的URL；U here is the URL sent by the user;

特征函数Filereq()：Feature function Filereq():

$Filereq Filereq (()) = = \{\begin{matrix} \underset{&ForAll; &ForAll; a a &Element; &Element; A A}{Max Max} \{\begin{matrix} Count count & ((a a)) \end{matrix}\},, A A &NotEqual; &NotEqual; Φ Φ \\ 00,, A A = = Φ Φ \end{matrix} - - - - - - ((22))$

这里A是用户设定发送的文件大小；Here A is the size of the file sent by the user;

特征函数IPAder()：Feature function IPAder():

IPAddr()=Number of distinct IP address (3)。IPAddr()=Number of distinct IP addresses (3).

本发明具有如下的优点及有益效果：The present invention has following advantage and beneficial effect:

本发明首先在学习阶段通过特征函数，区分出普通用户的行为和即时通讯蠕虫行为的差异。然后，通过简单马氏距离来检测网络蠕虫。为了使检测机制对站点访问模式的不敏感性，采用了无参数CUSUM算法，当新的数据的距离超过了算法设定的允许距离时生成警报。从大学即时通讯服务器收集的数据证明了该发明方法的有效性。The present invention first distinguishes the difference between the behavior of common users and the behavior of instant messaging worms through the feature function in the learning stage. Then, network worms are detected by simple Mahalanobis distance. In order to make the detection mechanism insensitive to the site access pattern, a parameter-free CUSUM algorithm is used to generate an alarm when the distance of new data exceeds the allowable distance set by the algorithm. Data collected from university instant messaging servers demonstrate the effectiveness of the inventive method.

采用本发明的装置安装在网关中，以1GHz奔腾III为基础的机器。在数据集中每经过10秒钟，记录数据进程部分所需的CPU时间。在99％的样本中，在不到2秒的CPU时间内能够处理10秒的数据包。此外，任何十秒钟样本处理所需最长时间少于四秒CPU时间。所有样本服务率超过了流量的到达率。这表明本发明方法的实时性能超过了一个大型网络10秒连发流量。The device adopting the present invention is installed in a gateway, a 1 GHz Pentium III based machine. Every 10 seconds elapsed in the data set, record the CPU time required for the data processing part. Able to process 10 seconds of packets in less than 2 seconds of CPU time in 99% of samples. Furthermore, the maximum time required to process any ten-second sample is less than four seconds of CPU time. All sample service rates exceed the arrival rate of traffic. This shows that the real-time performance of the method of the present invention exceeds the 10-second continuous flow of a large network.

附图说明Description of drawings

图1为仿真IM蠕虫通过在文本信息中发送网址传播，(a)显示了在特征函数变化情况、(b)引入IM蠕虫后测试统计值的变化情况；Fig. 1 is the propagation of the simulated IM worm by sending the URL in the text information, (a) shows the change of the characteristic function, (b) the change of the test statistical value after the introduction of the IM worm;

图2为显示了仿真IM蠕虫通过发送文件传播显示了在特征函数变化情况、(b)引入IM蠕虫后测试统计值y_n的变化情况。Figure 2 shows the propagation of the simulated IM worm by sending files, showing the change of the characteristic function, (b) the change of the test statistic value y_n after the introduction of the IM worm.

具体实施方式detailed description

下面结合附图及实施例对本发明进行详细地说明：Below in conjunction with accompanying drawing and embodiment the present invention is described in detail:

一种即时通讯蠕虫检测方法，用于通讯服务器上，该方法所在主体的检测装置设置在通讯服务器的网关上，对通过网关的数据进行检测，包括以下步骤：An instant messaging worm detection method is used on a communication server. The detection device of the main body of the method is set on the gateway of the communication server, and the data passing through the gateway is detected, including the following steps:

步骤1）学习阶段通过网络上感染蠕虫的数据分析网络上蠕虫行为特征，，存入数据库中；Step 1) In the learning stage, analyze the behavior characteristics of worms on the network through the data of worms infected on the network, and store them in the database;

典型的用户使用即时通讯系统是为了工作或娱乐。他/她与其他人交流日常生活。它似乎没有什么特别，但它揭示一个重要的特点：在一定时期内用户可能只和几个人交流。相反，即时通讯蠕虫将尽可能广泛蔓延，通常通过发送的托管蠕虫代码或文件网站的URL。因此，可以从正常的行为中区分即时通讯蠕虫行为。但装载蠕虫代码后，IM蠕虫将发送一个恶意网址的文字讯息到不同的用户。所以可以推断，该网址发送比率将会增加。定义函数Count(x)为数量不同的用户使用相同的x值与一个用户通讯。例如，如果一个用户发送www.google.com给联系名单中的四个不同的朋友，这时Count(www.google.com)就等于四。为刻画这个特征，定义特征函数URL()如式(1)。Typical users use instant messaging systems for work or pleasure. He/she communicates daily life with other people. It may seem like nothing special, but it reveals an important feature: a user may communicate with only a few people in a certain period of time. Instead, an instant messaging worm will spread as widely as possible, usually by sending the URL of a website that hosts the worm's code or files. Therefore, instant messaging worm behavior can be distinguished from normal behavior. But after loading the worm code, the IM worm will send a text message with a malicious URL to different users. So it can be inferred that the sending rate of the URL will increase. Define the function Count(x) to communicate with a user with the same value of x for different numbers of users. For example, if a user sends www.google.com to four different friends in the contact list, then Count(www.google.com) equals four. In order to characterize this feature, the feature function URL() is defined as formula (1).

这里的U是用户设定发送的URL。U here is the URL that the user sets to send.

另一种较常见的感染特点是受害者发送文件大小和内容都相同。实际上，这些文件就是即时通讯蠕虫。为描述这一特征，定义文件转发请求的特征函数，如式(2)。Another more common infection characteristic is that victims send files of the same size and content. In fact, these files are instant messaging worms. To describe this feature, define the feature function of file forwarding request, such as formula (2).

这里A是用户设定发送的文件大小Here A is the size of the file sent by the user

多个朋友在一定时期内与一个用户通讯。当用户使用即时通讯软件，他们可以在联络人清单中选择那个朋友或那些朋友进行沟通。然而，蠕虫会试图尽可能快传播，因此它可能与联络人清单中大量的朋友联系，这样就偏离了正常用户使用行为。在联络人清单中一个IP地址可以代表一个朋友，定义特征函数IPAder()来描述这特点如式(3)。Multiple friends communicate with one user within a certain period of time. When users use instant messaging software, they can choose that friend or those friends to communicate with in the contact list. However, the worm tries to spread as quickly as possible, so it may contact a large number of friends in the contact list, which deviates from normal user usage behavior. In the contact list, an IP address can represent a friend, and the characteristic function IPAder() is defined to describe this characteristic as formula (3).

IPAddr()=Number of distinct IP address(3)IPAddr()=Number of distinct IP address(3)

步骤2）检测模块接受通过网关的新数据并采用简单马氏距离与步骤1）中的特征函数的相似度进行对比，进而判断出新数据是否受蠕虫感染。Step 2) The detection module accepts the new data passing through the gateway and uses the simple Mahalanobis distance to compare it with the similarity of the feature function in step 1), and then judges whether the new data is infected by worms.

简单马氏距离计算公式为：The formula for calculating the simple Mahalanobis distance is:

其中，为简单马氏距离，m为特征函数的特征值的数目，x_i为新数据的第i个特征值，y_i为训练阶段数据的第i个特征值，为培训阶段第i个平均特征值，x为新数据特征向量，y为培训阶段平均特征向量，为第i个特征值的方差，计算出新数据的简单马氏距离简单马氏距离越大，表示蠕虫感染的几率越大。用{X_n,n=1,2,3…}表示简单马氏距离序列，此时n表示时间长度，in, is the simple Mahalanobis distance, m is the number of eigenvalues of the eigenfunction, x_i is the i-th eigenvalue of the new data, y_i is the i-th eigenvalue of the training phase data, is the i-th average eigenvalue in the training phase, x is the new data feature vector, y is the average feature vector in the training phase, Calculate the simple Mahalanobis distance of the new data for the variance of the i-th eigenvalue The larger the simple Mahalanobis distance, the greater the probability of worm infection. Use {X_n ,n=1,2,3...} to represent a simple Mahalanobis distance sequence, where n represents the length of time,

马氏距离是最常用的多元异常统计。公式基本描述的是新的样本是否异常于历史学习的数据。在这里，计算新观察的数据和学习阶段得到数据的距离。距离越高，就越有可能是不正常的迹象。The Mahalanobis distance is the most commonly used multivariate anomaly statistic. The formula basically describes whether the new sample is abnormal from the historically learned data. Here, the distance between the newly observed data and the data obtained during the learning phase is calculated. The higher the distance, the more likely it is a sign of abnormality.

马氏距离的定义如式(4)：The definition of Mahalanobis distance is as formula (4):

$D D. ((x x,, \overset{&OverBar; &OverBar;}{y the y})) = = {((x x,, \overset{&OverBar; &OverBar;}{y the y}))}^{T T} {C C}^{- - 11} ((x x,, \overset{&OverBar; &OverBar;}{y the y})) - - - - - - ((44))$

这里x和y是两个特征向量，每个向量元素是变量。x是新的观测特征向量，y是学习阶段中计算的平均特征向量。C^-1是逆协方差矩阵C_ij=Cov(y_i,y_j)，y_i，y_j是学习阶段特征向量中第i和第j个特征值。Here x and y are two eigenvectors, each vector element is a variable. x is the new observed eigenvector and y is the average eigenvector computed in the learning phase. C^-1 is the inverse covariance matrix C_ij =Cov(y_i , y_j ), y_i , y_j are the i-th and j-th eigenvalues in the eigenvector of the learning stage.

假设特征是统计独立的，马氏距离提供了一个有用方法，从基线衡量当前偏差。因此，协方差矩阵C成为对角线矩阵并且对角线上元素为每个特征值方差。因此，简单马氏距离如式(5)：Assuming features are statistically independent, the Mahalanobis distance provides a useful measure of current deviation from a baseline. Therefore, the covariance matrix C becomes a diagonal matrix and the entries on the diagonal are the variances of each eigenvalue. Therefore, the simple Mahalanobis distance is as formula (5):

$d d ((x x,, \overset{&OverBar; &OverBar;}{y the y})) = = {Σ Σ}_{i i = = 00}^{m m - - 11} \frac{{(({x x}_{i i} - - {\overset{&OverBar; &OverBar;}{y the y}}_{i i}))}^{22}}{{σ σ}_{i i}^{22}} - - - - - - ((55))$

这里m设置为3(因为有三个可选特征值)。Here m is set to 3 (because there are three optional eigenvalues).

当通过即时通讯系统与朋友联系时，由于繁忙的学习或工作用户不一定一直使用它。因此，特征函数值可能低于相关平均值，但是，这并不意味着它是异常。因此，这种偏差不应设定为马氏距离。因此，使用式(6)来计算简单马氏距离。When connecting with friends through the instant messaging system, users may not always use it due to busy study or work. Therefore, the characteristic function value may be lower than the relevant mean, however, this does not mean that it is an anomaly. Therefore, this deviation should not be set as the Mahalanobis distance. Therefore, formula (6) is used to calculate the simple Mahalanobis distance.

其中当(y_n-1+Z_n)>0时，(y_n-1+Z_n)⁺等于(y_n-1+Z_n)，否则为0。Wherein when (y_n-1 +Z_n )>0, (y_n-1 +Z_n )⁺ is equal to (y_n-1 +Z_n ), otherwise it is 0.

为了计算简单马氏距离，采用增量学习更新统计值来保持统计的的正确性，设E_i为第i个样本的一个特征值，设定三个变量(E,ω,n)，In order to calculate the simple Mahalanobis distance, incremental learning is used to update the statistical value to maintain the correctness of the statistics. Let E_i be a feature value of the i-th sample, and set three variables (E, ω, n),

n为历史样本长度，当观察到新的样本，三变量被更新如式(7)，(8)和(9): n is the historical sample length, when a new sample is observed, the three variables are updated as in equations (7), (8) and (9):

$ω ω = = ω ω + + {e e}_{n no + + 11}^{22} - - - - - - ((88))$

n=n+1 (9)n=n+1 (9)

其中(7)、(8)、(9)中，等号左侧为新样本的值，等号右侧是前一个历史样本长度的值。Among (7), (8), and (9), the left side of the equal sign is the value of the new sample, and the right side of the equal sign is the value of the previous historical sample length.

为了使检测机制对站点访问模式的不敏感性，一种无参数累积求和CUSUM方法。To make the detection mechanism insensitive to site access patterns, a parameter-free cumulative summation CUSUM method.

采用无参数CUSUM算法使检测对站点访问模式的不敏感：首先在不损失任何特性下，{X_n,n=1,2,3…}转化到另一个随机序列{Z_n,n=1,2,3…}，使所有Z_n中的负值不会随时间积累，定义Z_n如下：The non-parameter CUSUM algorithm is used to make the detection insensitive to the site access mode: firstly, without losing any characteristics, {X_n ,n=1,2,3…} is transformed into another random sequence {Z_n ,n=1, 2,3…}, so that all negative values in Z_n will not accumulate over time, define Z_n as follows:

Z_n=X_n-β (11)Z_n =X_n -β (11)

参数β是一个常量针对特定的网络条件它有助于产生一个带有负值的随机序列{Z_n,n=1,2,3…}，递归条件如下：The parameter β is a constant for specific network conditions. It helps to generate a random sequence {Z_n ,n=1,2,3...} with negative values. The recursive conditions are as follows:

y_n=(y_n-1+Z_n)⁺y_n =(y_n-1 +Z_n )⁺

y₀=0 (12)y₀ =0 (12)

其中(y_n-1+Z_n)+当(y_n-1+Z_n)>0等于(y_n-1+Z_n)，否则为0，y_n越大，表明攻击越强，其中y_n是测试统计，y_n表示X_n的累积正值；Where (y_n-1 + Z_n ) + when (y_n-1 + Z_n )>0 is equal to (y_n-1 + Z_n ), otherwise it is 0, and the larger y_n is, the stronger the attack is, where y_n is the test statistic, y_n represents the cumulative positive value of X_n ;

其中，初始S₀=0；in, Initial S₀ =0;

判决函数表示为：The decision function is expressed as:

${d d}_{N N} (({y the y}_{n no})) = = \{\begin{matrix} 00,, {y the y}_{n no} \leq \leq N N;; \\ 11,, {y the y}_{n no} > > N N . . \end{matrix} - - - - - - ((1414))$

在本发明中β取为3。In the present invention, β is taken as 3.

实施例Example

通过仿真环境验证了本发明方法。收集了某大学通讯服务器521个用户数据集(即时通讯服务只适用于校园内)并把数据分为两部分作为学习和分类检测。其中，80％数据被用作训练数据，其余20％用于与IM蠕虫攻击数据进行混合并且用来检测IM蠕虫，IM蠕虫数据是随机混合的。此外，每5分钟在文本信息中模拟即时通讯蠕虫的文件或发送的网址信息到在线的联络人清单中的朋友。The method of the present invention is verified by a simulation environment. Collected 521 user data sets of a university communication server (instant messaging service is only applicable on campus) and divided the data into two parts for learning and classification detection. Among them, 80% of the data is used as training data, and the remaining 20% is used for mixing with IM worm attack data and used to detect IM worms, and the IM worm data is randomly mixed. In addition, every 5 minutes simulates an instant messaging worm file in a text message or sends a URL message to a friend in the online contact list.

对于正常流量：For normal traffic:

由于忙于工作或艰苦研究，用户不会每时每刻都与联络人清单中的朋友联系，特别是在午夜。因此，当相应的特征函数值远大于零时。结果如表1所示：Due to busy work or hard research, users don't get in touch with friends in their contact list all the time, especially in the middle of the night. Therefore, when the corresponding eigenfunction value is much greater than zero. The results are shown in Table 1:

表1Table 1

characteristiccharacteristicμmuσ²^σ2URL()URL()1.3333121.3333120.4201570.420157FileReq()FileReq()1.2710031.2710030.2365400.236540IPAddr()IPAddr()2.6002122.6002120.7371410.737141

当普通用户使用IM服务时，在文本信息中有几个文件传输请求和网址。在大多数情况下，用户通过文本信息相互沟通。从结果中，还看到，URL()和FileReq()均值是1.333312和1.271003，相应的方差是0.420157和0.236540。这意味着，尽管用户在文本信息中发送网址或文件传输的要求，他们通常发送相同的URL或文件给一个或两个不同的朋友。IPAddr()的平均值和方差是2.600212和0.73714。When an ordinary user uses an IM service, there are several file transfer requests and URLs in the text message. In most cases, users communicate with each other through text messages. From the results, it can also be seen that the URL() and FileReq() mean values are 1.333312 and 1.271003, and the corresponding variances are 0.420157 and 0.236540. This means that although users send URLs or file transfer requests in text messages, they often send the same URL or file to one or two different friends. The mean and variance of IPAddr() are 2.600212 and 0.73714.

在增加即时通讯蠕虫流量后，蠕虫检测：After increasing instant messaging worm traffic, worm detection:

如图1所示，仿真IM蠕虫通过在文本信息中发送网址传播。(a)显示了在特征函数变化情况。显示到当没有即时通讯蠕虫流量时URL()的值不大于1，IPAddr()值的变化范围从0到3。然而，如(b)显示当引入IM蠕虫后URL()和IPAddr()值的突然向顶峰变化接近10。并没有改变FileReq()的值。因此，IM蠕虫可在爆发后的一个单位时间中检测出来。As shown in Figure 1, the simulated IM worm spreads by sending URLs in text messages. (a) shows the change in the characteristic function. It is shown that the value of URL() is not greater than 1 when there is no instant messaging worm traffic, and the value of IPAddr() varies from 0 to 3. However, as shown in (b), the values of URL() and IPAddr() suddenly change to a peak close to 10 when the IM worm is introduced. Does not change the value of FileReq(). Therefore, IM worms can be detected within one unit of time after an outbreak.

图2显示了仿真IM蠕虫通过发送文件传播。(a)显示了FileReq()值不大于1和IPAddr()值变化范围从0到3没有增加IM蠕虫流量。然而，FileReq()值和IPAddr()值不同于正常值在引入IM蠕虫后。他们变化超出7并达到他们顶峰15。FileReq()值一直是0。因此，(b)表明这种方法在引入IM蠕虫后，在爆发后的一个单位时间内检测出来。Figure 2 shows a simulated IM worm spreading by sending a file. (a) shows that FileReq() values no greater than 1 and IPAddr() values ranging from 0 to 3 did not increase IM worm traffic. However, the FileReq() value and IPAddr() value differ from normal values after the introduction of the IM worm. They vary beyond 7 and reach their peak of 15. The FileReq() value is always 0. Therefore, (b) shows that this method detects within one unit time after the outbreak after the IM worm is introduced.

进行了同样的试验反复100次。结果相似的，没有出现负值。The same experiment was repeated 100 times. The results are similar, no negative values appear.

将采用本发明的装置安装在网关中，以1GHz奔腾III为基础的机器。在数据集中每经过10秒钟，记录数据进程部分所需的CPU时间。在99％的样本中，在不到2秒的CPU时间内能够处理10秒的数据包。此外，任何十秒钟样本处理所需最长时间少于四秒CPU时间。所有样本服务率超过了流量的到达率。这表明本发明方法的实时性能超过了一个大型网络10秒连发流量。The device employing the present invention was installed in a gateway, a 1 GHz Pentium III based machine. Every 10 seconds elapsed in the data set, record the CPU time required for the data processing part. Able to process 10 seconds of packets in less than 2 seconds of CPU time in 99% of samples. Furthermore, the maximum time required to process any ten-second sample is less than four seconds of CPU time. All sample service rates exceed the arrival rate of traffic. This shows that the real-time performance of the method of the present invention exceeds the 10-second continuous flow of a large network.