CN107231383B

Movatterモバイル変換

Info

Publication number: CN107231383B
Application number: CN201710655723.9A
Authority: CN
Inventors: 范渊; 徐静; 郭晓; 龙文洁
Original assignee: Hangzhou Dbappsecurity Technology Co Ltd
Current assignee: Hangzhou Dbappsecurity Technology Co Ltd
Priority date: 2017-08-03
Filing date: 2017-08-03
Publication date: 2020-01-17
Anticipated expiration: 2037-08-03
Also published as: CN107231383A

Abstract

Translated fromChinese

本发明提供了一种CC攻击的检测方法及装置，涉及网络信息安全的技术领域，该方法包括：计算Web页面正常访问流量的第一先验概率和Web页面CC攻击访问流量的第二先验概率；获取正常访问流量的比率和CC攻击访问流量的比率，正常访问流量的比率和CC攻击访问流量的比率均为基于样本数据确定出的；采用第一后验概率模型，基于第一先验概率和正常访问流量的比率计算正常访问流量的第一后验概率；采用第二后验概率模型，基于第二先验概率和CC攻击访问流量的比率计算CC攻击访问流量的第二后验概率；基于第一后验概率和第二后验概率确定Web页面是否受到CC攻击，缓解了现有技术中存在的无法及时有效，并准确的检测CC攻击的技术问题。

The present invention provides a method and device for detecting a CC attack, and relates to the technical field of network information security. The method includes: calculating a first prior probability of normal access traffic of a Web page and a second a priori of the access traffic of a web page CC attack Probability; obtain the ratio of normal access traffic and the ratio of CC attack traffic, the ratio of normal access traffic and the ratio of CC attack traffic are determined based on sample data; using the first a posteriori probability model, based on the first a priori The ratio of the probability to the normal access traffic calculates the first a posteriori probability of the normal access traffic; the second a posteriori probability model is used to calculate the second a posteriori probability of the CC attack access traffic based on the ratio of the second prior probability and the CC attack access traffic and determining whether a Web page is attacked by CC based on the first a posteriori probability and the second a posteriori probability, which alleviates the technical problem existing in the prior art that the CC attack cannot be detected in a timely, effective and accurate manner.

Description

Translated fromChinese

CC攻击的检测方法及装置CC attack detection method and device

技术领域technical field

本发明涉及网络信息安全的技术领域，尤其是涉及一种CC攻击的检测方法及装置。The present invention relates to the technical field of network information security, in particular to a method and device for detecting a CC attack.

背景技术Background technique

随着互联网技术的不断发展，计算机网络技术在各行各业得到了广泛应用。互联网应用的快速发展，伴生了许多安全漏洞。这些漏洞，会使计算机遭受病毒和黑客攻击，从而可能导致数据丢失，严重可能导致用户数据丢失或财产损失。因此互联网安全的防护是互联网技术中的重点。With the continuous development of Internet technology, computer network technology has been widely used in all walks of life. The rapid development of Internet applications has created many security vulnerabilities. These vulnerabilities may expose computers to virus and hacker attacks, which may result in data loss, and in severe cases may result in user data loss or property damage. Therefore, the protection of Internet security is the focus of Internet technology.

CC全称为Challenge Collapsar，意为“挑战黑洞”。CC攻击是DDOS分布式拒绝服务的一种，CC攻击利用不断对网站发送连接请求致使形成拒绝服务，且CC攻击具备一定的隐蔽性。The full name of CC is Challenge Collapsar, which means "challenge black hole". The CC attack is a kind of DDOS distributed denial of service. The CC attack uses the continuous sending of connection requests to the website to form a denial of service, and the CC attack has a certain concealment.

目前CC攻击检测和防御手段大致如下：限制源IP即配置黑白名单、限制源IP的连接数、对所有的请求源IP进行统计并计算其请求速率。然而，如今大多数CC攻击通常是通过大量的傀儡机对被攻击的服务器发起请求。当被控制的傀儡机达到一定数量时，这些傀儡机发起请求的IP各不相同，黑白名单策略很难奏效；这些傀儡机IP发送的请求数并不高，不会超过IP连接数的阀值，因此配置连接数阀值手段也很容易被绕过；这些傀儡机IP的请求速率也不一定很高，低于请求速率的阀值、发向每个网站的每个URL的请求速率是不固定的，设置一个IP请求速率阈值，使之适合网站内所有的统一资源定位符(Uniform ResourceLocator，简称URL)是不现实的。At present, the CC attack detection and defense methods are roughly as follows: restricting the source IP, that is, configuring a black and white list, limiting the number of connections to the source IP, collecting statistics on all request source IPs, and calculating the request rate. However, most CC attacks today usually initiate requests to the attacked server through a large number of bots. When the number of controlled puppet machines reaches a certain number, the IPs that these puppet machines initiate requests are different, and the black and white list strategy is difficult to work; the number of requests sent by these puppet machines is not high and will not exceed the threshold of IP connections. , so the method of configuring the connection threshold value can also be easily bypassed; the request rate of these puppet IPs is not necessarily very high, and the request rate of each URL sent to each website is lower than the threshold value of the request rate. Fixed, it is unrealistic to set an IP request rate threshold to make it suitable for all Uniform Resource Locators (URLs) in a website.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明的目的在于提供一种CC攻击的检测方法及装置，以缓解了现有技术中存在的无法及时有效，并准确的检测CC攻击的技术问题。In view of this, the purpose of the present invention is to provide a CC attack detection method and device, so as to alleviate the technical problem existing in the prior art that the CC attack cannot be detected timely, effectively and accurately.

第一方面，本发明实施例提供了一种CC攻击的检测方法，包括：计算Web页面正常访问流量的第一先验概率和所述Web页面CC攻击访问流量的第二先验概率，其中，所述第一先验概率表示样本数据中正常访问流量与URL访问概率相匹配的概率，所述第二先验概率表示所述样本数据中CC攻击访问流量与所述URL访问概率相匹配的概率；获取正常访问流量的比率和CC攻击访问流量的比率，所述正常访问流量的比率和所述CC攻击访问流量的比率均为基于所述样本数据确定出的；采用第一后验概率模型，基于所述第一先验概率和所述正常访问流量的比率计算所述正常访问流量的第一后验概率；采用第二后验概率模型，基于所述第二先验概率和所述CC攻击访问流量的比率计算所述CC攻击访问流量的第二后验概率；基于所述第一后验概率和所述第二后验概率确定所述Web页面是否受到CC攻击。In a first aspect, an embodiment of the present invention provides a method for detecting a CC attack, including: calculating a first prior probability of normal access traffic of a web page and a second prior probability of the web page CC attack traffic, wherein, The first prior probability represents the probability that the normal access traffic in the sample data matches the URL access probability, and the second prior probability represents the probability that the CC attack access traffic in the sample data matches the URL access probability. Obtain the ratio of normal access traffic and the ratio of CC attack access traffic, the ratio of described normal access traffic and the ratio of described CC attack access traffic are determined based on the sample data; Adopt the first a posteriori probability model, Calculate the first a posteriori probability of the normal access traffic based on the ratio of the first prior probability and the normal access traffic; adopt a second a posteriori probability model, based on the second a priori probability and the CC attack The ratio of the access traffic calculates the second a posteriori probability of the access traffic of the CC attack; based on the first a posteriori probability and the second a posteriori probability, it is determined whether the Web page is attacked by the CC.

进一步地，采用第一后验概率模型，基于所述第一先验概率和所述正常访问流量的比率计算所述正常访问流量的第一后验概率包括：通过所述第一后验概率计算模型计算所述第一后验概率，其中，所述第一后验概率计算模型表示为：

P(C＝正常流量|A₁＝a₁,A₂＝a₂,…,A_N＝a_N)为所述第一后验概率，P(C＝正常流量)为所述正常访问流量的比率，P(A_i＝a_i|C＝正常流量)为所述第一先验概率。Further, using a first a posteriori probability model to calculate the first a posteriori probability of the normal access traffic based on the ratio of the first prior probability to the normal access traffic includes: calculating by the first a posteriori probability The model calculates the first posterior probability, wherein the first posterior probability calculation model is expressed as:

P(C=normal traffic|A₁ =a₁ ,A₂ =a₂ ,...,A_N =a_N ) is the first posterior probability, P(C=normal traffic) is the normal access traffic The ratio, P(A_i =a_i |C=normal flow) is the first prior probability.

进一步地，采用第二后验概率模型，基于所述第二先验概率和所述CC攻击访问流量的比率计算所述CC攻击访问流量的第二后验概率包括：通过所述第二后验概率计算模型计算所述第二后验概率，其中，所述第二后验概率计算模型为：

P(C＝CC攻击流量|A₁＝a₁,A₂＝a₂,…,A_N＝a_N)为所述第二后验概率，P(C＝CC攻击流量)为所述CC攻击访问流量的比率，P(A_i＝a_i|C＝CC攻击流量)为所述第二先验概率。Further, using a second a posteriori probability model to calculate the second a posteriori probability of the CC attack access traffic based on the ratio of the second prior probability and the CC attack access traffic includes: passing the second a posteriori The probability calculation model calculates the second posterior probability, wherein the second posterior probability calculation model is:

P(C=CC attack traffic|A₁ =a₁ ,A₂ =a₂ ,...,A_N =a_N ) is the second posterior probability, P(C=CC attack traffic) is the CC attack The ratio of access traffic, P(A_i =a_i |C=CC attack traffic) is the second prior probability.

进一步地，基于所述第一后验概率和所述第二后验概率确定Web页面是否受到CC攻击包括：在所述第一后验概率大于所述第二后验概率的情况下，确定当前时刻访问所述Web页面的访问流量为正常流量；在所述第一后验概率小于所述第二后验概率的情况下，确定当前时刻访问所述Web页面的访问流量为CC攻击流量。Further, determining whether the web page is subject to a CC attack based on the first a posteriori probability and the second a posteriori probability includes: when the first a posteriori probability is greater than the second a posteriori probability, determining the current The access traffic accessing the Web page at all times is normal traffic; in the case that the first a posteriori probability is less than the second a posteriori probability, it is determined that the access traffic accessing the Web page at the current moment is the CC attack traffic.

进一步地，计算Web页面正常访问流量的第一先验概率和所述Web页面CC攻击访问流量的第二先验概率包括：获取实时流量访问日志；在所述流量访问日志中提取URL和所述URL的访问时间信息；基于所述URL和所述访问时间信息确定访问概率集合，其中，所述访问概率集合中包括每个URL的访问概率；基于所述样本数据和所述访问概率集合确定所述第一先验概率和所述第二先验概率。Further, calculating the first a priori probability of the normal access traffic of the Web page and the second a priori probability of the web page CC attack access traffic includes: acquiring a real-time traffic access log; extracting the URL and the said traffic access log from the traffic access log. access time information of the URL; determine an access probability set based on the URL and the access time information, wherein the access probability set includes the access probability of each URL; determine the access probability set based on the sample data and the access probability set the first prior probability and the second prior probability.

进一步地，在计算Web页面正常访问流量的第一先验概率和所述Web页面CC攻击访问流量的第二先验概率之前，所述方法还包括：获取所述样本数据；基于所述样本数据确定所述正常访问流量的比率和所述CC攻击访问流量的比率；基于所述正常访问流量的比率和所述CC攻击访问流量的比率，采用朴素贝叶斯分类模型构建所述第一后验概率计算模型和所述第二后验概率计算模型。Further, before calculating the first a priori probability of the normal access traffic of the web page and the second a priori probability of the web page CC attack access traffic, the method further includes: acquiring the sample data; based on the sample data Determine the ratio of the normal access traffic and the ratio of the CC attack traffic; based on the ratio of the normal access traffic and the ratio of the CC attack traffic, use a naive Bayesian classification model to construct the first a posteriori a probability calculation model and the second posterior probability calculation model.

进一步地，基于所述样本数据确定所述正常访问流量的比率和所述CC攻击访问流量的比率包括：通过第一公式计算所述正常访问流量的比率，其中，所述第一公式表示为：

其中，B₂为在所述样本数据中统计出的正常流量次数，B₁为在所述样本数据中统计出CC攻击流量次数；通过第二公式计算所述攻击访问流量的比率，其中，所述第二公式表示为：

Further, determining the ratio of the normal access traffic and the ratio of the CC attack access traffic based on the sample data includes: calculating the ratio of the normal access traffic through a first formula, where the first formula is expressed as:

Wherein, B₂ is the number of normal traffic counted in the sample data, and B₁ is the number of CC attack traffic counted in the sample data; the ratio of the attack access traffic is calculated by the second formula, wherein all The second formula is expressed as:

第二方面，本发明实施例还提供一种CC攻击的检测装置，包括：第一计算单元，用于计算Web页面正常访问流量的第一先验概率和所述Web页面CC攻击访问流量的第二先验概率，其中，所述第一先验概率表示样本数据中正常访问流量与URL访问概率相匹配的概率，所述第二先验概率表示所述样本数据中CC攻击访问流量与所述URL访问概率相匹配的概率；第一获取单元，用于获取正常访问流量的比率和CC攻击访问流量的比率，所述正常访问流量的比率和所述CC攻击访问流量的比率均为基于所述样本数据确定出的；第二计算单元，用于采用第一后验概率模型，基于所述第一先验概率和所述正常访问流量的比率计算所述正常访问流量的第一后验概率；第三计算单元，用于采用第二后验概率模型，基于所述第二先验概率和所述CC攻击访问流量的比率计算所述CC攻击访问流量的第二后验概率；第一确定单元，用于基于所述第一后验概率和所述第二后验概率确定所述Web页面是否受到CC攻击。In a second aspect, an embodiment of the present invention further provides an apparatus for detecting a CC attack, including: a first calculation unit configured to calculate a first prior probability of normal access traffic of a web page and a first a priori probability of the web page CC attack access traffic Two prior probabilities, wherein the first prior probability represents the probability that the normal access traffic in the sample data matches the URL access probability, and the second prior probability represents that the CC attack access traffic in the sample data matches the URL access probability. The probability that the URL access probability matches; the first obtaining unit is used to obtain the ratio of normal access traffic and the ratio of CC attack access traffic, the ratio of the normal access traffic and the ratio of the CC attack access traffic are both based on the determined by the sample data; a second calculation unit, configured to use a first a posteriori probability model to calculate the first a posteriori probability of the normal access traffic based on the ratio of the first a priori probability and the normal access traffic; a third calculation unit, configured to use a second a posteriori probability model to calculate the second a posteriori probability of the CC attack access traffic based on the ratio of the second prior probability and the CC attack access traffic; the first determining unit , which is used to determine whether the Web page is under CC attack based on the first a posteriori probability and the second a posteriori probability.

进一步地，所述第二计算单元用于：通过所述第一后验概率计算模型计算所述第一后验概率，其中，所述第一后验概率计算模型表示为：

P(C＝正常流量|A₁＝a₁,A₂＝a₂,…,A_N＝a_N)为所述第一后验概率，P(C＝正常流量)为所述正常访问流量的比率，P(A_i＝a_i|C＝正常流量)为所述第一先验概率。Further, the second calculation unit is configured to: calculate the first a posteriori probability by using the first a posteriori probability calculation model, wherein the first a posteriori probability calculation model is expressed as:

进一步地，所述第三计算单元用于：通过所述第二后验概率计算模型计算所述第二后验概率，其中，所述第二后验概率计算模型为：Further, the third calculation unit is configured to: calculate the second a posteriori probability by using the second a posteriori probability calculation model, wherein the second a posteriori probability calculation model is:

P(C＝CC攻击流量|A₁＝a₁,A₂＝a₂,…,A_N＝a_N)为所述第二后验概率，P(C＝CC攻击流量)为所述CC攻击访问流量的比率，P(A_i＝a_i|C＝CC攻击流量)为所述第二先验概率。P(C=CC attack traffic|A₁ =a₁ ,A₂ =a₂ ,...,A_N =a_N ) is the second posterior probability, P(C=CC attack traffic) is the CC attack The ratio of access traffic, P(A_i =a_i |C=CC attack traffic) is the second prior probability.

在本发明实施例中，首先计算Web页面正常访问流量的第一先验概率和Web页面CC攻击访问流量的第二先验概率；然后，获取正常访问流量的比率和CC攻击访问流量的比率；接下来，采用第一后验概率模型，基于第一先验概率和正常访问流量的比率计算正常访问流量的第一后验概率；并采用第二后验概率模型，基于第二先验概率和CC攻击访问流量的比率计算CC攻击访问流量的第二后验概率；最后，基于第一后验概率和第二后验概率确定Web页面是否受到CC攻击。在本发明实施例中，通过对无CC攻击的日志和有CC攻击的日志进行样本训练并建模，模型建立后对实时流量进行模式匹配从而检测CC攻击，从而达到了及时并准确的检测出CC攻击的目的，进而缓解了现有技术中存在的无法及时有效，并准确的检测CC攻击的技术问题，从而实现了提高CC攻击检测效率的技术效果。In the embodiment of the present invention, first calculate the first a priori probability of the normal access traffic of the web page and the second a priori probability of the web page CC attack access traffic; then, obtain the ratio of the normal access traffic and the ratio of the CC attack traffic; Next, the first a posteriori probability model is used to calculate the first a posteriori probability of the normal access traffic based on the ratio of the first prior probability to the normal access traffic; and the second a posteriori probability model is used, based on the second prior probability and The ratio of the access traffic of the CC attack calculates the second a posteriori probability of the access traffic of the CC attack; finally, it is determined whether the web page is under the CC attack based on the first a posteriori probability and the second a posteriori probability. In the embodiment of the present invention, by performing sample training and modeling on logs without CC attacks and logs with CC attacks, after the model is established, pattern matching is performed on real-time traffic to detect CC attacks, thereby achieving timely and accurate detection. The purpose of the CC attack is to alleviate the technical problem in the prior art that the CC attack cannot be detected in a timely, effective and accurate manner, thereby achieving the technical effect of improving the detection efficiency of the CC attack.

本发明的其他特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本发明而了解。本发明的目的和其他优点在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present invention will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the description, claims and drawings.

为使本发明的上述目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附附图，作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present invention more obvious and easy to understand, preferred embodiments are given below, and are described in detail as follows in conjunction with the accompanying drawings.

附图说明Description of drawings

为了更清楚地说明本发明具体实施方式或现有技术中的技术方案，下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the specific embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the specific embodiments or the prior art. Obviously, the accompanying drawings in the following description The drawings are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without creative efforts.

图1是根据本发明实施例的一种CC攻击的检测方法的流程图；1 is a flowchart of a method for detecting a CC attack according to an embodiment of the present invention;

图2是根据本发明实施例的一种CC攻击的检测方法的示意图；2 is a schematic diagram of a method for detecting a CC attack according to an embodiment of the present invention;

图3是根据本发明实施例的一种CC攻击的检测装置的示意图；3 is a schematic diagram of an apparatus for detecting a CC attack according to an embodiment of the present invention;

图4是根据本发明实施例的另一种CC攻击的检测装置的示意图。FIG. 4 is a schematic diagram of another CC attack detection apparatus according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合附图对本发明的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention, but not all of them. example. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

实施例一：Example 1:

根据本发明实施例，提供了一种CC攻击的检测方法的实施例，需要说明的是，在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行，并且，虽然在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present invention, an embodiment of a method for detecting a CC attack is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings may be executed in a computer system such as a set of computer-executable instructions, and , although a logical order is shown in the flowcharts, in some cases steps shown or described may be performed in an order different from that herein.

图1是根据本发明实施例的一种CC攻击的检测方法的流程图，如图1所示，该方法包括如下步骤：FIG. 1 is a flowchart of a method for detecting a CC attack according to an embodiment of the present invention. As shown in FIG. 1 , the method includes the following steps:

步骤S102，计算Web页面正常访问流量的第一先验概率和Web页面CC攻击访问流量的第二先验概率，其中，第一先验概率表示样本数据中正常访问流量与URL访问概率相匹配的概率，第二先验概率表示样本数据中CC攻击访问流量与URL访问概率相匹配的概率；Step S102: Calculate the first prior probability of the normal access traffic of the web page and the second prior probability of the web page CC attack access traffic, wherein the first prior probability indicates that the normal access traffic in the sample data matches the URL access probability. probability, the second prior probability represents the probability that the CC attack access traffic in the sample data matches the URL access probability;

步骤S104，获取正常访问流量的比率和CC攻击访问流量的比率，正常访问流量的比率和CC攻击访问流量的比率均为基于样本数据确定出的；Step S104, obtaining the ratio of normal access traffic and the ratio of CC attack access traffic, the ratio of normal access traffic and the ratio of CC attack access traffic are determined based on sample data;

步骤S106，采用第一后验概率模型，基于第一先验概率和正常访问流量的比率计算正常访问流量的第一后验概率；Step S106, adopting a first a posteriori probability model to calculate the first a posteriori probability of normal access traffic based on the ratio of the first prior probability to the normal access traffic;

步骤S108，采用第二后验概率模型，基于第二先验概率和CC攻击访问流量的比率计算CC攻击访问流量的第二后验概率；Step S108, using a second a posteriori probability model to calculate the second a posteriori probability of the CC attack access traffic based on the second prior probability and the ratio of the CC attack access traffic;

步骤S110，基于第一后验概率和第二后验概率确定Web页面是否受到CC攻击。Step S110: Determine whether the Web page is attacked by CC based on the first a posteriori probability and the second a posteriori probability.

在本发明实施例中，在计算Web页面正常访问流量的第一先验概率和Web页面CC攻击访问流量的第二先验概率之前，还需要构建后验概率计算模型(即，第一后验概率计算模型和第二后验概率计算模型)，在构建后验概率计算模型时，是基于样本数据来构建的，其中，样本数据中包括Web页面的无CC攻击日志和Web页面有CC攻击日志，具体过程描述如下：In this embodiment of the present invention, before calculating the first a priori probability of the normal access traffic of the web page and the second a priori probability of the web page CC attack access traffic, a posterior probability calculation model (that is, the first a posteriori probability Probability calculation model and second posterior probability calculation model), when constructing the posterior probability calculation model, it is constructed based on sample data, wherein the sample data includes no CC attack logs of web pages and web pages with CC attack logs , the specific process is described as follows:

首先，获取样本数据；First, get sample data;

然后，基于样本数据确定正常访问流量的比率和CC攻击访问流量的比率；Then, determine the ratio of normal access traffic and the ratio of CC attack traffic based on the sample data;

最后，基于正常访问流量的比率和CC攻击访问流量的比率，采用朴素贝叶斯分类模型构建第一后验概率计算模型和第二后验概率计算模型。Finally, based on the ratio of normal access traffic and the ratio of CC attack traffic, the naive Bayesian classification model is used to construct the first posterior probability calculation model and the second posterior probability calculation model.

具体地，首先搜集m份受保护对象(例如，受保护Web网页)的访问流量，并且已知这些流量中包括正常流量和CC攻击流量；然后，对该访问流量进行分类统计得到点击率矩阵A(即，样本数据)。Specifically, first collect m access traffic of protected objects (for example, protected web pages), and it is known that these traffic includes normal traffic and CC attack traffic; then, classify and count the access traffic to obtain a click-through rate matrix A (ie, sample data).

其中，点击率矩阵A的表达式为：

在本发明实施例中，在矩阵A中，正常流量为第一行至第x行，CC攻击流量为第x+1行至第m行。在该矩阵中，a_ij表示第i份访问流量中第j个URL出现的概率。需要说明的是，样本数据是由受保护对象服务商提供，样本数据的质量和数量通常是决定一个模型性能的关键因素。Among them, the expression of the click-through rate matrix A is:

In the embodiment of the present invention, in matrix A, the normal traffic is from the first row to the xth row, and the CC attack traffic is from the x+1th row to the mth row. In this matrix, a_ij represents the probability of the occurrence of the jth URL in the ith access traffic. It should be noted that the sample data is provided by the protected object service provider, and the quality and quantity of the sample data are usually key factors determining the performance of a model.

在确定出样本数据之后，就可以基于样本数据确定正常访问流量的比率和CC攻击访问流量的比率。After the sample data is determined, the ratio of normal access traffic and the ratio of CC attack traffic can be determined based on the sample data.

在一个可选的实施方式中，基于样本数据确定正常访问流量的比率和CC攻击访问流量的比率的过程描述如下：In an optional embodiment, the process of determining the ratio of normal access traffic and the ratio of CC attack traffic based on sample data is described as follows:

通过第一公式计算正常访问流量的比率，其中，第一公式表示为：

其中，B₂为在样本数据中统计出的正常流量次数，B₁为在样本数据中统计出CC攻击流量次数；The ratio of normal access traffic is calculated by the first formula, where the first formula is expressed as:

Among them, B₂ is the number of normal traffic counted in the sample data, and B₁ is the count of the CC attack traffic count in the sample data;

通过第二公式计算攻击访问流量的比率，其中，第二公式表示为：

The ratio of attack access traffic is calculated by the second formula, where the second formula is expressed as:

需要说明的是，上述第一公式和第二公式中的次数是在样本数据中计算得到的。It should be noted that the times in the first formula and the second formula above are calculated from the sample data.

在本发明实施例中，可以通过下述方式计算正常访问流量的比率：P(C＝正常流量)＝正常流量次数B2/(正常流量次数B2+CC攻击流量次数B1)。In this embodiment of the present invention, the ratio of normal access traffic can be calculated in the following manner: P(C=normal traffic)=normal traffic times B2/(normal traffic times B2+CC attack traffic times B1).

在本发明实施例中，可以通过下述方式计算CC攻击访问流量的比率：P(C＝CC攻击流量)＝CC攻击流量次数B1/(正常流量次数B2+CC攻击流量次数B1)。In this embodiment of the present invention, the ratio of CC attack traffic can be calculated in the following manner: P(C=CC attack traffic)=CC attack traffic times B1/(normal traffic times B2+CC attack traffic times B1).

在确定出上述CC攻击访问流量的比率和正常访问流量的比率之后，就可以构建第一后验概率模型和构建第二后验概率模型。After the ratio of the CC attack access traffic and the normal access traffic ratio is determined, the first a posteriori probability model and the second a posteriori probability model can be constructed.

在本发明实施例中，可以通过朴素贝叶斯分类器建立后验概率模型，例如，通过公式：

构建第一后验概率模型；以及，通过公式

构建第二后验概率模型。需要说明的是，在上述公式中，P(A₁＝a₁,A₂＝a₂,…,A_N＝a_N)是一个常量。In this embodiment of the present invention, a posterior probability model may be established by using a naive Bayes classifier, for example, by using the formula:

construct a first posterior probability model; and, by formula

Build a second posterior probability model. It should be noted that, in the above formula, P(A₁ =a₁ , A₂ =a₂ , . . . , A_N =a_N ) is a constant.

在构建上述第一后验概率模型和第二后验概率模型之后，就可以对实时访问流量进行模式匹配从而检测CC攻击。After the first a posteriori probability model and the second a posteriori probability model are constructed, pattern matching can be performed on real-time access traffic to detect CC attacks.

在对实时访问流量进行模式匹配来检测CC攻击时，首先，计算Web页面正常访问流量的第一先验概率和Web页面CC攻击访问流量的第二先验概率，具体计算步骤包括如下：When pattern matching is performed on real-time access traffic to detect CC attacks, first, the first prior probability of normal web page access traffic and the second prior probability of web page CC attack traffic are calculated. The specific calculation steps include the following:

步骤S1021，获取实时流量访问日志；Step S1021, obtaining a real-time traffic access log;

步骤S1022，在流量访问日志中提取URL和URL的访问时间信息；Step S1022, extract URL and URL access time information in the traffic access log;

步骤S1023，基于URL和访问时间信息确定访问概率集合，其中，访问概率集合中包括每个URL的访问概率；Step S1023, determining an access probability set based on the URL and the access time information, wherein the access probability set includes the access probability of each URL;

步骤S1024，基于样本数据和访问概率集合确定第一先验概率和第二先验概率。Step S1024: Determine the first prior probability and the second prior probability based on the sample data and the access probability set.

首先，从实时流量访问日志中，按字段来提取URL和访问时间信息，得到提取结果。然后，根据提取结果，计算实时访问流量中，每个URL访问概率，得到访问概率集合：[a₁、a₂、…、a_n]。First, from the real-time traffic access log, extract URL and access time information by field to get the extraction result. Then, according to the extraction result, the access probability of each URL in the real-time access traffic is calculated, and the access probability set_is obtained: [a₁ , a₂ , . . . , an ].

在确定出上述访问概率集合之后，就可以结合样本数据和访问概率集合来计算先验概率。After the above access probability set is determined, the prior probability can be calculated by combining the sample data and the access probability set.

其中，计算得到的正常流量先验概率(即，第一先验概率)表示为：P(A_i＝a_i|C＝正常流量)，i∈1,2,…,N，其中，该正常流量先验概率表示为样本数据中正常访问流量即1…x行中第i列匹配到与访问概率集合中a_i值相等的概率。计算得到的CC攻击流量先验概率(即，第二先验概率)表示为：P(A_i＝a_i|C＝CC攻击流量)，i∈1,2,…,N，其中，该CC攻击流量先验概率表示为样本数据中CC攻击访问流量即x+1…m行中第i列匹配到与访问概率集合中a_i值相等的概率。Wherein, the calculated prior probability of normal flow (ie, the first prior probability) is expressed as: P(A_i =a_i |C=normal flow), i∈1,2,...,N, where the normal flow The traffic prior probability is expressed as the probability that the normal access traffic in the sample data, that is, the i-th column in the 1...x row matches the value of a_i in the access probability set. The calculated prior probability of CC attack traffic (ie, the second prior probability) is expressed as: P(A_i = a_i | C = CC attack traffic), i∈1,2,...,N, where the CC The prior probability of attack traffic is expressed as the probability that the CC attack traffic in the sample data, that is, the i-th column in the x+1...m row matches the value of a_i in the access probability set.

在确定出第一先验概率，第二先验概率，以及确定出正常访问流量的比率和CC攻击访问流量的比率之后，就可以通过后验概率模型来确定第一后验概率和第二后验概率。After determining the first prior probability, the second prior probability, and determining the ratio of normal access traffic and the ratio of CC attack traffic, the first a posteriori probability and the second posterior probability can be determined through the posterior probability model. test probability.

在一个可选的实施方式中，上述步骤S106，即，采用第一后验概率模型，基于第一先验概率和正常访问流量的比率计算正常访问流量的第一后验概率包括如下步骤：In an optional embodiment, the above step S106, that is, using a first a posteriori probability model, calculating the first a posteriori probability of the normal access traffic based on the ratio of the first a priori probability and the normal access traffic includes the following steps:

步骤S1061，通过第一后验概率计算模型计算第一后验概率，其中，第一后验概率计算模型表示为：

P(C＝正常流量|A₁＝a₁,A₂＝a₂,…,A_N＝a_N)为第一后验概率，P(C＝正常流量)为正常访问流量的比率，P(A_i＝a_i|C＝正常流量)为第一先验概率。Step S1061, calculate the first posterior probability through the first posterior probability calculation model, where the first posterior probability calculation model is expressed as:

P(C=normal traffic|A₁ =a₁ ,A₂ =a₂ ,...,A_N =a_N ) is the first posterior probability, P(C=normal traffic) is the ratio of normal access traffic, P( A_i =a_i |C = normal flow) is the first prior probability.

具体地，在本发明实施例中，可以将常量P(A₁＝a₁,A₂＝a₂,…,A_N＝a_N)、P(C＝正常流量)，以及正常访问流量的比率P(A_i＝a_i|C＝正常流量)，i∈1,2,…,N代入到建立的第一后验概率模型中，计算出正常流量第一后验概率P(C＝正常流量|A₁＝a₁,A₂＝a₂,…,A_N＝a_N)。Specifically, in this embodiment of the present invention, the constants P(A₁ =a₁ , A₂ =a₂ ,...,A_N =a_N ), P(C=normal traffic), and the ratio of normal access traffic can be P(A_i = a_i | C = normal flow), i∈1,2,...,N is substituted into the established first a posteriori probability model, and the first a posteriori probability of normal flow P(C = normal flow is calculated |A₁ =a₁ , A₂ =a₂ ,...,A_N =a_N ).

在一个可选的实施方式中，上述步骤S108，即，采用第二后验概率模型，基于第二先验概率和CC攻击访问流量的比率计算CC攻击访问流量的第二后验概率包括如下步骤：In an optional implementation manner, the above step S108, that is, using a second a posteriori probability model to calculate the second a posteriori probability of the CC attack access traffic based on the second prior probability and the ratio of the CC attack access traffic includes the following steps :

步骤S1081，通过第二后验概率计算模型计算第二后验概率，其中，第二后验概率计算模型为：

P(C＝CC攻击流量|A₁＝a₁,A₂＝a₂,…,A_N＝a_N)为第二后验概率，P(C＝CC攻击流量)为CC攻击访问流量的比率，P(A_i＝a_i|C＝CC攻击流量)为第二先验概率。Step S1081: Calculate the second posterior probability by using the second posterior probability calculation model, where the second posterior probability calculation model is:

P(C=CC attack traffic|A₁ =a₁ ,A₂ =a₂ ,...,A_N =a_N ) is the second posterior probability, P(C=CC attack traffic) is the ratio of CC attack traffic , P(A_i =a_i |C=CC attack traffic) is the second prior probability.

将常量P(A₁＝a₁,A₂＝a₂,…,A_N＝a_N)、P(C＝CC攻击流量)、以及CC攻击流量先验概率(即，上述第二先验概率)P(A_i＝a_i|C＝CC攻击流量)，i∈1,2,…,N代入到建立的第二后验概率模型中，从而计算出CC攻击流量后验概率(即，第二后验概率)P(C＝CC攻击流量|A₁＝a₁,A₂＝a₂,…,A_N＝a_N)。The constants P(A₁ =a₁ , A₂ =a₂ , . . . , A_N =a_N ), P(C=CC attack traffic), and the CC attack traffic prior probability (ie, the above-mentioned second prior probability )P(A_i =a_i |C=CC attack traffic), i∈1,2,...,N is substituted into the established second posterior probability model, so as to calculate the posterior probability of CC attack traffic (that is, the first Two posterior probabilities) P(C=CC attack traffic|A₁ =a₁ , A₂ =a₂ , . . . , A_N =a_N ).

在本发明实施例中，在确定出上述第一后验概率和第二后验概率之后，就可以基于第一后验概率和第二后验概率确定Web页面是否受到CC攻击，具体过程描述如下：In this embodiment of the present invention, after the above-mentioned first a posteriori probability and second a posteriori probability are determined, it is possible to determine whether a web page is under a CC attack based on the first a posteriori probability and the second a posteriori probability. The specific process is described as follows :

在第一后验概率大于第二后验概率的情况下，确定当前时刻访问Web页面的访问流量为正常流量；In the case that the first posterior probability is greater than the second posterior probability, determine that the access traffic of accessing the Web page at the current moment is normal traffic;

在第一后验概率小于第二后验概率的情况下，确定当前时刻访问Web页面的访问流量为CC攻击流量。In the case that the first a posteriori probability is less than the second a posteriori probability, it is determined that the access traffic for accessing the Web page at the current moment is the CC attack traffic.

也就是说，如果正常流量后验概率大于CC攻击流量后验概率，此实时流量为正常流量。如果CC攻击流量后验概率大于正常流量后验概率，此实时流量为CC攻击。That is to say, if the posterior probability of normal traffic is greater than the posterior probability of CC attack traffic, the real-time traffic is normal traffic. If the posterior probability of CC attack traffic is greater than the posterior probability of normal traffic, the real-time traffic is a CC attack.

综上各实施例提供的CC攻击的检测方法，为了直观理解上述过程，以图2所示的CC攻击的检测方法的示意图为例进行说明，该方法主要包括：To sum up, the detection methods of CC attacks provided by the above embodiments, in order to intuitively understand the above process, the schematic diagram of the detection methods for CC attacks shown in FIG. 2 is used as an example for description. The method mainly includes:

首先，获取样本数据；然后，对样本数据进行机器学习处理，其中，机器学习指让计算机从已知类别的样本数据，采用朴素贝叶斯分类器，建立模型参数。First, obtain sample data; then, perform machine learning processing on the sample data, where machine learning refers to letting the computer use the naive Bayes classifier to establish model parameters from the sample data of known categories.

上述样本数据的获取和机器学习处理具体包括以下步骤：The acquisition of the above-mentioned sample data and the processing of machine learning specifically include the following steps:

A1、样本数据A1. Sample data

搜集m份受保护对象的访问流量，并且已知这些流量中正常流量和CC攻击流量，然后进行分类统计得到点击率矩阵(即，样本数据)。其中，该点击率矩阵表示为：

其中，在矩阵A中，正常流量为第一行至第x行，CC攻击流量为第x+1行至第m行。在该矩阵中，a_ij表示第i份访问流量中第j个URL出现的概率。需要说明的是，样本数据是由受保护对象服务商提供，样本数据的质量和数量通常是决定一个模型性能的关键因素。Collect m access traffic of protected objects, and know normal traffic and CC attack traffic among these traffic, and then perform classification statistics to obtain a click rate matrix (ie, sample data). Among them, the CTR matrix is expressed as:

Among them, in matrix A, the normal traffic is from the first row to the xth row, and the CC attack traffic is from the x+1th row to the mth row. In this matrix, a_ij represents the probability of the occurrence of the jth URL in the ith access traffic. It should be noted that the sample data is provided by the protected object service provider, and the quality and quantity of the sample data are usually key factors that determine the performance of a model.

A2、建立模型A2. Build a model

构建第一后验概率模型；以及通过公式

构建第二后验概率模型。需要说明的是，在上述公式中，P(A₁＝a₁,A₂＝a₂,…,A_N＝a_N)是一个常量。其中，Z是一个常量P(A₁＝a₁,A₂＝a₂,…,A_N＝a_N)。In this embodiment of the present invention, a posterior probability model may be established by using a naive Bayes classifier, for example, by using the formula:

construct a first posterior probability model; and by formula

Build a second posterior probability model. It should be noted that, in the above formula, P(A₁ =a₁ , A₂ =a₂ , . . . , A_N =a_N ) is a constant. where Z is a constant P (A₁ =a₁ , A₂ =a₂ , . . . , A_N =a_N ).

在建立后验概率模型之后，就可以基于样本数据计算正常访问流量的比率P(C＝正常流量)和CC攻击访问流量的比率P(C＝CC攻击流量)。After the posterior probability model is established, the ratio P of normal access traffic (C=normal traffic) and the ratio P of CC attack traffic (C=CC attack traffic) can be calculated based on the sample data.

其中，正常访问流量的比率可以通过下述公式来计算：P(C＝正常流量)＝正常流量次数/正常流量次数+CC攻击流量次数。The ratio of normal access traffic can be calculated by the following formula: P(C=normal traffic)=normal traffic times/normal traffic times+CC attack traffic times.

CC攻击访问流量的比率可以通过下述公式来计算：P(C＝CC攻击流量)＝CC攻击流量次数/正常流量次数+CC攻击流量次数。The ratio of CC attack traffic can be calculated by the following formula: P(C=CC attack traffic)=CC attack traffic times/normal traffic times+CC attack traffic times.

需要说明的是，在上述公式中的次数是在样本数据中计算得出次数。It should be noted that the times in the above formula are calculated from the sample data.

计算先验概率：从步骤A中的后验概率模型可知，计算后验概率需先计算出正常流量先验概率P(A_i＝a_i|C＝正常流量)，i∈1,2,…,N、CC攻击流量先验概率P(A_i＝a_i|C＝CC攻击流量)，i∈1,2,…,N。具体步骤如下：Calculate the prior probability: From the posterior probability model in step A, it can be known that to calculate the posterior probability, it is necessary to first calculate the prior probability of normal flow P (A_i = a_i | C = normal flow), i∈1,2,… , N, CC attack traffic prior probability P (A_i = a_i | C = CC attack traffic), i∈1,2,...,N. Specific steps are as follows:

B、计算正常流量先验概率(即，第一先验概率)和CC攻击流量先验概率(即，第二先验概率)，其中，计算先验概率包括如下步骤：B. Calculate the prior probability of normal traffic (ie, the first prior probability) and the prior probability of CC attack traffic (ie, the second prior probability), wherein the calculation of the prior probability includes the following steps:

B1、提取访问样本：从实时流量访问日志按字段来提取URL、访问时间信息。B1. Extract access samples: extract URL and access time information by field from real-time traffic access logs.

B2、计算访问概率：根据步骤B1结果，计算实时访问流量中，每个URL访问概率[a1、a2、…、an]。B2. Calculate the access probability: According to the result of step B1, calculate the access probability [a1, a2, ..., an] of each URL in the real-time access traffic.

B3、计算先验概率。B3. Calculate the prior probability.

计算得到的正常流量先验概率(即，第一先验概率)表示为：P(A_i＝a_i|C＝正常流量)，i∈1,2,…,N，其中，该正常流量先验概率表示为样本数据中正常访问流量即1…x行中第i列匹配到与访问概率集合中a_i值相等的概率。计算得到的CC攻击流量先验概率(即，第二先验概率)表示为：P(A_i＝a_i|C＝CC攻击流量)，i∈1,2,…,N，其中，该CC攻击流量先验概率表示为样本数据中CC攻击访问流量即x+1…m行中第i列匹配到与访问概率集合中a_i值相等的概率。The calculated prior probability of normal flow (ie, the first prior probability) is expressed as: P(A_i =a_i |C=normal flow), i∈1,2,...,N, where the normal flow first The test probability is expressed as the normal access flow in the sample data, that is, the probability that the i-th column in the 1...x row matches the value of a_i in the access probability set. The calculated prior probability of CC attack traffic (ie, the second prior probability) is expressed as: P(A_i = a_i | C = CC attack traffic), i∈1,2,...,N, where the CC The prior probability of attack traffic is expressed as the probability that the CC attack access traffic in the sample data, that is, the i-th column in the x+1...m row matches the value of a_i in the access probability set.

C、计算正常流量后验概率(即，第一后验概率)和CC攻击流量后验概率(即，第二后验概率)，其中，计算后验概率包括如下步骤：C. Calculate the posterior probability of normal traffic (that is, the first posterior probability) and the posterior probability of CC attack traffic (that is, the second posterior probability), wherein, calculating the posterior probability includes the following steps:

C1、正常流量后验概率。C1. The posterior probability of normal flow.

将常量P(A₁＝a₁,A₂＝a₂,…,A_N＝a_N)、P(C＝正常流量)、步骤B3的正常流量先验概率P(A_i＝a_i|C＝正常流量)，i∈1,2,…,N代入到A2建立的后验概率模型中，计算出正常流量后验概率P(C＝正常流量|A₁＝a₁,A₂＝a₂,…,A_N＝a_N)The constants P(A₁ =a₁ , A₂ =a₂ ,...,A_N =a_N ), P(C=normal flow), and the prior probability of normal flow in step B3 P(A_i =a_i |C = normal flow), i∈1,2,...,N is substituted into the posterior probability model established by A2, and the posterior probability P of normal flow is calculated (C=normal flow|A₁ =a₁ ,A₂ =a₂ ,...,A_N =a_N )

C2、CC攻击流量后验概率。C2, CC attack traffic posterior probability.

将常量P(A₁＝a₁,A₂＝a₂,…,A_N＝a_N)、P(C＝CC攻击流量)、步骤B3的CC攻击流量先验概率P(A_i＝a_i|C＝CC攻击流量)，i∈1,2,…,N代入到A2建立的后验概率模型中，计算出CC攻击流量后验概率P(C＝CC攻击流量|A₁＝a₁,A₂＝a₂,…,A_N＝a_N)。The constants P(A₁ =a₁ , A₂ =a₂ ,...,A_N =a_N ), P(C=CC attack flow), and the prior probability of CC attack flow in step B3 P(A_i =a_i |C=CC attack traffic), i∈1,2,...,N is substituted into the posterior probability model established by A2, and the posterior probability P of CC attack traffic is calculated (C=CC attack traffic|A₁ =a₁ , A₂ =a₂ , . . . , A_N =a_N ).

D、检测CC攻击D. Detect CC attacks

如果正常流量后验概率大于CC攻击流量后验概率，此实时流量为正常流量。如果CC攻击流量后验概率大于正常流量后验概率，此实时流量为CC攻击。具体实现过程如上，这里不再赘述。If the posterior probability of normal traffic is greater than the posterior probability of CC attack traffic, the real-time traffic is normal traffic. If the posterior probability of CC attack traffic is greater than the posterior probability of normal traffic, the real-time traffic is a CC attack. The specific implementation process is as above, and will not be repeated here.

实施例二：Embodiment 2:

本发明实施例还提供了一种CC攻击的检测装置，该CC攻击的检测装置主要用于执行本发明实施例上述内容所提供的CC攻击的检测方法，以下对本发明实施例提供的CC攻击的检测装置做具体介绍。An embodiment of the present invention further provides a CC attack detection apparatus, and the CC attack detection apparatus is mainly used to perform the CC attack detection method provided by the above content of the embodiment of the present invention. The following describes the CC attack provided by the embodiment of the present invention. The detection device is introduced in detail.

图3是根据本发明实施例的一种CC攻击的检测装置的示意图，如图3所示，该CC攻击的检测装置主要包括：第一计算单元31，第一获取单元32，第二计算单元33，第三计算单元34和第一确定单元35，其中：FIG. 3 is a schematic diagram of an apparatus for detecting a CC attack according to an embodiment of the present invention. As shown in FIG. 3 , the apparatus for detecting a CC attack mainly includes: a first calculation unit 31 , a first acquisition unit 32 , and a second calculation unit 33. The third calculation unit 34 and the first determination unit 35, wherein:

第一计算单元31，用于计算Web页面正常访问流量的第一先验概率和Web页面CC攻击访问流量的第二先验概率，其中，第一先验概率表示样本数据中正常访问流量与URL访问概率相匹配的概率，第二先验概率表示样本数据中CC攻击访问流量与URL访问概率相匹配的概率；The first calculation unit 31 is configured to calculate the first prior probability of the normal access traffic of the web page and the second prior probability of the web page CC attack access traffic, wherein the first prior probability represents the normal access traffic and the URL in the sample data. The probability that the access probability matches, and the second prior probability represents the probability that the CC attack access traffic in the sample data matches the URL access probability;

第一获取单元32，用于获取正常访问流量的比率和CC攻击访问流量的比率，正常访问流量的比率和CC攻击访问流量的比率均为基于样本数据确定出的；The first obtaining unit 32 is used to obtain the ratio of normal access traffic and the ratio of CC attack access traffic, and the ratio of normal access traffic and the ratio of CC attack access traffic are determined based on sample data;

第二计算单元33，用于采用第一后验概率模型，基于第一先验概率和正常访问流量的比率计算正常访问流量的第一后验概率；The second calculation unit 33 is configured to use the first a posteriori probability model to calculate the first a posteriori probability of the normal access traffic based on the ratio of the first prior probability and the normal access traffic;

第三计算单元34，用于采用第二后验概率模型，基于第二先验概率和CC攻击访问流量的比率计算CC攻击访问流量的第二后验概率；The third computing unit 34 is configured to use a second a posteriori probability model to calculate the second a posteriori probability of the CC attack access traffic based on the second prior probability and the ratio of the CC attack access traffic;

第一确定单元35，用于基于第一后验概率和第二后验概率确定Web页面是否受到CC攻击。The first determining unit 35 is configured to determine whether the Web page is attacked by CC based on the first a posteriori probability and the second a posteriori probability.

可选地，第二计算单元用于：通过第一后验概率计算模型计算第一后验概率，其中，第一后验概率计算模型表示为：

P(C＝正常流量|A₁＝a₁,A₂＝a₂,…,A_N＝a_N)为第一后验概率，P(C＝正常流量)为正常访问流量的比率，P(A_i＝a_i|C＝正常流量)为第一先验概率。Optionally, the second calculation unit is configured to: calculate the first posterior probability through a first posterior probability calculation model, wherein the first posterior probability calculation model is expressed as:

可选地，第三计算单元用于：通过第二后验概率计算模型计算第二后验概率，其中，第二后验概率计算模型为：Optionally, the third calculation unit is configured to: calculate the second posterior probability through a second posterior probability calculation model, wherein the second posterior probability calculation model is:

P(C＝CC攻击流量|A₁＝a₁,A₂＝a₂,…,A_N＝a_N)为第二后验概率，P(C＝CC攻击流量)为CC攻击访问流量的比率，P(A_i＝a_i|C＝CC攻击流量)为第二先验概率。

可选地，第一确定单元用于：在第一后验概率大于第二后验概率的情况下，确定当前时刻访问Web页面的访问流量为正常流量；在第一后验概率小于第二后验概率的情况下，确定当前时刻访问Web页面的访问流量为CC攻击流量。Optionally, the first determining unit is configured to: in the case that the first posterior probability is greater than the second posterior probability, determine that the access traffic accessing the Web page at the current moment is normal traffic; when the first posterior probability is less than the second posterior probability In the case of the verification probability, it is determined that the access traffic accessing the web page at the current moment is the CC attack traffic.

可选地，第一计算单元用于：获取实时流量访问日志；在流量访问日志中提取URL和URL的访问时间信息；基于URL和访问时间信息确定访问概率集合，其中，访问概率集合中包括每个URL的访问概率；基于样本数据和访问概率集合确定第一先验概率和第二先验概率。Optionally, the first computing unit is used to: obtain a real-time traffic access log; extract URL and URL access time information from the traffic access log; determine an access probability set based on the URL and the access time information, wherein the access probability set includes each Access probability of a URL; determine the first prior probability and the second prior probability based on the sample data and the access probability set.

可选地，如图4所示，该装置还包括：第二获取单元41，用于在计算Web页面正常访问流量的第一先验概率和Web页面CC攻击访问流量的第二先验概率之前，获取样本数据；第二确定单元42，用于基于样本数据确定正常访问流量的比率和CC攻击访问流量的比率；构建单元43，用于基于正常访问流量的比率和CC攻击访问流量的比率，采用朴素贝叶斯分类模型构建第一后验概率计算模型和第二后验概率计算模型。Optionally, as shown in FIG. 4 , the apparatus further includes: a second obtaining unit 41, configured to calculate the first prior probability of the normal access traffic of the web page and the second prior probability of the web page CC attack traffic. , obtain sample data; The second determination unit 42 is used to determine the ratio of normal access traffic and the ratio of CC attack access traffic based on the sample data; Construction unit 43 is used for the ratio of normal access traffic and the ratio of CC attack access traffic, The naive Bayesian classification model is used to construct the first posterior probability calculation model and the second posterior probability calculation model.

可选地，第二确定单元用于：通过第一公式计算正常访问流量的比率，其中，第一公式表示为：

其中，A₁为在样本数据中统计出的正常流量次数，B₁为在样本数据中统计出CC攻击流量次数；通过第二公式计算正常访问流量的比率，其中，第二公式表示为：

Optionally, the second determining unit is configured to: calculate the ratio of normal access traffic through a first formula, where the first formula is expressed as:

Among them, A₁ is the number of normal traffic counted in the sample data, B₁ is the number of CC attack traffic counted in the sample data; the ratio of normal access traffic is calculated by the second formula, where the second formula is expressed as:

另外，在本发明实施例的描述中，除非另有明确的规定和限定，术语“安装”、“相连”、“连接”应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通。对于本领域的普通技术人员而言，可以具体情况理解上述术语在本发明中的具体含义。In addition, in the description of the embodiments of the present invention, unless otherwise expressly specified and limited, the terms "installed", "connected" and "connected" should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrally connected; it can be a mechanical connection or an electrical connection; it can be a direct connection, or an indirect connection through an intermediate medium, or the internal communication between the two components. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood in specific situations.

在本发明的描述中，需要说明的是，术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。此外，术语“第一”、“第二”、“第三”仅用于描述目的，而不能理解为指示或暗示相对重要性。In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. The indicated orientation or positional relationship is based on the orientation or positional relationship shown in the accompanying drawings, which is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the indicated device or element must have a specific orientation or a specific orientation. construction and operation, and therefore should not be construed as limiting the invention. Furthermore, the terms "first", "second", and "third" are used for descriptive purposes only and should not be construed to indicate or imply relative importance.

本发明实施例所提供的一种CC攻击的检测方法及装置的计算机程序产品，包括存储了处理器可执行的非易失的程序代码的计算机可读存储介质，程序代码包括的指令可用于执行前面方法实施例中的方法，具体实现可参见方法实施例，在此不再赘述。The computer program product of a method and apparatus for detecting a CC attack provided by the embodiment of the present invention includes a computer-readable storage medium storing non-volatile program code executable by a processor, and the instructions included in the program code can be used to execute For the methods in the foregoing method embodiments, reference may be made to the method embodiments for specific implementation, and details are not described herein again.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的，例如，单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，又例如，多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。Units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-OnlyMemory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods of various embodiments of the present invention. The aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.

最后应说明的是：以上实施例，仅为本发明的具体实施方式，用以说明本发明的技术方案，而非对其限制，本发明的保护范围并不局限于此，尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化，或者对其中部分技术特征进行等同替换；而这些修改、变化或者替换，并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应以权利要求的保护范围为准。Finally, it should be noted that the above embodiments are only specific implementations of the present invention, and are used to illustrate the technical solutions of the present invention, but not to limit them. The protection scope of the present invention is not limited thereto, although with reference to the foregoing embodiments The present invention has been described in detail, and those of ordinary skill in the art should understand that: any person skilled in the art can still modify or modify the technical solutions described in the foregoing embodiments within the technical scope disclosed by the present invention. Changes are easily thought of, or equivalent replacements are made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included in the protection of the present invention. within the range. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A method for detecting CC attack is characterized by comprising the following steps:

calculating a first prior probability of the normal access flow of the Web page and a second prior probability of the CC attack access flow of the Web page, wherein the first prior probability represents the probability that the normal access flow in sample data is matched with the URL access probability, and the second prior probability represents the probability that the CC attack access flow in the sample data is matched with the URL access probability;

obtaining the ratio of normal access traffic and the ratio of CC attack access traffic, wherein the ratio of the normal access traffic and the ratio of the CC attack access traffic are determined based on the sample data;

calculating the first posterior probability through the first posterior probability calculation model, wherein the first posterior probability calculation model is expressed as:

p (C ═ normal flow | a)₁＝a₁,A₂＝a₂,…,A_N＝a_N) For the first a posteriori probability, P (C ═ normal traffic) is the ratio of the normal access traffic, P (a)_i＝a_iI C ═ normal traffic) is the first prior probability;

calculating the second posterior probability through the second posterior probability calculation model, wherein the second posterior probability calculation model is:

p (C ═ CC attack traffic | a₁＝a₁,A₂＝a₂,…,A_N＝a_N) For the second a posteriori probability, P (C ═ CC attack traffic) is the ratio of the CC attack access traffic, P (a)_i＝a_iC ═ CC attack traffic) as the second prior probability; wherein Z is a constant, a_iN is the maximum value of i for the elements in the access probability set;

determining the access flow for accessing the Web page at the current moment as a normal flow under the condition that the first posterior probability is greater than the second posterior probability;

and under the condition that the first posterior probability is smaller than the second posterior probability, determining that the access flow for accessing the Web page at the current moment is CC attack flow.

2. The method of claim 1, wherein calculating a first prior probability of normal access traffic of a Web page and a second prior probability of CC attack access traffic of the Web page comprises:

acquiring a real-time flow access log;

extracting a URL and access time information of the URL from the flow access log;

determining an access probability set based on the URLs and the access time information, wherein the access probability set comprises access probability of each URL;

determining the first prior probability and the second prior probability based on the sample data and the set of access probabilities.

3. The method of claim 1, wherein prior to calculating a first prior probability of normal access traffic for a Web page and a second prior probability of CC attack access traffic for the Web page, the method further comprises:

acquiring the sample data;

determining a ratio of the normal access traffic and a ratio of the CC attack access traffic based on the sample data;

and constructing the first posterior probability calculation model and the second posterior probability calculation model by adopting a naive Bayes classification model based on the ratio of the normal access traffic and the ratio of the CC attack access traffic.

4. The method of claim 3, wherein determining the ratio of normal access traffic and the ratio of CC attack access traffic based on the sample data comprises:

calculating a ratio of the normal access traffic by a first formula, wherein the first formula is expressed as:

wherein, B₂For the number of normal flows counted in the sample data, B₁Counting the number of CC attack flow in the sample data;

calculating a ratio of the attack access traffic by a second formula, wherein the second formula is represented as:

5. an apparatus for detecting a CC attack, comprising:

the first calculation unit is used for calculating a first prior probability of the normal access flow of the Web page and a second prior probability of the CC attack access flow of the Web page, wherein the first prior probability represents the probability that the normal access flow in sample data is matched with the URL access probability, and the second prior probability represents the probability that the CC attack access flow in the sample data is matched with the URL access probability;

a first obtaining unit, configured to obtain a ratio of normal access traffic and a ratio of CC attack access traffic, where the ratio of normal access traffic and the ratio of CC attack access traffic are both determined based on the sample data;

a second calculation unit configured to calculate the first posterior probability by the first posterior probability calculation model, wherein the first posterior probability calculation model is expressed as:p (C ═ normal flow | a)₁＝a₁,A₂＝a₂,…,A_N＝a_N) For the first a posteriori probability, P (C ═ normal traffic) is the ratio of the normal access traffic, P (a)_i＝a_iI C ═ normal traffic) is the first prior probability;

a third calculation unit for calculating a model by the second posterior probabilityCalculating the second posterior probability, wherein the second posterior probability calculation model is:

the first determining unit is used for determining that the access flow for accessing the Web page at the current moment is normal flow under the condition that the first posterior probability is greater than the second posterior probability; and under the condition that the first posterior probability is smaller than the second posterior probability, determining that the access flow for accessing the Web page at the current moment is CC attack flow.