CN112085039B

Movatterモバイル変換

Info

Publication number: CN112085039B
Application number: CN201910503725.5A
Authority: CN
Inventors: 刘亮; 胡星高; 郑荣锋; 周安民
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2019-06-12
Filing date: 2019-06-12
Publication date: 2022-08-16
Anticipated expiration: 2039-06-12
Also published as: CN112085039A

Abstract

Translated fromChinese

本发明涉及机器学习算法和隐蔽通道检测领域，旨在提供一种基于随机森林机器学习算法的ICMP隐蔽通道检测方法。该技术首先捕获网络间通信的数据包，提取数据包中与数据包相关的基本信息（源IP地址，目的IP地址等），将信息归类，根据这些信息形成ICMP报文流，利用本方法特有的规则从对应的ICMP报文流中提取特征，将得到关于源IP地址和目的IP地址之间ICMP报文通信的数据流的特征，再将特征利用基于随机森林的机器学习方法进行训练，最后获得检测ICMP隐蔽通道的分类器。利用本方法进行ICMP隐蔽通道检测，计算成本和时间成本低，所生成的ICMP流特征，针对性强，可信度高，能有效地检测出ICMP隐蔽通道。

The invention relates to the field of machine learning algorithm and covert channel detection, and aims to provide an ICMP covert channel detection method based on random forest machine learning algorithm. The technology first captures the data packets communicated between the networks, extracts the basic information related to the data packets (source IP address, destination IP address, etc.) The unique rules extract features from the corresponding ICMP message flow, and obtain the features of the data flow of the ICMP message communication between the source IP address and the destination IP address, and then use the machine learning method based on random forest to train the features. Finally, a classifier for detecting ICMP covert channels is obtained. Using the method to detect the ICMP covert channel, the calculation cost and time cost are low, the generated ICMP flow feature has strong pertinence and high reliability, and can effectively detect the ICMP covert channel.

Description

Translated fromChinese

一种基于随机森林的ICMP隐蔽通道检测方法A Random Forest-based ICMP Covert Channel Detection Method

技术领域technical field

本发明涉及网络流量监测技术，旨在利用ICMP流量的特征和随机森林算法对ICMP数据包进行检测，核心是从捕获到的正常的ICMP报文和异常的ICMP报文中提取特征，通过特定生成规则和机器学习方法生成的分类模型来识别ICMP隐蔽通道通信行为。The invention relates to a network flow monitoring technology, and aims to use the characteristics of ICMP flow and random forest algorithm to detect ICMP data packets. Classification models generated by rules and machine learning methods to identify ICMP covert channel communication behavior.

背景技术Background technique

随着计算机网络技术的迅速发展，越来越多的先进技术相应诞生，与此同时，信息安全问题的出现也随之严重。隐蔽隧道问题是目前较为严重的一类信息安全问题，隐蔽隧道是基于网络协议的漏洞问题而产生的一种利用各种网络协议的冗余来秘密进行数据的传输信息、攻击网络的手段。尽管入侵检测系统、防火墙等安全工具被广泛的使用，但伴随着协议本身的漏洞问题，秘密数据的传输、数据泄露、通过恶意软件绕过防火墙获取信息等安全事件层出不穷。利用隐蔽通道的相关特点，数据在网络中被秘密传送而不被发现，这对于黑客窃取信息更为便利。With the rapid development of computer network technology, more and more advanced technologies are born accordingly, and at the same time, the emergence of information security problems is also serious. Covert tunnel problem is a serious information security problem at present. Covert tunnel is a means of secretly transmitting data and attacking the network by using the redundancy of various network protocols based on the vulnerability of network protocols. Although security tools such as intrusion detection systems and firewalls are widely used, with the vulnerability of the protocol itself, security incidents such as transmission of secret data, data leakage, and bypassing firewalls to obtain information through malware emerge in an endless stream. Using the relevant characteristics of covert channels, data is secretly transmitted in the network without being discovered, which is more convenient for hackers to steal information.

在面对复杂数量庞大的ICMP报文时，为了判断是否存在ICMP隐蔽通道，首要面对的就是以下几个问题：In the face of complex and huge ICMP packets, in order to determine whether there is an ICMP covert channel, the first problem to face is the following:

1）庞大的数据量使得数据捕获模块可能无法完全捕获内网中全部的数据包，因而产生漏抓ICMP报文的数据包的情况；1) Due to the huge amount of data, the data capture module may not be able to completely capture all the data packets in the intranet, resulting in the situation that the data packets of the ICMP packets are missed;

2）处理所有的数据包，会消耗大量资源，如果是在内网的条件下将所有数据包进行保存，在极大地消耗网络中的带宽资源的同时也会消耗大量的存储硬件资源；2) Processing all data packets will consume a lot of resources. If all data packets are stored under the condition of the intranet, it will consume a lot of storage hardware resources while greatly consuming the bandwidth resources in the network;

3）防火墙对于ICMP报文的数据包几乎不做任何过滤；3) The firewall hardly filters the data packets of ICMP packets;

4）已有的检测ICMP隐蔽通道的方法需要的时间长，计算复杂，且缺乏实时性。4) The existing methods for detecting ICMP covert channels require a long time, are computationally complex, and lack real-time performance.

为了解决以上的问题，提高ICMP隐蔽通道的检测效率和减少对网络资源的消耗，本发明提出了一种基于随机森林的ICMP隐蔽通道检测方法，可以有效地减少对网络资源的占用，并提高ICMP隐蔽通道的检测准确率。In order to solve the above problems, improve the detection efficiency of ICMP covert channel and reduce the consumption of network resources, the present invention proposes an ICMP covert channel detection method based on random forest, which can effectively reduce the occupation of network resources and improve ICMP. Covert channel detection accuracy.

发明内容SUMMARY OF THE INVENTION

“一种基于随机森林的ICMP隐蔽通道检测方法”是对网络流量中含有ICMP报文的数据包的检测过程中所提出来的发明，本发明的一个目的是针对现有的ICMP隐蔽通道检测计算复杂，检测准确度不高，消耗资源过多等缺点，提出的一种基于随机森林分类的ICMP隐蔽通道检测方法。利用机器学习算法来实现ICMP隐蔽通道的检测是一种很好的弥补现有的ICMP检测不足之处的方法，但是该方法对于ICMP的检测主要依靠两点：1）样本是否足够广泛2)用于机器学习的特征是否足够典型，冗余度是否足够小。"An ICMP covert channel detection method based on random forest" is an invention proposed in the process of detecting data packets containing ICMP messages in network traffic. Due to the disadvantages of complexity, low detection accuracy, and excessive resource consumption, an ICMP covert channel detection method based on random forest classification is proposed. Using machine learning algorithms to detect ICMP covert channels is a good way to make up for the shortcomings of existing ICMP detection, but this method mainly relies on two points for ICMP detection: 1) Whether the sample is wide enough 2) Using Whether the features of machine learning are typical enough and the redundancy is small enough.

本发明提供了一种新的ICMP隐蔽通道检测方法，通过采用本发明特有的ICMP报文的特征，利用随机森林方法进行机器学习，从而实现高效率，高准确率的具有高针对性的ICMP隐蔽通道检测。该方法包括四个模块：数据捕获处理模块，通过多线程技术捕获数据包，并将其中含有ICMP报文的数据包保留下来；预处理模块，将含有ICMP报文的数据包进行整合分类，形成以源IP地址和目的IP地址以及一定时间为依据的ICMP报文数据流；数据处理模块，将预处理模块得到的ICMP报文数据流根据特征生成规则生成ICMP数据流特征；机器学习模块，将数据流特征数值化、标准化，再通过随机森林的方法，生成一个具有高准确、高效率的分类器，从而通过这个分类器能够有效地检测ICMP隐蔽通道。The present invention provides a new ICMP covert channel detection method. By adopting the characteristics of the unique ICMP message of the present invention, the random forest method is used for machine learning, so as to achieve high efficiency and high accuracy and highly targeted ICMP covert Channel detection. The method includes four modules: a data capture and processing module, which captures data packets through multi-threading technology, and retains the data packets containing ICMP messages; a preprocessing module, which integrates and classifies the data packets containing ICMP messages to form ICMP message data stream based on source IP address, destination IP address and a certain time; data processing module, which generates ICMP data stream features based on the ICMP message data stream obtained by the preprocessing module according to the feature generation rules; The data flow features are quantified and standardized, and then a random forest method is used to generate a classifier with high accuracy and efficiency, so that the ICMP covert channel can be effectively detected by this classifier.

附图说明Description of drawings

为了更进一步描述本发明的实现目标、实现方法和特性，下面将要结合附图进行详细描述，以便更加清楚地理解本发明的目标、实现方法、优点和特性。In order to further describe the objectives, implementation methods and characteristics of the present invention, the following will be described in detail with reference to the accompanying drawings, so as to more clearly understand the objectives, implementation methods, advantages and characteristics of the present invention.

图1是一个展示本发明的一个整体流程的框架图。FIG. 1 is a frame diagram showing an overall flow of the present invention.

图2是一个说明本发明的数据捕获预处理模块的流程图。Figure 2 is a flow chart illustrating the data capture preprocessing module of the present invention.

图3是一个说明本发明的预处理模块的流程图。Figure 3 is a flow chart illustrating the preprocessing module of the present invention.

图4是一个说明本发明的数据处理模块的流程图。Figure 4 is a flow chart illustrating the data processing module of the present invention.

图5是一个说明本发明的机器学习模块的流程图。Figure 5 is a flow diagram illustrating the machine learning module of the present invention.

具体实施方式Detailed ways

本发明分为四个模块，第一个模块的目的在于减少其他协议或者与隐蔽通道检测无关的流量，从而提高检测效率，精准度和减少干扰。第二个模块的目的在于获得只属于通信双方的ICMP数据包。第三个模块的目的在于获得具有更强针对性的ICMP报文流特征。第四个模块的目的在于通过机器学习算法，在模块三的特征基础上，形成高效率、高准确度的分类器，从而实现能够快速、精准的进行ICMP隐蔽通道检测。The present invention is divided into four modules, the purpose of the first module is to reduce other protocols or traffic irrelevant to covert channel detection, thereby improving detection efficiency, accuracy and reducing interference. The purpose of the second module is to obtain the ICMP data packets belonging to only the two communicating parties. The purpose of the third module is to obtain more targeted ICMP message flow characteristics. The purpose of the fourth module is to form a high-efficiency and high-accuracy classifier based on the features of module three through machine learning algorithms, so as to achieve fast and accurate ICMP covert channel detection.

下面结合附图对本发明做进一步说明。The present invention will be further described below with reference to the accompanying drawings.

如图1所示，是描述本发明的技术架构图，以分层模型架构。每一层具有不同的功能划分，且每一层的输入数据都来自于上一层的输出数据。最上层的数据来自于路由器镜像口的数据包，现在对每一层的功能和流程作详细的描述。As shown in FIG. 1, it is a technical architecture diagram describing the present invention, which is structured in a layered model. Each layer has different functional divisions, and the input data of each layer comes from the output data of the previous layer. The data of the top layer comes from the data packets of the mirror port of the router. Now the function and process of each layer are described in detail.

如图2所示，为数据捕获模块，其主要的工作是负责将网络中的通信流量捕获，再将其中所需要的ICMP报文筛选提取出来。As shown in Figure 2, it is a data capture module, whose main job is to capture the communication traffic in the network, and then filter and extract the required ICMP packets.

紧接着，如图3，执行预处理模块，将ICMP报文按照源IP地址和目的IP地址，以及发送的时间在一定规则下进行分类，最终得到一定时间内的两个IP之间的ICMP报文流。这里的规则如下所示：从收到的第一个ICMP报文的数据包开始，在接下来的60s内两个IP之间交互产生的所有ICMP包，将其称为一个ICMP报文数据流。这里的ICMP报文只考虑请求回显报文（类型8）和回显应答报文（类型0）。Then, as shown in Figure 3, the preprocessing module is executed to classify the ICMP packets according to the source IP address, destination IP address, and the sending time under certain rules, and finally obtain the ICMP packet between the two IPs within a certain period of time. text flow. The rules here are as follows: starting from the data packet of the first ICMP packet received, all ICMP packets generated by the interaction between the two IPs in the next 60s are called an ICMP packet data flow . The ICMP message here only considers the request echo message (type 8) and the echo response message (type 0).

然后，如图4，开始处理ICMP报文流的信息，ICMP报文流的信息来自于图3中的预处理模块，根据规则从图3中所得到的ICMP报文数据流提取所需要的各项ICMP数据流统计特征。其中规则已经在图4中一一列出，按照这些规则，最终生成属于ICMP报文流的特征。Then, as shown in Figure 4, start processing the information of the ICMP message flow. The information of the ICMP message flow comes from the preprocessing module in Figure 3, and according to the rules, the required ICMP message data flow is extracted from Item ICMP data flow statistical characteristics. The rules have been listed one by one in Figure 4. According to these rules, the characteristics belonging to the ICMP message flow are finally generated.

将ICMP报文流特征提取完毕后，根据原始数据来源，给ICMP报文流信息打上正负标记，存入数据库。After the feature extraction of the ICMP message flow is completed, according to the original data source, the ICMP message flow information is marked with positive and negative marks and stored in the database.

最后就是将提取完成的数据利用机器学习算法进行学习，最后生成一个可以高效率、高准确率的ICMP隐蔽通道检测器，其具体过程如图5所示。The last step is to use the machine learning algorithm to learn the extracted data, and finally generate an ICMP covert channel detector with high efficiency and high accuracy. The specific process is shown in Figure 5.

先将数据从数据库中提取出来，将数据进行标准化处理，得到一组符合随机森林输入的数据，然后利用随机森林的算法，最终生成ICMP隐蔽通道检测器。First, extract the data from the database, standardize the data, and obtain a set of data that matches the random forest input, and then use the random forest algorithm to finally generate the ICMP covert channel detector.