




技术领域technical field
本发明属于物联网安全测试领域,涉及一种基于物联网终端评测平台的自动测试方法及系统。The invention belongs to the field of Internet of Things security testing, and relates to an automatic testing method and system based on an Internet of Things terminal evaluation platform.
背景技术Background technique
随着我国第五代移动通信技术(5G)正式商用,物联网行业即将进入创新发展期,物联网为海量的信息资源、服务资源、应用资源提供开放共享化平台。通过物联网平台,所有用户都可以和权限范围以内的设备进行互动,物联网资源得到广泛利用。但在物联网推动行业发展、方便民众生活的同时,平台的安全测评方面尚不成熟,物联网攻击已成为热点安全话题之一。With the official commercial use of the fifth-generation mobile communication technology (5G) in my country, the Internet of Things industry is about to enter a period of innovation and development. The Internet of Things provides an open and shared platform for massive information resources, service resources, and application resources. Through the IoT platform, all users can interact with devices within the scope of their authority, and IoT resources are widely used. However, while the Internet of Things promotes the development of the industry and facilitates people's lives, the security evaluation of the platform is still immature, and the Internet of Things attack has become one of the hot topics in security.
物联网终端规模庞大,物联网终端评测平台的漏洞也随之增加,攻击者会根据这些漏洞发起攻击。物联网因其异构多域共享、海量节点并存等特点,导致物联网更加容易遭受到诸如DoS/DDoS攻击、密码暴力破解、前台页面注入等攻击。The scale of IoT terminals is huge, and the vulnerabilities of IoT terminal evaluation platforms also increase, and attackers will launch attacks based on these vulnerabilities. Due to the characteristics of heterogeneous multi-domain sharing and coexistence of massive nodes, the Internet of Things is more vulnerable to attacks such as DoS/DDoS attacks, password brute force cracking, and front page injection.
公开号为CN111372247A(名称:一种基于窄带物联网的终端安全接入方法及终端安全接入系统)的专利申请,公开的系统需要获取电力物联管理平台的平台公钥及私钥、第一数字签名、第二数字签名等,认证机制面临大量对接、协调工作,过程复杂,极大地阻碍了物联网终端规模的扩展。公开号为CN106789153A(名称:物联网系统终端设备的多渠道自适应日志记录、输出方法及系统)的专利,公开的系统虽然有效地记录和传输设备日志,并由日志服务器转发至日志服务器或云终端记录;但不具备异常检测分析能力,如果发生异常,不能及时发现并告警。Patent application with publication number CN111372247A (name: a terminal security access method and terminal security access system based on narrowband Internet of Things), the disclosed system needs to obtain the platform public key and private key of the power IoT management platform, the first Digital signatures, second digital signatures, etc., the authentication mechanism faces a lot of docking and coordination work, and the process is complicated, which greatly hinders the expansion of the scale of IoT terminals. Patent publication number CN106789153A (name: multi-channel adaptive log recording, output method and system for terminal equipment of Internet of Things system), although the disclosed system effectively records and transmits equipment logs, and is forwarded by the log server to the log server or cloud Terminal records; however, it does not have the ability to detect and analyze anomalies. If an anomaly occurs, it cannot be detected and alerted in time.
日志文件作为平台健康状态记录的重要手段,分析系统日志对整个平台状态尤为重要,直接从海量日志中提取网络事件极具挑战。其原因有以下四点:(1)日志海量非结构化;(2)用于标识日志事件的等级不能准确表明事件类型,不能直接用于异常检测;(3)日志内容较复杂,包含参数和模板词,需要先验知识才能找到日志中的准确含义;(4)日志模式是不断变化的。Log files are an important means to record the health status of the platform. Analyzing system logs is particularly important for the status of the entire platform. It is very challenging to directly extract network events from massive logs. There are four reasons for this: (1) The log is massive and unstructured; (2) The level used to identify log events cannot accurately indicate the event type and cannot be directly used for anomaly detection; (3) The log content is complex, including parameters and Template words, which require prior knowledge to find the exact meaning in the log; (4) log patterns are constantly changing.
为解决上述问题,亟需一种基于物联网终端评测平台的自动测试方法,及时检测平台的健康状况。In order to solve the above problems, an automatic testing method based on an IoT terminal evaluation platform is urgently needed to detect the health status of the platform in time.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本发明的目的在于提供一种基于物联网终端评测平台的自动测试方法及系统,设计基于关联关系的树型模板提取方法,对新的日志进行快速提取,不断更新模板库,提出一种K-Means和LSTM结合的异常检测分析方法,及时检测平台的健康状况。In view of this, the purpose of the present invention is to provide an automatic testing method and system based on the Internet of Things terminal evaluation platform, design a tree template extraction method based on an association relationship, quickly extract new logs, and continuously update the template library. An anomaly detection analysis method that combines K-Means and LSTM to detect the health of the platform in time.
为达到上述目的,本发明提供如下技术方案:To achieve the above object, the present invention provides the following technical solutions:
1、一种基于物联网终端评测平台的自动测试方法,具体包括以下步骤:1. An automatic testing method based on an IoT terminal evaluation platform, which specifically includes the following steps:
S1:基于关联关系的模板树提取:利用设定的规则对原始日志进行初步的过滤清洗;对初步清洗后的日志以空格为标志切分,区分参数词和模板词;提取共用的日志模板库,并不断更新模板库;S1: Template tree extraction based on association relationship: Preliminarily filter and clean the original log by using the set rules; segment the log after preliminary cleaning with spaces as symbols to distinguish parameter words and template words; extract the common log template library , and constantly update the template library;
S2:异常检测分析:对于预处理后的数据,先基于大量的正常日志构建事件库,计算待检测日志与正常日志模板库的异常度来检测是否为异常日志;若为异常日志,先判断是否为已知异常事件以及种类,否则将待检测日志作为新类型异常攻击加入异常模板库中。S2: Anomaly detection and analysis: For the preprocessed data, first build an event database based on a large number of normal logs, and calculate the degree of anomaly between the log to be detected and the normal log template library to detect whether it is an abnormal log; if it is an abnormal log, first determine whether it is an abnormal log. The abnormal events and types are known, otherwise, the log to be detected is added to the abnormal template library as a new type of abnormal attack.
首先,对于已经提取的模板,依据关联分析方法,对日志提取特征值;然后,使用K-Means算法进行聚类;最后,使用改进的LSTM模型执行异常检测,对异常分析之后的日志事件进行统计分析,并以图表的形式更加直观地展示出来。First, for the extracted template, according to the association analysis method, extract feature values from the log; then, use the K-Means algorithm for clustering; finally, use the improved LSTM model to perform anomaly detection, and count the log events after anomaly analysis. Analysis, and display it more intuitively in the form of charts.
进一步,步骤S1中,过滤并切分保存日志具体包括:日志格式虽然多样,但存在固定不变的短文本,日志中的每个单词之间都以空格分隔开的,可利用空格为标志对日志切分,得到的文本词包含参数词和模板词;将切分后的日志信息保存于数组中,利用数组下标区分参数词和模板词。Further, in step S1, filtering and dividing and saving the log specifically includes: although the log format is diverse, there is a fixed short text, and each word in the log is separated by a space, which can be marked with a space. For log segmentation, the obtained text words include parameter words and template words; the segmented log information is stored in an array, and the array subscript is used to distinguish parameter words and template words.
进一步,步骤S1中,区分参数词和模板词,具体包括:根据模板词的概率大于参数词来进行区分;模板词通常会出现在具有相等长度的消息的相同位置上。因此在切词过程中,记录下单词的位置和消息的长度(p,len)。使用基于条件概率的计算公式得出每个单词作为模板词的可能性,此概率作为该词的得分Score;Further, in step S1, distinguishing between parameter words and template words specifically includes: distinguishing according to the probability that the template words are greater than the parameter words; the template words usually appear in the same position of messages with equal lengths. Therefore, in the process of word segmentation, the position of the word and the length of the message (p, len) are recorded. Use the calculation formula based on conditional probability to obtain the possibility of each word as a template word, and this probability is used as the score of the word;
评判标准公式为:The judging criteria formula is:
Score(word,p,len)=P(word|p,len)Score(word,p,len)=P(word|p,len)
其中,P表示单词word出现在整个单词的p位置的概率,将这个概率作为其得分;len表示消息长度。Among them, P represents the probability that the word word appears in the p position of the whole word, and this probability is used as its score; len represents the length of the message.
进一步,步骤S1中,构造基于关联关系的模板树,具体包括:从根节点开始,记录根节点的词频,然后记录每个节点下面的子节点词频,若在同一父节点的同一层子节点中,出现某一词的词频数量和其父节点或祖父节点词频相同,则从根节点到子节点方向的日志模板止于此,将该层及其子节点均删除。Further, in step S1, construct a template tree based on the association relationship, which specifically includes: starting from the root node, recording the word frequency of the root node, and then recording the word frequency of the child nodes below each node, if in the same layer of child nodes of the same parent node , the word frequency of a certain word is the same as the word frequency of its parent or grandparent node, then the log template from the root node to the child node ends here, and the layer and its child nodes are deleted.
进一步,步骤S1中,自主更新模板库,具体包括:对于新进入的日志,在经过处理之后,和原日志模板库中的模板树进行比较;由于在前期的选择模板词时是通过阈值作为标准评选,因此在后期的比较过程中采用一个最大近似值,用于表示新进入的模板和模板库的近似程度,表示如下:Further, in step S1, the template library is updated autonomously, which specifically includes: for the newly entered log, after processing, it is compared with the template tree in the original log template library; because the threshold is used as the standard when selecting template words in the early stage Therefore, in the later comparison process, a maximum approximation value is used to indicate the degree of approximation between the newly entered template and the template library, which is expressed as follows:
Logs(N,M)=Nx/MxLogs(N,M)=Nx /Mx
其中,x表示第x类模板,Nx表示新加入的日志模板树,Mx表示原日志模板库;如果Logs的最高值大于或者等于阈值,则将该输入的日志分到第x类;否则,为x创建新的模板库。Among them, x represents thexth type template,Nx represents the newly added log template tree, and Mx represents the original log template library; if the highest value of Logs is greater than or equal to the threshold, the input log will be classified into the xth type; otherwise , to create a new template library for x.
进一步,步骤S2中,选择时间窗口与创建特征值,具体包括:物联网终端评测平台短时间内会产生大量数据,确定日志块提取的时间窗,对平台发生的事件所产生的不同种类的日志,需要按照时间窗中对应的时间戳进行排序,并将相同时间戳的日志合并;将日志条目、周期性、平均发生时间、频率等这些可以在多方面表征事件的特征计算出来,作为向量来表征事件。Further, in step S2, selecting a time window and creating a feature value, specifically including: the IoT terminal evaluation platform will generate a large amount of data in a short time, determining the time window for log block extraction, and different types of logs generated by events that occur on the platform. , it is necessary to sort according to the corresponding timestamps in the time window, and merge the logs with the same timestamp; log entries, periodicity, average occurrence time, frequency, etc., which can characterize events in many aspects, are calculated as a vector to Characterize events.
进一步,步骤S2中,基于聚类的模板库生成,具体包括:利用提取特征值后向量化的日志矩阵作为实验数据,采用主成分分析法(PCA),对该实验数据实现降维操作;降维操作在预处理中起着重要作用,在提取到相关特征属性之后,利用PCA降维压缩数据集,减少计算时间。Further, in step S2, generating a template library based on clustering specifically includes: using the log matrix vectorized after extracting eigenvalues as experimental data, and adopting principal component analysis (PCA) to realize a dimensionality reduction operation on the experimental data; Dimensional operation plays an important role in preprocessing. After extracting the relevant feature attributes, PCA is used to reduce the dimensionality to compress the dataset and reduce the computation time.
进一步,步骤S2中,基于LSTM的异常检测模型:经过聚类后得到模板库,使用改进的LSTM模型执行异常检测;使用均方误差(MSE)作为损失函数来描述预测值与真实值之间的差异;改进的LSTM模型是在输入层与隐藏层之间增加嵌入层。Further, in step S2, the LSTM-based anomaly detection model: after clustering, a template library is obtained, and the improved LSTM model is used to perform anomaly detection; the mean square error (MSE) is used as a loss function to describe the difference between the predicted value and the real value. Difference; the improved LSTM model is to add an embedding layer between the input layer and the hidden layer.
2、一种基于物联网终端评测平台的自动测试系统,包括:数据采集模块、日志分析模块、异常检测模块及告警模块;2. An automatic test system based on an IoT terminal evaluation platform, comprising: a data acquisition module, a log analysis module, an anomaly detection module and an alarm module;
所述日志分析模块对采集到的物联网终端评测平台产生的日志数据进行预处理,包括模板提取、特征提取、PCA降维和聚类四步;原始日志数量大且非结构化,需要对原始日志进行处理,将日志转换为日志模板,从而使日志在数量上锐减,且保证了基本语义。通过日志分析模板提取模型获得的日志模板和原始日志做映射表,并提取能表征事件的特征值,形成特征向量;将带检测日志进行向量化处理,做聚类分析;The log analysis module preprocesses the collected log data generated by the IoT terminal evaluation platform, including four steps of template extraction, feature extraction, PCA dimensionality reduction and clustering; the number of original logs is large and unstructured, and it is necessary to Process and convert logs into log templates, so that the number of logs is drastically reduced and the basic semantics are guaranteed. The log template obtained by the log analysis template extraction model and the original log are used as a mapping table, and the eigenvalues that can characterize the event are extracted to form a eigenvector; the detection log is vectorized and analyzed for clustering;
所述异常检测模块是将处理后的适合安全分析的日志数据进行异常检测,首先基于大量的正常日志构建事件库,计算待检测日志与正常日志模板库的异常度来检测是否为异常日志;若为异常日志,先判断是否为已知异常事件以及种类,否则将待检测日志作为新类型异常攻击加入异常模板库中;The abnormality detection module performs abnormality detection on the processed log data suitable for security analysis. First, an event library is constructed based on a large number of normal logs, and the abnormality degree between the log to be detected and the normal log template library is calculated to detect whether it is an abnormal log; For abnormal logs, first determine whether it is a known abnormal event and its type, otherwise, add the log to be detected as a new type of abnormal attack to the exception template library;
所述告警模块是对异常分析之后的日志事件进行统计分析,并以图表的形式更加直观的展示出来。The alarm module performs statistical analysis on log events after abnormal analysis, and displays them more intuitively in the form of graphs.
本发明的有益效果在于:本发明通过对新的日志进行快速提取,不断更新模板库,用K-Means进行聚类,LSTM进行预测,及时检测平台的健康状况,使其更具有产业上的利用价值。本发明能够实时检测已知攻击,判别未知攻击,提升了物联网终端评测平台的安全性测试。The beneficial effects of the present invention are: by rapidly extracting new logs, the present invention continuously updates the template library, uses K-Means for clustering, LSTM for prediction, and timely detects the health status of the platform, making it more industrially applicable value. The present invention can detect known attacks in real time, discriminate unknown attacks, and improve the security test of the Internet of Things terminal evaluation platform.
本发明的其他优点、目标和特征在某种程度上将在随后的说明书中进行阐述,并且在某种程度上,基于对下文的考察研究对本领域技术人员而言将是显而易见的,或者可以从本发明的实践中得到教导。本发明的目标和其他优点可以通过下面的说明书来实现和获得。Other advantages, objects, and features of the present invention will be set forth in the description that follows, and will be apparent to those skilled in the art based on a study of the following, to the extent that is taught in the practice of the present invention. The objectives and other advantages of the present invention may be realized and attained by the following description.
附图说明Description of drawings
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作优选的详细描述,其中:In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be preferably described in detail below with reference to the accompanying drawings, wherein:
图1为本发明自动测试系统的整体框架图;Fig. 1 is the overall frame diagram of the automatic test system of the present invention;
图2为构建日志树原理框图;Figure 2 is a schematic block diagram of building a log tree;
图3为转换后的日志树原理框图;Figure 3 is a block diagram of the converted log tree;
图4为已知事件模板库生成图;Fig. 4 is a known event template library generation diagram;
图5为改进后LSTM模型的网络结构图。Figure 5 is the network structure diagram of the improved LSTM model.
具体实施方式Detailed ways
以下通过特定的具体实例说明本发明的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本发明的精神下进行各种修饰或改变。需要说明的是,以下实施例中所提供的图示仅以示意方式说明本发明的基本构想,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。The embodiments of the present invention are described below through specific specific examples, and those skilled in the art can easily understand other advantages and effects of the present invention from the contents disclosed in this specification. The present invention can also be implemented or applied through other different specific embodiments, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present invention. It should be noted that the drawings provided in the following embodiments are only used to illustrate the basic idea of the present invention in a schematic manner, and the following embodiments and features in the embodiments can be combined with each other without conflict.
请参阅图1~图5,图1为本发明提供的基于物联网终端评测平台的自动测试系统的整体框架图,如图1所示,该系统从下到上依次是数据采集模块、日志分析模块、异常检测模块及告警模块。日志分析模块对采集到的物联网终端评测平台产生的日志数据进行预处理,含模板提取、特征提取、PCA降维、聚类四个步骤。原始日志数量大且非结构化,需要对原始日志进行处理,将日志转换为日志模板,从而使日志在数量上锐减,且保证了基本语义。通过日志分析模板提取模型获得的日志模板和原始日志做映射表,并提取能表征事件的特征值,形成特征向量。将带检测日志进行向量化处理,做聚类分析。Please refer to FIG. 1 to FIG. 5. FIG. 1 is the overall frame diagram of the automatic test system based on the IoT terminal evaluation platform provided by the present invention. As shown in FIG. 1, the system includes a data acquisition module, log analysis module from bottom to top. module, anomaly detection module and alarm module. The log analysis module preprocesses the collected log data generated by the IoT terminal evaluation platform, including four steps: template extraction, feature extraction, PCA dimension reduction, and clustering. The number of original logs is large and unstructured. It is necessary to process the original logs and convert the logs into log templates, so as to reduce the number of logs sharply and ensure the basic semantics. The log template obtained by the log analysis template extraction model and the original log are used as a mapping table, and the feature values that can characterize the event are extracted to form a feature vector. The detection log is vectorized and clustered.
将处理后的适合安全分析的日志数据进行异常检测。异常检测模块是本架构的核心,对于预处理之后的数据,异常检测模块会先基于大量的正常日志构建事件库,计算待检测日志与正常日志模板库的异常度来检测是否为异常日志。若为异常日志,先判断是否为已知异常事件以及种类,否则将待检测日志作为新类型异常攻击加入异常模板库中。告警模块主要是对异常分析之后的日志事件进行统计分析,并以图表的形式更加直观的展示出来。Anomaly detection is performed on the processed log data suitable for security analysis. The anomaly detection module is the core of this architecture. For the preprocessed data, the anomaly detection module will first build an event library based on a large number of normal logs, and calculate the abnormality of the log to be detected and the normal log template library to detect whether it is an abnormal log. If it is an abnormal log, first determine whether it is a known abnormal event and its type, otherwise, add the log to be detected as a new type of abnormal attack to the exception template library. The alarm module mainly performs statistical analysis on log events after abnormal analysis, and displays them more intuitively in the form of charts.
该系统的自动测试方法具体包括以下步骤:The automatic test method of the system specifically includes the following steps:
S1:基于关联关系的模板树提取:利用设定的规则对原始日志进行初步的过滤清洗;对初步清洗后的日志以空格为标志切分,区分参数词和模板词;提取共用的日志模板库,并不断更新模板库。其中,S1: Template tree extraction based on association relationship: Preliminarily filter and clean the original log by using the set rules; segment the log after preliminary cleaning with spaces as symbols to distinguish parameter words and template words; extract the common log template library , and keep updating the template library. in,
1)过滤并切分保存日志:物联网终端评测平台产生的日志用于表示系统中发生的事件,例如用户访问、DoS/DDoS攻击、登录失败等。日志结构包含时间戳、Host、进程ID以及事件描述,日志格式取决于服务类型或供应商,且格式多样,因此其语法和语义也不尽相同,而且日志格式可能会随时更新。1) Filter and segment and save logs: The logs generated by the IoT terminal evaluation platform are used to represent events that occur in the system, such as user access, DoS/DDoS attacks, and login failures. The log structure includes timestamp, host, process ID, and event description. The log format depends on the service type or provider, and the format is diverse, so its syntax and semantics are also different, and the log format may be updated at any time.
本发明提出在线模板提取方法,即使日志模板变化,也可以实现动态更新日志模板库。日志格式虽然多样,但存在固定不变的短文本,其中Message中的内容有许多相关性,以提取出关于ssh的日志信息,如下表1所示。The invention proposes an online template extraction method, which can realize the dynamic update of the log template library even if the log template changes. Although the log format is diverse, there is a fixed short text, and the content in the Message has many correlations to extract the log information about ssh, as shown in Table 1 below.
表1 shh日志信息Table 1 shh log information
从表1中可以发现,关于ssh的日志条目都会包含“ssh”字样,Message中的内容是对网络事件的说明,描述登录成功以及失败原因。Message中可提取出多种子类型,即整体的日志是层次性的,子类型的整合即可表示这一类事件。As can be seen from Table 1, the log entries about ssh will contain the word "ssh", and the content in the Message is the description of the network event, describing the successful login and the reason for the failure. A variety of subtypes can be extracted from Message, that is, the overall log is hierarchical, and the integration of subtypes can represent this type of event.
日志文本中,包含模板词和参数词,模板词即组成了模板。如果要提取模板,可将不断变化的参数词替换成泛指“*”,如下表2所示。对日志文本进行切分,将可变的参数过滤删除。The log text contains template words and parameter words, and the template words constitute the template. If you want to extract the template, you can replace the changing parameter word with "*", as shown in Table 2 below. Divide the log text and filter and delete the variable parameters.
表2过滤原始日志Table 2 Filter raw logs
日志中的每个单词之间都是以空格分隔开的,可利用空格为标识对日志切词,得到的文本词包含参数词和模板词。同一个模板中不同之处就是参数词的不同,参数词存在于相同长度日志消息的相同位置上。因此将切分后的日志信息保存于数组中,可利用数组下标准确区分参数词和模板词。Each word in the log is separated by a space, and the log can be segmented by using the space as an identifier, and the obtained text words include parameter words and template words. The difference in the same template is the parameter word, which exists in the same position in the log message of the same length. Therefore, the segmented log information is stored in an array, and the array subscript can be used to accurately distinguish between parameter words and template words.
2)判别模板词:模板词也是文本,与参数词的区别之处在于出现概率较高。为模板词根据其出现在日志文本特定位置的概率打分,其概率值即作为其得分值,评判标准公式为:2) Distinguish template words: Template words are also text, and the difference from parameter words is that the probability of occurrence is high. The template word is scored according to its probability of appearing in a specific position of the log text, and its probability value is used as its score value. The evaluation standard formula is:
Score(word,p,len)=P(word|p,len)Score(word,p,len)=P(word|p,len)
其中P即单词word出现在整个单词的p位置的概率,将这个概率作为其得分。从定义可以看出,模板词的概率大于参数词。Among them, P is the probability that the word word appears in the p position of the whole word, and this probability is used as its score. As can be seen from the definition, the probability of template words is greater than that of parameter words.
3)构造基于关联关系的模板树:如图2所示,以其中一种类型日志为例构建日志模板树,将系统主机名以及日志类型作为父节点,然后将频繁出现的词作为子节点依次添加,转而到子节点,重复上述过程,直到所有内容均被添加进日志树,该操作根据日志信息本身的联系将日志以更加清晰的形式展示出来。3) Construct a template tree based on association relationship: As shown in Figure 2, take one of the types of logs as an example to build a log template tree, take the system host name and log type as parent nodes, and then take frequently occurring words as child nodes in turn. Add, turn to the child node, and repeat the above process until all content is added to the log tree. This operation displays the log in a clearer form according to the relationship of the log information itself.
由于日志模板在提取过程中会从某个节点开始出现分支,如果分支过多就需要将作为子节点的分支删除,则该条模板截止到这些节点的父节点。如图2中“web1 sshdReceived disconnect from*”,将两条模板合并成一条,所剩下的模板词在模板中只会出现一次,在准确提取模板的同时也减少了匹配时间。Since the log template will start to branch from a certain node during the extraction process, if there are too many branches, the branch that is a child node needs to be deleted, and the template ends at the parent node of these nodes. As shown in Figure 2 "web1 sshdReceived disconnect from*", the two templates are merged into one, and the remaining template words only appear once in the template, which reduces the matching time while accurately extracting the template.
从根节点开始,记录根节点的词频,然后记录每个节点下面的子节点词频,若在同一父节点的同一层子节点中,出现某一词的词频数量和其父节点或祖父节点词频相同,则该方向的日志模板止于此,将该层及其子节点均删除。利用该方法可以将图2中的日志转换成图3所示日志模板,所举日志例子的模板结构如表3所示。Starting from the root node, record the word frequency of the root node, and then record the word frequency of the child nodes under each node. If the word frequency of a word appears in the same layer of child nodes of the same parent node, the word frequency of a word is the same as that of its parent or grandparent node. , the log template in this direction ends here, and the layer and its child nodes are deleted. Using this method, the log in FIG. 2 can be converted into the log template shown in FIG. 3 , and the template structure of the log example is shown in Table 3.
表3日志模板Table 3 Log Templates
4)自主更新模板库:对于新进入的日志,在经过处理之后,和原本模板库中的模板树进行比较。由于在前期的选择模板词时是通过阈值作为标准评选,因此在后期的比较过程中同样需要一个最大近似值,用于表示新进入的模板和模板库的近似程度,表示如下:4) Self-update template library: For newly entered logs, after processing, compare with the template tree in the original template library. Since the selection of template words in the early stage is based on the threshold value, a maximum approximation value is also required in the later comparison process, which is used to indicate the degree of approximation between the newly entered template and the template library, which is expressed as follows:
Logs(N,M)=Nx/MxLogs(N,M)=Nx /Mx
其中,x表示第x类模板,Nx表示新加入的日志模板树,Mx表示原日志模板库。如果Logs的最高值大于或者等于阈值,则将该输入的日志分到第x类。否则,为x创建新的模板库。Among them, x represents the x-th type template, Nx represents the newly added log template tree, and Mx represents the original log template library. If the highest value of Logs is greater than or equal to the threshold, then classify the input log into the xth class. Otherwise, create a new template library for x.
S2:异常检测分析:对于预处理后的数据,先基于大量的正常日志构建事件库,计算待检测日志与正常日志模板库的异常度来检测是否为异常日志;若为异常日志,先判断是否为已知异常事件以及种类,否则将待检测日志作为新类型异常攻击加入异常模板库中。S2: Anomaly detection and analysis: For the preprocessed data, first build an event database based on a large number of normal logs, and calculate the degree of anomaly between the log to be detected and the normal log template library to detect whether it is an abnormal log; if it is an abnormal log, first determine whether it is an abnormal log. The abnormal events and types are known, otherwise, the log to be detected is added to the abnormal template library as a new type of abnormal attack.
首先,对于已经提取的模板,依据关联分析方法,对日志提取特征值;然后,使用K-Means算法进行聚类;最后,使用改进的LSTM模型执行异常检测,对异常分析之后的日志事件进行统计分析,并以图表的形式更加直观地展示出来。其中,First, for the extracted template, according to the association analysis method, extract feature values from the log; then, use the K-Means algorithm for clustering; finally, use the improved LSTM model to perform anomaly detection, and count the log events after anomaly analysis. Analysis, and display it more intuitively in the form of charts. in,
1)选择时间窗口与创建特征值:物联网终端评测平台短时间内会产生大量数据,确定了日志块提取的时间窗,对于平台发生的事件所产生的不同种类的日志,需要按照时间窗中对应的时间戳进行排序,并将相同时间戳的日志合并。将日志条目、周期性、平均发生时间、频率等这些可以在多方面表征事件的特征计算出来,作为向量来表征事件。1) Selecting the time window and creating characteristic values: The IoT terminal evaluation platform will generate a large amount of data in a short period of time, and the time window for log block extraction has been determined. The corresponding timestamps are sorted and logs with the same timestamp are merged. Log entries, periodicity, average occurrence time, frequency, etc., which can characterize events in many ways, are calculated as vectors to represent events.
日志条目:事件发生会在平台产生不同数量的日志条目以及不同位置及格式的日志,可能一个事件只有一行日志,但有的事件会产生多行、不同位置及不同格式的日志块。Log entries: When an event occurs, different numbers of log entries and logs in different locations and formats will be generated on the platform. An event may have only one log line, but some events will generate log blocks with multiple lines, different locations, and different formats.
周期性:有些需要轮询操作的作业会产生周期性日志,比如需要不断轮询查询数据库操作,这些操作通常和系统故障无关,但可以考虑某个时段中的各种粒度。Periodicity: Some jobs that require polling operations will generate periodic logs, such as continuous polling and query database operations. These operations are usually not related to system failures, but various granularities in a certain period of time can be considered.
平均发生时间:事件有自己的发生时间以及平台因此产生的日志条目,根据平均每条日志发生所需要的时间作为事件的一个标志点。Average Occurrence Time: The event has its own occurrence time and log entries generated by the platform. The average time required for each log to occur is used as a marker of the event.
频率:每个日志消息会以不同的周期性出现,结合基于统计的关联规则方法进行判断,如果错误事件发生一次,可以判断为失误导致,但如果相同错误事件发生10次以上,则考虑系统是否发生了异常事件,比如DoS/DDoS攻击、ssh暴力破解事件等。Frequency: Each log message will appear in different periodicities, and it is judged by combining the method of association rules based on statistics. If an error event occurs once, it can be judged as an error, but if the same error event occurs more than 10 times, it is considered whether the system is Abnormal events have occurred, such as DoS/DDoS attacks, ssh brute force cracking events, etc.
消息等级:日志消息是有等级的,表征事件发生的严重程度,等级由低到高为debug、info、warn、error、fatal,通常在日志消息中出现更多的是debug和info。Message level: Log messages are graded, representing the severity of the event. The levels are debug, info, warn, error, and fatal from low to high. Usually, more debug and info appear in log messages.
将这些可以在多方面表征事件的特征计算出来,作为向量来表征事件。These features that can characterize events in multiple ways are calculated as vectors to represent events.
2)基于聚类的模板库生成:异常检测方法利用提取特征值后向量化的日志矩阵作为实验数据,采用主成分分析法(PCA),对该数据实现降维操作。降维操作在预处理中起着重要作用,在提取到相关特征属性之后,利用PCA降维压缩数据集,减少计算时间。2) Generation of template library based on clustering: The anomaly detection method uses the vectorized log matrix after extracting the eigenvalues as the experimental data, and adopts the principal component analysis (PCA) method to realize the dimensionality reduction operation on the data. The dimensionality reduction operation plays an important role in the preprocessing. After the relevant feature attributes are extracted, the PCA dimensionality reduction is used to compress the dataset and reduce the computation time.
K-Means算法是非监督学习中的聚类算法,是一种通过不断迭代选取距离聚类中心均值最近样本点的算法。自动测试方法要求做到主动学习,当检测到一种新的攻击类型时会重新聚类。由于K-Means需要提前指定k值,则重新聚类时k值加1,满足增加事件种类需求。在学习训练过程中需要对K-Means算法流程依据需要做出改变,若某一种攻击类型不属于模板库中已有类型,则将新出现的攻击类型进行标记,以增加攻击模板库。如图4所示,由于在算法输入端需要明确指明k值,在训练过程中,若发现输入端的日志数据代表一个新的攻击事件,则需要更新模板库。此时将再次进行迭代操作,再次调用K-Means算法,但此时的k值增加1,将新出现的事件重新放回学习训练阶段,由此得到模板库的生成过程。The K-Means algorithm is a clustering algorithm in unsupervised learning. It is an algorithm that selects the sample points closest to the mean value of the cluster center through continuous iteration. Automated testing methods require active learning, reclustering when a new attack type is detected. Since K-Means needs to specify the k value in advance, the k value is increased by 1 when re-clustering to meet the needs of increasing the types of events. In the learning and training process, the K-Means algorithm process needs to be changed according to the needs. If a certain attack type does not belong to the existing type in the template library, the new attack type will be marked to increase the attack template library. As shown in Figure 4, since the k value needs to be clearly specified at the input end of the algorithm, during the training process, if the log data at the input end is found to represent a new attack event, the template library needs to be updated. At this time, the iterative operation will be performed again, and the K-Means algorithm will be called again, but the value of k at this time will be increased by 1, and the new events will be put back into the learning and training stage, thereby obtaining the generation process of the template library.
3)基于LSTM的异常检测模型:如图5所示,经过聚类后得到模板库,使用改进的LSTM模型执行异常检测,在普通LSTM网络结构的基础上增加了Embedding层。输入层的维度为n,表示参数向量的维度,输出层的维度同样为n,隐藏层层数为l,用于记忆和储存过去状态的节点个数。每层LSTM单元个数为α,LSTM模型的时间步为h,表示使用的历史日志消息数量。3) LSTM-based anomaly detection model: As shown in Figure 5, the template library is obtained after clustering, and the improved LSTM model is used to perform anomaly detection, and an Embedding layer is added on the basis of the ordinary LSTM network structure. The dimension of the input layer is n, which represents the dimension of the parameter vector, the dimension of the output layer is also n, and the number of hidden layers is l, which is the number of nodes used to memorize and store past states. The number of LSTM units in each layer is α, and the time step of the LSTM model is h, which represents the number of historical log messages used.
使用均方误差(MSE)作为损失函数来描述预测值与真实值之间的差异。MSE一般用来衡量参数估计值和参数真实值之间的差异,对于两个相同长度的向量x与y,其MSE计算公式如下式所示:The difference between predicted and true values is described using mean squared error (MSE) as a loss function. MSE is generally used to measure the difference between the estimated value of the parameter and the real value of the parameter. For two vectors x and y of the same length, the MSE calculation formula is as follows:
其中N为向量的维度。where N is the dimension of the vector.
最后说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本技术方案的宗旨和范围,其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent replacements, without departing from the spirit and scope of the technical solution, should all be included in the scope of the claims of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010916739.2ACN112039907A (en) | 2020-09-03 | 2020-09-03 | Automatic testing method and system based on Internet of things terminal evaluation platform |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010916739.2ACN112039907A (en) | 2020-09-03 | 2020-09-03 | Automatic testing method and system based on Internet of things terminal evaluation platform |
| Publication Number | Publication Date |
|---|---|
| CN112039907Atrue CN112039907A (en) | 2020-12-04 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010916739.2APendingCN112039907A (en) | 2020-09-03 | 2020-09-03 | Automatic testing method and system based on Internet of things terminal evaluation platform |
| Country | Link |
|---|---|
| CN (1) | CN112039907A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112579414A (en)* | 2020-12-08 | 2021-03-30 | 西安邮电大学 | Log abnormity detection method and device |
| CN113760645A (en)* | 2021-03-10 | 2021-12-07 | 京东科技控股股份有限公司 | System operation log monitoring method and device, electronic equipment and storage medium |
| CN114119219A (en)* | 2021-11-02 | 2022-03-01 | 浙江网商银行股份有限公司 | Detection method and device for risk monitoring coverage |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101807961B1 (en)* | 2016-06-07 | 2017-12-11 | 한양대학교 산학협력단 | Method and apparatus for processing speech signal based on lstm and dnn |
| CN109923557A (en)* | 2016-11-03 | 2019-06-21 | 易享信息技术有限公司 | Use continuous regularization training joint multitask neural network model |
| CN110096411A (en)* | 2019-03-22 | 2019-08-06 | 西安电子科技大学 | Log template rapid extracting method and system based on association analysis and time window |
| CN111600905A (en)* | 2020-06-01 | 2020-08-28 | 广州鹄志信息咨询有限公司 | Anomaly detection method based on Internet of things |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101807961B1 (en)* | 2016-06-07 | 2017-12-11 | 한양대학교 산학협력단 | Method and apparatus for processing speech signal based on lstm and dnn |
| CN109923557A (en)* | 2016-11-03 | 2019-06-21 | 易享信息技术有限公司 | Use continuous regularization training joint multitask neural network model |
| CN110096411A (en)* | 2019-03-22 | 2019-08-06 | 西安电子科技大学 | Log template rapid extracting method and system based on association analysis and time window |
| CN111600905A (en)* | 2020-06-01 | 2020-08-28 | 广州鹄志信息咨询有限公司 | Anomaly detection method based on Internet of things |
| Title |
|---|
| 常二慧: ""基于日志分析的物联网平台异常检测方法及系统"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112579414A (en)* | 2020-12-08 | 2021-03-30 | 西安邮电大学 | Log abnormity detection method and device |
| CN113760645A (en)* | 2021-03-10 | 2021-12-07 | 京东科技控股股份有限公司 | System operation log monitoring method and device, electronic equipment and storage medium |
| CN113760645B (en)* | 2021-03-10 | 2024-09-24 | 京东科技控股股份有限公司 | System operation log monitoring method and device, electronic equipment and storage medium |
| CN114119219A (en)* | 2021-11-02 | 2022-03-01 | 浙江网商银行股份有限公司 | Detection method and device for risk monitoring coverage |
| Publication | Publication Date | Title |
|---|---|---|
| CN111612041B (en) | Abnormal user identification method and device, storage medium and electronic equipment | |
| CN107294993B (en) | WEB abnormal traffic monitoring method based on ensemble learning | |
| CN113254255B (en) | A cloud platform log analysis method, system, device and medium | |
| CN111967761B (en) | A monitoring and early warning method, device and electronic equipment based on knowledge graph | |
| US6697802B2 (en) | Systems and methods for pairwise analysis of event data | |
| WO2021016978A1 (en) | Telecommunication network alarm prediction method and system | |
| CN119182607B (en) | A network anomaly detection method, device, model training method and electronic equipment | |
| CN113918367B (en) | A large-scale system log anomaly detection method based on attention mechanism | |
| KR102470364B1 (en) | A method for generating security event traning data and an apparatus for generating security event traning data | |
| CN112039907A (en) | Automatic testing method and system based on Internet of things terminal evaluation platform | |
| CN117675691B (en) | Remote fault monitoring method, device, equipment and storage medium of router | |
| WO2022048668A1 (en) | Knowledge graph construction method and apparatus, check method and storage medium | |
| CN114647558A (en) | A method and device for log anomaly detection | |
| CN111274218A (en) | Multi-source log data processing method for power information system | |
| CN112686521A (en) | Wind control rule tuning method and system | |
| CN113590451A (en) | Root cause positioning method, operation and maintenance server and storage medium | |
| CN117221087A (en) | Alarm root cause positioning method, device and medium | |
| CN115102848A (en) | Log data extraction method, system, device and medium | |
| CN117708736A (en) | A method for detecting terminal data access anomalies | |
| CN120050108A (en) | Abnormal behavior detection method and system based on multi-source log and electronic equipment | |
| CN117792801B (en) | Network security threat identification method and system based on multivariate event analysis | |
| CN115062144A (en) | Log anomaly detection method and system based on knowledge base and integrated learning | |
| Lin et al. | Dcsa: Using density-based clustering and sequential association analysis to predict alarms in telecommunication networks | |
| CN111314109A (en) | Weak key-based large-scale Internet of things equipment firmware identification method | |
| CN114697108A (en) | System log anomaly detection method based on ensemble learning |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | Application publication date:20201204 | |
| RJ01 | Rejection of invention patent application after publication |