Movatterモバイル変換


[0]ホーム

URL:


CN116467438A - Threat information attribution method based on graph attention mechanism - Google Patents

Threat information attribution method based on graph attention mechanism
Download PDF

Info

Publication number
CN116467438A
CN116467438ACN202211459436.8ACN202211459436ACN116467438ACN 116467438 ACN116467438 ACN 116467438ACN 202211459436 ACN202211459436 ACN 202211459436ACN 116467438 ACN116467438 ACN 116467438A
Authority
CN
China
Prior art keywords
threat
threat intelligence
information
graph
report
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211459436.8A
Other languages
Chinese (zh)
Inventor
严寒冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
National Computer Network and Information Security Management Center
Original Assignee
Beihang University
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University, National Computer Network and Information Security Management CenterfiledCriticalBeihang University
Priority to CN202211459436.8ApriorityCriticalpatent/CN116467438A/en
Publication of CN116467438ApublicationCriticalpatent/CN116467438A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于图注意力机制的威胁情报归因方法。本方法步骤如下:1、整理APT攻击组织的别名列表;2、根据别名列表,设计爬虫代码,并结合开源威胁情报存储库,采集公开威胁情报报告;3、自底向上进行APT攻击组织建模,设计威胁情报知识图谱结构;4、将威胁情报报告进行统一格式转化并存储,对非结构化威胁情报进行信息抽取及扩线,并用图数据库存储;5、对节点进行特征向量初始化;6、将异构网络映射为同构网络,并基于图注意力机制训练多分类模型对威胁情报报告节点分类;7、使用分类模型对待测试报告进行分类。通过以上步骤,本发明达到了以较高的准确率识别威胁情报报告类别,能有效为专家研判提供依据,减少安全分析人员压力。

The invention discloses a threat intelligence attribution method based on a graph attention mechanism. The steps of this method are as follows: 1. Organize the alias list of APT attack organizations; 2. Design the crawler code according to the alias list, and combine the open source threat intelligence repository to collect public threat intelligence reports; 3. Carry out APT attack organization modeling from the bottom up, and design the threat intelligence knowledge map structure; 4. Transform and store the threat intelligence report in a unified format, perform information extraction and line expansion on unstructured threat intelligence, and store it in a graph database; 5. Initialize feature vectors for nodes; Train the multi-classification model to classify the threat intelligence report nodes; 7. Use the classification model to classify the report to be tested. Through the above steps, the present invention can identify threat intelligence report categories with high accuracy, can effectively provide basis for expert research and judgment, and reduce the pressure on security analysts.

Description

Translated fromChinese
一种基于图注意力机制的威胁情报归因方法A Threat Intelligence Attribution Method Based on Graph Attention Mechanism

技术领域technical field

本发明属于网络安全技术领域,具体涉及一种基于图注意力机制的威胁情报归因方法。The invention belongs to the technical field of network security, and in particular relates to a threat intelligence attribution method based on a graph attention mechanism.

背景技术Background technique

当前,网络空间安全态势日趋严俊,高级可持续威胁(APT)新型攻击越来越复杂,传统的APT组织识别依赖恶意样本的特征,而被广泛使用的样本和攻击武器无疑给APT组织溯源提出难题。威胁情报具有数据内容丰富、准确性高、可自动化处理等特点,将其应用于APT组织攻击溯源成为当前有效的手段。At present, the security situation in cyberspace is becoming more and more serious, and the new attacks of Advanced Persistent Threats (APT) are becoming more and more complex. Traditional APT organization identification relies on the characteristics of malicious samples, and the widely used samples and attack weapons undoubtedly pose difficulties for APT organizations to trace their origins. Threat intelligence has the characteristics of rich data content, high accuracy, and automatic processing. Applying it to APT organization attack source tracing has become an effective means at present.

由于APT攻击的复杂和隐蔽,人们更多的关注人读的威胁情报,即拥有更多的上下文、背景信息及攻击细节的非结构化APT组织分析报告,该类情报包含了大量的人工分析结果,将其按照攻击组织分类整理起来,可以减少网络安全分析人员的压力,对后续威胁情报分析产生极大好处。目前存在的主要问题是,威胁情报数据分布散乱,情报共享技术有限,各种不同安全厂商会对同一APT组织产生不同的命名方式,导致APT组织别名层出不穷,为攻击溯源带来了巨大挑战。Due to the complexity and concealment of APT attacks, people pay more attention to human-readable threat intelligence, that is, unstructured APT organization analysis reports with more context, background information, and attack details. This type of intelligence contains a large number of manual analysis results. Classifying them according to attack organization can reduce the pressure on network security analysts and bring great benefits to subsequent threat intelligence analysis. The current main problem is that the distribution of threat intelligence data is scattered, intelligence sharing technology is limited, and various security vendors will have different naming methods for the same APT organization, resulting in the emergence of APT organization aliases, which brings great challenges to attack source tracing.

当前网络安全平台主要提供一些简单的技术威胁情报,在信息分析方面,仅仅只给出简单的关联,缺少更深层次的分析。纵观各国的研究,安全领域缺乏一种能将威胁情报分析报告映射到其所属攻击组织的方法。The current network security platform mainly provides some simple technical threat intelligence. In terms of information analysis, it only provides simple correlations and lacks deeper analysis. Looking at the research of various countries, the security field lacks a method that can map threat intelligence analysis reports to their own attack groups.

有鉴于上述现有的技术存在的缺陷,本发明经过不断的研究、设计,并经反复试作及改进后,终于创设出确具实用价值的本发明。In view of the above-mentioned defects in the prior art, the present invention finally creates the present invention with practical value through continuous research, design, and repeated trials and improvements.

发明内容Contents of the invention

本发明的主要目的在于,克服现有的技术存在的缺陷,而提供一种新的一种基于图注意力机制的威胁情报归因方法,所要解决的技术问题是使其通过收集并分析真实的威胁情报数据,自底向上构建威胁情报知识图谱,并基于图注意力机制对非结构化的威胁情报进行组织分类,实现有效的威胁情报分类引擎,非常适于实用。The main purpose of the present invention is to overcome the defects of the existing technology and provide a new threat intelligence attribution method based on graph attention mechanism. The technical problem to be solved is to make it collect and analyze real threat intelligence data, build a threat intelligence knowledge map from the bottom up, and organize and classify unstructured threat intelligence based on the graph attention mechanism to realize an effective threat intelligence classification engine, which is very suitable for practical use.

本发明的另一目的在于,提供一种新的一种基于图注意力机制的威胁情报归因方法,所要解决的技术问题是使其将威胁情报分析报告映射到其所属攻击组织的方法,从而能进行更深层次的分析更加适于实用。Another object of the present invention is to provide a new threat intelligence attribution method based on graph attention mechanism. The technical problem to be solved is to make it map the threat intelligence analysis report to the attack organization to which it belongs, so that deeper analysis is more suitable for practical use.

本发明的还一目的在于,提供一种新的一种基于图注意力机制的威胁情报归因方法,所要解决的技术问题是使其通过结构、属性等维度的学习方法,学习并得到知识图谱关键要素的向量化表示,可用于节点分类、聚类、知识推理等类型的技术实现,从而更加适于实用。Another object of the present invention is to provide a new threat intelligence attribution method based on the graph attention mechanism. The technical problem to be solved is to make it learn and obtain the vectorized representation of the key elements of the knowledge map through the learning method of the structure, attribute and other dimensions, which can be used for node classification, clustering, knowledge reasoning and other types of technical realization, so that it is more suitable for practical use.

本发明的再一目的在于,提供一种新的一种基于图注意力机制的威胁情报归因方法,所要解决的技术问题是维护APT组织别名库,并通过自动化对威胁情报报告分类,而不断更新别名列表,从而实现对未知攻击组织的实时监测,从而更加适于实用,且具有产业上的利用价值。Another object of the present invention is to provide a new threat intelligence attribution method based on the graph attention mechanism. The technical problem to be solved is to maintain the APT organization alias library, and to automatically classify the threat intelligence report, and continuously update the alias list, so as to realize real-time monitoring of unknown attacking organizations, which is more suitable for practical use, and has industrial utilization value.

本发明的目的及解决其技术问题是采用以下技术方案来实现的。依据本发明提出的一种基于图注意力机制的威胁情报归因方法,其步骤如下:The purpose of the present invention and the solution to its technical problems are achieved by adopting the following technical solutions. According to a threat intelligence attribution method based on a graph attention mechanism proposed by the present invention, the steps are as follows:

步骤101:参考国际常用的APT组织命名方式,并结合各国安全厂商对APT组织的中文命名方式,整理APT攻击组织的别名列表;Step 101: Refer to the commonly used international naming methods of APT organizations, and combine the Chinese naming methods of APT organizations by security vendors in various countries to organize the alias list of APT attack organizations;

步骤102:根据步骤101的别名列表,设计爬虫代码,采集各国安全厂商的公开威胁情报报告,并结合开源威胁情报存储库,提取所关注APT组织的威胁情报报告;Step 102: According to the alias list in step 101, design the crawler code, collect the public threat intelligence reports of security vendors in various countries, and combine the open source threat intelligence repository to extract the threat intelligence reports of the concerned APT organizations;

步骤103:参考国内外各大安权厂商对威胁情报本体图的设计方案,并结合真实数据分析经验,进行APT攻击组织建模,设计威胁情报知识图谱结构;Step 103: Refer to the design schemes of threat intelligence ontology graphs by major security vendors at home and abroad, and combine the experience of real data analysis to model APT attack organizations and design the structure of threat intelligence knowledge graphs;

步骤104:将步骤102得到的威胁情报报告进行统一格式转化并存储,根据步骤103得到的威胁情报知识图谱结构,对非结构化威胁情报进行信息抽取及扩线,并将得到的实体关系及属性用图数据库存储;Step 104: Convert and store the threat intelligence report obtained in step 102 in a unified format, perform information extraction and line expansion on the unstructured threat intelligence according to the threat intelligence knowledge map structure obtained in step 103, and store the obtained entity relationships and attributes in a graph database;

步骤105:在步骤104后,得到威胁情报异构网络图,在此基础上进行特征向量初始化,旨在尽可能保留不同的节点的属性信息,使其向量表示具有实际意义;Step 105: After step 104, obtain the threat intelligence heterogeneous network diagram, and then initialize the feature vector on this basis, aiming to retain the attribute information of different nodes as much as possible, so that the vector representation has practical significance;

步骤106:将异构网路图映射为同构图,并基于图注意力网络(GAT)结合交叉熵损失函数及梯度下降法训练节点分类模型;Step 106: Map the heterogeneous network graph into an isomorphic graph, and train the node classification model based on the graph attention network (GAT) combined with the cross-entropy loss function and the gradient descent method;

步骤107:使用步骤106分类模型对待测试报告进行分类。Step 107: Use the classification model of step 106 to classify the report to be tested.

通过以上步骤,本发明实现了一种基于图注意力机制的的威胁情报归因方法,达到了以较高的准确率识别威胁情报报告类别,弥补了现有研究缺乏一种能将威胁情报分析报告映射到其所属攻击组织的方法,能有效为专家研判提供依据,减少安全分析人员压力。Through the above steps, the present invention realizes a threat intelligence attribution method based on the graph attention mechanism, achieves the identification of threat intelligence report categories with a high accuracy rate, and makes up for the lack of a method in the existing research that can map threat intelligence analysis reports to their attack organizations, which can effectively provide a basis for expert research and judgment, and reduce the pressure on security analysts.

进一步,在步骤102中所述的“威胁情报报告”,在一定程度保证数据的及时和有效性,同时为避免组织归因不明,过滤掉没有明确归属威胁情报报告。Further, in the "threat intelligence report" described in step 102, the timeliness and validity of the data are guaranteed to a certain extent, and at the same time, in order to avoid unclear organizational attribution, threat intelligence reports that do not have a clear attribution are filtered out.

进一步,在步骤103中所述的APT攻击组织建模是指对步骤102收集到的APT组织分析报告进行深入挖掘,设计威胁情报实体从多角度对APT组织进行刻画,共分为两类情报,9类实体,9类实体包括:Further, the modeling of the APT attack organization described in step 103 refers to the in-depth mining of the APT organization analysis report collected in step 102, and the design of threat intelligence entities to describe the APT organization from multiple perspectives, which are divided into two types of intelligence, 9 types of entities, and the 9 types of entities include:

战术情报:Techniques(技术)、Tactics(战术);Tactical intelligence: Techniques (technology), Tactics (tactics);

技术情报:IP、Domain(域名)、Malware(恶意代码)、URL(统一资源定位符)、CVE(漏洞编号)、Register(注册表)、FilePath(主机路径)。Technical information: IP, Domain (domain name), Malware (malicious code), URL (uniform resource locator), CVE (vulnerability number), Register (registry), FilePath (host path).

在步骤103中所述的设计威胁情报知识图谱结构是参考国内外各大威胁情报平台,并结合真实攻击场景,对威胁实体进行重要属性挖掘和关系分析,最终建立具有实用性的威胁情报知识图谱。The design of the threat intelligence knowledge graph structure described in step 103 refers to major threat intelligence platforms at home and abroad, combined with real attack scenarios, to carry out important attribute mining and relationship analysis of threat entities, and finally establish a practical threat intelligence knowledge graph.

进一步,在步骤104中所述的“统一格式转化并存储”是将收集到的多样报告格式,同一转换为文本格式,以便后续信息抽取,且将原始文件和转换后的文件路径分别存储。Further, the "unified format conversion and storage" described in step 104 is to convert the collected various report formats into a text format for subsequent information extraction, and store the original file and the converted file path separately.

进一步,在步骤105中所述的“特征向量初始化”旨在使威胁情报实体尽可能保留其语义特征,但在威胁情报知识图谱中,实体节点往往不具有实际含义,通常为数字和标点的组合,这些实体的向量初始化往往为随机初始化,这极大降低了其表示能力,通过融合实体属性信息,来增强其表示能力,具体做法如下:Further, the "feature vector initialization" described in step 105 aims to make the threat intelligence entity retain its semantic features as much as possible. However, in the threat intelligence knowledge graph, entity nodes often have no actual meaning, and are usually a combination of numbers and punctuation. The vector initialization of these entities is often random initialization, which greatly reduces its expressive ability. By fusing entity attribute information, its expressive ability is enhanced. The specific methods are as follows:

步骤105-1:根据步骤104,IP、Domain、Malware、Techniques、Tactics五类实体具有丰富属性,通过word2vec生成实体属性向量,进一步取平均值得到实体向量表示;Step 105-1: According to step 104, IP, Domain, Malware, Techniques, and Tactics five types of entities have rich attributes, and the entity attribute vector is generated by word2vec, and the average value is further obtained to obtain the entity vector representation;

步骤105-2:URL、CVE、Register、FilePath四类实体,根据one-hot编码方式生成随机向量;Step 105-2: Four types of entities, URL, CVE, Register, and FilePath, generate random vectors according to the one-hot encoding method;

步骤105-3:Reoprt实体向量表示与其直接相连的实体向量加和。Step 105-3: Reoprt entity vector represents the sum of entity vectors directly connected to it.

进一步,在步骤106中所述的“将异构图映射为同构图”,是将不同威胁情报报告通过共同关联的威胁要素建立邻居关系,最终建立只有威胁情报报告节点的同构信息网络。Further, the "mapping heterogeneous graphs into homogeneous graphs" described in step 106 is to establish neighbor relationships between different threat intelligence reports through commonly associated threat elements, and finally establish a homogeneous information network with only threat intelligence reporting nodes.

其中,在步骤106中所述的“基于图注意力机制,”是在得到威胁情报报告同构网络上训练同构图注意力模型,得到节点向量表示后,训练交叉熵损失函数进行节点分类。Among them, the "graph-based attention mechanism" described in step 106 is to train the isomorphic graph attention model on the isomorphic network obtained from the threat intelligence report, and after obtaining the node vector representation, train the cross-entropy loss function to classify the nodes.

进一步,将测试集的报告提取关系和向量表示,投入步骤106训练得到的模型中,输出待测试样本是否为其所属组织类别。Further, the report extraction relationship and vector representation of the test set are put into the model trained in step 106 to output whether the sample to be tested belongs to its organizational category.

进一步,在步骤104中所述的“信息抽取及扩线”的具体步骤入下:Further, the specific steps of "information extraction and line expansion" described in step 104 are as follows:

步骤104-1:设计正则表达式对有规则的威胁情报实体进行提取;Step 104-1: designing regular expressions to extract regular threat intelligence entities;

步骤104-2:使用国外著名安全公司开发的先进的技术分析工具,将句子映射到ATT&CK中的TTPs项,并对置信度为100%的实体进行提取;Step 104-2: Using advanced technical analysis tools developed by famous foreign security companies, sentences are mapped to TTPs items in ATT&CK, and entities with a confidence level of 100% are extracted;

步骤104-3:通过爬取国外知名厂商的数据,对提取出来的实体进行属性丰富和关系建立,并对与威胁情报实体产生直接联系的其他情报数据纳入知识图谱,以达到信息扩线;Step 104-3: By crawling the data of well-known foreign manufacturers, enrich the attributes of the extracted entities and establish relationships, and incorporate other intelligence data directly related to threat intelligence entities into the knowledge map to achieve information expansion;

步骤104-4:进一步利用静态特征分析,挖掘恶意代码同源关系。Step 104-4: Further use static feature analysis to mine malicious code homology.

借由上述技术方案,本发明至少具有下列优点:By virtue of the above technical solutions, the present invention has at least the following advantages:

1、网络环境本身具有典型的图结构,威胁情报描述网络空间中安全实体及复杂关联关系,本发明将威胁情报和知识图谱技术结合,可以将分散的威胁情报信息融合。图表示学习是知识图谱推理的重要方法,通过结构、属性等维度的学习方法,学习并得到知识图谱关键要素的向量化表示,可用于节点分类、聚类、知识推理等类型的技术实现。1. The network environment itself has a typical graph structure. Threat intelligence describes security entities and complex correlations in cyberspace. The present invention combines threat intelligence and knowledge graph technology to fuse scattered threat intelligence information. Graph representation learning is an important method of knowledge graph reasoning. Through learning methods in dimensions such as structure and attributes, vectorized representations of key elements of knowledge graphs can be learned and obtained, which can be used for technical implementations such as node classification, clustering, and knowledge reasoning.

2、借由上述技术方案,本发明一种威胁情报归因方法,通过收集大量数据,对真实数据分析建模,对重要实体进行特征挖掘和关系分析,构建实用化的威胁情报知识图谱,并在此基础上训练图神经网络对威胁情报分析报告进行分类,达到了以较高的准确率识别威胁情报报告类别,能有效为专家研判提供依据,缩短溯源周期。2. Based on the above technical solution, the present invention provides a threat intelligence attribution method. By collecting a large amount of data, analyzing and modeling real data, performing feature mining and relationship analysis on important entities, a practical threat intelligence knowledge graph is constructed, and on this basis, the graph neural network is trained to classify threat intelligence analysis reports, which achieves high accuracy in identifying threat intelligence report categories, effectively providing a basis for expert research and judgment, and shortening the traceability cycle.

3、本发明实现了一种基于图注意力机制的的威胁情报归因方法,达到了以较高的准确率识别威胁情报报告类别,弥补了现有研究缺乏一种能将威胁情报分析报告映射到其所属攻击组织的方法,能有效为专家研判提供依据,减少安全分析人员压力。3. The present invention implements a threat intelligence attribution method based on the graph attention mechanism, which can identify threat intelligence report categories with high accuracy, and makes up for the lack of a method in existing research that can map threat intelligence analysis reports to their own attacking organizations. It can effectively provide a basis for expert research and judgment, and reduce the pressure on security analysts.

4、本发明从多角度对APT攻击组织建模,通过正则匹配提取9类有规则的实体,并辅以TRAM工具将自然语言映射为ATT&CK中的TTPs项,不仅避免了训练神经网络进行实体识别带来的误差,同时结合少量人工核查,在最大程度保证实体抽取的准确率。4. The present invention models APT attack organizations from multiple angles, extracts 9 types of regular entities through regular matching, and uses TRAM tools to map natural language into TTPs items in ATT&CK, which not only avoids errors caused by training neural networks for entity recognition, but also combines a small amount of manual verification to ensure the accuracy of entity extraction to the greatest extent.

5、借由上述技术方案,本发明引入第三方数据库,对实体进行关系挖掘和属性丰富,进一步进行信息扩线,使得所建立的威胁情报知识库具有一定的健壮性。同时对恶意代码进行高级静态特征相似度分析,从而建立APT组织攻击武器之间的联系,极大提高了APT组织的关联性,为后续威胁情报报告分类做出贡献。5. With the above-mentioned technical solution, the present invention introduces a third-party database, conducts relationship mining and attribute enrichment for entities, and further expands information lines, so that the established threat intelligence knowledge base has certain robustness. At the same time, advanced static feature similarity analysis is carried out on malicious codes to establish the connection between APT organization attack weapons, which greatly improves the relevance of APT organization and contributes to the classification of subsequent threat intelligence reports.

本发明的具体方法由以下实施例及其附图详细给出。The specific method of the present invention is given in detail by the following examples and accompanying drawings.

附图说明Description of drawings

图1是本发明所述方法系统示意图。Fig. 1 is a schematic diagram of the method system of the present invention.

本发明旨在实现面向已知威胁情报来判断未知威胁情报类别的功能,主要包括非结构化威胁情报数据上传模块、非结构化威胁情报信息抽取与特征扩线模块、威胁情报知识图谱存储模块和威胁情报分类模块。The present invention aims to realize the function of judging the category of unknown threat intelligence based on known threat intelligence, and mainly includes an unstructured threat intelligence data upload module, an unstructured threat intelligence information extraction and feature expansion module, a threat intelligence knowledge graph storage module and a threat intelligence classification module.

图2是本发明所构建的威胁情报知识图谱结构图。Fig. 2 is a structural diagram of the threat intelligence knowledge map constructed by the present invention.

该结构以刻画APT攻击组织特征为目标,对所提取的威胁情报各类实体的属性及关联关系进行分析设计,最终得到了10类实体,15条有向边。The structure aims to describe the characteristics of APT attacking organizations, and analyzes and designs the attributes and association relationships of various entities of the extracted threat intelligence. Finally, 10 types of entities and 15 directed edges are obtained.

具体实施方式Detailed ways

为更进一步阐述本发明为达成预定发明目的所采取的技术手段及功效,以下结合附图及较佳实施例,对依据本发明提出的一种基于图注意力机制的威胁情报归因方法,其具体实施步骤如下:In order to further explain the technical means and effects of the present invention to achieve the intended purpose of the invention, the specific implementation steps of a threat intelligence attribution method based on the graph attention mechanism proposed according to the present invention are as follows in combination with the accompanying drawings and preferred embodiments:

步骤101:参考国际常用的APT组织命名方式,并结合各国安全厂商对APT组织的中文命名方式,整理APT攻击组织的别名列表;Step 101: Refer to the commonly used international naming methods of APT organizations, and combine the Chinese naming methods of APT organizations by security vendors in various countries to organize the alias list of APT attack organizations;

步骤102:根据步骤101的别名列表,设计爬虫代码,采集各国安全厂商的公开威胁情报报告,并结合开源威胁情报存储库,提取所关注APT组织的威胁情报报告;Step 102: According to the alias list in step 101, design the crawler code, collect the public threat intelligence reports of security vendors in various countries, and combine the open source threat intelligence repository to extract the threat intelligence reports of the concerned APT organizations;

步骤103:参考国内外各大安权厂商对威胁情报本体图的设计方案,并结合真实数据分析经验,进行APT攻击组织建模,设计威胁情报知识图谱结构;Step 103: Refer to the design schemes of threat intelligence ontology graphs by major security vendors at home and abroad, and combine the experience of real data analysis to model APT attack organizations and design the structure of threat intelligence knowledge graphs;

步骤104:将步骤102得到的威胁情报报告进行统一格式转化并存储,根据步骤103得到的威胁情报知识图谱结构,对非结构化威胁情报进行信息抽取及扩线,并将得到的实体关系及属性用图数据库存储;所述的“信息抽取及扩线”的具体步骤入下:Step 104: Convert and store the threat intelligence report obtained in step 102 in a unified format, perform information extraction and line expansion on the unstructured threat intelligence according to the threat intelligence knowledge map structure obtained in step 103, and store the obtained entity relationship and attributes in a graph database; the specific steps of the "information extraction and line expansion" are as follows:

步骤104-1:设计正则表达式对有规则的威胁情报实体进行提取;Step 104-1: designing regular expressions to extract regular threat intelligence entities;

步骤104-2:使用国外著名安全公司开发的先进的技术分析工具,将句子映射到ATT&CK中的TTPs项,并对置信度为100%的实体进行提取;Step 104-2: Using advanced technical analysis tools developed by famous foreign security companies, sentences are mapped to TTPs items in ATT&CK, and entities with a confidence level of 100% are extracted;

步骤104-3:通过爬取国外知名厂商的数据,对提取出来的实体进行属性丰富和关系建立,并对与威胁情报实体产生直接联系的其他情报数据纳入知识图谱,以达到信息扩线;Step 104-3: By crawling the data of well-known foreign manufacturers, enrich the attributes of the extracted entities and establish relationships, and incorporate other intelligence data directly related to threat intelligence entities into the knowledge map to achieve information expansion;

步骤104-4:进一步利用静态特征分析,挖掘恶意代码同源关系。Step 104-4: Further use static feature analysis to mine malicious code homology.

步骤105:在步骤104后,得到威胁情报异构网络图,在此基础上进行特征向量初始化,旨在尽可能保留不同的节点的属性信息,使其向量表示具有实际意义;Step 105: After step 104, obtain the threat intelligence heterogeneous network diagram, and then initialize the feature vector on this basis, aiming to retain the attribute information of different nodes as much as possible, so that the vector representation has practical significance;

步骤106:将异构网路图映射为同构图,并基于图注意力网络(GAT)结合交叉熵损失函数及梯度下降法训练节点分类模型;Step 106: Map the heterogeneous network graph into an isomorphic graph, and train the node classification model based on the graph attention network (GAT) combined with the cross-entropy loss function and the gradient descent method;

步骤107:使用步骤106分类模型对待测试报告进行分类。Step 107: Use the classification model of step 106 to classify the report to be tested.

通过以上步骤,本发明实现了一种基于图注意力机制的的威胁情报归因方法,达到了以较高的准确率识别威胁情报报告类别,弥补了现有研究缺乏一种能将威胁情报分析报告映射到其所属攻击组织的方法,能有效为专家研判提供依据,减少安全分析人员压力。Through the above steps, the present invention realizes a threat intelligence attribution method based on the graph attention mechanism, achieves the identification of threat intelligence report categories with a high accuracy rate, and makes up for the lack of a method in the existing research that can map threat intelligence analysis reports to their attack organizations, which can effectively provide a basis for expert research and judgment, and reduce the pressure on security analysts.

具体地,在步骤102中所述的“威胁情报报告”,在一定程度保证数据的及时和有效性,同时为避免组织归因不明,过滤掉没有明确归属威胁情报报告。Specifically, the "threat intelligence report" described in step 102 ensures the timeliness and validity of the data to a certain extent, and at the same time filters out threat intelligence reports that do not have a clear attribution in order to avoid unclear organizational attribution.

具体地,在步骤103中所述的APT攻击组织建模是指对步骤102收集到的APT组织分析报告进行深入挖掘,设计威胁情报实体从多角度对APT组织进行刻画,共分为两类情报,9类实体,9类实体包括:Specifically, the APT attack organization modeling described in step 103 refers to deeply digging the APT organization analysis report collected in step 102, and designing threat intelligence entities to describe the APT organization from multiple perspectives, which are divided into two types of intelligence, 9 types of entities, and 9 types of entities include:

战术情报:Techniques(技术)、Tactics(战术);Tactical intelligence: Techniques (technology), Tactics (tactics);

技术情报:IP、Domain(域名)、Malware(恶意代码)、URL(统一资源定位符)、CVE(漏洞编号)、Register(注册表)、FilePath(主机路径)。Technical information: IP, Domain (domain name), Malware (malicious code), URL (uniform resource locator), CVE (vulnerability number), Register (registry), FilePath (host path).

在步骤103中所述的设计威胁情报知识图谱结构是参考国内外各大威胁情报平台,并结合真实攻击场景,对威胁实体进行重要属性挖掘和关系分析,最终建立具有实用性的威胁情报知识图谱。The design of the threat intelligence knowledge graph structure described in step 103 refers to major threat intelligence platforms at home and abroad, combined with real attack scenarios, to carry out important attribute mining and relationship analysis of threat entities, and finally establish a practical threat intelligence knowledge graph.

具体地,在步骤104中所述的“统一格式转化并存储”是将收集到的多样报告格式,同一转换为文本格式,以便后续信息抽取,且将原始文件和转换后的文件路径分别存储。Specifically, the "unified format conversion and storage" described in step 104 is to convert the collected various report formats into a text format for subsequent information extraction, and store the original file and the converted file path separately.

具体地,在步骤105中所述的“特征向量初始化”旨在使威胁情报实体尽可能保留其语义特征,但在威胁情报知识图谱中,实体节点往往不具有实际含义,通常为数字和标点的组合,这些实体的向量初始化往往为随机初始化,这极大降低了其表示能力,通过融合实体属性信息,来增强其表示能力,具体做法如下:Specifically, the "feature vector initialization" described in step 105 aims to make the threat intelligence entity retain its semantic features as much as possible. However, in the threat intelligence knowledge graph, entity nodes often have no actual meaning, and are usually a combination of numbers and punctuation. The vector initialization of these entities is often random initialization, which greatly reduces their expressiveness. Entity attribute information is fused to enhance its expressiveness. The specific method is as follows:

步骤105-1:根据步骤104,IP、Domain、Malware、Techniques、Tactics五类实体具有丰富属性,通过word2vec生成实体属性向量,进一步取平均值得到实体向量表示;Step 105-1: According to step 104, IP, Domain, Malware, Techniques, and Tactics five types of entities have rich attributes, and the entity attribute vector is generated by word2vec, and the average value is further obtained to obtain the entity vector representation;

步骤105-2:URL、CVE、Register、FilePath四类实体,根据one-hot编码方式生成随机向量;Step 105-2: Four types of entities, URL, CVE, Register, and FilePath, generate random vectors according to the one-hot encoding method;

步骤105-3:Reoprt实体向量表示与其直接相连的实体向量加和。Step 105-3: Reoprt entity vector represents the sum of entity vectors directly connected to it.

具体地,在步骤106中所述的“将异构图映射为同构图”,是将不同威胁情报报告通过共同关联的威胁要素建立邻居关系,最终建立只有威胁情报报告节点的同构信息网络。Specifically, the "mapping heterogeneous graphs into homogeneous graphs" described in step 106 is to establish neighbor relationships between different threat intelligence reports through commonly associated threat elements, and finally establish a homogeneous information network with only threat intelligence reporting nodes.

其中,在步骤106中所述的“基于图注意力机制,”是在得到威胁情报报告同构网络上训练同构图注意力模型,得到节点向量表示后,训练交叉熵损失函数进行节点分类。Among them, the "graph-based attention mechanism" described in step 106 is to train the isomorphic graph attention model on the isomorphic network obtained from the threat intelligence report, and after obtaining the node vector representation, train the cross-entropy loss function to classify the nodes.

具体地,将测试集的报告提取关系和向量表示,投入步骤106训练得到的模型中,输出待测试样本是否为其所属组织类别。Specifically, the report extraction relationship and vector representation of the test set are put into the model trained in step 106 to output whether the sample to be tested belongs to the organization category it belongs to.

具体地,在步骤104中所述的“信息抽取及扩线”的具体步骤入下:Specifically, the specific steps of "information extraction and line expansion" described in step 104 are as follows:

步骤104-1:设计正则表达式对有规则的威胁情报实体进行提取;Step 104-1: designing regular expressions to extract regular threat intelligence entities;

步骤104-2:使用国外著名安全公司开发的先进的技术分析工具,将句子映射到ATT&CK中的TTPs项,并对置信度为100%的实体进行提取;Step 104-2: Using advanced technical analysis tools developed by famous foreign security companies, sentences are mapped to TTPs items in ATT&CK, and entities with a confidence level of 100% are extracted;

步骤104-3:通过爬取国外知名厂商的数据,对提取出来的实体进行属性丰富和关系建立,并对与威胁情报实体产生直接联系的其他情报数据纳入知识图谱,以达到信息扩线;Step 104-3: By crawling the data of well-known foreign manufacturers, enrich the attributes of the extracted entities and establish relationships, and incorporate other intelligence data directly related to threat intelligence entities into the knowledge map to achieve information expansion;

步骤104-4:进一步利用静态特征分析,挖掘恶意代码同源关系。Step 104-4: Further use static feature analysis to mine malicious code homology.

请参阅图1和图2所示,本发明较佳实施例的一种基于图注意力机制的威胁情报归因方法,其主要包括以下步骤:Please refer to Fig. 1 and Fig. 2, a kind of threat intelligence attribution method based on graph attention mechanism in a preferred embodiment of the present invention mainly includes the following steps:

为了使本发明方法的目的,技术方案的描述更清晰,以下对具体实施方式做进一步的详细说明。In order to make the purpose of the method of the present invention and the description of the technical solution clearer, the specific implementation will be further described in detail below.

步骤S101:参考国际常用的APT攻击组织的英文命名方式,并结合国内安全厂商对APT组织的中文命名方式,整理关于22个APT攻击组织的别名列表。Step S101: Refer to the English naming method of commonly used international APT attack organizations, and combine the Chinese naming methods of APT organizations by domestic security vendors to sort out the alias list of 22 APT attack organizations.

步骤S102:根据步骤101的别名列表,设计爬虫代码,收集多源威胁情报报告。为在一定程度保证数据的及时和有效,主要关注2015年以后产生的威胁情报分析报告。这些报告被专家进行人工过滤,具有明确唯一的组织归属,且包含丰富的IOC信息。Step S102: According to the alias list in step 101, design a crawler code, and collect multi-source threat intelligence reports. In order to ensure the timely and effective data to a certain extent, we mainly focus on the threat intelligence analysis reports generated after 2015. These reports are manually filtered by experts, have a clear and unique organizational attribution, and contain rich IOC information.

步骤S103:结合真实数据分析经验,进行APT攻击组织建模,从多角度对APT组织进行刻画,具体分为9类特征,两大类情报。其中技术情报包括IP、Domain、Malware、URL、CVE、Register、FilePath,战术情报包括Techniques、Tactics。并以IP、Domain和Malware为重要节点,分析三者之间的关联关系,及恶意代码的同源关系,设计有一定实用性的威胁情报知识图谱结构如图1、2所示。Step S103: Combining real data analysis experience, conduct APT attack organization modeling, and describe APT organization from multiple perspectives, specifically divided into 9 types of characteristics and two types of intelligence. Among them, technical intelligence includes IP, Domain, Malware, URL, CVE, Register, and FilePath, and tactical intelligence includes Techniques and Tactics. And take IP, Domain and Malware as important nodes, analyze the relationship between the three, and the homologous relationship of malicious code, and design a practical threat intelligence knowledge graph structure as shown in Figure 1 and Figure 2.

步骤S104:结合步骤S102和步骤S103,对收集的1171篇威胁情报报告进行信息抽取,得到具有12310个实体,16678条边的威胁知识图谱。Step S104: Combining steps S102 and S103, perform information extraction on the collected 1171 threat intelligence reports to obtain a threat knowledge graph with 12310 entities and 16678 edges.

步骤S105:结合实体属性,为每一个实体节点生成128维的初始向量Step S105: Combining entity attributes, generate a 128-dimensional initial vector for each entity node

步骤S106:对威胁情报报告同构信息网络使用同构图注意力机制。图神经网络的核心是消息传递机制,通过聚合邻居特征信息来学习目标节点特征向量。图注意力网络首先通过线性变化W将输入特征向量转换为高级特征,其中N为节点个数,F为节点特征的维数,图注意力网络首先通过线性变化W将其转换为高级特征,不同邻居节点的重要性不同,对目标节点特征表示的贡献也不同。GAT通过计算注意力权重来区分邻居节点的重要程度,并聚合按不同注意力权重缩放后的邻居节点特征表示。权重计算如公式(1)(2)(3)所示:Step S106: using an isomorphic graph attention mechanism for the threat intelligence report isomorphic information network. The core of the graph neural network is the message passing mechanism, which learns the target node feature vector by aggregating neighbor feature information. The graph attention network first transforms the input feature vector by linearly changing W Convert to high-level features, where N is the number of nodes, F is the dimension of node features, the graph attention network first converts it into high-level features by linearly changing W, the importance of different neighbor nodes is different, and the contribution to the feature representation of the target node is also different. GAT distinguishes the importance of neighbor nodes by calculating attention weights, and aggregates the feature representations of neighbor nodes scaled by different attention weights. The weight calculation is shown in formula (1)(2)(3):

其中αij表示节点j对节点i的重要程度,W为线性变换,Hj为输入初始向量,σ为非线性激活函数。Where αij represents the importance of node j to node i, W is a linear transformation, Hj is an input initial vector, and σ is a nonlinear activation function.

eij=Learky ReLU(αT[HiW,HjW]) (3)eij =Learky ReLU(αT [Hi W,Hj W]) (3)

其中eij表示节点j对于节点i的重要性。为单层前馈神经网络,然后选取了LearkyReLU做非线性化。最后使用了softmax对中心节点的邻居节点做了归一化,得到了邻居节点的权重。where eij represents the importance of node j to node i. It is a single-layer feed-forward neural network, and LearkyReLU is selected for nonlinearization. Finally, softmax is used to normalize the neighbor nodes of the central node, and the weight of the neighbor nodes is obtained.

同时,GAT可以使用多头注意力机制从不同方面学习节点特征表示,从而获得目标节点的多个不同的特征表示,最终采用均值的方式对多头注意力机制进行输出,如公式4所示,K为设置的多头数:At the same time, GAT can use the multi-head attention mechanism to learn node feature representations from different aspects, so as to obtain multiple different feature representations of the target node, and finally use the mean value to output the multi-head attention mechanism, as shown in formula 4, K is the number of multi-heads set:

经过注意力机制网络层训练提取特征后,训练交叉熵损失函数,进行节点分类。After the attention mechanism network layer is trained to extract features, the cross-entropy loss function is trained to classify nodes.

步骤S107:取出待测试样本放入分类模型,识别样本是否为所属攻击组织。Step S107: Take out the sample to be tested and put it into the classification model to identify whether the sample belongs to the attacking organization.

以上所述,仅是本发明的较佳实施例而已,并非对本发明作任何形式上的限制,虽然本发明已以较佳实施例揭露如上,然而并非用以限定本发明,任何熟悉本专业的技术人员,在不脱离本发明技术方案范围内,当可利用上述揭示的技术内容作出些许更动或修饰为等同变化的等效实施例,但凡是未脱离本发明技术方案的内容,依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与修饰,均仍属于本发明技术方案的范围内。The above description is only a preferred embodiment of the present invention, and does not limit the present invention in any form. Although the present invention has been disclosed as above with preferred embodiments, it is not intended to limit the present invention. Any skilled person who is familiar with the profession, without departing from the scope of the technical solution of the present invention, can use the technical content disclosed above to make some changes or be modified into equivalent embodiments with equivalent changes. and modifications, all still belong to the scope of the technical solution of the present invention.

Claims (8)

CN202211459436.8A2022-11-212022-11-21Threat information attribution method based on graph attention mechanismPendingCN116467438A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202211459436.8ACN116467438A (en)2022-11-212022-11-21Threat information attribution method based on graph attention mechanism

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202211459436.8ACN116467438A (en)2022-11-212022-11-21Threat information attribution method based on graph attention mechanism

Publications (1)

Publication NumberPublication Date
CN116467438Atrue CN116467438A (en)2023-07-21

Family

ID=87174122

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202211459436.8APendingCN116467438A (en)2022-11-212022-11-21Threat information attribution method based on graph attention mechanism

Country Status (1)

CountryLink
CN (1)CN116467438A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116756327A (en)*2023-08-212023-09-15天际友盟(珠海)科技有限公司Threat information relation extraction method and device based on knowledge inference and electronic equipment
CN116992052A (en)*2023-09-272023-11-03天际友盟(珠海)科技有限公司Long text abstracting method and device for threat information field and electronic equipment
CN118627499A (en)*2024-06-132024-09-10中国人民解放军61660部队 An analysis method based on hybrid threat intelligence data
CN118972110A (en)*2024-07-242024-11-15东南大学 A method and system for early identification and warning of APT attacks based on heterogeneous graphs

Cited By (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116756327A (en)*2023-08-212023-09-15天际友盟(珠海)科技有限公司Threat information relation extraction method and device based on knowledge inference and electronic equipment
CN116756327B (en)*2023-08-212023-11-10天际友盟(珠海)科技有限公司Threat information relation extraction method and device based on knowledge inference and electronic equipment
CN116992052A (en)*2023-09-272023-11-03天际友盟(珠海)科技有限公司Long text abstracting method and device for threat information field and electronic equipment
CN116992052B (en)*2023-09-272023-12-19天际友盟(珠海)科技有限公司Long text abstracting method and device for threat information field and electronic equipment
CN118627499A (en)*2024-06-132024-09-10中国人民解放军61660部队 An analysis method based on hybrid threat intelligence data
CN118627499B (en)*2024-06-132025-06-13中国人民解放军61660部队 An analysis method based on hybrid threat intelligence data
CN118972110A (en)*2024-07-242024-11-15东南大学 A method and system for early identification and warning of APT attacks based on heterogeneous graphs

Similar Documents

PublicationPublication DateTitle
CN117473571B (en)Data information security processing method and system
CN112910929B (en) Method and device for malicious domain name detection based on heterogeneous graph representation learning
CN110889556B (en) A kind of enterprise management risk characteristic data information extraction method and extraction system
Yan et al.Learning URL embedding for malicious website detection
CN116467438A (en)Threat information attribution method based on graph attention mechanism
KR102452123B1 (en)Apparatus for Building Big-data on unstructured Cyber Threat Information, Method for Building and Analyzing Cyber Threat Information
CN107566376B (en)Threat information generation method, device and system
CN111831802B (en) A system and method for detecting urban domain knowledge based on LDA topic model
CN110717049A (en)Text data-oriented threat information knowledge graph construction method
CN114372470B (en) Chinese legal text entity recognition method based on boundary detection and cue learning
CN114860882A (en)Fair competition review auxiliary method based on text classification model
Li et al.NEDetector: Automatically extracting cybersecurity neologisms from hacker forums
CN109858020A (en)A kind of method and system obtaining taxation informatization problem answers based on grapheme
CN115086004B (en)Security event identification method and system based on heterogeneous graph
CN117729003B (en) Threat intelligence trust analysis system and method based on machine learning
CN116186251A (en) A Malicious URL Detection Method Based on Hybrid Binary Neural Tree
CN118503450A (en) A method and system for identifying key nodes of network pollution based on knowledge graph
CN115687760A (en)User learning interest label prediction method based on graph neural network
CN119336920A (en) A knowledge graph information domain quality assessment method and system
CN118916877A (en)Code segment sensitive authentication information detection method based on pre-training model
CN118194133A (en) A method for analyzing online gambling gangs based on knowledge graph
Sun et al.Aptkg: Constructing threat intelligence knowledge graph from open-source apt reports based on deep learning
CN116886327A (en)Malicious domain name detection method and device based on heterogeneous graph self-supervised learning
CN115065556A (en)Log malicious behavior detection method and system based on graph contrast learning
Tang et al.EthGAN: Improving Ethereum Account Classification Accuracy via Data Augmentation

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp