Background
Advanced persistent threat (APT, advanced Persistent Threat) is a complex network attack strategy, unlike traditional attacks, APT is a complex class of attacks initiated by an attacker using a wide range of attack techniques and tools. The APT attacker first destroys a host or a server in a target environment, and then expands the attack range through transverse movement, and internal reconnaissance and data leakage are carried out. It carries out long-term, persistent penetration and attack against specific targets, constituting a serious threat. APT attacks are dangerous not only because of their technical complexity and concealment, but also because they can last months or even years without being perceived, which makes it difficult for traditional network security detection methods to achieve good protection.
Therefore, the security analysis technology of the front edge of the APT attack path reasoning is proposed, and the APT attack path reasoning is a method for tracing and understanding the APT attack in the whole process, namely from the initial reconnaissance stage to the final data leakage and the subsequent trace cleaning process. By analyzing the weblog, the endpoint activity, the traffic pattern and the system behavior, clues left by an attacker in the attacked target network are identified and connected to form a logically coherent attack path. By identifying early signs of APT attacks, such as abnormal network access modes or unauthorized system activities, the APT attack path reasoning is helpful for organizing early warning in advance, timely defending measures are adopted to prevent an attacker from penetrating the network further, the APT attack path reasoning can help a security team to reconstruct a complete attack chain, and understand how the attacker gradually gains control over the network and which links are utilized. The method is crucial for repairing security holes and reinforcing defense systems, and the result of APT attack path reasoning can be converted into threat information to be shared with other organizations, so that the security protection level of the whole community is improved.
The existing APT attack path reasoning technology comprises the steps of extracting rules from known APT attack cases and threat information based on rule reasoning, carrying out reasoning according to data conforming to pattern matching, finally adjusting the rules according to evaluation results, adding new rules or deleting invalid rules to improve accuracy and efficiency of a system, collecting data from network traffic logs and system logs based on a deep learning and neural network method, carrying out preprocessing and feature engineering on the data, dividing the data into a training set, a validation set and a testing set, learning and training the characteristics of POI (Point of Interest, interest points) events by constructing a model, obtaining prediction and classification, and finally carrying out path reasoning by using dynamic path analysis, and carrying out analysis and reasoning on the attack path by defining entities in the network as nodes in the graph and interactions between the entities as edges in the graph, adding attributes for each node and each edge, and carrying out analysis and anomaly detection on the graph.
However, the existing APT attack path reasoning technology still has a certain limitation, the existing technology usually carries out path reasoning based on predefined heuristic rules, the rules are usually formulated according to known attack modes and behaviors, static rules cannot timely cover novel attack modes, if the rules need to be dynamically updated and optimized, a large amount of labor cost is required to be increased, the existing technology lacks consideration of an overall tactical chain, the heuristic-based APT attack path reasoning usually focuses on specific attack stages or behaviors and lacks global consideration of the whole APT tactical chain, in addition, collected real network logs and system log data are often incomplete, misjudgment and misjudgment are likely to be generated based on the rule-based APT attack path reasoning under the condition that data are missing or wrong, and the normal behaviors of legal users and malicious behavior limits of the attacker are sometimes ambiguous, and in the condition that the APT attacker imitates normal activities, the rule-based method is difficult to distinguish between the two possible behaviors.
Disclosure of Invention
The invention aims to provide an APT attack path reasoning method based on attack technology identification, which has high reasoning accuracy and the obtained APT attack path is more complete.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
an APT attack path reasoning method based on attack technology identification, the APT attack path reasoning method based on attack technology identification includes:
constructing an attack technology knowledge base based on the ATT & CK knowledge base, wherein the attack technology knowledge base comprises a plurality of attack technologies;
acquiring an APT attack threat information document set, matching an attack technology knowledge base with each APT attack threat information document, and mining out a corresponding attack technology sequence mode to obtain an attack technology sequence mode set;
Collecting system kernel event data, extracting an alarm event list from the system kernel event data, constructing a traceability graph according to the alarm event list, and extracting traceability subgraphs from the traceability graph based on each alarm event in the alarm event list, wherein the alarm event list is the system kernel event data of an APT attack path to be determined;
Performing technical matching in an attack technology knowledge base based on the attribute of each node in the traceability subgraph, and setting an attack technology corresponding to each alarm event according to the technical matching result;
constructing an attack technique sequence pattern matching tree based on the attack technique sequence pattern set;
and carrying out pattern matching on the alarm event after the attack technology is set and the attack technology sequence pattern matching tree, and outputting an APT attack path after pattern matching.
The following provides several alternatives, but not as additional limitations to the above-described overall scheme, and only further additions or preferences, each of which may be individually combined for the above-described overall scheme, or may be combined among multiple alternatives, without technical or logical contradictions.
Preferably, the attack techniques in the attack technique knowledge base areWhereinIs the firstThe number of attack techniques to be performed is,Is the firstThe number of the individual attack technique is set,Is the firstThe name of the individual attack technique is given,Is the firstIoC lists corresponding to the attack techniques.
Preferably, the acquiring an APT attack threat information document set, matching an attack technology knowledge base and each APT attack threat information document, and mining out a corresponding attack technology sequence mode to obtain an attack technology sequence mode set, includes:
Identifying an attack technology in an APT attack threat information document, and extracting the identified attack technology which exists in an attack technology knowledge base at the same time;
Sequencing the extracted attack technologies according to the sequence of the APT attack threat information documents to obtain an attack technology list;
obtaining an attack technique list set based on all APT attack threat information documents in the APT attack threat information document set;
and adopting a sequence pattern mining algorithm to mine out an attack technique sequence pattern from the attack technique list set, and combining all attack technique sequence patterns to obtain an attack technique sequence pattern set.
Preferably, the conditions of the sequence pattern mining algorithm are set as follows:
The attack technique sequence mode is continuous and gapless in the original attack technique list;
the occurrence number of the attack technique sequence mode in the attack technique list set cannot be less than a threshold value;
The shorter of the two attack technique sequence patterns having the containment relationship is filtered.
Preferably, the extracting a traceability sub-graph from the traceability graph based on each alarm event in the alarm event list includes:
For an alarm event, determining a target node pointed by a source node in an alarm event list according to the process name and the process number of the source node in the alarm event, then searching all source nodes corresponding to the target node in a traceability graph, finally collecting all neighbor nodes in 3 hops of each source node, and forming a traceability subgraph based on the collection structure of all source nodes.
Preferably, the technical matching is performed in an attack technology knowledge base based on the attribute of each node in the traceability subgraph, and the attack technology corresponding to each alarm event is set according to the technical matching result, including:
The attributes of all nodes contained in the traceability subgraph are respectively matched with IoC in the attack technical knowledge base one by one, and the number of IoC successfully matched is recorded as;
If it isRemoving the alarm event corresponding to the traceability subgraph from the alarm event list if the traceability subgraph is the traceability subgraphSetting the attack technique corresponding to IoC with successful matching as the attack technique of the alarm event ifAnd setting the mode of the attack technology corresponding to IoC successfully matched as the attack technology of the alarm event.
Preferably, the constructing an attack technique sequence pattern matching tree based on the attack technique sequence pattern set includes:
Creating a root node initialization attack technique sequence pattern matching tree;
Searching each attack technical sequence mode in the attack technical sequence mode set in the current attack technical sequence mode matching tree, if any prefix of the attack technical sequence mode cannot be matched with any branch except a root node in the attack technical sequence mode matching tree, inserting the attack technical sequence mode as a new branch under the root node, and if the prefix of the attack technical sequence mode can be matched with the branch in the attack technical sequence mode matching tree, inserting the part of the attack technical sequence mode except the prefix at the tail end of the branch.
Preferably, the step of performing pattern matching on the alarm event after the attack technique is set and the attack technique sequence pattern matching tree, and determining an attack path according to a pattern matching result includes:
Taking the first alarm event in the alarm event list as the current alarm event, and initializing a search node as a root node in an attack technical sequence pattern matching tree;
If the current alarm event in the alarm event list is not searched, the current alarm event in the alarm event list is deleted, and the next alarm event is sequentially taken as the current alarm event to restart the search;
Taking the next alarm event in the alarm event list as the current alarm event in sequence, restarting searching until all alarm events in the alarm event list are searched, outputting a marked attack technical sequence as a pattern matching result, and taking the rest alarm events in the alarm event list to form an APT attack path according to the time sequence.
Compared with the prior art, the APT attack path reasoning method based on attack technology identification has the following beneficial effects:
(1) By analyzing a large amount of APT attack threat information, an attack technology sequence mode set capable of being updated regularly and automatically is established, so that the APT attack strategy which is updated continuously can be conveniently handled, and by carrying out association analysis on alarm events by mining the attack technology sequence mode, a large amount of false alarm events can be filtered, the burden for the APT attack path reasoning work is reduced, and the reasoning time consumption is effectively reduced. (2) The inference attack path is identified based on the attack technology, and the scattered alarm events are connected in series to form a path, so that the path is used as a complete attack chain, a security manager can be helped to quickly restore an attack scene, the efficiency of the APT attack in resisting work is improved, and the accuracy of rule-based inference and the restoration degree of the APT attack path are improved.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
In order to overcome the defects of the prior art, the embodiment provides an APT attack path reasoning method based on attack technology identification, as shown in fig. 1, specifically including the following steps:
(1) And mining the attack technical sequence mode, namely mining the attack technical sequence mode from an attack technical knowledge base by analyzing a large number of APT attack threat information documents. And a new APT attack threat information document is acquired in a periodic updating period to supplement an attack technical sequence mode set of an updating rule in real time so as to solve the effective reasoning of a novel attack mode.
(1-1) Constructing an attack technical knowledge base based on the ATT & CK knowledge base.
The ATT & CK (MITRE ADVERSARIAL TACTICS, techniques and Common Knowledge, i.e., intruder tactics, technology, and commonality) knowledge base has a complete list of all attack technologies. Wherein each attack Technique (Technique) has a name and a number, and each attack Technique provides a plurality of cases (Procedure Example), such as cases T1204.002-5, T1204.001-5, where T1204 represents the attack Technique, 001/002 represents a sub-Technique of the attack Technique, each case contains a plurality of IoC (Indicators of Compromise), ioC referring to suspicious IP (Internet Protocol) addresses, suspicious files, suspicious processes, malware signatures, and traceable and identifiable information related to an attacker. Gathering these knowledge from the ATT & CK knowledge base and constructing an attack technology knowledge base containing all attack technologiesEach attack technique contained therein isWhereinIs the firstThe number of attack techniques to be performed is,Is the firstThe number of the individual attack technique is set,Is the firstThe name of the individual attack technique is given,Is the firstAll IoC under the individual attack technique are combined to get IoC list.
And (1-2) extracting an attack technology sequence, namely acquiring an APT attack threat information document set, matching an attack technology knowledge base with each APT attack threat information document, and mining out a corresponding attack technology sequence mode to acquire an attack technology sequence mode set.
Given APT attack threat intelligence document setAPT attack threat information document inFirst, threat information document is attacked to APTThe data in the APT attack threat information document is subjected to attack technology identification, and the name and the number of the identified attack technology are marked in the APT attack threat information documentIs a kind of medium.
Then searching APT attack threat information documentIncluded in the attack technology knowledge baseThe name or number of the attack technique is extracted, namely the identified attack technique which exists in the attack technique knowledge base at the same time is extracted, and then the extracted attack technique is used for attacking threat information documents according to APTThe order of occurrence in the network is ordered, and repeated attack techniques are removed, so that an attack technique list is expressed asWhereinIn order to attack the list of techniques,To the first in the attack technique listThe name or number of the individual attack technique,. The attack technique list set extracted from a large number of APT attack threat information documents is recorded as。
(1-3) Attack technique sequence pattern mining from the attack technique List set using a sequence pattern mining algorithm such as PrefixSpan (Prefix-Projected Patten Growth) algorithm, GSP (Generalized Sequential PATTERN MINING algorithm) algorithmAn attack technique sequence pattern is mined. Three-point condition limits are set during the attack technique sequence pattern mining. First, the attack technique sequence pattern must be contiguous and non-spaced in the original attack technique list. Second, the number of occurrences of the attack technique sequence pattern in the attack technique list set cannot be less than a threshold (e.g., set to 7 times, which can be adjusted according to the actual situation). Third, the shorter of the two attack technique sequence patterns having the containment relationship is filtered. Combining all attack technical sequence modes obtained by final mining to obtain an attack technical sequence mode set。
An example is given below, assuming an attack technique list setContaining 3 attack technique lists、、Therein, whereinThe sequence mode of the attack technique obtained by mining is thatWherein, the method comprises the steps of, wherein,At the position ofRather than being continuous and non-spaced,Is included inAnd therefore neither is used as an attack technique sequence pattern.
(2) Attack technique identification, given a list of alarm eventsBased on attack technical knowledge baseAnd identifying the attack technology corresponding to each alarm event. Alarm event listAnd the system kernel event data is the system kernel event data of the APT attack path to be determined.
And (2-1) constructing a traceability graph, namely collecting system kernel event data, extracting an alarm event list from the system kernel event data, and constructing the traceability graph according to the alarm event list.
Collecting system kernel event data by using an operating system kernel event collecting tool (such as ETW (EVENT TRACING for Windows) and Auditd), extracting system kernel event data of an APT attack path to be determined from the originally collected system kernel event data, and constructing a traceability graph according to the system kernel event data of the APT attack path to be determinedWherein, the method comprises the steps of, wherein,For tracing the node set of the graph, each nodeRepresenting system entities (e.g., processes, files); To trace the edge set of the graph, each edgeRepresenting interaction events (e.g., creation process, reading and writing files) between system entities.
And (2-2) tracing sub-sampling, namely extracting a tracing sub-image from the tracing image based on each alarm event in the alarm event list. For alarm event listEach alarm event in (a)First according to the alarm eventDetermining the process name and process number of the medium source node, determining the target node pointed by the source node in an alarm event list, then searching all source nodes corresponding to the target node in a tracing graph, finally collecting all neighbor nodes in 3 hops of each source node, forming a tracing subgraph based on the collection structure of all source nodes, and marking as。Representing a list of alarm eventsThe first of (3)An alarm event. The attribute of the node in the traceability subgraph comprises information such as a process name, a process number, a file name, an IP address and the like, the attribute of the node is determined according to the type of the node, for example, the type of the node is a process, and the attribute of the node comprises the process name and the process number.
And (2-3) attack technology classification, namely carrying out technology matching in an attack technology knowledge base based on the attribute of each node in the traceability subgraph, and setting the attack technology corresponding to each alarm event according to the technology matching result.
Subgraph to be traced to sourceAll the nodes contained in the system are respectively matched with the attack technical knowledge base in terms of attributesIoC in the sequence are matched one by one, and the number of IoC which is successfully matched is recorded as. When matching, each field in the attribute of the node is matched with the attack technical knowledge baseMatching is performed at IoC in (c), one node can be matched to one or more IoC. If it isThen trace to source subgraphCorresponding alarm eventFrom a list of alarm eventsIs removed ifSetting the attack technique corresponding to IoC with successful matching as an alarm eventAttack technique of (1) ifSetting the mode of the attack technology corresponding to IoC which is successfully matched as an alarm eventAttack techniques of (a). The alarm event list finally reserved after the attack technology is set is marked as。
(3) Attack path reasoning, namely, alarming event list after attack technology is setSequence pattern set of attack techniqueAnd performing pattern matching, and determining an APT attack path based on a matching result.
(3-1) Constructing an attack technique sequence pattern matching tree based on the attack technique sequence pattern set given the attack technique sequence pattern setFirstly, creating a root node initialization attack technique sequence pattern matching tree. Then, the attack technique sequence pattern setEach attack technique sequence pattern in (a)At the present timeIs searched. If it isIs not matched by any prefix of (a)Any branches (except the root node) in (a) willInserted under the root node as a new branch ifCan be matched with a prefix ofA branch of (B) willThe portion from which the prefix is removed is inserted at the end of the branch. Wherein the method comprises the steps ofSequence pattern set for attack techniqueThe first of (3)A pattern of attack sequences.
An example is given below, assuming a set of attack technique sequence patternsThe method comprises 4 attack technique sequence modes、、、Then constructing the attack technique sequence pattern matching treeThe flow of (a) is shown in FIG. 2, which is to first take the attack technique sequence patternAny of its prefixes cannot be matchedAny branches in the middle except the root node will attack the technologyInserted under the root node as a new branch, then the attack technique sequence pattern is fetchedIts prefixAnd (3) withBranches in (a)In (a) and (b)Match, thenPrefix of (a)Post-removal insert into matched branchesBy taking the attack technique sequence patternAny of its prefixes cannot be matchedAny branches in the middle except the root node will attack the technologyInserting the new branch into the root node, taking the attack technique sequence modeIts prefix、And (3) withBranches in (a)In (a) and (b)、Match, thenPrefix of (a)、Post-removal insert into matched branchesAt the end of (2).
(3-2) Sequence pattern matching of attack technique, namely, setting an alarm event list after attack techniquePattern matching tree with attack technique sequenceAnd performing pattern matching, and outputting an APT attack path after the pattern matching.
(3-2-1) Taking a list of alarm events after setting an attack techniqueThe first alarm event in the set is used as the current alarm event, and the search node is set as an attack technical sequence pattern matching treeSearching the attack technology corresponding to the current alarm event in all child nodes of the searching node. If not, deleting the alarm event listThe current alarm event in the list is sequentially fetchedThe next alarm event in the sequence is used as the current alarm event, then the step (3-2-1) is re-executed to start searching, otherwise, the tree is matched in the attack technique sequence modeThe current successful-search attack technique is marked (also understood as a marked node), the searched node is updated to be the node searched in the attack technique sequence pattern matching tree by the current successful-search attack technique, and then the step (3-2-2) is executed.
(3-2-2) List of alarm events after attack technique is setSequentially taking the alarm events as the current alarm events, and searching all sub-nodes of the searching node for an attack technique corresponding to the current alarm event. If not, deleting the alarm event listThe current alarm event in the list is sequentially fetchedThe next alarm event in the sequence is used as the current alarm event, then the step (3-2-2) is re-executed to start searching, otherwise, the tree is matched in the attack technique sequence modeThe current successful-search attack technique is marked (also understood as a marked node), the searched node is updated to be the node searched in the attack technique sequence pattern matching tree by the current successful-search attack technique, and then the step (3-2-3) is executed.
(3-2-3) Sequentially taking the alarm event ListThe next alarm event in (2-3) is executed until the alarm event listIf all alarm events in the list are searched, the alarm event sequence corresponding to the PaTree branches which are finally matched is an attack path, a marked attack technical sequence is output as a mode matching result, and the rest alarm events in the alarm event list are taken to form an APT attack path according to the time sequence.
An example is given below, assuming a list of alarm events after setting up an attack techniqueIs thatWhich pattern matches the tree in the attack technique sequence as shown in figure 2Pattern matching is performed in the process, firstly, alarm event is takenThe corresponding attack technique has the number value ofAt this time, the search node is an attack technique sequence pattern matching treeThe number value of the attack technique of the root node in the search node and all the child nodes of the search node isAndSearching in all child nodes of the searching nodeAs a result of not being searched, and thus alarm eventDelete, at this point alarm event listUpdated to;
Get alarm eventThe corresponding attack technique has the number value ofAt this time, the search node is still an attack technique sequence pattern matching treeThe number value of the attack technique of the root node in the search node and all the child nodes of the search node isAndSearching in all child nodes of the searching nodeAs a result of searching, pattern matching tree is then performed in the attack technique sequenceAttack technique for marking current search success in middleAnd updating search nodes as attack techniquesCorresponding nodes;
Continue to fetch alarm eventsThe corresponding attack technique has the number value ofAt this time, the search node is an attack technique sequence pattern matching treeMiddle attack techniqueCorresponding node, and the number value of the attack technique of all the child nodes of the search node isAndSearching in all child nodes of the searching nodeAs a result of searching, pattern matching tree is then performed in the attack technique sequenceAttack technique for marking current search success in middleAnd updating search nodes as attack techniquesCorresponding nodes;
Continue to fetch alarm eventsThe corresponding attack technique has the number value ofAt this time, the search node is an attack technique sequence pattern matching treeMiddle attack techniqueCorresponding node, and the number value of the attack technique of all the child nodes of the search node isAndSearching in all child nodes of the searching nodeAs a result of not being searched, and thus alarm eventDelete, at this point alarm event listUpdated to;
Continue to fetch alarm eventsThe corresponding attack technique has the number value ofAt this time, the search node is still an attack technique sequence pattern matching treeMiddle attack techniqueCorresponding node, and the number value of the attack technique of all the child nodes of the search node isAndSearching in all child nodes of the searching nodeAs a result of searching, pattern matching tree is then performed in the attack technique sequenceAttack technique for marking current search success in middleAnd updating search nodes as attack techniquesCorresponding nodes;
judging alarm eventFor alarm event listThe last alarm event in the sequence pattern matching tree is matched in the attack technique sequence pattern matching treeThe attack technology of the medium mark is that、、Outputting the matching result asAnd the rest alarm events in the alarm event list form an APT attack path according to time sequence as。
Based on the APT attack path reasoning method, an attack technology sequence mode is mined through PrefixSpan algorithm, attack path reasoning is carried out on the system Kernel event data sample containing real APT attack sampled through kellect (Kernel-based EFFICIENT AND Lossless Event Log Collector), 300 attack path reasoning results of the attack samples are tested and compared with the real attack path, the average accuracy of the obtained attack path reasoning is 77%, and the average attack reasoning time is 1.48 seconds. The APT attack path reasoning method provided by the invention has higher accuracy and faster reasoning speed in the rule-based reasoning.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present invention, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of the invention should be assessed as that of the appended claims.