Disclosure of Invention
The invention mainly aims to provide a progressive node active tracking method and device for a P2P botnet, and aims to improve the network security of the Internet of things and realize the rapid and accurate tracking of the Internet of things botnet.
In order to achieve the above object, the present invention provides a progressive node active tracking method for a P2P botnet, which is applied in a P2P network including a plurality of nodes, and the method includes the following steps:
step1, constructing an initial node set comprising normal P2P nodes;
and 2, selecting the node as a relay routing node, sending a find _ node request with the target of a random node under a specific sub-tree to the relay routing node, and receiving a nearest node set from the random node of the specific sub-tree in a routing table returned by the node.
Step3, traversing the nodes in the nearest node set, judging whether the nodes in the nearest node set exist in a specific sub-tree, and storing the node information existing in the specific sub-tree into a botnet node list; after traversing, if all nodes in the nearest node set are not in the specific subtree, randomly selecting one node in the nearest node set as a relay routing node, and returning to execute the step 1;
and 4, displaying the acquired botnet node list.
The specific subtree is a subtree formed by gathering Mozi nodes in a node binary tree constructed according to node IDs in a Kademlia network;
the node information comprises an IP address, a port number and a node ID.
Wherein, the method still further comprises: and step 5, verifying the nodes in the botnet node list, sending data in a format of 1: v4: flag (4 bytes), judging whether the nodes send the data in the same format or not, and if so, determining that the nodes are Mozi nodes.
The invention provides a progressive node active tracking device for a P2P botnet, which is applied to a P2P network comprising a plurality of nodes, and comprises the following components:
the initial node construction module is used for constructing an initial node set comprising normal P2P nodes;
and the nearest node acquisition module is used for selecting the node as a relay routing node, sending a find _ node request with the target of a random node under a specific sub-tree to the relay routing node, and receiving a nearest node set which is away from the random node of the specific sub-tree in a routing table returned by the node.
The specific sub-tree identification module is used for traversing the nodes in the nearest node set, judging whether the nodes in the nearest node set exist in the specific sub-tree or not, and storing the node information existing in the specific sub-tree into a botnet node list; after traversing, if all nodes in the nearest node set are not in the specific subtree, randomly selecting one node in the nearest node set as a relay routing node, and returning to the nearest node acquisition module;
and the display module is used for displaying the acquired botnet node list.
The specific subtree is a subtree formed by gathering Mozi nodes in a node binary tree constructed according to node IDs in a Kademlia network;
the node information comprises an IP address, a port number and a node ID.
Wherein, still the device still further includes: the verification module verifies the nodes in the botnet node list, sends data in a format of 1: v4: flag (4 bytes), judges whether the nodes send the data in the same format or not, and determines that the nodes are Mozi nodes if the nodes send the data in the same format.
The beneficial effects of the invention include: (1) the normal p2p node is used as a relay routing node, so that the problem of cold start of botnet node tracking is solved; (2) by tracking specific subtrees, as many as possible nodes under the subtrees are obtained, and then the Mozi nodes are confirmed, so that the tracking efficiency is greatly improved compared with Mozi confirmation of random nodes which are randomly conducted.
Detailed Description
The Kademlia protocol (hereinafter referred to as Kad) is a study published in 2002 by Petar P, Maymounkov and David Mazieres, New York university, U.S.A. Peer-to-Peer information system based on the XOR metric. In brief, Kad is a Distributed Hash Table (DHT) technology, and a brand new DHT topology is established by taking an exclusive or algorithm (XOR) as a distance measurement basis, so that the routing query speed is greatly improved.
After BiTtorent, which is famous in 5 months in 2005, realized the DHT technology based on Kademlia protocol in version 4.1.0, BitComet and BitSpirit in China quickly realized the DHT technology compatible with BitTorrent, and realized the trackeress download mode.
Kad have now become the mainstream implementation of DHT, the p2p network hosted by the Mozi botnet is also based on Kad. The Kademlia algorithm is a distributed hash storage and routing algorithm, and each node of the P2P network realized based on Kad protocol stores partial resources. At this time, the following problems need to be considered:
1. how to distribute memory content to various nodes
2. How to find nodes/addresses/paths storing files
How to route the target node quickly and efficiently is an important problem solved by Kad.
In the p2p network constructed by Kademlia protocol, all information is stored in the form of hash table entries, and the entries are dispersedly stored on each node, so that a huge distributed hash table is formed in a full-network mode.
Node distance. Key value (Key) of data in Kademlia network, node ID (NodeID) are all represented by 160 bits, and NodeID is randomly distributed when joining the network. The Kad protocol specifies that each node maintains 2 pieces of information as follows: 1. the resource to be stored, which is stored in the form of a < key, value > pair, can be understood as a hash value of a file name and a file content. 2. A routing table called as "k-packet" is layered according to Node IDs and records IDs, IP addresses and ports of other nodes with limited number.
In the Kad network, the distance between two nodes is measured not by physical distance and router hop number, but by the exclusive or distance of node ID.
For another understanding of the above description: if the nodes of the entire network are combed into a binary tree arranged by node IDs, each leaf at the extreme end of the tree is a node. The location of each node is uniquely determined by the shortest prefix of its ID value. The process of composing Kad a network node tree from nodes is as follows: step 1: keys (such as node IDs) are represented in a binary form, and then the keys are sequentially processed according to steps 2-3 from high order to low order. Step 2: the nth bit of the binary corresponds to the nth level of the binary tree. Step 3: if the current bit is 1, the right sub-tree is entered, if 0, the left sub-tree is entered (set is considered, and the reverse is possible). Step 4: after the processing according to the high order to the low order is finished, the Key value corresponds to a certain leaf node on the binary tree.
Assume that each node ID is N bits. After each node splits a sub-tree according to the own view angle, N sub-trees can be obtained in total. If 1 node in each of the N subtrees is known, the current node can be recursively routed using the N nodes to reach any subtree and node of the entire binary tree. Also, the cost of maintaining these N nodes depends on the tree height. The tree height of Kademlia default 160 is relatively maintenance-less costly. In the actual use process, considering the problem of robustness (each node may be quit or down), only one node is not enough to be known, and more nodes are needed to be relatively insurance.
After each node splits a sub-tree according to its own view, N sub-trees can be obtained, and then N routing tables (corresponding to N K-buckets) need to be maintained. The Kad algorithm uses the concept of K-bucket to store the state information of other neighboring nodes, which is composed of (IP address, UDP port, Node ID) data list (Kad network exchanges information by UDP protocol). Each such list is referred to as a K-bucket, and the internal information storage location of each K-bucket is arranged according to the time sequence of last sight, last-sight being placed at the head and last-sight being placed at the tail. Each bucket has no more than K data items.
The Kademlia protocol includes four remote RPC operations: PING, STORE, FIND _ NODE, FIND _ VALUE. 1. The PING operation is used for detecting a node to judge whether the node is still online; 2. the effect of the STORE operation is to inform a node to STORE one<key,value>Right for later inquiry needs; 3. the FIND _ NODE operation uses a 160 bit ID as a parameter. The recipient of this operation returns (IP address, UDP port, Node ID) information for the K nodes it knows that are closer to the target ID. The information of the nodes can be obtained from a single K bucket or from a plurality of K buckets (such asThe K buckets with the fruit closest to the target ID are not full). In either case, the acceptor will return information for the K nodes to the operation initiator. But if the acceptor does not have K nodes information for all K buckets added up, it will return all nodes information to the initiator. 4. The FIND _ VALUE operation is similar to the FIND _ NODE operation except that it only needs to return (IP address, UDP port, NODE ID) information for one NODE. If the recipient of the operation receives a STORE operation of the same key, the stored value is returned directly. To prevent spoofing addresses, the recipient needs to respond to a random 160-bit ID value in all RPC operations. In addition, the PING operation may also be piggybacked in the recipient's RPC reply message in order to be sure of the sender's network address. Recursive routing. Kad is characterized by providing a fast node searching mechanism and adjusting the searching speed according to the parameters. If node X is to find node Y, Kad performs a recursive route search according to the following recursive operation steps: 1. the distance of X to Y is calculated. 2. Log from X2And D K buckets take out the information of alpha NODEs, and simultaneously carry out FIND _ NODE operation. If the information in this K bucket is less than a, then a total of a nodes from the nearby buckets that are closest to D are selected. 3. For each node subjected to the query operation, if finding that the node is Y, answering the node that the node is the closest to Y; otherwise, measuring the distance between the self and the Y, and selecting the information of alpha nodes from the K bucket corresponding to the self to the X. 4. X performs FIND _ NODE operation again for each newly received NODE, and this process is repeated until each branch has a NODE responding that is closest to Y. 5. Through the search operation, X obtains K pieces of node information closest to Y.
While we studied mozi samples, we found that there were 2 prefixes of mozi node IDs:
1. sample files hardcoded 888888 (0 x 3838383838)
2. The prefix specified by the hp instruction in the config configuration file, most commonly 88888888888888 (0 x 383838383838)
The hard-coded prefix nature of the Mozi node ID equivalently states that the Mozi node has clustered under certain specific subtrees of the p2p network. After the Mozi node joins the network through normal p2p communication, in order to distinguish the traffic quickly, the identification of 1: v4: flag (4 bytes) is used to identify whether the traffic is sent by the same node, thus achieving the purpose of peer-to-peer handshake identification.
The flag byte means as follows:
0-----random
1-----hard-code(0x42) or from [ver]
2-----calc by algorithm
3-----calc by algorithm
the aggregation of Mozi nodes under specific subtrees embodies the necessity to track these specific subtrees.
The goal of tracking a particular sub-tree is to acquire as many nodes as possible under those sub-trees and then perform the validation of the Mozi nodes. Compared with Mozi confirmation of random nodes in a random purposeless way, the tracking efficiency is greatly improved.
The first problem to be solved is how to acquire as many nodes as possible under a particular sub-tree. At this point, the Kad protocol features that we refer to in section 2.2.6 are used: the node will return the α nodes closest to the target node in its own routing table in response to the find _ node request.
If we use the normal p2p node as the relay routing node and send the find _ node request targeting the random node under the specific subtree, the normal p2p node will return the α nodes closest to the random node of the specific subtree in its routing table. The a nodes may exist directly in a particular sub-tree, even if not, relatively closer to the particular sub-tree. If the returned node is not in the specific subtree, the node is taken as a relay routing node, and the acquisition of the routing table data is continued, and the process is repeated in a circulating way.
Common available public normal p2p nodes are for example:
- dht.transmissionbt.com:6881
- router.bittorrent.com:6881
- router.utorrent.com:6881
- bttracker.debian.org:6881
by probing the specific routing table data of normal p2p nodes, we can collect a large amount of node data under a specific sub-tree.
The invention provides a progressive node active tracking method for a P2P botnet, which is applied to a P2P network comprising a plurality of nodes, and comprises the following steps:
step1, constructing an initial node set comprising normal P2P nodes;
and 2, selecting the node as a relay routing node, sending a find _ node request with the target of a random node under a specific sub-tree to the relay routing node, and receiving a nearest node set from the random node of the specific sub-tree in a routing table returned by the node.
Step3, traversing the nodes in the nearest node set, judging whether the nodes in the nearest node set exist in a specific sub-tree, and storing the node information existing in the specific sub-tree into a botnet node list; after traversing, if all nodes in the nearest node set are not in the specific subtree, randomly selecting one node in the nearest node set as a relay routing node, and returning to execute the step 1;
and 4, displaying the acquired botnet node list.
The specific subtree is a subtree formed by gathering Mozi nodes in a node binary tree constructed according to node IDs in a Kademlia network;
the node information comprises an IP address, a port number and a node ID.
Wherein, the method still further comprises: and step 5, verifying the nodes in the botnet node list, sending data in a format of 1: v4: flag (4 bytes), judging whether the nodes send the data in the same format or not, and if so, determining that the nodes are Mozi nodes.
The invention provides a progressive node active tracking device for a P2P botnet, which is applied to a P2P network comprising a plurality of nodes, and comprises the following components:
the initial node construction module is used for constructing an initial node set comprising normal P2P nodes;
and the nearest node acquisition module is used for selecting the node as a relay routing node, sending a find _ node request with the target of a random node under a specific sub-tree to the relay routing node, and receiving a nearest node set which is away from the random node of the specific sub-tree in a routing table returned by the node.
The specific sub-tree identification module is used for traversing the nodes in the nearest node set, judging whether the nodes in the nearest node set exist in the specific sub-tree or not, and storing the node information existing in the specific sub-tree into a botnet node list; after traversing, if all nodes in the nearest node set are not in the specific subtree, randomly selecting one node in the nearest node set as a relay routing node, and returning to the nearest node acquisition module;
and the display module is used for displaying the acquired botnet node list.
The specific subtree is a subtree formed by gathering Mozi nodes in a node binary tree constructed according to node IDs in a Kademlia network;
the node information comprises an IP address, a port number and a node ID.
Wherein, still the device still further includes: the verification module verifies the nodes in the botnet node list, sends data in a format of 1: v4: flag (4 bytes), judges whether the nodes send the data in the same format or not, and determines that the nodes are Mozi nodes if the nodes send the data in the same format.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.