Block chain transaction topological graph analysis method and device based on graph neural networkTechnical Field
The invention belongs to the technical field of anti-money laundering of blockchain virtual currency, and particularly relates to a method and a device for analyzing a blockchain transaction topological graph based on a graph neural network.
Background
In the transaction on the block chain, illegal cases such as reimbursement, fraud, money laundering and the like are frequently seen, and increasingly serious threats are caused to the financial security of financial institutions and regions. The blockchain mass account addresses are divergent and anonymous, resulting in complex and hidden blockchain illegal transaction patterns. How to effectively lock and extract illegal transaction paths from a massive dynamic anonymous transaction network is a problem to be solved urgently.
In the existing illegal transaction analysis methods, most methods provide clue intervention from police or victims, and analysis is carried out based on preset illegal transaction business rules. Although the business rules can help to find some abnormal illegal behaviors, the block chain account addresses are massive and divergent, the transaction behaviors are complex and concealed, and the problems of time and labor consumption in address analysis and early warning discrimination processing exist. In addition, as the transaction network may extend infinitely, the transaction network analysis is performed according to the business rules, the boundary of the target transaction network cannot be determined effectively, and redundant information cannot be filtered effectively.
The existing methods for identifying illegal transaction related addresses and transaction behaviors are mainly divided into three categories: one is to set a set of rules for identifying possible abnormal transactions based on information such as transaction amount, transaction timestamp, etc.; the method has low automation degree, and the recognition accuracy depends on the expert experience and the rule setting accuracy. One is to adopt the traditional machine learning method to cluster and classify the block chain addresses, so as to identify more similar abnormal transactions according to a small amount of abnormal address information identified manually, but the generalized linear regression or decision tree model can not be directly applied to the graph data, and secondly, both the regression model and the decision tree model depend on the human characteristic engineering, so that better clustering and classification effects are difficult to obtain. One is that a trading network subgraph is extracted from original trading graph data according to a target node, the subgraph data is input into a network model to train the feature vectors of the trading nodes, and finally the target vector is put into a binary model to perform false trading recognition, but the nodes are not filtered and pruned in the process of extracting the target network subgraph, so that the network subgraph contains more trading nodes, and the characteristic extraction efficiency of the trading nodes is low. In addition, the vector after the neural network of the graph is trained is subjected to two-class identification, and is not suitable for multi-class identification of the identity of the transaction network node.
Disclosure of Invention
The invention provides a block chain transaction topological graph analysis method based on a graph neural network, and aims to solve the problems that illegal transaction behaviors on a block chain cannot be efficiently identified, illegal transaction positioning cannot be performed, and fund sources and flow directions in illegal cases cannot be effectively identified in the prior art. The invention adopts the depth map neural network and combines the business strategy to realize the extraction of the illegal transaction key path, improves the accuracy of the identification result and is beneficial to assisting law enforcement departments in quickly identifying illegal transactions.
The invention is realized by the following technical scheme:
a block chain transaction topological graph analysis method based on a graph neural network comprises the following steps:
acquiring a block chain target fund transaction network subgraph;
performing transaction behavior characteristic extraction on the target fund transaction network subgraph;
giving partial node labels to the target fund transaction network subgraph;
vector training is carried out on all nodes in the target fund transaction network subgraph by adopting a depth map neural network model;
acquiring labels of unknown label nodes in the target fund transaction network subgraph according to the feature vectors of all nodes in the target fund transaction network subgraph;
and obtaining illegal fund transaction paths after the target fund transaction network is pruned according to all the node labels and the service pruning strategy in the target fund transaction network subgraph.
The invention filters through the preset graph characteristics, screens the nodes in the target fund network, extracts the transaction paths on the chain to form a transaction network subgraph, then adopts a deep graph neural network algorithm to carry out vector representation on the nodes in the target fund transaction network subgraph, calculates the similarity between the nodes and the known label nodes, combines the service rules to carry out effective pruning of the subgraph network, and further identifies and extracts illegal transaction paths on the chain, thereby solving the problem that the prior method can not efficiently identify the source and the flow direction of related fund.
Preferably, the step of acquiring the blockchain target fund transaction network subgraph comprises the following steps:
acquiring a full-volume blockchain transaction network topological graph;
and extracting a target fund transaction network subgraph from the full-volume blockchain transaction network topological graph according to a wind control strategy of preset graph characteristics.
Preferably, the step of performing transaction behavior feature extraction on the target fund transaction network subgraph comprises the following steps:
and carrying out node transaction behaviors and transaction relation data characteristic extraction between nodes on all nodes of the target fund transaction network subgraph.
Preferably, the step of assigning partial node labels to the target fund transaction network subgraph comprises the following steps:
acquiring labels of partial nodes in the target fund transaction network subgraph according to a crawler address label library of a third-party open source website;
and acquiring the labels of partial nodes in the target fund transaction network subgraph according to clues provided by users or police.
Preferably, the vector training step of all the nodes in the target fund transaction network subgraph by adopting a depth map neural network model comprises the following steps:
inputting the extracted target fund transaction network subgraph and the extracted node transaction behavior characteristics into a depth map neural network model;
and carrying out deep graph neural network model training to generate the aggregation vectors of all nodes of the target fund transaction network subgraph, thereby obtaining the target feature vectors of all nodes in the target fund transaction network subgraph.
Preferably, the step of performing deep map neural network model training and generating aggregation vectors of all nodes of the target fund transaction network subgraph specifically comprises:
s1, selecting a node from the target fund transaction network subgraph as a current node;
s2, neighbor sampling is carried out on neighbor nodes of the current node;
s3, sampling a specified number of neighbor nodes through the current node, performing feature aggregation, and updating the feature of the current node by using the feature obtained by aggregation, namely, using the aggregated feature vector as the target feature vector of the current node;
s4, another node in the target fund transaction network subgraph is selected as the current node, and the steps S2-S3 are repeated until target feature vectors of all nodes in the transaction network graph G (V, E) are generated.
Preferably, the step of obtaining the label of the unknown label node in the target fund transaction network subgraph according to the feature vectors of all nodes in the target fund transaction network subgraph comprises the following steps:
according to the node target feature vector obtained by the depth map neural network aggregation, similarity calculation or distance calculation is carried out on the unknown label node and the known label node;
and acquiring the label of the unknown label node in the target fund transaction network subgraph according to a preset similarity threshold or distance threshold.
In a second aspect, the invention provides a block chain transaction topological graph analysis device based on a graph neural network, which comprises an acquisition unit, a feature extraction unit, a labeling unit, a model training unit, a classification unit and an identification unit;
the acquisition unit acquires a full-volume blockchain transaction network topological graph and extracts a target fund transaction network subgraph from the full-volume blockchain transaction network topological graph according to a wind control strategy of preset graph characteristics;
the feature extraction unit extracts features of the target fund transaction network subgraph;
the labeling unit labels part of nodes in the target fund transaction network subgraph;
the model training unit performs vector training on all nodes in the target fund transaction network subgraph by adopting a depth map neural network model to obtain target feature vectors of all nodes of the target fund transaction network subgraph;
the classification unit acquires labels of all unknown label nodes in the target fund transaction network subgraph according to target feature vectors of all nodes of the target fund transaction network subgraph;
and the identification unit obtains illegal fund transaction paths after the target fund transaction network is pruned according to all node labels and the service pruning strategy of the target fund transaction network subgraph.
In a third aspect, the present invention provides a computer device comprising a memory and a processor, wherein the memory stores a computer program, and wherein the processor implements the steps of the method of the present invention when executing the computer program.
In a fourth aspect, the invention proposes a computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the method according to the invention.
The invention has the following advantages and beneficial effects:
1. according to the method, the network structure of the illegal target fund transaction is pre-extracted according to the wind control strategy based on the graph characteristics, then the specified graph neural network is applied to learn the node characteristic vectors and extract the key illegal path, and the pre-extraction of the network structure reduces the range of the retrieval target node and improves the timeliness of characteristic data extraction and model training.
2. The invention trains the target fund transaction network by adopting the deep map neural network, can classify nodes in the network, prunes the target fund network by combining the business strategy to extract the key path of illegal transaction, improves the accuracy of the identification result, is beneficial to assisting law enforcement departments in quickly identifying illegal transaction, and improves the case solving efficiency.
3. The method adopts the graph learning algorithm to combine with the service scene to identify the fund source and the outflow path of illegal case fund transaction, and adopts the deep graph neural network algorithm which can be matched with the service scene to train the characteristic vector of the graph network node on the algorithm, so that the node can learn the self transaction information and the transaction information of the neighbor node without losing the network structure information. Then, label identification is carried out on the unknown address of the transaction network according to the address labels of the partial block chains; meanwhile, the address label of the algorithm learning result under the service scene is judged again by combining the service rule, so that the accuracy of the model result is improved, and the fund source and the outflow path of illegal case fund transaction are more accurately extracted.
4. The invention combines the model discrimination result with the business strategy, thereby effectively making up the deficiency of model identification and effectively identifying illegal addresses of other transaction characteristics on the chain. Compared with other graph convolution neural network models, the depth graph neural network model can perform model migration, can process dynamically-changed transaction network graphs, and is wide in application range.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a schematic flow chart of the method of the present invention.
FIG. 2 is a diagram of transaction network relationships.
FIG. 3 is a diagram of a critical path for illegal funds.
Fig. 4 is a block diagram of a network identification process of illegal transactions.
FIG. 5 is a schematic diagram of a computer device according to the present invention.
Fig. 6 is a schematic block diagram of the apparatus of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
Example 1
The embodiment provides a block chain transaction topological graph analysis method based on a graph neural network, and as shown in fig. 1, the method of the embodiment includes:
step 101, acquiring a block chain target fund transaction network subgraph.
According to the obtained case clues, the nodes are analyzed and filtered in the full-volume blockchain transaction network topological graph according to the wind control strategy of the preset graph characteristics (such as characteristics of transaction node cash-out or cash-in ratio, cash-out degree and credit-in degree and the like), the paths of unnecessary and non-main fund links are removed, and a transaction network subgraph of the target suspect fund is obtained, as shown in fig. 2, other possible illegal account nodes related to the illegal node a in a specified step length (step length is 3) are extracted. The target fund transaction network subgraph obtained in the embodiment is a suspected illegal fund transaction network subgraph.
In the embodiment, the wind control strategy is combined to pre-extract the illegal target fund transaction network structure, and through a specific business rule (for example, if the fund flows into the exchange, the fund inflow address of the exchange is the money-filled address category), the identity recognition of part of transaction nodes is increased, the tags of the subsequent unknown tag transaction nodes are promoted to more accurately and better conform to the business scene (namely, the target fund transaction network subgraph is closer to the business), meanwhile, the planning of the transaction network subgraph is reduced, and the time efficiency of the subsequent characteristic data extraction is improved.
And 102, performing transaction behavior characteristic extraction on the target fund transaction network subgraph.
In this embodiment, all nodes of the transaction network subgraph of the target suspected fund are subjected to feature extraction of transaction relationship data between node transaction behaviors and nodes, fields which can explain service scenarios more, such as address balance, transfer amount, transfer-out amount, average transfer amount per time, average transfer-out amount per time, export degree, admission degree and other block chain address transaction behavior feature data, are extracted by adopting a data statistics means, and relationship data of two nodes represents transaction edges between the nodes.
And 103, giving partial node labels to the target fund transaction network subgraph.
In the embodiment, partial node tags are given to the target suspected fund transaction network subgraph according to a crawler address tag library of a third-party open source website and clues provided by a user and an police, the tag types provide illegal addresses and user damaged addresses in the clues, transaction hot wallet addresses, transaction exchange money charging addresses, mixed money addresses, dark net addresses and the like for the police or the user, and other nodes which are not marked serve as nodes to be identified.
And step 104, performing vector training on all nodes in the target fund transaction network subgraph by adopting a depth map neural network model.
According to the embodiment, model training is carried out on the target fund transaction network by adopting a depth map neural network model according to the behavior characteristic data of the nodes and the relation data among the nodes, so that vector characteristics embedding of all the nodes are obtained.
The training process of this embodiment specifically includes:
step 201, the target fund transaction network subgraph G (V, E) extracted instep 101 and the node transaction characteristics extracted instep 102 are used as input data of the deep map neural network model.
Step 202, performing depth map neural network model training to generate an aggregation vector of all nodes of the target fund transaction network subgraph G (V, E), including the following substeps:
step 301, neighbor sampling is performed on neighbor nodes of the current node. Taking a current node as a target aggregation node, and when K is 1, indicating that only one-hop neighbor nodes of the current node are sampled; when K is 2, the one-hop and two-hop neighbor nodes of the current node are sampled; and so on. Where K represents the number of sampling layers.
Step 302, a specified number of neighbor nodes are sampled by the current node and feature aggregation is performed by a preset aggregation method (for example, an averaging processing technique, an LSTM technique, or the like), and the features of the current node are updated by using the features obtained through aggregation, that is, the aggregated feature vector is used as a target feature vector of the current node. In the illegal transaction investigation process, the structure of the target fund transaction graph is always dynamically changed, new nodes are added from time to time, and when new nodes are added into the graph, the depth map neural network can acquire the latest node feature vector representation without repeated training by the aid of the aggregation mode.
Step 303, selecting another node in the target fund transaction network subgraph G (V, E) as a current node, and repeating steps 301-302 until target feature vectors of all nodes in the target fund transaction network subgraph G (V, E) are generated.
And 105, acquiring the label of the unknown label node in the target fund transaction network subgraph.
The embodiment specifically includes:
step 401, according to node feature vectors obtained by depth map neural network aggregation, similarity calculation or distance calculation is carried out on nodes to be identified and known label nodes;
step 402, acquiring the label of the unknown label node in the target fund transaction network subgraph according to a preset similarity threshold or distance threshold;
step 403, another node to be identified is selected, and the steps 401 to 402 are repeated until the labels of all unknown label nodes (i.e. nodes to be identified) in the target fund transaction network subgraph are obtained, that is, the classification of all nodes in the target fund transaction network subgraph is realized.
And step 106, acquiring the illegal fund transaction path after the pruning of the target fund transaction network subgraph.
In this embodiment, a node label and a service pruning policy in the target fund transaction network subgraph are combined to obtain a illegal fund transaction path after the target fund transaction network is pruned, as shown in fig. 3, taking the illegal transaction node a as an example, all relevant nodes of the node a are obtained, according to the illegal transaction service rules (for example, fraud accounts usually implement fund transfer through intermediate accounts), pruning the relevant nodes of the node a (for example, the peripheral 9 nodes with the lightest color shown in fig. 3) and obtaining a key fund transaction path of the node a (for example, the 7 node networks with darker color shown in fig. 3).
The classic scenario in a blockchain illegal transaction case as shown in fig. 4 has the business rules: the transaction relationship is derived from a whole node, and local typical fund transaction in illegal cases is shown, wherein alpha1,α2,…,α11Are transaction addresses. The final flow of illegal funds is bound to the exchange's monetary address alpha10And finally, the exchange is reached for fund change. Based on the characteristic, the known illegal address alpha of the report on the chain can be effectively found out7The fund transfer address alpha9(ii) a Based on the similarity of the feature vectors of the graph nodes, the address alpha with the same fraud characteristics can be effectively found out8. Therefore, in the illegal transaction chain, according to a mode of combining the business strategy and the graph learning algorithm, the illegal transaction path can be effectively found out, and the transaction network subgraph of the illegal fund key path is finally obtained.
The embodiment also provides a computer device for executing the method of the embodiment.
As shown in fig. 5 in particular, the computer device includes a processor, an internal memory, and a system bus; various device components including internal memory and processors are connected to the system bus. A processor is hardware used to execute computer program instructions through basic arithmetic and logical operations in a computer system. An internal memory is a physical device used to temporarily or permanently store computing programs or data (e.g., program state information). The system bus may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus. The processor and the internal memory may be in data communication via a system bus. Including read-only memory (ROM) or flash memory (not shown), and Random Access Memory (RAM), which typically refers to main memory loaded with an operating system and computer programs.
Computer devices typically include an external storage device. The external storage device may be selected from a variety of computer readable media, which refers to any available media that can be accessed by the computer device, including both removable and non-removable media. For example, computer-readable media includes, but is not limited to, flash memory (micro SD cards), CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer device.
A computer device may be logically connected in a network environment to one or more network terminals. The network terminal may be a personal computer, a server, a router, a smart phone, a tablet, or other common network node. The computer apparatus is connected to the network terminal through a network interface (local area network LAN interface). A Local Area Network (LAN) refers to a computer network formed by interconnecting within a limited area, such as a home, a school, a computer lab, or an office building using a network medium. WiFi and twisted pair wiring ethernet are the two most commonly used technologies to build local area networks.
It should be noted that other computer systems including more or less subsystems than computer devices can also be suitable for use with the invention.
As described in detail above, the computer device adapted to the present embodiment can perform the specified operations of the blockchain transaction topology analysis method. The computer device performs these operations in the form of software instructions executed by a processor in a computer-readable medium. These software instructions may be read into memory from a storage device or from another device via a local area network interface. The software instructions stored in the memory cause the processor to perform the method of processing group membership information described above. Furthermore, the present invention can be implemented by hardware circuits or by a combination of hardware circuits and software instructions. Thus, implementation of the present embodiments is not limited to any specific combination of hardware circuitry and software.
Example 2
In this embodiment, a block chain transaction topological graph analyzing apparatus based on a graph neural network is provided, as shown in fig. 6, the apparatus of this embodiment includes: the device comprises an acquisition unit, a feature extraction unit, a labeling unit, a model training unit, a classification unit and an identification unit.
The acquisition unit acquires a full-volume blockchain transaction network topological graph and extracts a target fund transaction network subgraph from the full-volume blockchain transaction network topological graph according to a wind control strategy of preset graph characteristics.
And the feature extraction unit performs feature extraction on the target fund transaction network subgraph, including node transaction behavior feature and relationship feature extraction between nodes.
And the labeling unit labels part of nodes in the target fund transaction network subgraph, and unmarked nodes are used as nodes to be identified.
And the model training unit performs vector training on all nodes in the target fund transaction network subgraph by adopting a depth map neural network model to obtain target feature vectors of all nodes of the target fund transaction network subgraph.
The classification unit acquires labels of all unknown label nodes (namely to-be-identified nodes) in the target fund transaction network subgraph according to target feature vectors of all nodes of the target fund transaction network subgraph, and classification of all nodes in the target fund transaction network subgraph is achieved.
And the identification unit obtains illegal fund transaction paths after the target fund transaction network is pruned according to all node labels and the service pruning strategy of the target fund transaction network subgraph.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.