CN113779085A

Movatterモバイル変換

Info

Publication number: CN113779085A
Application number: CN202110844838.9A
Authority: CN
Inventors: 邹磊; 曾立
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2021-12-10

Abstract

Translated fromChinese

本申请实施例提供一种获取同构子图的方法、装置、计算机设备及可读存储介质。方法包括：在第一查询图中确定当前查询的第一节点；响应于第一节点满足第一条件，确定第一节点的至少一个匹配节点，满足第一条件指第一节点的所有邻居节点确定了匹配节点，第一查询图中任一节点的匹配节点指任一节点在第一数据图中对应的具有相同结构的节点；基于查询顺序依次确定第一查询图节点中未确定匹配节点的第二节点的匹配节点；响应于第二节点确定了匹配节点，将第一节点与上述至少一个匹配节点进行匹配；基于匹配结果确定第一查询图的同构子图。该方法基于第一查询图中满足条件的第一节点，确定查询匹配方式，减少重复操作，提高获取同构子图的效率与灵活性。

Embodiments of the present application provide a method, an apparatus, a computer device, and a readable storage medium for acquiring an isomorphic subgraph. The method includes: determining a first node currently queried in a first query graph; in response to the first node satisfying a first condition, determining at least one matching node of the first node, and satisfying the first condition means that all neighbor nodes of the first node determine If there is a matching node, the matching node of any node in the first query graph refers to the node with the same structure corresponding to any node in the first data graph; the first query graph node is determined in turn based on the query sequence without determining the matching node. A matching node of two nodes; in response to the second node determining a matching node, matching the first node with the above at least one matching node; determining an isomorphic subgraph of the first query graph based on the matching result. The method determines the query matching mode based on the first node that satisfies the condition in the first query graph, reduces repeated operations, and improves the efficiency and flexibility of acquiring isomorphic subgraphs.

Description

Method and device for acquiring isomorphic subgraph, computer equipment and readable storage medium

Technical Field

The present application relates to the field of internet searching and mining technologies, and in particular, to a method and an apparatus for obtaining a isomorphic subgraph, a computer device, and a readable storage medium.

Background

Subgraph matching is a graph theory-based pattern recognition method commonly used in the field of internet search and mining, and is used for searching isomorphic subgraphs of a query graph in a data graph.

In the related art, an isomorphic subgraph of a query graph is generally acquired in a data graph based on a subgraph matching algorithm, and an algorithm framework of the subgraph matching algorithm is Depth-First Search and backtracking (Depth First Search). After determining the candidate nodes corresponding to the query nodes in the data graph, enumerating various combinations obtained by matching the query nodes in the query graph with the candidate nodes according to the matching sequence of the query nodes, and determining the effective combinations meeting the conditions, thereby obtaining all isomorphic subgraphs of the query graph in the data graph.

In the process of obtaining the isomorphic subgraph, the matching sequence of each query node in the query graph and the structures of the data graph and the query graph influence the speed of obtaining the isomorphic subgraph. The subgraph matching algorithm in the related technology determines that the matching sequence of each query node in the query graph is long, the speed of obtaining the isomorphic subgraph is low, the whole obtaining process is long, the average time of obtaining the isomorphic subgraph is long, and the obtaining efficiency is low.

Disclosure of Invention

The embodiment of the application provides a method and a device for acquiring isomorphic subgraphs, computer equipment and a readable storage medium, which can reduce repeated matching operation in the process of acquiring the isomorphic subgraphs and improve the efficiency and flexibility of acquiring the isomorphic subgraphs.

In one aspect, a method for obtaining a isomorphic subgraph is provided, and the method includes:

determining a first node of a current query in each node of the first query graph;

responding to that a first node meets a first condition, determining at least one matched node of the first node, wherein the first condition is that all neighbor nodes of the first node already determine the matched node, and the matched node of any node of a first query graph is a corresponding node with the same structure of any node in a first data graph;

determining matching nodes of second nodes in sequence based on the query sequence of each node of the first query graph, wherein the second nodes are nodes of which the matching nodes are not determined in the first query graph;

matching the first node with at least one matching node of the first nodes in response to the second nodes all determining the matching node;

based on the matching result, a homogeneous subgraph of the first query graph in the first data graph is determined.

In one possible implementation, determining at least one matching node of the first node includes:

determining all neighbor nodes of the first node in the first query graph, and acquiring the matched nodes of all the neighbor nodes; determining neighbor nodes of all matched nodes in the first data graph, and obtaining a set of neighbor nodes of any matched node based on the neighbor nodes of any matched node; and performing intersection operation on the set of the neighbor nodes of each matching node, and taking at least one node included in the intersection operation result as at least one matching node of the first node.

In a possible implementation manner, before determining the first node of the current query among the nodes of the first query graph, the method further includes:

based on the width-first search sequence, sequentially determining at least one node corresponding to each node in the first query graph in the first data graph from the initial node of the first query graph, and storing the matching result of each node in the first query graph; if after storing at least one node corresponding to the node currently being queried in the first data graph, the storage space occupied by the node corresponding to each matched node reaches a storage threshold value, determining a first node in the matched nodes in the first query graph; determining a matching node of each node of the first query graph according to the depth-first search order based on the first node; and determining a isomorphic subgraph of the first query graph in the first data graph based on the matching result of each node in the first query graph.

merging target nodes in the second query graph, and obtaining the first query graph according to a merging result, wherein the target nodes are a plurality of nodes which meet target conditions in the second query graph, the target conditions are that the plurality of nodes have the same label and neighbor nodes with the same data volume, and the labels of the neighbor nodes of different nodes in the plurality of nodes are the same;

determining a isomorphic subgraph of the first query graph in the first data graph, comprising: a homogeneous subgraph of the second query graph in the first data graph is determined.

merging target nodes in the second data graph, and obtaining the first data graph according to a merging result, wherein the target nodes are a plurality of nodes which meet target conditions in the second data graph, the target conditions are that the plurality of nodes have the same label and neighbor nodes with the same data volume, and the labels of the neighbor nodes of different nodes in the plurality of nodes are the same;

determining a isomorphic subgraph of the first query graph in the first data graph, comprising: a homogeneous subgraph of the first query graph in the second data graph is determined.

In a possible implementation manner, merging the target nodes in the second query graph, and obtaining the first query graph according to a merging result includes:

determining a plurality of target nodes in the second query graph based on the labels of all the nodes in the second query graph and the connection relation between all the nodes in the second query graph; and merging all the target nodes to obtain a first query graph according to a merging result.

determining a plurality of target nodes in the second query graph based on the labels of all the nodes in the second query graph and the connection relation between all the nodes in the second query graph; merging all leaf nodes in the plurality of target nodes, and merging non-leaf nodes in the plurality of target nodes based on a first qualifying condition; and determining a first query graph according to the merging result.

In a possible implementation manner, merging the target nodes in the second data graph, and obtaining the first data graph according to a merging result includes:

determining a plurality of target nodes in the second data graph based on the labels of all the nodes in the second data graph and the connection relation between all the nodes in the second data graph; and merging the target nodes based on a second limiting condition, and determining a first data graph according to a merging result.

In one aspect, an apparatus for obtaining isomorphic subgraphs is provided, and the apparatus includes:

the first determining module is used for determining a first node of the current query in each node of the first query graph;

a second determining module, configured to determine at least one matching node of the first node in response to that the first node satisfies a first condition, where the first condition is that all neighboring nodes of the first node have already determined the matching node, and a matching node of any node of the first query graph is a node of the same structure corresponding to any node in the first data graph;

a third determining module, configured to sequentially determine, based on a query sequence of each node of the first query graph, a matching node of a second node, where the second node is a node of the first query graph for which a matching node is not determined;

the matching module is used for responding to the fact that the second nodes determine the matching nodes, and matching the first node with at least one matching node of the first node;

and the subgraph determining module is used for determining a isomorphic subgraph of the first query graph in the first data graph based on the matching result.

In a possible implementation manner, the second determining module is configured to determine all neighbor nodes of the first node in the first query graph, and obtain a matching node of each neighbor node; determining neighbor nodes of all matched nodes in the first data graph, and obtaining a set of neighbor nodes of any matched node based on the neighbor nodes of any matched node; and performing intersection operation on the set of the neighbor nodes of each matching node, and taking at least one node included in the intersection operation result as at least one matching node of the first node.

In one possible implementation, the apparatus further includes: the switching module is used for sequentially determining at least one node corresponding to each node in the first query graph in the first data graph from a starting node of the first query graph based on the breadth-first search sequence, and storing the matching result of each node in the first query graph; if after storing at least one node corresponding to the node currently being queried in the first data graph, the storage space occupied by the node corresponding to each matched node reaches a storage threshold value, determining a first node in the matched nodes in the first query graph; determining a matching node of each node of the first query graph according to the depth-first search order based on the first node; and determining a isomorphic subgraph of the first query graph in the first data graph based on the matching result of each node in the first query graph.

In one possible implementation, the apparatus further includes: the first merging module is used for merging target nodes in the second query graph and obtaining the first query graph according to merging results, wherein the target nodes are a plurality of nodes which meet target conditions in the second query graph, the target conditions are that the nodes have the same label and neighbor nodes with the same data volume, and the labels of the neighbor nodes of different nodes in the nodes are the same; and the subgraph determining module is used for determining the isomorphic subgraph of the second query graph in the first data graph.

In one possible implementation, the apparatus further includes: the second merging module is used for merging the target nodes in the second data graph and obtaining the first data graph according to a merging result, wherein the target nodes are a plurality of nodes which meet target conditions in the second data graph, the target conditions include that the plurality of nodes have the same label and neighbor nodes with the same data volume, and the labels of the neighbor nodes of different nodes in the plurality of nodes are the same; and the subgraph determining module is used for determining the isomorphic subgraph of the first query graph in the second data graph.

In a possible implementation manner, the first merging module is configured to determine a plurality of target nodes in the second query graph based on the labels of the nodes in the second query graph and the connection relationship between the nodes in the second query graph; and merging all the target nodes to obtain a first query graph according to a merging result.

In a possible implementation manner, the first merging module is configured to determine a plurality of target nodes in the second query graph based on the labels of the nodes in the second query graph and the connection relationship between the nodes in the second query graph; merging all leaf nodes in the plurality of target nodes, and merging non-leaf nodes in the plurality of target nodes based on a first qualifying condition; and determining a first query graph according to the merging result.

In a possible implementation manner, the second merging module is configured to determine a plurality of target nodes in the second data graph based on the labels of the nodes in the second data graph and the connection relationship between the nodes in the second data graph; and merging the target nodes based on a second limiting condition, and determining a first data graph according to a merging result.

In another aspect, a computer device is provided, where the electronic device includes a processor and a memory, where at least one program code or instruction is stored in the memory, and the at least one program code or instruction is loaded and executed by the processor, so as to enable the computer device to implement any one of the above methods for obtaining a homogenous sub-graph.

In another aspect, a computer-readable storage medium is provided, in which at least one program code or instruction is stored, the program code or instruction being loaded and executed by a processor to make a computer implement any of the above methods for obtaining a homogenous subgraph.

In another aspect, a computer program or a computer program product is provided, in which at least one computer instruction is stored, and the at least one computer instruction is loaded and executed by a processor, so as to enable a computer to implement any one of the above methods for obtaining a homogenous sub-graph.

According to the technical scheme provided by the embodiment of the application, the query matching mode of the first data graph is further determined for the first node meeting the first condition in the first query graph, and a better query sequence can be selected, so that repeated matching operation is reduced, and the efficiency and the flexibility of obtaining the isomorphic graph are improved.

Drawings

Fig. 1 is a schematic diagram of a query graph according to an embodiment of the present application;

fig. 2 is a flowchart of a method for obtaining an isomorphic sub-graph according to an embodiment of the present application;

FIG. 3 is a diagram of a data graph provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a merged query graph according to an embodiment of the present application;

FIG. 5 is a schematic diagram of another query graph provided in an embodiment of the present application;

fig. 6 is a schematic diagram of an apparatus for obtaining isomorphic subgraphs according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram of an apparatus for obtaining isomorphic subgraphs according to an embodiment of the present disclosure;

fig. 8 is a schematic diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the technical method in the present application better understood, some embodiments of the present application will be described below with reference to the accompanying drawings. It is to be understood that, unless otherwise defined, all technical terms used in the embodiments of the present application have the same meaning as commonly understood by one of ordinary skill in the art, and are used only for the purpose of explanation of the embodiments of the present application, and are not intended to limit the present application.

In order to better understand the method provided by the embodiments of the present application, some definitions referred to in the relevant contents of the present application will be introduced first.

Definition 1, figure: a graph may be represented as G ═ V, E, L, where V is the set of all nodes,

is the set of all undirected edges, and L is a function that assigns labels to nodes and/or edges.

Illustratively, referring to fig. 1, the query graph in fig. 1 has 5 nodes, and there are links (i.e., edges) between some nodes, where "A, B, C" represents the label, i.e., attribute, of each node, and the label is used to indicate some property corresponding to the node. "u" is a unit₀、u₁、u₂、u₃、u₄"denotes the identity of each node, for example, the identity of a node may be the name of the node to uniquely identify the node.

Definition 2, subgraph: given a graph G ═ { V, E, L }, the subgraph of G can be represented as G '═ { V', E ', L' }, where the set of points V 'and the set of edges E' are subsets of V and E, respectively, can be represented as

And

also for the purpose of the tag function,

defining 3, deriving a subgraph: given a graph G ═ { V, E, L }, G' is the derived subgraph of G if and only if: according to the subgraph definition, G' is the subgraph of G;

v₂∈V′，

wherein v is₁，v₂Which are the nodes of the graph G,

indicating node v₁，v₂The edge in between.

Definition 4, graph isomorphism: given two graphs H and G, H and G are isomorphic and only if:

there is a bijective function f (denoted f: v (H) → v (G)) between the sets of points of H and G such that:

f (u) e V (G) and L_H(u)＝L_G(f(u))

And when the edge is provided with a label

And when the edge is provided with a label

Defining 5, search problem of subgraph isomorphism: given a query graph Q and a data graph G, the search problem of sub-graph isomorphism requires finding all sub-graphs G' in G that are isomorphic with Q. That is, at least one isomorphic sub-graph G' of the query graph Q is obtained in the data graph G. Where G' may be referred to as a match for Q.

The embodiment of the application provides a method for acquiring isomorphic subgraphs, which is applied to determining each isomorphic subgraph of a query graph in a data graph. Referring to fig. 2, the method for obtaining a isomorphic sub-graph disclosed in the embodiment of the present application includes, but is not limited to, the followingsteps 201 to 205.

Step 201: a first node of a current query is determined among the nodes of the first query graph.

Based on the above definition of the graph, the first query graph includes a plurality of nodes and a plurality of edges for connecting the nodes. The process of obtaining the isomorphic subgraph of the first query graph in the embodiment of the application comprises the following steps: and determining corresponding matching nodes of the first query graph in the first data graph, and determining the isomorphic sub-graph of the first query graph in the first data graph based on the matching nodes of the first query graph. The matching node of any node of the first query graph refers to a corresponding node with the same structure of the any node in the first data graph. Illustratively, the same structure indicates that any node of the first query graph has the same label as the corresponding matching node, and the number and label of all neighboring nodes connected by the edge of any node in the first query graph are also the same as the number and label of all neighboring nodes connected by the matching node in the first data graph.

In which the first node of the current query in the first query graph is determined based on the depth-first search order, and then the following

step

202 and 205 are performed. In a possible implementation manner, before thestep 201, the method further includes querying the first query graph based on a breadth-first search order, where the breadth-first search order indicates that, starting from a root node in the first query graph, querying is performed according to a hierarchy of nodes, that is, after all nodes in the same hierarchy are queried, querying is performed on nodes in a next hierarchy, and a queried matching result is stored in a query process, so that, in a process of querying and matching according to the breadth-first search order, an occupation of a memory by a matching result of a node is continuously increased. In this possible implementation, the process of obtaining a isomorphic sub-graph includes:

based on the width-first search sequence, sequentially determining at least one node corresponding to each node in the first query graph in the first data graph from the initial node of the first query graph, and storing the matching result of each node in the first query graph; if after storing at least one node corresponding to the node currently being queried in the first data graph, the storage space occupied by the node corresponding to each matched node reaches a storage threshold value, determining a first node in the matched nodes in the first query graph; determining a matching node of each node of the first query graph according to the depth-first search order based on the first node; and determining a isomorphic subgraph of the first query graph in the first data graph based on the matching results of the nodes in the first query graph. The process of obtaining the isomorphic subgraph is described in the following steps 2011-2014.

Step 2011: and sequentially determining at least one node corresponding to each node in the first query graph in the first data graph from the initial node of the first query graph based on the width-first search order, and storing the matching result of each node in the first query graph.

The starting node is determined in each node of the first query graph, and the determining manner of the starting node is not limited in the embodiment of the present application, and the starting node may be determined based on logic in an adopted algorithm. A node in the first query graph that is querying is determined, and at least one node in the first data graph that corresponds to the node that is querying is determined. In the process, the matching result of the node which is queried in the first query graph is stored in the memory. The nodes in the first data graph corresponding to the nodes queried in the first query graph refer to the nodes in the first data graph having the same labels as the nodes queried.

In a possible implementation manner, before this step 2011, the method further includes: the scale of the search space in the process of obtaining the isomorphic subgraph is estimated based on the first query graph and the first data graph, and the step 2011 is executed if the scale of the search space is smaller than the size of the memory. The search space is the space occupied by the matching results of each node in the first query graph in the storage width-first search order.

Step 2012: and if the storage space occupied by the corresponding node of each matched node reaches the storage threshold after the corresponding at least one node of the current query node in the first data graph is stored, determining the first node in the matched nodes in the first query graph.

In this possible implementation manner, the nodes are queried and matched in the first query graph, the matching result is stored in the content continuously, and the occupation of the matching result on the memory space is increased continuously as the nodes in the first query graph are queried and matched continuously. In a possible implementation manner, if the occupation of the storage space in the memory reaches the storage threshold after the node currently performing query is at least one node corresponding to the node in the first data graph, that is, after the matching result of the node currently performing query is stored, the first node currently performing matching is determined among the nodes already matched in the first query graph. Wherein the storage threshold is used to define a maximum value of the amount of data stored in the memory. Optionally, the storage threshold is an actual size of the memory storage space, or the threshold may be a maximum value of the amount of data allowed to be stored in the memory, which is set empirically.

Step 2013: based on the first node, determining a matching node of each node of the first query graph according to the depth-first search order.

In this step, in the case where the memory space is largely occupied, the first node is determined, and based on the first node, the matching node of each node of the first query graph is determined in the depth-first search order. The determination may be made with reference to step 202-205, described below.

Step 2014: and determining a isomorphic subgraph of the first query graph in the first data graph based on the matching result of each node.

In a possible implementation manner, in the process of querying according to the depth-first search order, the memory continuously releases the storage space, and when the current storage space is lower than the storage threshold, the manner of querying according to the width-first search order can be recovered. In this possible implementation manner, in the process of obtaining the isomorphic subgraph, the alternating use of the width-first search order and the depth-first search order is realized.

In practical applications, the storage device for storing the nodes corresponding to the nodes of the first query graph in the first data graph may be an L3 memory in a central processor of a computer. In practical application, there are various algorithms to implement breadth-first search or depth-first search, and the embodiment of the present application does not limit the specific form of the algorithm.

In a possible implementation manner, before this step, based on the label of each node in the first query graph and the label of each node in the first data graph, at least one candidate node corresponding to any node in the first query graph is determined in the first data graph, where the candidate node refers to a node in the first data graph that has the same label as any node in the first query graph, and at least one candidate node of any node in the first query graph constitutes a candidate set of the any node. In this possible implementation manner, each node of the first query graph determines a matching node in the first data graph, and may be performed based on a candidate set corresponding to the node. In the process of obtaining the isomorphic subgraph, the candidate nodes of each node can be updated based on the matching conditions of other nodes.

In one possible implementation, beforestep 201, a first query graph is first determined based on the second query graph and/or a first data graph is determined based on the second data graph. In the case where a second query graph exists, the final determination is the isomorphic subgraph of the second query graph.

Based on this, step 201 is preceded by: and merging the target nodes in the second query graph, and obtaining the first data graph according to a merging result. The target nodes refer to a plurality of nodes which meet target conditions in the second query graph; the target condition is satisfied when the plurality of nodes have the same label and the same data amount, and the labels of the neighbor nodes of different nodes in the plurality of nodes are the same. In this possible approach, a homogenous sub-graph of the second query graph in the first data graph is finally determined.

Step 201 is preceded by: and for the target node in the second data graph, obtaining the first data graph according to the merging result. The target nodes refer to a plurality of nodes meeting target conditions in the second data graph; the target condition is satisfied when the plurality of nodes have the same label and the same data amount, and the labels of the neighbor nodes of different nodes in the plurality of nodes are the same. In this possible approach, a homogenous sub-graph of the first query graph in the second data graph is finally determined.

In one possible approach, beforestep 201, the target nodes in the second query graph and the second data graph are merged, respectively, and then the isomorphic subgraph of the second query graph in the second data graph is finally determined.

Beforestep 201, in the process of obtaining the first query graph, a plurality of target nodes in the second query graph are determined based on the labels of the nodes in the second query graph and the connection relationship between the nodes.

In one possible implementation, all the target nodes are merged, and the first query graph is obtained according to the merging result. Based on a plurality of nodes in the second query graph, in a possible implementation manner, there are a plurality of groups of similar nodes with different labels, illustratively, the labels of the nodes labeled 1, 2, and 5 in the second query graph are a, and the three nodes are all connected with one node labeled 3; in the second query graph, the labels of the nodes marked as 11, 12 and 16 are B, and the labels of the respective connection points of the 3 nodes are also the same, so that in the first query graph, the nodes 1, 2 and 5 are similar nodes to each other, the nodes 11, 12 and 16 are similar nodes, and in the merging process, the similar nodes with different labels are respectively merged.

In a possible implementation manner, the second query graph is a tree graph, and based on the property of the tree graph, each node of the second query graph may be divided into a leaf node and a non-leaf node, and then the target node in the second query graph may have both the leaf node and the non-leaf node. All leaf nodes in the target node are merged and non-leaf nodes in the target node are merged based on the first qualifying condition. Illustratively, the first defining conditions are as follows:

let SU represent the set of target nodes to be merged, | SU | represent the number of target nodes in SU, based on the properties of target nodes, no edges are connected between target nodes or a group is formed between target nodes. If SU is a clique, the more nodes, the stronger the pruning power. Whether to merge the target nodes is determined based on the following equations 1 and 2. Wherein, the group is composed of a plurality of nodes which are connected with each other pairwise.

Wherein c (SU) indicates the candidate set size, | SU | indicates the number of target nodes, isclique (SU) indicates whether the target node is clustered, wherein c (SU) indicates the candidate set of nodes in SU, "if SU for a clique" indicates "if the target node is clustered," isclique (SU) takes a value of 1 at this time, "others" indicates "otherwise," i.e., if the target node is not clustered, isclique (SU) takes a value of 0 at this time. The value of turbo (SU) is used to determine whether to perform the combining process, and may be referred to as a first combined value, and the smaller the value of turbo (SU) finally calculated, the more SU should be combined.

In a possible implementation manner, in the second query graph, the target node is a similar node that satisfies the target condition, and a node obtained after merging is called a virtual node.

Beforestep 201, in the process of obtaining the first data diagram, determining a plurality of target nodes in the second data diagram based on the labels of the nodes in the second data diagram and the connection relationship between the nodes; and merging the target nodes based on the second limiting condition, and obtaining a first data graph according to a merging result. Wherein the content of the second limiting condition is as follows:

wherein, SV represents a similar node set to be merged, | SV | indicates the number of target nodes, and d (SV) represents the reading of the nodes in the SV; IsClique (SV) indicates whether the target node is clustered, the determination mode of the value of IsClique (SV) is similar to the above formula 1, the clustering of similar nodes is 1, and otherwise, the value is 0. Based on the above formula, the value of boost (SV) is used to determine whether to perform the merging process, which may be referred to as a second merged value, and the larger the value of boost (SV), the more SV should be merged.

In this possible implementation, where the set of SVs is a clique, the number of edges that can be saved is:

(| SV | -1) × [ d (SV) - | SV | +1] + | SV | × (| SV | -1)/2 | (| SV | -1) [ d (SV) +0.5-0.5| SV | ] (formula 4)

In one possible implementation, where there is no clique between nodes in the set of SVs, i.e., there is no edge between nodes, the number of edges saved is (| SV | -1) × d (SV).

In a possible implementation manner, in the second data graph, the target node is an equivalent node that satisfies the target condition, and a node obtained after merging is called a super point.

In a possible implementation manner, before query, nodes of a second query graph are merged, and then a isomorphic sub graph of the second query graph in a first data graph is finally determined; in a possible implementation manner, before query, nodes in the second data graph are merged, and then an isomorphic subgraph of the first query graph in the second data graph is finally determined; in a possible implementation manner, before the query is performed, the nodes in the second query graph and the second data graph are merged, and then the isomorphic sub-graph of the second query graph in the second data graph is finally determined.

Step 202: and in response to the first node meeting a first condition, determining at least one matched node of the first node, wherein the meeting of the first condition means that all neighbor nodes of the first node have already determined the matched node, and the matched node of any node of the first query graph means that the any node corresponds to a node with the same structure in the first data graph.

In a possible implementation, the first node is preceded by a node that has been matched and a matching node is determined, and from the initial matching step, the nodes of the first query graph and the matching node form matching pairs, and the matching pairs form a matching set. Wherein the matching set can be set when the first query graph is not queried and set as an empty set.

Based on the first node determined instep 201, it is determined in thisstep 202 whether the first node satisfies a first condition. The first node satisfying the first condition may be referred to as a complete node. Satisfying the first condition means that all neighboring nodes of the first node have determined corresponding matching nodes, i.e., all neighboring nodes of the first node have completed matching before the first node. In a possible implementation manner, there is a node that has been matched before the first node, and a matching set is set, and if the first condition is satisfied, all the neighboring nodes of the first node belong to the matching set.

Illustratively, referring to the query graph presented in FIG. 1, queries are performed in a depth-first search order, i.e., in u₀、u₁、u₃、u₄、u₂Inquiring each node of the inquiry graph, wherein the current inquiry node is u₂The neighbor node of the node is u₀、u₁Based on the query sequence, it can be determined that all the neighboring nodes of the node are matched and the corresponding matched node is determined, then the node u₂Satisfying the first condition may be referred to as a complete node.

In one possible implementation manner, in response to the first node satisfying the first condition, the determining at least one matching node of the first node may include: determining all neighbor nodes of the first node in the first query graph, and acquiring the matched nodes of all the neighbor nodes; determining neighbor nodes of all matched nodes based on the first data graph, and obtaining a set of neighbor nodes of any matched node based on the neighbor nodes of any matched node; and performing set intersection operation on the neighbor nodes of each matching node, and taking at least one node included in the intersection operation result as at least one matching node of the first node.

Exemplarily, referring to fig. 1, u is determined₂To satisfy the firstThe first node of the condition, i.e. the perfect node, then in this way the node u is determined₂Is based on u₂Of neighbor node u₀And u₁Determining the matched node; referring to the data diagram G given in FIG. 3, node u₀And u₁The corresponding matching nodes are respectively v₀And v₁₀₀。

Based on data graphs G, v₀Of neighboring nodes { v } of neighboring nodes₁，v₂，v₁₀₀，v₂₀₂}，v₁₀₀Of neighboring nodes { v } of neighboring nodes₀，v₂₀₀，v₂₀₁，v₂₀₂Performing intersection operation on the set of the neighbor nodes of the two matched nodes to obtain an intersection operation result { v }₂₀₂H, the intersection operation result includes a node v₂₀₂As node u₂The matching node of (2).

In a possible implementation manner, if the first node does not satisfy the first condition, a candidate set corresponding to the first node is determined, and a node satisfying the matching condition is determined as a matching node of the first node in the candidate set. The matching condition is satisfied: the connection relationship between the first node and the node in the first data graph, which has completed matching (the edge between the first node and the node, which has completed matching), exists in the connection relationship between the matching node and the node in the first data graph, which has been determined as the matching node (the edge between the matching node and the other matching node).

Step 203: and sequentially determining the matching nodes of a second node based on the query sequence of each node in the first query graph, wherein the second node is the node which is used for determining the matching nodes in the nodes of the first query graph.

After at least one matching node of the first node is determined, determining a next node to be queried in the first query graph according to the query sequence of the depth-first search, and determining one matching node of the node based on the candidate set of the node in response to the first node not meeting the first condition. The determination process may refer to the description of this case instep 203 above, and then determine the next node to query.

Therefore, in this step, the second nodes of the matching nodes in the first query graph are not determined according to the query sequence of the depth-first search, and the number of the second nodes is at least one.

In a possible implementation manner, in the process of sequentially determining the matching nodes of the second nodes, when one second node satisfies the first condition, at least one matching node of the second node is determined based on the manner instep 202; then, based on the manner ofstep 203, a matching node which is subsequently the second node for matching is determined.

In a possible implementation manner, before determining a second node, it is determined whether each node of the first query graph is queried completely, if so, the second node is not determined, and step 204 is executed to obtain an isomorphic sub-graph; otherwise, a second node is determined, and then a matching node for the second node is determined. The exemplary method for judging whether the query of each node of the first query graph is completed includes: under the condition that a matching set exists, judging whether the number of matching pairs in the matching set is the same as the number of nodes in the first data graph or not, and if so, judging whether all the nodes of the first query graph are queried completely; otherwise, continuing to determine the second node.

Step 204: and matching the first node with at least one matching node of the first nodes in response to the second nodes determining the matching nodes.

In this step, each node of the first query graph is matched with a node of the first data graph, and other nodes except the first node determine a matching node, and then the first node is matched with a node of at least one matching node of the first node. In one possible implementation, at least one matching node of the first node forms a set, and step 204 is to expand the set so that the first node is matched with nodes in the set respectively.

Step 205: based on the matching result, a homogeneous subgraph of the first query graph in the first data graph is determined.

In the above process, all nodes except the first node in the first query graph determine corresponding matching nodes, and based on the matching result of the first node and the at least one matching node in theabove step 204, at least one isomorphic sub-graph of the first query graph in the first data graph is determined.

In one possible implementation, there may be a plurality of nodes satisfying the first condition in the first query graph, and each node satisfying the first condition determines at least one matching node with reference to step 202. In this possible implementation manner, each node satisfying the first condition is respectively matched with at least one corresponding matching node, and then at least one isomorphic sub-graph of the first query graph in the first data graph is determined based on a matching result of each node satisfying the first condition. Wherein, there may be multiple matching results of a node satisfying the first condition, and then determining at least one isomorphic sub-graph of the first query graph in the first data graph based on the matching results of the nodes satisfying the first condition includes: and performing full-permutation processing on the matching results of the nodes meeting the first condition, enumerating the combination condition of the matching results, and obtaining at least one isomorphic subgraph of the first query graph in the first data graph.

The order of the above implementation steps is to explain the method for obtaining a homogenous sub-graph provided in the embodiment of the present application, and is not limited to various possible implementations. In practical applications, the above various possible implementations may be combined based on actual situations. In a possible implementation manner, after the query graph and/or the data graph are merged, in the query process, it may be performed that whether the first node currently queried satisfies the first condition is not determined, but the matching is directly performed in a manner that the first node does not satisfy the first condition.

According to the method for obtaining the isomorphic sub-graph, the query matching mode of the first data graph is further determined by judging the condition met by the first node of the current query, a better query sequence can be selected, repeated matching operation is reduced, and the efficiency and flexibility of the method for obtaining the isomorphic sub-graph are improved. In addition, the nodes in the query graph and the data graph can be merged, so that the number of the nodes in the query graph and the data graph is simplified, the efficiency of the method for acquiring the isomorphic subgraph is improved, and the query time is reduced; and various query and search sequences can be mixed for use, the utilization of the storage space is optimized, and the flexibility of the method for acquiring the isomorphic subgraph is improved.

In order to better understand the method for acquiring the isomorphic subgraph provided by the embodiment of the application, the method for acquiring the isomorphic subgraph is described based on the above embodiment.

In an exemplary embodiment, a conventional method for obtaining isomorphic subgraphs based on a depth-first search order is described based on a query graph Q in fig. 1 and a data graph G in fig. 3, and the method includes the following steps 301 to 329.

Step 301, a matching set is set, and the matching set is used for storing each match determined in the query process. The partial match set is denoted as M, and M is initialized as an empty set.

Step 302, determining at least one candidate node corresponding to each node in the query graph Q in the data graph G, and determining a candidate set corresponding to each node in the query graph Q based on the obtained candidate nodes, wherein the candidate set is represented as C (u)_i) Wherein u is_iIs the identification of the nodes in the query graph. The nodes in the candidate set and the nodes corresponding to the candidate set have the same label, the nodes in the candidate set are the nodes in the data graph, and the nodes corresponding to the candidate set are the nodes in the query graph.

Illustratively, node u in the query graph₀The corresponding candidate node in the data graph G is v₂₀₃And v₀Then node u₀Is C (u)₀)＝{v₀，v₂₀₃}; then the candidate sets corresponding to other nodes in the query graph Q are:

C(u₁)＝{v₁，v₂，v₁₀₀}

C(u₂)＝C(u₃)＝C(u₄)＝{v₁₀₁，v₁₀₂，v₂₀₀，v₂₀₁，v₂₀₂}

step 303, comparing the number of the matching pairs in the matching set M with the number of the nodes in the query graph, and if the number of the matching pairs in the matching set M is equal to the number of the nodes in the query graph, outputting the matching pairs in the matching set to obtain a isomorphic subgraph of the query graph; otherwise, continuing to execute the step of searching and matching. The number of matching pairs in the matching set is expressed as | M |, and the number of nodes in the query graph is expressed as | V (Q) |.

Exemplarily, in this embodiment, the search matching step is not performed yet, the number of matches in the matching set M is 0, the number of nodes in the query graph Q is 5, and the subsequent search matching step is continued if the number is different.

Step 304, determining a first query node of the current query in the query graph, updating a candidate set corresponding to the first query node to obtain an updated candidate set, where the updated candidate set may be denoted as C_R(u_i)。

Illustratively, the first query node u₀After updating the candidate set, the first query node u₀The corresponding candidate node is not changed, and the content of the updated candidate set is the same as that before the update, namely C_R(u₀)＝C(u₀)＝{v₀，v₂₀₃}。

Step 305, traverse the first query node u₀Determining one of the matching nodes in the updated candidate set.

Exemplarily, in the embodiment of the present application, v is determined first₀As u₀The matching node of (2).

Step 306, obtaining a matching pair based on the first query node and the matching node of the first query node, and adding the matching pair to the matching set to obtain an updated matching set.

Illustratively, u is determined₀Has a matching node of v₀To obtain a matching pair of (u)₀，v₀) (ii) a Adding the matching pair into a matching set, wherein the updated matching set is as follows: m { (u)₀，v₀)}。

Step 307, comparing the number of the matching pairs in the matching set with the number of the nodes in the query graph, and determining the subsequent steps to be executed.

In the present exemplary embodiment, when | M | ═ 1, | v (q) | ═ 5, and the number of matching pairs in the matching set is not the same as the number of nodes in the query graph, the search matching operation is continued.

And 308, determining a second query node in the data graph according to the matching sequence, and updating the candidate set corresponding to the second query node to obtain an updated candidate set.

In the exemplary embodiment, the second query node is determined to be one and the first query node u₀Directly connected neighbor node u₁U is the₁The original corresponding candidate set is: c (u)₁)＝{v₁，v₂，v₁₀₀}. Based on the above steps, when v is determined₀In the case of (u)₁If the candidate set of the second query node does not change, the updated candidate set of the second query node is consistent with the candidate set before updating, which is C_R(u₁)＝C(u₁)＝{v₁，v₂，v₁₀₀}。

Step 309, traversing the updated candidate set of the second query node, and determining a matching node corresponding to the second query node, where the matching node is a candidate node satisfying the matching condition in the updated candidate set.

The matching conditions are as follows: an edge between the second query node and a node of the query graph for which a matching node has been determined (a node of the query graph for which a matching set has been stored) also exists between the matching node and a node that has been determined to be a matching node (a matching node in a match for which a partial match set has been stored). The target condition may be expressed as:

and u and v indicate the query node and the post-candidate node which are currently subjected to search matching.

In the embodiment of the present application, it is assumed that the first candidate node to search for a match in the updated candidate set is v₁₀₀Based on v₁₀₀The matching conditions are met:

then a candidate node v may be determined₁₀₀Is u₁The matching node of (2).

And 310, obtaining a matching pair based on the second query node and the matching node of the second query node, and adding the matching pair into the matching set to obtain an updated matching set.

Illustratively, a second query node u is determined₁Corresponding matching node is v₁₀₀To obtain a matching pair of (u)₁，v₁₀₀) (ii) a Adding the match to the current match set, the updated match set is: m { (u)₀，v₀)，(u₁，v₁₀₀)}。

Step 311, determining that | M | ═ 2 and | v (q) | ═ 5 currently, the number of matches in the matching set is different from the number of nodes in the query graph, and continuing to perform the search matching operation.

And step 312, determining a third query node in the data graph according to the matching sequence, and updating the candidate set corresponding to the third query node to obtain an updated candidate set.

In the exemplary embodiment, the third query node is determined to be one and the second query node u₁Directly connected neighbor node u₃，u₃The original candidate set is: c (u)₃)＝{v₁₀₁，v₁₀₂，v₂₀₀，v₂₀₁，v₂₀₂}. Based on the steps, a second query node u is determined₁Has a matching node of v₁₀₀Based on the data map G, u₃Will not match to the candidate node v₁₀₁And v₁₀₂Then get the updated candidate set as C_R(u₃)＝{v₂₀₀，v₂₀₁，v₂₀₂}。

Step 313, determining a third query node u in the updated candidate set₃Assuming that the first candidate node for search matching in this step is v₂₀₀，v₂₀₀Satisfy the matching condition

Then a third query node u is determined₃Has a matching node of v₂₀₀。

Step 314, based on the third query node u₃And matching node is v₂₀₀Form a new matching pair of (u)₃，v₂₀₀) If the matching pair is added to the current matching set M, the updated partial matching set is: m { (u)₀，v₀)，(u₁，v₁₀₀)，(u₃，v₂₀₀)}。

Step 315, determining that | M | ═ 3 and | v (q) | ═ 5 currently, the number of matches in the matching set is different from the number of nodes in the query graph, and continuing to execute the search matching operation.

Step 316, determine the fourth query node u₄Determining u based on the above matching process₄The corresponding updated candidate set is C_R(u₄)＝{v₂₀₁，v₂₀₂}。

Step 317, suppose that the first candidate node which is currently subjected to search matching is v₂₀₁，v₂₀₁Satisfy the matching condition

Then a fourth query node u is determined₄Has a matching node of v₂₀₁。

Step 318, based on u₄And v₂₀₁Form a new matching pair of (u)₄，v₂₀₁) If the match is added to the current matching set, the updated matching set is: m { (u)₀，v₀)，(u₁，v₁₀₀)，(u₃，v₂₀₀)，(u₄，v₂₀₁)}。

Step 319, determining that the current | M | ═ 4 and | v (q) | ═ 5 are different in number, and continuing to execute the search matching operation.

Step 320, determine the fifth query node u₂Determining u based on the above matching process₂Corresponding toThe updated candidate set is C_R(u₂)＝{v₂₀₂}。

Step 321, v₂₀₂Satisfy the matching condition

Then a fifth query node u is determined₂Has a matching node of v₂₀₂。

Step 322, based on u₂And v₂₀₂Form a new matching pair of (u)₂，v₂₀₂) If the match is added to the current matching set M, the updated matching set is: m { (u)₀，v₀)，(u₁，v₁₀₀)，(u₃，v₂₀₀)，(u₄，v₂₀₁)，(u₂，v₂₀₂)}。

Step 323, judging that | M | ═ 5, | v (Q) | ═ 5, at this time, the number of matches in the matching set is the same as the number of nodes in the query graph, and outputting all matching pairs in M to obtain an isomorphic subgraph of the data graph Q. I.e. (u)₀，v₀)，(u₁，v₁₀₀)，(u₃，v₂₀₀)，(u₄，v₂₀₁)，(u₂，v₂₀₂)。

Step 324, release matching pairs (u) in M₄，v₂₀₁)，(u₂，v₂₀₂) Proceed back to step 316.

Step 325, search for v in the query candidate set₂₀₂Whether a matching condition is satisfied, and in response to it satisfying a target condition, determining v₂₀₂For the fourth query node u₄Generating a new matching pair by the matching nodes, and updating the matching set, wherein the updated matching set is as follows: m { (u)₀，v₀)，(u₁，v₁₀₀)，(u₃，v₂₀₀)，(u₄，v₂₀₂)}。

Step 326 determines that the current | M | ═ 4, | v (q) | ═ 5, and the number is different, and continues to perform the search matching operation.

Step 327, determine the fifth query node u₂Determining u based on the above matching process₂The corresponding updated candidate set is C_R(u₂)＝{v₂₀₁}. At this time, node v₂₀₁The matching condition is not satisfied.

Step 328, release (u) of M₄，v₂₀₂) And (u)₃，v₂₀₀) Proceed back to step 312.

Step 329, search if other candidate nodes in the candidate set are with the third query node u₃And matching, and executing corresponding search query steps according to the content.

In the subsequent process, a search query mode and a backtracking query mode are continuously executed, all combination conditions of matching between each query node and the candidate nodes in the query graph are enumerated, and isomorphic subgraphs of all the query graphs are obtained. In this exemplary embodiment, in addition to the isomorphic sub-graph obtained in step 323, another isomorphic sub-graph is obtained in the subsequent query process, where the matching pairs output in the matching set are: (u)₀，v₀)，(u₁，v₁₀₀)，(u₃，v₂₀₁)，(u₄，v₂₀₀)，(u₂，v₂₀₂)。

In an exemplary embodiment, the nodes in the query graph Q are merged, and the merged nodes may be referred to as virtual nodes.

Illustratively, based on the query graph Q of FIG. 1 above, where u₃And u₄For similar nodes, u is₃And u₄Merge to get virtual node u', get updated query graph Q as shown in FIG. 4₁Then, a homogeneous subgraph process of the query graph Q is obtained, which may include the following steps 401 and 420.

Step 401, a matching set is set, and the matching set is used for storing each matching determined in the query process. The partial match set is denoted as M, and M is initialized as an empty set.

Step 402, determining an updated query graph Q in the data graph G₁At least one candidate node corresponding to each node, and determining a candidate set corresponding to each node based on the obtained candidate nodes, wherein the candidate set is represented as C (u)_i) Wherein u is_iIs the identity of the node.

Illustratively, node u₀The corresponding candidate node in the data graph G is v₂₀₃And v₀Then node u₀Is C (u)₀)＝{v₀，v₂₀₃}; then the candidate sets corresponding to other nodes in the query graph Q are:

C(u₁)＝{v₁，v₂，v₁₀₀}

C(u₂)＝C(u′)＝{v₁₀₁，v₁₀₂，v₂₀₀，v₂₀₁，v₂₀₂}

in the present exemplary embodiment, node u is paired₀And u₁The search matching process of (1) comprises steps 403-411 which are the same as steps 303-311 of the method described above.

Step 412, determining a third query node in the data graph according to the matching sequence, and updating the candidate set corresponding to the third query node to obtain an updated candidate set.

In the exemplary embodiment, the third query node is determined to be one and the second query node u₁The original candidate set of the directly connected neighbor nodes u ', u' is: c (u') ═ v₁₀₁，v₁₀₂，v₂₀₀，v₂₀₁，v₂₀₂}. Based on the steps, a second query node u is determined₁Has a matching node of v₁₀₀Based on the data graph G, u' will not match the candidate node v₁₀₁And v₁₀₂Then get the updated candidate set as C_R(u′)＝{v₂₀₀，v₂₀₁，v₂₀₂}。

Step 413, determine the matching node of the third query node u' in the updated candidate set, traverse the candidate set C_RCandidate node in (u'), v₂₀₀、v₂₀₁And v₂₀₂All satisfy the matching condition. Based on the inclusion of the query node u in the third query node u' (virtual node)₃And u₄In this step, the node u is determined by expansion₃And u₄The matching pair of (2) has the following six possibilities:

the method may comprise the following steps of 1: (u)₃，v₂₀₀)，(u₄，v₂₀₁) (ii) a The method may comprise the following steps of 2: (u)₃，v₂₀₀)，(u₄，v₂₀₂)；

It is possible to have 3: (u)₃，v₂₀₁)，(u₄，v₂₀₀) (ii) a The probability of 4: (u)₃，v₂₀₁)，(u₄，v₂₀₂)；

And possibly 5: (u)₃，v₂₀₂)，(u₄，v₂₀₁) (ii) a And 6: (u)₃，v₂₀₂)，(u₄，v₂₀₀)。

Assume that a possible 1 is selected for subsequent search matching operations.

Step 414, determining the current matching set M as: m { (u)₀，v₀)，(u₁，v₁₀₀)，(u₃，v₂₀₀)，(u₄，v₂₀₁)}。

Step 415, determining that the current | M | ═ 4, | v (q) | ═ 5, the number of matches in the matching set is different from the number of nodes in the query graph, and continuing to execute the search matching operation.

Step 416, determine the next query node u₂Determining u based on the above matching process₂The corresponding updated candidate set is C_R(u₂)＝{v₂₀₂}。

Step 417, v₂₀₂Satisfy the matching condition

Then a fifth query node u is determined₂Has a matching node of v₂₀₂。

Step 418, based on u₂And v₂₀₂Form a new match of (u)₂，v₂₀₂) To prepare theAnd matching and adding the current matching set, wherein the updated matching set is as follows: m { (u)₀，v₀)，(u₁，v₁₀₀)，(u₃，v₂₀₀)，(u₄，v₂₀₁)，(u₂，v₂₀₂)}。

Step 419 determines that the current | M | ═ 5, | v (Q) | ═ 5, and at this time, the number of matches in the matching set is the same as the number of nodes in the query graph, and outputs all the matching pairs in M, so as to obtain an isomorphic subgraph of the data graph Q. I.e. (u)₀，v₀)，(u₁，v₁₀₀)，(u₃，v₂₀₀)，(u₄，v₂₀₁)，(u₂，v₂₀₂)。

Step 420, releasing the matching pair (u) in M₃，v₂₀₀)，(u₄，v₂₀₁)，(u₂，v₂₀₂) Tracing back to step 413, another possibility is determined in step 413, and then a search query operation is performed. After all six possible searches in step 413 are completed, the process returns to step 410, and the search query node forms matched search query results with other candidate nodes. In the process of enumerating all search results in this step, all isomorphic subgraphs of the query graph are obtained.

In the exemplary embodiment, the nodes in the data graph G are merged, and the resulting point after merging is called a super point. The process of the present exemplary embodiment is similar to the above process, and the description of the present exemplary embodiment is omitted here.

In an exemplary embodiment, the points in the query graph Q and the data graph G are merged at the same time.

For the improved method, a method for acquiring the isomorphic subgraph applied during enumeration is as follows: assuming that the current virtual node (node resulting from merging in the data graph) contains k real nodes, the currently considered combination is { (h)_i，n_i) 1 < i < m, where m is the number of different vertices (nodes in the data graph that result from the merge), and n is_iIs the number of times the overshoot hi occurs in this combination. Then the two processes of enumerating the permutations of real nodes in the query graph and enumerating the combination of real nodes in the over-point can be performedIs fused, the effect is equivalent to that for any 1 < i < m, n is taken out from the real node set of the beyond point hi_iAnd combining the nodes without repetition to make a full array.

Illustratively, u in the query graph₁、u₂And u₃Belongs to the same virtual node u', the nodes are connected with each other by edges, and the corresponding candidate set is h₁And h₂，h₁And h₂Is a super point, where h₁Containing node v₁，v₂，h₂Containing node v₃，v₄，v₅. The corresponding candidate node of the virtual node u' is the over point h₁And h₂。

In one possible implementation, the matching of the real node and the over point in the virtual node is determined as follows: m_s＝{(u₁，h₁)，(u₂，h₁)，(u₃，h₂)}. From { v₁，v₂There are 1 fetch for two nodes from { v }₃，v₄，v₅There are three methods for taking one node. And can determine the matching M of the real node over-point in the virtual node_sThere are 3 kinds. Then there is 3! 6 kinds. The method has clear logic, simple flow, easy realization and high process efficiency.

In the above exemplary embodiment, the nodes in the query graph and the data graph are merged, so that the number of the nodes in the query graph and the data graph is simplified, the efficiency of the method for obtaining the isomorphic subgraph is improved, and the query time is reduced.

In an exemplary embodiment, a first node of a current query, which may be referred to as a complete node, satisfies a first condition.

In this exemplary embodiment, a complete node is defined: given a query graph Q and a matching order π, let M_cRepresents the set of matched nodes in the query graph, when the first node u (u ∈ Q) is matched, if N (u) is contained in M_cIf yes, the first node is called a complete node, where N (u) represents a set of neighboring nodes of the first node u. I.e. the neighbor nodes of the first node currently queried in the query graphAnd determining corresponding matching nodes, forming matching pairs and adding a matching set, and then calling the first node of the current query as a complete node. In other words, all edges of the complete node are connected by the matched set.

Illustratively, in one possible implementation, referring to step 320 in the above exemplary embodiment of depth-first search, the fifth query node u is determined in step 320₂. Known based on the search query procedure in the previous step, and u₂Connected neighbor query node u₀And u₁All have determined corresponding matches, the query node u₂I.e. a complete node.

Illustratively, in the case where the first node satisfies the first condition, in this manner, the determination of the matching node for the complete node may be generated based on the matching nodes of the neighboring nodes connected to the complete node.

Illustratively, see step 20 in the exemplary embodiment of the depth-first search backtracking section above, in which node u is determined₂Is a complete node, then in this manner, the node u is determined₂Is based on u₂Neighbor query node u of₀And u₁Is matched with v₀And v₁₀₀The determination is made.

Based on data graphs G, v₀Form a set of neighboring nodes v₁，v₂，v₁₀₀，v₂₀₂}，v₁₀₀Form a set of neighboring nodes v₀，v₂₀₀，v₂₀₁，v₂₀₂Get the intersection of the two sets to get the node v₂₀₂I.e. determining node u₂Has a matching node of v₂₀₂。

In the possible implementation manner, the first node is judged to be a complete node, at least one matching node of the complete node is determined based on the manner, the at least one matching node is stored firstly, then, the second node which is used for matching in the matching query graph is searched continuously, after the matching nodes of all the second nodes are obtained, the at least one matching node of the complete node is expanded and matched with the first node, and a final isomorphic sub-graph is obtained.

Illustratively, based on the query graph Q given in FIG. 5₂And isomorphic query is carried out in a certain data graph by adopting the depth-first search method. At node u₄In the case of the first node of the current query, assume that node u₄Is a complete node (i.e. u)₁And u₃All have determined corresponding matching nodes), u₁And u₃Neighbor node determination u of corresponding matching node₄The matching node of (2).

In an exemplary embodiment, if multiple corresponding matching nodes are determined based on intersection operations, then u is the basis₄Generating a set C of at least one matching node_M. Storing the set, and then sequentially searching subsequent query nodes u₂、u₅Searching and matching are carried out, and the node u is successfully determined₂、u₅In the case of a matching node of (3), C is set_MNode u and node n₃And (4) forming a final matching pair by expanding and matching, and then determining a isomorphic subgraph of the data graph.

In the above exemplary embodiment, by determining the condition that the first node of the current query satisfies, and further determining the query matching manner for the first data graph, a better query sequence may be selected to reduce repeated matching operations, and improve the efficiency and flexibility of the method for obtaining the isomorphic graph.

In an exemplary embodiment, a breadth-first search and a depth-first search are used in combination. See query graph Q in fig. 1 and data graph G in fig. 3. The process of the present exemplary embodiment includes the following steps 1 to 5.

Step 1, determining a candidate set of each node in the query graph Q, which is performed in the same manner as in the foregoing embodiment.

Illustratively, C (u)₀)＝{v₀，v₂₀₃}C(u₁)＝{v₁，v₂，v₁₀₀}

Step 2, determining a first query node u based on the determined candidate set of each node₀The matching pair possibly corresponding is (u)₀，v₀) Or (u)₀，v₂₀₃) (ii) a The two possible matching pairs are stored in a memory. The storage may be in the form of a table.

And 3, determining a matching pair to search and match the subsequent nodes based on the determined candidate set and the possible matching pairs formed in the step 1.

In one possible implementation, a matching pair (u) is selected₀，v₀) Determining at the second query node u₁In (u)₀，v₀) In this case, a matching pair may be generated.

In the matching pair of (u)₀，v₀) In case of u₁The corresponding candidate set is { v }₁，v₂，v₁₀₀And then possible formed matching pairs include: (u)₀，v₁)、(u₀，v₂) And (u)₀，v₁₀₀)；

And 4, when the size of the intermediate table reaches a given memory threshold, indicating that the memory space is excessively occupied. Then the method of depth-first search and backtracking is performed on the matching pairs in each row of the current intermediate table, and independent search is performed based on each path.

And step 5, in the process of depth-first search and backtracking, continuously releasing the memory space, and if the current memory occupation is lower than a threshold value, switching the expansion mode of the current path back to the mode of width-first search and backtracking.

In the above exemplary embodiment, a plurality of query search sequences are mixed, the utilization of the storage space is optimized, the flexibility of the method for obtaining the isomorphic subgraph is improved, and the application range of the method is expanded.

In a possible implementation manner, different data sets are used as query graphs to verify the effectiveness of the method for obtaining the isomorphic subgraph provided by the embodiment of the application. In an exemplary embodiment, Email and DBLP (DataBase systems and Logic Programming) datasets are chosen. The Email data set represents the network formed by Email communication, and has 36692 nodes and 183831 edges, and the maximum degree is 1383. The DBLP dataset represents a document reference network with 317080 nodes and 1049866 edges, with a maximum of 343 degrees. In this exemplary embodiment, both data sets are power-law graphs, and the distribution rule of degrees thereof satisfies the power-law distribution. Specific information for the two Data sets is given in Table 1 below, while information for other Data sets (e.g., Yeast Data Set; Human pose Data Set) is also included in Table 1, but was not selected for validation testing because of the smaller size of the other Data sets.

TABLE 1

Data set	Number of nodes	Number of edges	Number of point tags	Maximum degree	Mean degree of
						Yeast Data Set	3112	12519	179	168	8.0
Human	4674	86282	88	771	36.9
						Email	36692	183831	20	1383	10.0
DBLP	317080	1049866	20	343	6.6

In this exemplary experiment, 100 query graphs were generated for each data set, and the statistical average query time was used as a measure. Where the time limit for one query graph is 10 minutes, the top 10 is found⁵The individual isomorphic subgraphs stop matching.

In practical applications, there are a variety of algorithms that can implement the depth-first search framework, which may include qsi (quicksift), gql (graphql), tbi (turboiso), CFL (CFL-Match), VF3, CECI, DAF. In the exemplary test, the query effect of the method for obtaining the isomorphic subgraph provided by the embodiment of the application is respectively counted and executed on the basis of the algorithm. Tables 2 and 3 below are statistics of the validation performed in the Email dataset and the DBLP dataset, respectively.

TABLE 2

TABLE 3

In tables 2 and 3, the + NECa indicates a query mode in which nodes in the query graph are merged; + ISO^bThe query mode is used for indicating a query mode for merging the data graphs on the basis of merging the query graphs; the + ORDc indicates that a complete node judgment query mode is adopted on the basis of merging the query graph and the data graph; + MIX^dThe method adopts a plurality of searching order mixed searching modes on the basis of the above steps. The acceleration ratios in tables 2 and 3 are calculated by dividing the average query time of the previous query mode by the average query time of the current query mode. As shown in Table 3, on the basis of QSL algorithm, + ISO^bCorresponding acceleration ratio "1.3 ×", average query time by the former query pattern (+ NECA query time 1.9) and average query time by the current query pattern (+ ISO)^b1.5) is obtained by division calculation (1.9/1.5).

Based on the data of the acceleration ratios in table 2 and table 3, it can be known that the method for obtaining the isomorphic subgraph provided by the embodiment of the application can obtain a better acceleration effect.

In an embodiment of the present application, a device for obtaining subgraph isomorphism, referring to fig. 6, includes:

a first determiningmodule 601, configured to determine a first node of a current query among nodes of a first query graph;

a second determiningmodule 602, configured to determine at least one matching node of the first node in response to that the first node satisfies a first condition, where the first condition is that all neighboring nodes of the first node have already determined the matching node, and a matching node of any node of the first query graph is a corresponding node of the same structure of any node in the first data graph;

a third determiningmodule 603, configured to sequentially determine, based on a query sequence of each node in the first query graph, a matching node of a second node, where the second node is a node of the first query graph for which a matching node is not determined;

amatching module 604, configured to match the first node with at least one matching node of the first nodes in response to the second nodes all determining the matching node;

asubgraph determination module 605, configured to determine a homogeneous subgraph of the first query graph in the first data graph based on the matching result.

In a possible implementation manner, the second determiningmodule 602 is configured to determine all neighbor nodes of the first node in the first query graph, and obtain a matching node of each neighbor node; determining neighbor nodes of all matched nodes in the first data graph, and obtaining a set of neighbor nodes of any matched node based on the neighbor nodes of any matched node; and performing intersection operation on the set of the neighbor nodes of each matching node, and taking at least one node included in the intersection operation result as at least one matching node of the first node.

In one possible implementation, referring to fig. 7, the apparatus further includes: a switchingmodule 606, configured to sequentially determine, based on the breadth-first search order, at least one node corresponding to each node in the first query graph in the first data graph from a start node of the first query graph, and store a matching result of each node in the first query graph; if after storing at least one node corresponding to the node currently being queried in the first data graph, the storage space occupied by the node corresponding to each matched node reaches a storage threshold value, determining a first node in the matched nodes in the first query graph; determining a matching node of each node of the first query graph according to the depth-first search order based on the first node; and determining a isomorphic subgraph of the first query graph in the first data graph based on the matching result of each node in the first query graph.

In one possible implementation, the apparatus further includes: afirst merging module 607, configured to merge target nodes in the second query graph, and obtain the first query graph according to a merging result, where the target nodes are multiple nodes in the second query graph that meet a target condition, where meeting the target condition means that the multiple nodes have the same label and neighbor nodes with the same data size, and the labels of the neighbor nodes of different nodes in the multiple nodes are the same; asubgraph determining module 605, configured to determine a isomorphic subgraph of the second query graph in the first data graph.

In a possible implementation manner, the apparatus further includes: asecond merging module 608, configured to merge target nodes in the second data graph, and obtain the first data graph according to a merging result, where the target nodes are multiple nodes in the second data graph that meet a target condition, where meeting the target condition means that the multiple nodes have the same label and neighbor nodes with the same data size, and the labels of the neighbor nodes of different nodes in the multiple nodes are the same; asubgraph determining module 605, configured to determine a isomorphic subgraph of the first query graph in the second data graph.

In a possible implementation manner, thefirst merging module 607 is configured to determine a plurality of target nodes in the second query graph based on the labels of the nodes in the second query graph and the connection relationship between the nodes in the second query graph; and merging all the target nodes to obtain a first query graph according to a merging result.

In a possible implementation manner, thefirst merging module 607 is configured to determine a plurality of target nodes in the second query graph based on the labels of the nodes in the second query graph and the connection relationship between the nodes in the second query graph; merging all leaf nodes in the plurality of target nodes, and merging non-leaf nodes in the plurality of target nodes based on a first qualifying condition; and determining a first query graph according to the merging result.

In a possible implementation manner, thesecond merging module 608 is configured to determine a plurality of target nodes based on the labels of the nodes in the second data graph and the connection relationships between the nodes in the second data graph; and merging the target nodes based on a second limiting condition, and determining a first data graph according to a merging result.

It should be noted that, the device for obtaining sub-graph isomorphism provided in the foregoing embodiment is only illustrated by the division of each functional module, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above.

In another aspect, a computer device is provided, which includes a processor and a memory, where at least one program code or instruction is stored in the memory, and the at least one program code or instruction is loaded and executed by the processor, so as to enable the computer device to implement any one of the above methods for obtaining a homogenous subgraph.

Fig. 8 is a schematic structural diagram of a computer device that may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 801 and one ormore memories 802, where the one ormore memories 802 store at least one program instruction, and the at least one program instruction is loaded and executed by the one ormore processors 801 to implement the method for obtaining an isomorphic graph provided in the foregoing method embodiments. Certainly, the computer device may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the computer device may further include other components for implementing the functions of the device, which is not described herein again. The computer device may be a server, and the embodiment of the present application may be executed in a central processing unit in the server.

It should be understood that, in the embodiments of the present application, the size of the serial number of each process does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

The term "at least one" in this application means one or more, and the term "plurality" in this application means two or more.

It is to be understood that the terminology used in the description of the various examples herein is for the purpose of describing particular examples only and is not intended to be limiting. As used in the description of the various examples and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It will be further understood that the terms "comprises," "comprising," "includes," and/or "including," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should be understood that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.

It is also to be understood that the terms "if" and "if" may be interpreted to mean "when" ("where" or "upon") or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined," or "if [ a stated condition or event ] is detected," may be interpreted to mean "upon determining," or "in response to determining," or "upon detecting [ a stated condition or event ], or" in response to detecting [ a stated condition or event ] ", depending on the context.

It should also be appreciated that reference throughout this specification to "one embodiment," "an embodiment," "one possible implementation" means that a particular feature, structure, or characteristic described in connection with the embodiment or implementation is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" or "one possible implementation" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of obtaining isomorphic subgraphs, the method comprising:

determining at least one matching node of the first node in response to the first node satisfying a first condition, wherein the first condition is that all neighbor nodes of the first node have determined matching nodes, and the matching node of any node of the first query graph is a corresponding node with the same structure in a first data graph of the any node;

matching the first node with at least one matching node of the first nodes in response to the second nodes each determining a matching node;

and determining a isomorphic subgraph of the first query graph in the first data graph based on the matching result.

2. The method of claim 1, wherein determining at least one matching node for the first node comprises:

determining all neighbor nodes of the first node in the first query graph, and acquiring a matched node of each neighbor node;

determining neighbor nodes of all matched nodes in the first data graph, and obtaining a set of neighbor nodes of any matched node based on the neighbor nodes of any matched node;

and performing intersection operation on the set of the neighbor nodes of each matching node, and taking at least one node included in the intersection operation result as at least one matching node of the first node.

3. The method according to claim 1 or 2, wherein before determining the first node of the current query among the nodes of the first query graph, further comprising:

based on a breadth-first search sequence, sequentially determining at least one node corresponding to each node in the first query graph in the first data graph from a starting node of the first query graph, and storing a matching result of each node in the first query graph;

if after storing at least one node corresponding to the node currently performing query in the first data graph, the storage space occupied by the node corresponding to each matched node reaches a storage threshold, determining the first node in the matched nodes in the first query graph;

determining a matching node of each node of the first query graph according to a depth-first search order based on the first node;

and determining a isomorphic sub-graph of the first query graph in the first data graph based on the matching result of each node in the first query graph.

4. The method according to claim 1 or 2, wherein before determining the first node of the current query among the nodes of the first query graph, further comprising:

merging target nodes in a second query graph, and obtaining a first query graph according to a merging result, wherein the target nodes are a plurality of nodes which meet target conditions in the second query graph, the target conditions are that the nodes have the same label and neighbor nodes with the same data volume, and the labels of the neighbor nodes of different nodes in the nodes are the same;

the determining a homogeneous subgraph of the first query graph in the first data graph comprises:

determining a homogenous sub-graph of the second query graph in the first data graph.

5. The method according to claim 1 or 2, wherein before determining the first node of the current query among the nodes of the first query graph, further comprising:

merging target nodes in a second data graph, and obtaining a first data graph according to a merging result, wherein the target nodes are a plurality of nodes which meet target conditions in the second data graph, the target conditions are that the plurality of nodes have the same label and neighbor nodes with the same data volume, and the labels of the neighbor nodes of different nodes in the plurality of nodes are the same;

determining a homogenous sub-graph of the first query graph in the second data graph.

6. The method of claim 4, wherein merging the target nodes in the second query graph to obtain the first query graph according to a merging result comprises:

determining a plurality of target nodes in the second query graph based on the labels of the nodes in the second query graph and the connection relation between the nodes in the second query graph;

and merging all the target nodes, and obtaining the first query graph according to a merging result.

7. The method of claim 4, wherein merging the target nodes in the second query graph to obtain the first query graph according to a merging result comprises:

merging all leaf nodes in the plurality of target nodes, and merging non-leaf nodes in the plurality of target nodes based on a first qualifying condition; and determining the first query graph according to the merging result.

8. The method of claim 5, wherein merging the target nodes in the second data graph to obtain the first data graph according to a merging result comprises:

determining a plurality of target nodes in the second data graph based on the labels of the nodes in the second data graph and the connection relation between the nodes in the second data graph;

and merging the target nodes based on a second limiting condition, and determining the first data graph according to a merging result.

9. An apparatus for obtaining isomorphic subgraphs, the apparatus comprising:

a second determining module, configured to determine at least one matching node of the first node in response to that the first node satisfies a first condition, where the first condition is that all neighboring nodes of the first node have already determined matching nodes, and a matching node of any node of the first query graph is a corresponding node of the any node in the first data graph and having the same structure;

a matching module, configured to match the first node with at least one matching node of the first nodes in response to the second nodes each determining a matching node;

and the subgraph determining module is used for determining the isomorphic subgraph of the first query graph in the first data graph based on the matching result.

10. A computer device, characterized in that the apparatus comprises a processor and a memory, wherein at least one program code or instruction is stored in the memory, and the processor loads and executes the at least one program code or instruction to cause the computer device to implement the method of acquiring a homogenous sub-graph according to any one of claims 1 to 8.

11. A computer-readable storage medium, in which at least one program code or instructions is stored, which is loaded and executed by a processor, to cause a computer to implement the method of retrieving a homogenous sub-graph according to any of claims 1-8.