Detailed Description
In order to make the technical method in the present application better understood, some embodiments of the present application will be described below with reference to the accompanying drawings. It is to be understood that, unless otherwise defined, all technical terms used in the embodiments of the present application have the same meaning as commonly understood by one of ordinary skill in the art, and are used only for the purpose of explanation of the embodiments of the present application, and are not intended to limit the present application.
In order to better understand the method provided by the embodiments of the present application, some definitions referred to in the relevant contents of the present application will be introduced first.
Definition 1, figure: a graph may be represented as G ═ V, E, L, where V is the set of all nodes,
is the set of all undirected edges, and L is a function that assigns labels to nodes and/or edges.
Illustratively, referring to fig. 1, the query graph in fig. 1 has 5 nodes, and there are links (i.e., edges) between some nodes, where "A, B, C" represents the label, i.e., attribute, of each node, and the label is used to indicate some property corresponding to the node. "u" is a unit0、u1、u2、u3、u4"denotes the identity of each node, for example, the identity of a node may be the name of the node to uniquely identify the node.
Definition 2, subgraph: given a graph G ═ { V, E, L }, the subgraph of G can be represented as G '═ { V', E ', L' }, where the set of points V 'and the set of edges E' are subsets of V and E, respectively, can be represented as
And
also for the purpose of the tag function,
defining 3, deriving a subgraph: given a graph G ═ { V, E, L }, G' is the derived subgraph of G if and only if: according to the subgraph definition, G' is the subgraph of G;
v
2∈V′,
wherein v is
1,v
2Which are the nodes of the graph G,
indicating node v
1,v
2The edge in between.
Definition 4, graph isomorphism: given two graphs H and G, H and G are isomorphic and only if:
there is a bijective function f (denoted f: v (H) → v (G)) between the sets of points of H and G such that:
f (u) e V (G) and L
H(u)=L
G(f(u))
And when the edge is provided with a label
And when the edge is provided with a label
Defining 5, search problem of subgraph isomorphism: given a query graph Q and a data graph G, the search problem of sub-graph isomorphism requires finding all sub-graphs G' in G that are isomorphic with Q. That is, at least one isomorphic sub-graph G' of the query graph Q is obtained in the data graph G. Where G' may be referred to as a match for Q.
The embodiment of the application provides a method for acquiring isomorphic subgraphs, which is applied to determining each isomorphic subgraph of a query graph in a data graph. Referring to fig. 2, the method for obtaining a isomorphic sub-graph disclosed in the embodiment of the present application includes, but is not limited to, the followingsteps 201 to 205.
Step 201: a first node of a current query is determined among the nodes of the first query graph.
Based on the above definition of the graph, the first query graph includes a plurality of nodes and a plurality of edges for connecting the nodes. The process of obtaining the isomorphic subgraph of the first query graph in the embodiment of the application comprises the following steps: and determining corresponding matching nodes of the first query graph in the first data graph, and determining the isomorphic sub-graph of the first query graph in the first data graph based on the matching nodes of the first query graph. The matching node of any node of the first query graph refers to a corresponding node with the same structure of the any node in the first data graph. Illustratively, the same structure indicates that any node of the first query graph has the same label as the corresponding matching node, and the number and label of all neighboring nodes connected by the edge of any node in the first query graph are also the same as the number and label of all neighboring nodes connected by the matching node in the first data graph.
In which the first node of the current query in the first query graph is determined based on the depth-first search order, and then the followingstep 202 and 205 are performed. In a possible implementation manner, before thestep 201, the method further includes querying the first query graph based on a breadth-first search order, where the breadth-first search order indicates that, starting from a root node in the first query graph, querying is performed according to a hierarchy of nodes, that is, after all nodes in the same hierarchy are queried, querying is performed on nodes in a next hierarchy, and a queried matching result is stored in a query process, so that, in a process of querying and matching according to the breadth-first search order, an occupation of a memory by a matching result of a node is continuously increased. In this possible implementation, the process of obtaining a isomorphic sub-graph includes:
based on the width-first search sequence, sequentially determining at least one node corresponding to each node in the first query graph in the first data graph from the initial node of the first query graph, and storing the matching result of each node in the first query graph; if after storing at least one node corresponding to the node currently being queried in the first data graph, the storage space occupied by the node corresponding to each matched node reaches a storage threshold value, determining a first node in the matched nodes in the first query graph; determining a matching node of each node of the first query graph according to the depth-first search order based on the first node; and determining a isomorphic subgraph of the first query graph in the first data graph based on the matching results of the nodes in the first query graph. The process of obtaining the isomorphic subgraph is described in the following steps 2011-2014.
Step 2011: and sequentially determining at least one node corresponding to each node in the first query graph in the first data graph from the initial node of the first query graph based on the width-first search order, and storing the matching result of each node in the first query graph.
The starting node is determined in each node of the first query graph, and the determining manner of the starting node is not limited in the embodiment of the present application, and the starting node may be determined based on logic in an adopted algorithm. A node in the first query graph that is querying is determined, and at least one node in the first data graph that corresponds to the node that is querying is determined. In the process, the matching result of the node which is queried in the first query graph is stored in the memory. The nodes in the first data graph corresponding to the nodes queried in the first query graph refer to the nodes in the first data graph having the same labels as the nodes queried.
In a possible implementation manner, before this step 2011, the method further includes: the scale of the search space in the process of obtaining the isomorphic subgraph is estimated based on the first query graph and the first data graph, and the step 2011 is executed if the scale of the search space is smaller than the size of the memory. The search space is the space occupied by the matching results of each node in the first query graph in the storage width-first search order.
Step 2012: and if the storage space occupied by the corresponding node of each matched node reaches the storage threshold after the corresponding at least one node of the current query node in the first data graph is stored, determining the first node in the matched nodes in the first query graph.
In this possible implementation manner, the nodes are queried and matched in the first query graph, the matching result is stored in the content continuously, and the occupation of the matching result on the memory space is increased continuously as the nodes in the first query graph are queried and matched continuously. In a possible implementation manner, if the occupation of the storage space in the memory reaches the storage threshold after the node currently performing query is at least one node corresponding to the node in the first data graph, that is, after the matching result of the node currently performing query is stored, the first node currently performing matching is determined among the nodes already matched in the first query graph. Wherein the storage threshold is used to define a maximum value of the amount of data stored in the memory. Optionally, the storage threshold is an actual size of the memory storage space, or the threshold may be a maximum value of the amount of data allowed to be stored in the memory, which is set empirically.
Step 2013: based on the first node, determining a matching node of each node of the first query graph according to the depth-first search order.
In this step, in the case where the memory space is largely occupied, the first node is determined, and based on the first node, the matching node of each node of the first query graph is determined in the depth-first search order. The determination may be made with reference to step 202-205, described below.
Step 2014: and determining a isomorphic subgraph of the first query graph in the first data graph based on the matching result of each node.
In a possible implementation manner, in the process of querying according to the depth-first search order, the memory continuously releases the storage space, and when the current storage space is lower than the storage threshold, the manner of querying according to the width-first search order can be recovered. In this possible implementation manner, in the process of obtaining the isomorphic subgraph, the alternating use of the width-first search order and the depth-first search order is realized.
In practical applications, the storage device for storing the nodes corresponding to the nodes of the first query graph in the first data graph may be an L3 memory in a central processor of a computer. In practical application, there are various algorithms to implement breadth-first search or depth-first search, and the embodiment of the present application does not limit the specific form of the algorithm.
In a possible implementation manner, before this step, based on the label of each node in the first query graph and the label of each node in the first data graph, at least one candidate node corresponding to any node in the first query graph is determined in the first data graph, where the candidate node refers to a node in the first data graph that has the same label as any node in the first query graph, and at least one candidate node of any node in the first query graph constitutes a candidate set of the any node. In this possible implementation manner, each node of the first query graph determines a matching node in the first data graph, and may be performed based on a candidate set corresponding to the node. In the process of obtaining the isomorphic subgraph, the candidate nodes of each node can be updated based on the matching conditions of other nodes.
In one possible implementation, beforestep 201, a first query graph is first determined based on the second query graph and/or a first data graph is determined based on the second data graph. In the case where a second query graph exists, the final determination is the isomorphic subgraph of the second query graph.
Based on this, step 201 is preceded by: and merging the target nodes in the second query graph, and obtaining the first data graph according to a merging result. The target nodes refer to a plurality of nodes which meet target conditions in the second query graph; the target condition is satisfied when the plurality of nodes have the same label and the same data amount, and the labels of the neighbor nodes of different nodes in the plurality of nodes are the same. In this possible approach, a homogenous sub-graph of the second query graph in the first data graph is finally determined.
Step 201 is preceded by: and for the target node in the second data graph, obtaining the first data graph according to the merging result. The target nodes refer to a plurality of nodes meeting target conditions in the second data graph; the target condition is satisfied when the plurality of nodes have the same label and the same data amount, and the labels of the neighbor nodes of different nodes in the plurality of nodes are the same. In this possible approach, a homogenous sub-graph of the first query graph in the second data graph is finally determined.
In one possible approach, beforestep 201, the target nodes in the second query graph and the second data graph are merged, respectively, and then the isomorphic subgraph of the second query graph in the second data graph is finally determined.
Beforestep 201, in the process of obtaining the first query graph, a plurality of target nodes in the second query graph are determined based on the labels of the nodes in the second query graph and the connection relationship between the nodes.
In one possible implementation, all the target nodes are merged, and the first query graph is obtained according to the merging result. Based on a plurality of nodes in the second query graph, in a possible implementation manner, there are a plurality of groups of similar nodes with different labels, illustratively, the labels of the nodes labeled 1, 2, and 5 in the second query graph are a, and the three nodes are all connected with one node labeled 3; in the second query graph, the labels of the nodes marked as 11, 12 and 16 are B, and the labels of the respective connection points of the 3 nodes are also the same, so that in the first query graph, the nodes 1, 2 and 5 are similar nodes to each other, the nodes 11, 12 and 16 are similar nodes, and in the merging process, the similar nodes with different labels are respectively merged.
In a possible implementation manner, the second query graph is a tree graph, and based on the property of the tree graph, each node of the second query graph may be divided into a leaf node and a non-leaf node, and then the target node in the second query graph may have both the leaf node and the non-leaf node. All leaf nodes in the target node are merged and non-leaf nodes in the target node are merged based on the first qualifying condition. Illustratively, the first defining conditions are as follows:
let SU represent the set of target nodes to be merged, | SU | represent the number of target nodes in SU, based on the properties of target nodes, no edges are connected between target nodes or a group is formed between target nodes. If SU is a clique, the more nodes, the stronger the pruning power. Whether to merge the target nodes is determined based on the following equations 1 and 2. Wherein, the group is composed of a plurality of nodes which are connected with each other pairwise.
Wherein c (SU) indicates the candidate set size, | SU | indicates the number of target nodes, isclique (SU) indicates whether the target node is clustered, wherein c (SU) indicates the candidate set of nodes in SU, "if SU for a clique" indicates "if the target node is clustered," isclique (SU) takes a value of 1 at this time, "others" indicates "otherwise," i.e., if the target node is not clustered, isclique (SU) takes a value of 0 at this time. The value of turbo (SU) is used to determine whether to perform the combining process, and may be referred to as a first combined value, and the smaller the value of turbo (SU) finally calculated, the more SU should be combined.
In a possible implementation manner, in the second query graph, the target node is a similar node that satisfies the target condition, and a node obtained after merging is called a virtual node.
Beforestep 201, in the process of obtaining the first data diagram, determining a plurality of target nodes in the second data diagram based on the labels of the nodes in the second data diagram and the connection relationship between the nodes; and merging the target nodes based on the second limiting condition, and obtaining a first data graph according to a merging result. Wherein the content of the second limiting condition is as follows:
wherein, SV represents a similar node set to be merged, | SV | indicates the number of target nodes, and d (SV) represents the reading of the nodes in the SV; IsClique (SV) indicates whether the target node is clustered, the determination mode of the value of IsClique (SV) is similar to the above formula 1, the clustering of similar nodes is 1, and otherwise, the value is 0. Based on the above formula, the value of boost (SV) is used to determine whether to perform the merging process, which may be referred to as a second merged value, and the larger the value of boost (SV), the more SV should be merged.
In this possible implementation, where the set of SVs is a clique, the number of edges that can be saved is:
(| SV | -1) × [ d (SV) - | SV | +1] + | SV | × (| SV | -1)/2 | (| SV | -1) [ d (SV) +0.5-0.5| SV | ] (formula 4)
In one possible implementation, where there is no clique between nodes in the set of SVs, i.e., there is no edge between nodes, the number of edges saved is (| SV | -1) × d (SV).
In a possible implementation manner, in the second data graph, the target node is an equivalent node that satisfies the target condition, and a node obtained after merging is called a super point.
In a possible implementation manner, before query, nodes of a second query graph are merged, and then a isomorphic sub graph of the second query graph in a first data graph is finally determined; in a possible implementation manner, before query, nodes in the second data graph are merged, and then an isomorphic subgraph of the first query graph in the second data graph is finally determined; in a possible implementation manner, before the query is performed, the nodes in the second query graph and the second data graph are merged, and then the isomorphic sub-graph of the second query graph in the second data graph is finally determined.
Step 202: and in response to the first node meeting a first condition, determining at least one matched node of the first node, wherein the meeting of the first condition means that all neighbor nodes of the first node have already determined the matched node, and the matched node of any node of the first query graph means that the any node corresponds to a node with the same structure in the first data graph.
In a possible implementation, the first node is preceded by a node that has been matched and a matching node is determined, and from the initial matching step, the nodes of the first query graph and the matching node form matching pairs, and the matching pairs form a matching set. Wherein the matching set can be set when the first query graph is not queried and set as an empty set.
Based on the first node determined instep 201, it is determined in thisstep 202 whether the first node satisfies a first condition. The first node satisfying the first condition may be referred to as a complete node. Satisfying the first condition means that all neighboring nodes of the first node have determined corresponding matching nodes, i.e., all neighboring nodes of the first node have completed matching before the first node. In a possible implementation manner, there is a node that has been matched before the first node, and a matching set is set, and if the first condition is satisfied, all the neighboring nodes of the first node belong to the matching set.
Illustratively, referring to the query graph presented in FIG. 1, queries are performed in a depth-first search order, i.e., in u0、u1、u3、u4、u2Inquiring each node of the inquiry graph, wherein the current inquiry node is u2The neighbor node of the node is u0、u1Based on the query sequence, it can be determined that all the neighboring nodes of the node are matched and the corresponding matched node is determined, then the node u2Satisfying the first condition may be referred to as a complete node.
In one possible implementation manner, in response to the first node satisfying the first condition, the determining at least one matching node of the first node may include: determining all neighbor nodes of the first node in the first query graph, and acquiring the matched nodes of all the neighbor nodes; determining neighbor nodes of all matched nodes based on the first data graph, and obtaining a set of neighbor nodes of any matched node based on the neighbor nodes of any matched node; and performing set intersection operation on the neighbor nodes of each matching node, and taking at least one node included in the intersection operation result as at least one matching node of the first node.
Exemplarily, referring to fig. 1, u is determined2To satisfy the firstThe first node of the condition, i.e. the perfect node, then in this way the node u is determined2Is based on u2Of neighbor node u0And u1Determining the matched node; referring to the data diagram G given in FIG. 3, node u0And u1The corresponding matching nodes are respectively v0And v100。
Based on data graphs G, v0Of neighboring nodes { v } of neighboring nodes1,v2,v100,v202},v100Of neighboring nodes { v } of neighboring nodes0,v200,v201,v202Performing intersection operation on the set of the neighbor nodes of the two matched nodes to obtain an intersection operation result { v }202H, the intersection operation result includes a node v202As node u2The matching node of (2).
In a possible implementation manner, if the first node does not satisfy the first condition, a candidate set corresponding to the first node is determined, and a node satisfying the matching condition is determined as a matching node of the first node in the candidate set. The matching condition is satisfied: the connection relationship between the first node and the node in the first data graph, which has completed matching (the edge between the first node and the node, which has completed matching), exists in the connection relationship between the matching node and the node in the first data graph, which has been determined as the matching node (the edge between the matching node and the other matching node).
Step 203: and sequentially determining the matching nodes of a second node based on the query sequence of each node in the first query graph, wherein the second node is the node which is used for determining the matching nodes in the nodes of the first query graph.
After at least one matching node of the first node is determined, determining a next node to be queried in the first query graph according to the query sequence of the depth-first search, and determining one matching node of the node based on the candidate set of the node in response to the first node not meeting the first condition. The determination process may refer to the description of this case instep 203 above, and then determine the next node to query.
Therefore, in this step, the second nodes of the matching nodes in the first query graph are not determined according to the query sequence of the depth-first search, and the number of the second nodes is at least one.
In a possible implementation manner, in the process of sequentially determining the matching nodes of the second nodes, when one second node satisfies the first condition, at least one matching node of the second node is determined based on the manner instep 202; then, based on the manner ofstep 203, a matching node which is subsequently the second node for matching is determined.
In a possible implementation manner, before determining a second node, it is determined whether each node of the first query graph is queried completely, if so, the second node is not determined, and step 204 is executed to obtain an isomorphic sub-graph; otherwise, a second node is determined, and then a matching node for the second node is determined. The exemplary method for judging whether the query of each node of the first query graph is completed includes: under the condition that a matching set exists, judging whether the number of matching pairs in the matching set is the same as the number of nodes in the first data graph or not, and if so, judging whether all the nodes of the first query graph are queried completely; otherwise, continuing to determine the second node.
Step 204: and matching the first node with at least one matching node of the first nodes in response to the second nodes determining the matching nodes.
In this step, each node of the first query graph is matched with a node of the first data graph, and other nodes except the first node determine a matching node, and then the first node is matched with a node of at least one matching node of the first node. In one possible implementation, at least one matching node of the first node forms a set, and step 204 is to expand the set so that the first node is matched with nodes in the set respectively.
Step 205: based on the matching result, a homogeneous subgraph of the first query graph in the first data graph is determined.
In the above process, all nodes except the first node in the first query graph determine corresponding matching nodes, and based on the matching result of the first node and the at least one matching node in theabove step 204, at least one isomorphic sub-graph of the first query graph in the first data graph is determined.
In one possible implementation, there may be a plurality of nodes satisfying the first condition in the first query graph, and each node satisfying the first condition determines at least one matching node with reference to step 202. In this possible implementation manner, each node satisfying the first condition is respectively matched with at least one corresponding matching node, and then at least one isomorphic sub-graph of the first query graph in the first data graph is determined based on a matching result of each node satisfying the first condition. Wherein, there may be multiple matching results of a node satisfying the first condition, and then determining at least one isomorphic sub-graph of the first query graph in the first data graph based on the matching results of the nodes satisfying the first condition includes: and performing full-permutation processing on the matching results of the nodes meeting the first condition, enumerating the combination condition of the matching results, and obtaining at least one isomorphic subgraph of the first query graph in the first data graph.
The order of the above implementation steps is to explain the method for obtaining a homogenous sub-graph provided in the embodiment of the present application, and is not limited to various possible implementations. In practical applications, the above various possible implementations may be combined based on actual situations. In a possible implementation manner, after the query graph and/or the data graph are merged, in the query process, it may be performed that whether the first node currently queried satisfies the first condition is not determined, but the matching is directly performed in a manner that the first node does not satisfy the first condition.
According to the method for obtaining the isomorphic sub-graph, the query matching mode of the first data graph is further determined by judging the condition met by the first node of the current query, a better query sequence can be selected, repeated matching operation is reduced, and the efficiency and flexibility of the method for obtaining the isomorphic sub-graph are improved. In addition, the nodes in the query graph and the data graph can be merged, so that the number of the nodes in the query graph and the data graph is simplified, the efficiency of the method for acquiring the isomorphic subgraph is improved, and the query time is reduced; and various query and search sequences can be mixed for use, the utilization of the storage space is optimized, and the flexibility of the method for acquiring the isomorphic subgraph is improved.
In order to better understand the method for acquiring the isomorphic subgraph provided by the embodiment of the application, the method for acquiring the isomorphic subgraph is described based on the above embodiment.
In an exemplary embodiment, a conventional method for obtaining isomorphic subgraphs based on a depth-first search order is described based on a query graph Q in fig. 1 and a data graph G in fig. 3, and the method includes the following steps 301 to 329.
Step 301, a matching set is set, and the matching set is used for storing each match determined in the query process. The partial match set is denoted as M, and M is initialized as an empty set.
Step 302, determining at least one candidate node corresponding to each node in the query graph Q in the data graph G, and determining a candidate set corresponding to each node in the query graph Q based on the obtained candidate nodes, wherein the candidate set is represented as C (u)i) Wherein u isiIs the identification of the nodes in the query graph. The nodes in the candidate set and the nodes corresponding to the candidate set have the same label, the nodes in the candidate set are the nodes in the data graph, and the nodes corresponding to the candidate set are the nodes in the query graph.
Illustratively, node u in the query graph0The corresponding candidate node in the data graph G is v203And v0Then node u0Is C (u)0)={v0,v203}; then the candidate sets corresponding to other nodes in the query graph Q are:
C(u1)={v1,v2,v100}
C(u2)=C(u3)=C(u4)={v101,v102,v200,v201,v202}
step 303, comparing the number of the matching pairs in the matching set M with the number of the nodes in the query graph, and if the number of the matching pairs in the matching set M is equal to the number of the nodes in the query graph, outputting the matching pairs in the matching set to obtain a isomorphic subgraph of the query graph; otherwise, continuing to execute the step of searching and matching. The number of matching pairs in the matching set is expressed as | M |, and the number of nodes in the query graph is expressed as | V (Q) |.
Exemplarily, in this embodiment, the search matching step is not performed yet, the number of matches in the matching set M is 0, the number of nodes in the query graph Q is 5, and the subsequent search matching step is continued if the number is different.
Step 304, determining a first query node of the current query in the query graph, updating a candidate set corresponding to the first query node to obtain an updated candidate set, where the updated candidate set may be denoted as CR(ui)。
Illustratively, the first query node u0After updating the candidate set, the first query node u0The corresponding candidate node is not changed, and the content of the updated candidate set is the same as that before the update, namely CR(u0)=C(u0)={v0,v203}。
Step 305, traverse the first query node u0Determining one of the matching nodes in the updated candidate set.
Exemplarily, in the embodiment of the present application, v is determined first0As u0The matching node of (2).
Step 306, obtaining a matching pair based on the first query node and the matching node of the first query node, and adding the matching pair to the matching set to obtain an updated matching set.
Illustratively, u is determined0Has a matching node of v0To obtain a matching pair of (u)0,v0) (ii) a Adding the matching pair into a matching set, wherein the updated matching set is as follows: m { (u)0,v0)}。
Step 307, comparing the number of the matching pairs in the matching set with the number of the nodes in the query graph, and determining the subsequent steps to be executed.
In the present exemplary embodiment, when | M | ═ 1, | v (q) | ═ 5, and the number of matching pairs in the matching set is not the same as the number of nodes in the query graph, the search matching operation is continued.
And 308, determining a second query node in the data graph according to the matching sequence, and updating the candidate set corresponding to the second query node to obtain an updated candidate set.
In the exemplary embodiment, the second query node is determined to be one and the first query node u0Directly connected neighbor node u1U is the1The original corresponding candidate set is: c (u)1)={v1,v2,v100}. Based on the above steps, when v is determined0In the case of (u)1If the candidate set of the second query node does not change, the updated candidate set of the second query node is consistent with the candidate set before updating, which is CR(u1)=C(u1)={v1,v2,v100}。
Step 309, traversing the updated candidate set of the second query node, and determining a matching node corresponding to the second query node, where the matching node is a candidate node satisfying the matching condition in the updated candidate set.
The matching conditions are as follows: an edge between the second query node and a node of the query graph for which a matching node has been determined (a node of the query graph for which a matching set has been stored) also exists between the matching node and a node that has been determined to be a matching node (a matching node in a match for which a partial match set has been stored). The target condition may be expressed as:
and u and v indicate the query node and the post-candidate node which are currently subjected to search matching.
In the embodiment of the present application, it is assumed that the first candidate node to search for a match in the updated candidate set is v
100Based on v
100The matching conditions are met:
then a candidate node v may be determined
100Is u
1The matching node of (2).
And 310, obtaining a matching pair based on the second query node and the matching node of the second query node, and adding the matching pair into the matching set to obtain an updated matching set.
Illustratively, a second query node u is determined1Corresponding matching node is v100To obtain a matching pair of (u)1,v100) (ii) a Adding the match to the current match set, the updated match set is: m { (u)0,v0),(u1,v100)}。
Step 311, determining that | M | ═ 2 and | v (q) | ═ 5 currently, the number of matches in the matching set is different from the number of nodes in the query graph, and continuing to perform the search matching operation.
And step 312, determining a third query node in the data graph according to the matching sequence, and updating the candidate set corresponding to the third query node to obtain an updated candidate set.
In the exemplary embodiment, the third query node is determined to be one and the second query node u1Directly connected neighbor node u3,u3The original candidate set is: c (u)3)={v101,v102,v200,v201,v202}. Based on the steps, a second query node u is determined1Has a matching node of v100Based on the data map G, u3Will not match to the candidate node v101And v102Then get the updated candidate set as CR(u3)={v200,v201,v202}。
Step 313, determining a third query node u in the updated candidate set
3Assuming that the first candidate node for search matching in this step is v
200,v
200Satisfy the matching condition
Then a third query node u is determined
3Has a matching node of v
200。
Step 314, based on the third query node u3And matching node is v200Form a new matching pair of (u)3,v200) If the matching pair is added to the current matching set M, the updated partial matching set is: m { (u)0,v0),(u1,v100),(u3,v200)}。
Step 315, determining that | M | ═ 3 and | v (q) | ═ 5 currently, the number of matches in the matching set is different from the number of nodes in the query graph, and continuing to execute the search matching operation.
Step 316, determine the fourth query node u4Determining u based on the above matching process4The corresponding updated candidate set is CR(u4)={v201,v202}。
Step 317, suppose that the first candidate node which is currently subjected to search matching is v
201,v
201Satisfy the matching condition
Then a fourth query node u is determined
4Has a matching node of v
201。
Step 318, based on u4And v201Form a new matching pair of (u)4,v201) If the match is added to the current matching set, the updated matching set is: m { (u)0,v0),(u1,v100),(u3,v200),(u4,v201)}。
Step 319, determining that the current | M | ═ 4 and | v (q) | ═ 5 are different in number, and continuing to execute the search matching operation.
Step 320, determine the fifth query node u2Determining u based on the above matching process2Corresponding toThe updated candidate set is CR(u2)={v202}。
Step 321, v
202Satisfy the matching condition
Then a fifth query node u is determined
2Has a matching node of v
202。
Step 322, based on u2And v202Form a new matching pair of (u)2,v202) If the match is added to the current matching set M, the updated matching set is: m { (u)0,v0),(u1,v100),(u3,v200),(u4,v201),(u2,v202)}。
Step 323, judging that | M | ═ 5, | v (Q) | ═ 5, at this time, the number of matches in the matching set is the same as the number of nodes in the query graph, and outputting all matching pairs in M to obtain an isomorphic subgraph of the data graph Q. I.e. (u)0,v0),(u1,v100),(u3,v200),(u4,v201),(u2,v202)。
Step 324, release matching pairs (u) in M4,v201),(u2,v202) Proceed back to step 316.
Step 325, search for v in the query candidate set202Whether a matching condition is satisfied, and in response to it satisfying a target condition, determining v202For the fourth query node u4Generating a new matching pair by the matching nodes, and updating the matching set, wherein the updated matching set is as follows: m { (u)0,v0),(u1,v100),(u3,v200),(u4,v202)}。
Step 326 determines that the current | M | ═ 4, | v (q) | ═ 5, and the number is different, and continues to perform the search matching operation.
Step 327, determine the fifth query node u2Determining u based on the above matching process2The corresponding updated candidate set is CR(u2)={v201}. At this time, node v201The matching condition is not satisfied.
Step 328, release (u) of M4,v202) And (u)3,v200) Proceed back to step 312.
Step 329, search if other candidate nodes in the candidate set are with the third query node u3And matching, and executing corresponding search query steps according to the content.
In the subsequent process, a search query mode and a backtracking query mode are continuously executed, all combination conditions of matching between each query node and the candidate nodes in the query graph are enumerated, and isomorphic subgraphs of all the query graphs are obtained. In this exemplary embodiment, in addition to the isomorphic sub-graph obtained in step 323, another isomorphic sub-graph is obtained in the subsequent query process, where the matching pairs output in the matching set are: (u)0,v0),(u1,v100),(u3,v201),(u4,v200),(u2,v202)。
In an exemplary embodiment, the nodes in the query graph Q are merged, and the merged nodes may be referred to as virtual nodes.
Illustratively, based on the query graph Q of FIG. 1 above, where u3And u4For similar nodes, u is3And u4Merge to get virtual node u', get updated query graph Q as shown in FIG. 41Then, a homogeneous subgraph process of the query graph Q is obtained, which may include the following steps 401 and 420.
Step 401, a matching set is set, and the matching set is used for storing each matching determined in the query process. The partial match set is denoted as M, and M is initialized as an empty set.
Step 402, determining an updated query graph Q in the data graph G1At least one candidate node corresponding to each node, and determining a candidate set corresponding to each node based on the obtained candidate nodes, wherein the candidate set is represented as C (u)i) Wherein u isiIs the identity of the node.
Illustratively, node u0The corresponding candidate node in the data graph G is v203And v0Then node u0Is C (u)0)={v0,v203}; then the candidate sets corresponding to other nodes in the query graph Q are:
C(u1)={v1,v2,v100}
C(u2)=C(u′)={v101,v102,v200,v201,v202}
in the present exemplary embodiment, node u is paired0And u1The search matching process of (1) comprises steps 403-411 which are the same as steps 303-311 of the method described above.
Step 412, determining a third query node in the data graph according to the matching sequence, and updating the candidate set corresponding to the third query node to obtain an updated candidate set.
In the exemplary embodiment, the third query node is determined to be one and the second query node u1The original candidate set of the directly connected neighbor nodes u ', u' is: c (u') ═ v101,v102,v200,v201,v202}. Based on the steps, a second query node u is determined1Has a matching node of v100Based on the data graph G, u' will not match the candidate node v101And v102Then get the updated candidate set as CR(u′)={v200,v201,v202}。
Step 413, determine the matching node of the third query node u' in the updated candidate set, traverse the candidate set CRCandidate node in (u'), v200、v201And v202All satisfy the matching condition. Based on the inclusion of the query node u in the third query node u' (virtual node)3And u4In this step, the node u is determined by expansion3And u4The matching pair of (2) has the following six possibilities:
the method may comprise the following steps of 1: (u)3,v200),(u4,v201) (ii) a The method may comprise the following steps of 2: (u)3,v200),(u4,v202);
It is possible to have 3: (u)3,v201),(u4,v200) (ii) a The probability of 4: (u)3,v201),(u4,v202);
And possibly 5: (u)3,v202),(u4,v201) (ii) a And 6: (u)3,v202),(u4,v200)。
Assume that a possible 1 is selected for subsequent search matching operations.
Step 414, determining the current matching set M as: m { (u)0,v0),(u1,v100),(u3,v200),(u4,v201)}。
Step 415, determining that the current | M | ═ 4, | v (q) | ═ 5, the number of matches in the matching set is different from the number of nodes in the query graph, and continuing to execute the search matching operation.
Step 416, determine the next query node u2Determining u based on the above matching process2The corresponding updated candidate set is CR(u2)={v202}。
Step 417, v
202Satisfy the matching condition
Then a fifth query node u is determined
2Has a matching node of v
202。
Step 418, based on u2And v202Form a new match of (u)2,v202) To prepare theAnd matching and adding the current matching set, wherein the updated matching set is as follows: m { (u)0,v0),(u1,v100),(u3,v200),(u4,v201),(u2,v202)}。
Step 419 determines that the current | M | ═ 5, | v (Q) | ═ 5, and at this time, the number of matches in the matching set is the same as the number of nodes in the query graph, and outputs all the matching pairs in M, so as to obtain an isomorphic subgraph of the data graph Q. I.e. (u)0,v0),(u1,v100),(u3,v200),(u4,v201),(u2,v202)。
Step 420, releasing the matching pair (u) in M3,v200),(u4,v201),(u2,v202) Tracing back to step 413, another possibility is determined in step 413, and then a search query operation is performed. After all six possible searches in step 413 are completed, the process returns to step 410, and the search query node forms matched search query results with other candidate nodes. In the process of enumerating all search results in this step, all isomorphic subgraphs of the query graph are obtained.
In the exemplary embodiment, the nodes in the data graph G are merged, and the resulting point after merging is called a super point. The process of the present exemplary embodiment is similar to the above process, and the description of the present exemplary embodiment is omitted here.
In an exemplary embodiment, the points in the query graph Q and the data graph G are merged at the same time.
For the improved method, a method for acquiring the isomorphic subgraph applied during enumeration is as follows: assuming that the current virtual node (node resulting from merging in the data graph) contains k real nodes, the currently considered combination is { (h)i,ni) 1 < i < m, where m is the number of different vertices (nodes in the data graph that result from the merge), and n isiIs the number of times the overshoot hi occurs in this combination. Then the two processes of enumerating the permutations of real nodes in the query graph and enumerating the combination of real nodes in the over-point can be performedIs fused, the effect is equivalent to that for any 1 < i < m, n is taken out from the real node set of the beyond point hiiAnd combining the nodes without repetition to make a full array.
Illustratively, u in the query graph1、u2And u3Belongs to the same virtual node u', the nodes are connected with each other by edges, and the corresponding candidate set is h1And h2,h1And h2Is a super point, where h1Containing node v1,v2,h2Containing node v3,v4,v5. The corresponding candidate node of the virtual node u' is the over point h1And h2。
In one possible implementation, the matching of the real node and the over point in the virtual node is determined as follows: ms={(u1,h1),(u2,h1),(u3,h2)}. From { v1,v2There are 1 fetch for two nodes from { v }3,v4,v5There are three methods for taking one node. And can determine the matching M of the real node over-point in the virtual nodesThere are 3 kinds. Then there is 3! 6 kinds. The method has clear logic, simple flow, easy realization and high process efficiency.
In the above exemplary embodiment, the nodes in the query graph and the data graph are merged, so that the number of the nodes in the query graph and the data graph is simplified, the efficiency of the method for obtaining the isomorphic subgraph is improved, and the query time is reduced.
In an exemplary embodiment, a first node of a current query, which may be referred to as a complete node, satisfies a first condition.
In this exemplary embodiment, a complete node is defined: given a query graph Q and a matching order π, let McRepresents the set of matched nodes in the query graph, when the first node u (u ∈ Q) is matched, if N (u) is contained in McIf yes, the first node is called a complete node, where N (u) represents a set of neighboring nodes of the first node u. I.e. the neighbor nodes of the first node currently queried in the query graphAnd determining corresponding matching nodes, forming matching pairs and adding a matching set, and then calling the first node of the current query as a complete node. In other words, all edges of the complete node are connected by the matched set.
Illustratively, in one possible implementation, referring to step 320 in the above exemplary embodiment of depth-first search, the fifth query node u is determined in step 3202. Known based on the search query procedure in the previous step, and u2Connected neighbor query node u0And u1All have determined corresponding matches, the query node u2I.e. a complete node.
Illustratively, in the case where the first node satisfies the first condition, in this manner, the determination of the matching node for the complete node may be generated based on the matching nodes of the neighboring nodes connected to the complete node.
Illustratively, see step 20 in the exemplary embodiment of the depth-first search backtracking section above, in which node u is determined2Is a complete node, then in this manner, the node u is determined2Is based on u2Neighbor query node u of0And u1Is matched with v0And v100The determination is made.
Based on data graphs G, v0Form a set of neighboring nodes v1,v2,v100,v202},v100Form a set of neighboring nodes v0,v200,v201,v202Get the intersection of the two sets to get the node v202I.e. determining node u2Has a matching node of v202。
In the possible implementation manner, the first node is judged to be a complete node, at least one matching node of the complete node is determined based on the manner, the at least one matching node is stored firstly, then, the second node which is used for matching in the matching query graph is searched continuously, after the matching nodes of all the second nodes are obtained, the at least one matching node of the complete node is expanded and matched with the first node, and a final isomorphic sub-graph is obtained.
Illustratively, based on the query graph Q given in FIG. 52And isomorphic query is carried out in a certain data graph by adopting the depth-first search method. At node u4In the case of the first node of the current query, assume that node u4Is a complete node (i.e. u)1And u3All have determined corresponding matching nodes), u1And u3Neighbor node determination u of corresponding matching node4The matching node of (2).
In an exemplary embodiment, if multiple corresponding matching nodes are determined based on intersection operations, then u is the basis4Generating a set C of at least one matching nodeM. Storing the set, and then sequentially searching subsequent query nodes u2、u5Searching and matching are carried out, and the node u is successfully determined2、u5In the case of a matching node of (3), C is setMNode u and node n3And (4) forming a final matching pair by expanding and matching, and then determining a isomorphic subgraph of the data graph.
In the above exemplary embodiment, by determining the condition that the first node of the current query satisfies, and further determining the query matching manner for the first data graph, a better query sequence may be selected to reduce repeated matching operations, and improve the efficiency and flexibility of the method for obtaining the isomorphic graph.
In an exemplary embodiment, a breadth-first search and a depth-first search are used in combination. See query graph Q in fig. 1 and data graph G in fig. 3. The process of the present exemplary embodiment includes the following steps 1 to 5.
Step 1, determining a candidate set of each node in the query graph Q, which is performed in the same manner as in the foregoing embodiment.
Illustratively, C (u)0)={v0,v203}C(u1)={v1,v2,v100}
C(u2)=C(u3)=C(u4)={v101,v102,v200,v201,v202}
Step 2, determining a first query node u based on the determined candidate set of each node0The matching pair possibly corresponding is (u)0,v0) Or (u)0,v203) (ii) a The two possible matching pairs are stored in a memory. The storage may be in the form of a table.
And 3, determining a matching pair to search and match the subsequent nodes based on the determined candidate set and the possible matching pairs formed in the step 1.
In one possible implementation, a matching pair (u) is selected0,v0) Determining at the second query node u1In (u)0,v0) In this case, a matching pair may be generated.
In the matching pair of (u)0,v0) In case of u1The corresponding candidate set is { v }1,v2,v100And then possible formed matching pairs include: (u)0,v1)、(u0,v2) And (u)0,v100);
And 4, when the size of the intermediate table reaches a given memory threshold, indicating that the memory space is excessively occupied. Then the method of depth-first search and backtracking is performed on the matching pairs in each row of the current intermediate table, and independent search is performed based on each path.
And step 5, in the process of depth-first search and backtracking, continuously releasing the memory space, and if the current memory occupation is lower than a threshold value, switching the expansion mode of the current path back to the mode of width-first search and backtracking.
In the above exemplary embodiment, a plurality of query search sequences are mixed, the utilization of the storage space is optimized, the flexibility of the method for obtaining the isomorphic subgraph is improved, and the application range of the method is expanded.
In a possible implementation manner, different data sets are used as query graphs to verify the effectiveness of the method for obtaining the isomorphic subgraph provided by the embodiment of the application. In an exemplary embodiment, Email and DBLP (DataBase systems and Logic Programming) datasets are chosen. The Email data set represents the network formed by Email communication, and has 36692 nodes and 183831 edges, and the maximum degree is 1383. The DBLP dataset represents a document reference network with 317080 nodes and 1049866 edges, with a maximum of 343 degrees. In this exemplary embodiment, both data sets are power-law graphs, and the distribution rule of degrees thereof satisfies the power-law distribution. Specific information for the two Data sets is given in Table 1 below, while information for other Data sets (e.g., Yeast Data Set; Human pose Data Set) is also included in Table 1, but was not selected for validation testing because of the smaller size of the other Data sets.
TABLE 1
| Data set | Number of nodes | Number of edges | Number of point tags | Maximum degree | Mean degree of |
| Yeast Data Set | 3112 | 12519 | 179 | 168 | 8.0 |
| Human | 4674 | 86282 | 88 | 771 | 36.9 |
| Email | 36692 | 183831 | 20 | 1383 | 10.0 |
| DBLP | 317080 | 1049866 | 20 | 343 | 6.6 |
In this exemplary experiment, 100 query graphs were generated for each data set, and the statistical average query time was used as a measure. Where the time limit for one query graph is 10 minutes, the top 10 is found5The individual isomorphic subgraphs stop matching.
In practical applications, there are a variety of algorithms that can implement the depth-first search framework, which may include qsi (quicksift), gql (graphql), tbi (turboiso), CFL (CFL-Match), VF3, CECI, DAF. In the exemplary test, the query effect of the method for obtaining the isomorphic subgraph provided by the embodiment of the application is respectively counted and executed on the basis of the algorithm. Tables 2 and 3 below are statistics of the validation performed in the Email dataset and the DBLP dataset, respectively.
TABLE 2
TABLE 3
In tables 2 and 3, the + NECa indicates a query mode in which nodes in the query graph are merged; + ISObThe query mode is used for indicating a query mode for merging the data graphs on the basis of merging the query graphs; the + ORDc indicates that a complete node judgment query mode is adopted on the basis of merging the query graph and the data graph; + MIXdThe method adopts a plurality of searching order mixed searching modes on the basis of the above steps. The acceleration ratios in tables 2 and 3 are calculated by dividing the average query time of the previous query mode by the average query time of the current query mode. As shown in Table 3, on the basis of QSL algorithm, + ISObCorresponding acceleration ratio "1.3 ×", average query time by the former query pattern (+ NECA query time 1.9) and average query time by the current query pattern (+ ISO)b1.5) is obtained by division calculation (1.9/1.5).
Based on the data of the acceleration ratios in table 2 and table 3, it can be known that the method for obtaining the isomorphic subgraph provided by the embodiment of the application can obtain a better acceleration effect.
In an embodiment of the present application, a device for obtaining subgraph isomorphism, referring to fig. 6, includes:
a first determiningmodule 601, configured to determine a first node of a current query among nodes of a first query graph;
a second determiningmodule 602, configured to determine at least one matching node of the first node in response to that the first node satisfies a first condition, where the first condition is that all neighboring nodes of the first node have already determined the matching node, and a matching node of any node of the first query graph is a corresponding node of the same structure of any node in the first data graph;
a third determiningmodule 603, configured to sequentially determine, based on a query sequence of each node in the first query graph, a matching node of a second node, where the second node is a node of the first query graph for which a matching node is not determined;
amatching module 604, configured to match the first node with at least one matching node of the first nodes in response to the second nodes all determining the matching node;
asubgraph determination module 605, configured to determine a homogeneous subgraph of the first query graph in the first data graph based on the matching result.
In a possible implementation manner, the second determiningmodule 602 is configured to determine all neighbor nodes of the first node in the first query graph, and obtain a matching node of each neighbor node; determining neighbor nodes of all matched nodes in the first data graph, and obtaining a set of neighbor nodes of any matched node based on the neighbor nodes of any matched node; and performing intersection operation on the set of the neighbor nodes of each matching node, and taking at least one node included in the intersection operation result as at least one matching node of the first node.
In one possible implementation, referring to fig. 7, the apparatus further includes: a switchingmodule 606, configured to sequentially determine, based on the breadth-first search order, at least one node corresponding to each node in the first query graph in the first data graph from a start node of the first query graph, and store a matching result of each node in the first query graph; if after storing at least one node corresponding to the node currently being queried in the first data graph, the storage space occupied by the node corresponding to each matched node reaches a storage threshold value, determining a first node in the matched nodes in the first query graph; determining a matching node of each node of the first query graph according to the depth-first search order based on the first node; and determining a isomorphic subgraph of the first query graph in the first data graph based on the matching result of each node in the first query graph.
In one possible implementation, the apparatus further includes: afirst merging module 607, configured to merge target nodes in the second query graph, and obtain the first query graph according to a merging result, where the target nodes are multiple nodes in the second query graph that meet a target condition, where meeting the target condition means that the multiple nodes have the same label and neighbor nodes with the same data size, and the labels of the neighbor nodes of different nodes in the multiple nodes are the same; asubgraph determining module 605, configured to determine a isomorphic subgraph of the second query graph in the first data graph.
In a possible implementation manner, the apparatus further includes: asecond merging module 608, configured to merge target nodes in the second data graph, and obtain the first data graph according to a merging result, where the target nodes are multiple nodes in the second data graph that meet a target condition, where meeting the target condition means that the multiple nodes have the same label and neighbor nodes with the same data size, and the labels of the neighbor nodes of different nodes in the multiple nodes are the same; asubgraph determining module 605, configured to determine a isomorphic subgraph of the first query graph in the second data graph.
In a possible implementation manner, thefirst merging module 607 is configured to determine a plurality of target nodes in the second query graph based on the labels of the nodes in the second query graph and the connection relationship between the nodes in the second query graph; and merging all the target nodes to obtain a first query graph according to a merging result.
In a possible implementation manner, thefirst merging module 607 is configured to determine a plurality of target nodes in the second query graph based on the labels of the nodes in the second query graph and the connection relationship between the nodes in the second query graph; merging all leaf nodes in the plurality of target nodes, and merging non-leaf nodes in the plurality of target nodes based on a first qualifying condition; and determining a first query graph according to the merging result.
In a possible implementation manner, thesecond merging module 608 is configured to determine a plurality of target nodes based on the labels of the nodes in the second data graph and the connection relationships between the nodes in the second data graph; and merging the target nodes based on a second limiting condition, and determining a first data graph according to a merging result.
It should be noted that, the device for obtaining sub-graph isomorphism provided in the foregoing embodiment is only illustrated by the division of each functional module, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above.
In another aspect, a computer device is provided, which includes a processor and a memory, where at least one program code or instruction is stored in the memory, and the at least one program code or instruction is loaded and executed by the processor, so as to enable the computer device to implement any one of the above methods for obtaining a homogenous subgraph.
Fig. 8 is a schematic structural diagram of a computer device that may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 801 and one ormore memories 802, where the one ormore memories 802 store at least one program instruction, and the at least one program instruction is loaded and executed by the one ormore processors 801 to implement the method for obtaining an isomorphic graph provided in the foregoing method embodiments. Certainly, the computer device may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the computer device may further include other components for implementing the functions of the device, which is not described herein again. The computer device may be a server, and the embodiment of the present application may be executed in a central processing unit in the server.
In another aspect, a computer-readable storage medium is provided, in which at least one program code or instruction is stored, the program code or instruction being loaded and executed by a processor to make a computer implement any of the above methods for obtaining a homogenous subgraph.
In another aspect, a computer program or a computer program product is provided, in which at least one computer instruction is stored, and the at least one computer instruction is loaded and executed by a processor, so as to enable a computer to implement any one of the above methods for obtaining a homogenous sub-graph.
It should be understood that, in the embodiments of the present application, the size of the serial number of each process does not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
The term "at least one" in this application means one or more, and the term "plurality" in this application means two or more.
It is to be understood that the terminology used in the description of the various examples herein is for the purpose of describing particular examples only and is not intended to be limiting. As used in the description of the various examples and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It will be further understood that the terms "comprises," "comprising," "includes," and/or "including," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should be understood that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.
It is also to be understood that the terms "if" and "if" may be interpreted to mean "when" ("where" or "upon") or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined," or "if [ a stated condition or event ] is detected," may be interpreted to mean "upon determining," or "in response to determining," or "upon detecting [ a stated condition or event ], or" in response to detecting [ a stated condition or event ] ", depending on the context.
It should also be appreciated that reference throughout this specification to "one embodiment," "an embodiment," "one possible implementation" means that a particular feature, structure, or characteristic described in connection with the embodiment or implementation is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" or "one possible implementation" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.