Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a flowchart of a temporal knowledge graph representation method based on a historical relationship and a two-graph convolutional network according to an embodiment. Referring to fig. 1, the temporal knowledge graph representation method includes data preprocessing, original drawing and edge drawing construction, temporal knowledge graph representation learning, and an application stage, which is similar to the data preprocessing, the original drawing and edge drawing construction, and the temporal knowledge graph representation learning, and therefore is not shown in fig. 1, and each stage is described in detail below.
Data preprocessing stage
The specific flow of data preprocessing is as follows:
step 1-1, inputting a temporal knowledge map TKB, and extracting events to obtain an event set.
The temporal knowledge graph TKB comprises a large amount of time-labeled knowledge, and event extraction is performed on the temporal knowledge graph TKB to form an event set. Events are represented in the form of quadruplets (s, r, o, t), where s represents the head entity, r represents the relationship, o represents the tail entity, and t represents the timestamp.
Represents a collection of entities, and
represents a set of relationships, and
t represents a set of timestamps, and T ∈ T.
And 1-2, extracting historical relations of the ergodic event set through an entity composed of head and tail entities to obtain a complete training data set.
A historical relationship is a sequence of relationships between pairs of entities with a time stamp. In this step, the historical relationship h is extracted by traversing the event set through the entity pair (s, o), which is formally represented as h { (r)1,t1),…(ri,ti),(rj,tj) Where t isi<tjDenotes the relationship riRatio relation rjOccurs first.
Original and edge graph construction stage
The specific flow of the construction of the original graph and the edge graph is as follows:
step 2-1 is to construct the original image G ═ V, E, and a from the history relationship, taking the entity as a node and the history relationship as an edge.
Where V denotes a node set (i.e., an entity set), E denotes an edge set (i.e., a history relation set), and a denotes an adjacency matrix of the original image. If a historical relationship exists between two entities, the two entities are considered to have an edge therebetween, and the adjacency matrix A of the original graph has the following calculation formula:
wherein A isi,jConnectivity between entities is represented.
Step 2-2, regarding the edges in the original image as nodes, defining the edges according to whether the edges of the original image are communicated or not, and constructing an edge graph Gedge=(Vedge,Eedge,Aedge)。VedgeRepresenting a set of nodes (i.e., a set of historical relationships), EedgeRepresenting sets of edges, the edges of the edge graph being based onWhether the edges of the original are connected or not (i.e. whether the edges are connected or not), AedgeAn adjacency matrix representing an edge graph.
In this step, the edge graph is obtained by converting the original graph, the nodes of the edge graph are the edges of the original graph, the edges in the edge graph are defined according to whether the edges of the original graph are communicated, and the definition rule is as follows: if two edges in the original graph share a vertex, i.e. it indicates that two edges in the original graph are connected, then there is an edge between two nodes in the edge graph. Adjacency matrix A of edge mapedgeThe calculation formula is as follows:
wherein A isedgei→j,u→vNot equal to 0 indicates the communication between the historical relationships i → j and u → v, weight wi→j,u→vThe influence degree between the history relations (namely two edges in the original) is shown, the influence degree is determined by the degree of the shared vertex in the original, the larger the degree of the shared vertex is, the smaller the influence of the side i → j on the side u → v is, and conversely, the larger the influence is. Weight wi→j,u→vThe calculation formula of (a) is as follows:
wherein α is a hyperparameter, and 0< α < 0.5.
Temporal knowledge graph representation learning phase
The specific flow of the tense knowledge graph representation learning stage is as follows:
and 3-1, sampling the training data set for Z times according to a fixed size, sampling one subgraph every time, wherein the subgraph comprises P nodes representing entities and all edges representing historical relations among the nodes, and executing the steps 3-2 to 3-5 for each subgraph.
And 3-2, for the historical relationship h in the subgraph, modeling a multi-range time dependence relationship by using a historical relationship encoder based on a time self-attention mechanism to obtain a representation h of the historical relationship.
The self-attention mechanism may model the context of each portion in the sequence, assigning weights to the different portions. As shown in fig. 2, the historical relationship encoder based on the time self-attention mechanism takes the self-attention mechanism as a component, consists of the time-based intra-block self-attention mechanism and the time-based inter-block self-attention mechanism, and can simultaneously model the local time dependency relationship and the global time dependency relationship.
In this step, the history relation h { (r)
1,t
1),…(r
i,t
i),(r
j,t
j) Split into M blocks, formally defined as h ═ z
1,z
2,…,z
M]. Each block contains N relationships, e.g. the first block formally defined as z
1={(r
1,t
1),…(r
N-1,t
N-1),(r
N,t
N)}. Time-based intra-block self-attention mechanism assigns a weight to each relationship within a block
Taking the first block as an example, the calculation formula is as follows:
wherein, W
intra、
For a learnable parameter, σ (-) is the activation function,
b
intraas an offset, p
iFor each relationship and relationship r in the first block
iRelative time between p
i={(t
1-t
i),…(t
N-1-t
i),(t
N-t
i)},r
iIs a relation r
iIs shown inThe representation z of each block can be obtained by weighted summation of each relation in the block through random initialization
kTaking the first block as an example, the calculation formula is as follows:
time-based inter-block self-attention mechanism assigns weights to each block
The calculation formula is as follows:
wherein, W
inter、
For a learnable parameter, σ (-) is the activation function,
b
interas an offset, q
kFor the first relation in each block and block z
kThe relative time of the first relationship in (a),
is a block z
kThe time of the first relationship in (a),
is a block z
MThe representation h of each historical relationship can be obtained by weighted summation of the representations of each block at the time of the first relationship:
step 3-3, representing all historical relations by the set of h, the adjacent matrixes A and A of the original graph and the edge graphedgeThe two-graph convolutional network is input together, and the representation of each node (namely the representation of the entity) is obtained by modeling the interaction between the entities and the historical relationship.
The Graph Convolution Network (GCN) is a deep neural network that performs convolution operation on a graph, can model message transmission between nodes, and is widely used in the fields of traffic prediction, image classification, and the like. The graph convolution network calculation formula used is as follows:
where gc (-) represents the graph convolution operation, X is the input node representation,
in order to normalize the laplacian matrix of the graph,
i is an identity matrix and D is a diagonalization matrix of nodes, i.e. D
ii=∑
jA
ij。
And representing an activation function, wherein ReLU is taken as the activation function, and theta is a parameter of the graph convolution network. The one-layer graph convolution network can aggregate the messages of the 1-hop neighbors for each node, and the range of the message-passing neighbors can be enlarged by stacking the multi-layer graph convolution networks. As shown in fig. 3, the dual graph convolution network uses the graph convolution network as a component, and includes k-layer original graph convolution network and k-1 layer edge graph convolution network, so that the interaction between entities and between historical relationships can be modeled simultaneously.
In this step, a node-edge indication matrix M is first constructedneTo show the corresponding relation between the original image nodes and the edges when the original image and the edge image are merged. Node-edge indicating momentMatrix MneThe calculation formula is as follows:
then, the historical relationship is expressed as a set of h, an adjacent matrix A of the original image and an adjacent matrix A of the edge mapedgeNode-edge indication matrix MneAnd inputting the data into the double-graph convolution network to obtain the representation of the node (namely the representation of the entity).
In the original convolutional network, the input of k layer is the output X of the original convolutional network of k-1 layer(k-1)And the output Y of the edge graph convolution network of the k-1 layer(k-1)Except for the first layer. The calculation formula is as follows:
wherein, X
(0)Is the initial representation of the original image node, is obtained by random initialization,
is the laplacian matrix of the original. Theta
node(k-1)Is a parameter of convolution of the original image of the (k-1) th layer, theta
en(k-1)Is the parameter of the k-1 layer edge map-artwork fusion convolution, [, ]]It is shown that the splicing operation is performed,
showing that the splicing result is activated;
in the edge graph convolution network, the input of the k-1 layer is the output Y of the edge graph convolution network of the k-2 layer(k-2)And the output X of the original convolutional network of the k-1 layer(k-1)Except for the first layer. The calculation formula is as follows:
wherein, Y
(0)For an initial representation of the nodes of the edge graph,
normalized Laplace matrix, Y, for edge maps
(0)Is obtained by linear conversion of the historical relationship expression h
edge(k-2)Is a parameter of the k-2 layer edge map convolution, theta
en(k-1)The parameters of the k-1 layer original image-edge image fusion convolution are shown.
And 3-4, predicting the relation which may exist in the entity pair in a future period of time by utilizing a semantic matching model based on the entity representation output in the step 3-3.
The semantic matching model adopts a DistMult model, the DistMult model is a knowledge graph representation learning model, and the matching degree of the entity pairs and each relation is measured through a bilinear function. That is, the matching score is obtained by using the DistMult model, and the score function of the DistMult model is defined as follows:
f(s,r,o)=(sTMro) (16)
wherein s and o represent representations of a head entity and a tail entity, obtained from the output of the dual graph convolutional network; m
rFor each relationship corresponding diagonal matrix, M
r=diag(m
r) Diag () will vector m
rConversion to diagonal matrix M
rWherein m is
r=Mh
r,h
rFor the historical relationship representation of the entity pair (s, o), M is a learnable weight matrix. Semantic matching is performed by using the DistMult model, the relation r which is possibly generated between entity pairs in a future period of time can be predicted, and
and 3-5, generating negative samples for all training samples in the subgraph by a method of randomly replacing head and tail entities, calculating the prediction loss of all samples in the subgraph, and then adjusting network parameters of a historical relationship encoder, a dual-graph convolution network and a semantic matching model based on a time self-attention mechanism according to a loss function.
In this step, cross entropy loss of all samples in the subgraph is calculated according to the prediction result, and the calculation formula is as follows:
where D is the set of positive and negative samples, and for positive samples (s, r, o), negative samples are obtained by randomly replacing s and o. Sig (·) is a sigmoid function, y takes a value of {0, 1}, y of a positive sample takes a value of 1, and a negative sample takes a value of 0.
Application phase
After determining the network parameters of the historical relationship encoder, the dual-graph convolutional network and the semantic matching model based on the time attention mechanism, namely building the model comprising the historical relationship encoder, the dual-graph convolutional network and the semantic matching model based on the time attention mechanism determined by the network parameters, the prediction of the future relationship between the entities can be carried out, and the specific process is as follows:
(a) preprocessing data in the temporal knowledge graph, and extracting an event and historical relation;
(b) constructing an original graph and an edge graph according to the steps 2-1 and 2-2;
(c) constructing a multi-range time dependency relationship for the original image by using a historical relationship encoder which is determined by parameters and is based on a time self-attention mechanism, and obtaining historical relationship representation;
(d) obtaining entity representation according to historical relationship representation, original graphs and edge graphs by using a double-graph convolution network determined by parameters;
(e) and predicting the relation between the entities in a future period of time according to the entity representation and the historical relation by utilizing the semantic matching model determined by the parameters, and constructing a new temporal knowledge graph according to the relation which is likely to occur in the future period of time.
Specifically, a semantic matching model determined by the parameters is used for calculating a matching score according to the entity representation and the historical relationship representation, and predicting a relationship which may occur in a future period of time according to the matching score, namely, a corresponding relationship of a part with a larger matching score can be selected as a relationship which may occur in the future period of time; a new temporal knowledge graph may be constructed based on the relationships that may occur over the future period of time.
According to the temporal knowledge graph representation method, the original graph and the edge graph are constructed, the double-graph convolution network is introduced for representation learning, the correlation between entities and the correlation between historical relations can be utilized, the model performance is improved, namely the prediction accuracy of the relation which is likely to occur in a period of time in the future is improved, and the method has a wide application prospect in the fields of international relation prediction, social network analysis and the like.
The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.