Detailed Description
The embodiment of the application provides a training method of an encoder, an information recommendation method and a related device, which not only fully considers the interaction condition between nodes, but also can improve the network performance by adopting the self-encoder structure in the training process, so that the encoding result is more accurate.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that the present application provides an encoder training method and an information recommendation method implemented based on an Artificial Intelligence (AI) technology, where the encoder training method and the information recommendation method are specifically applied to a Graph Network (Graph Network), which is a generalized Artificial neural Network based on a Graph structure, where the Graph structure represents a set of nodes, and the nodes are connected through a series of edges. Graph-based structures have a wide range of applications, for example, in airline traffic, where nodes represent airports and edges represent direct flights between two airports. For example, in the social network, since most of data available in reality are not labeled, in order to fully utilize the data, unsupervised learning in the graph network can be adopted, learning is performed in the unlabeled data, and a good basis is provided for subsequent tasks.
For convenience of understanding, the present application provides an information recommendation method, which is applied to an information recommendation system shown in fig. 1, please refer to fig. 1, where fig. 1 is an architecture schematic diagram of the information recommendation system in an embodiment of the present application, as shown in the figure, a graph attention network auto-encoder (GATAE) is trained on a server side, and the trained GATAE is stored on the server side, specifically, the description will be given in conjunction with fig. 2, fig. 2 is a flowchart schematic diagram of the information recommendation method in the embodiment of the present application, in step S1, the server acquires data of each node in a graph form, in step S2, the server acquires a feature vector of each node, and trains by using a self-encoder framework based on the feature vector of each node, so as to acquire the GATAE through training.
When a client initiates a service recommendation request to a server, the server determines a target user according to a user identifier carried in the service recommendation request, and further obtains a user related to the target user, and if the target user is a target node in a graph structure, the user related to the target user is another node adjacent to the target node, specifically, please continue to refer to fig. 2, in step S3, the server adopts a trained GATAE to output coding features of the target node and the adjacent nodes respectively, and calculates a distance between each adjacent node and the target node by using the coding features, in step S4, according to the distance between each adjacent node and the target node obtained in step S3, selects an adjacent node having a distance smaller than a threshold as a node to be recommended.
It should be noted that the client is disposed on a terminal device, where the terminal device includes but is not limited to a tablet computer, a notebook computer, a palm computer, a mobile phone, a voice interaction device, and a Personal Computer (PC), and is not limited herein.
It should be understood that the technical solutions provided in the present application relate specifically to the field of Machine Learning (ML) based on artificial intelligence. The artificial intelligence is a theory, a method, a technology and an application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense the environment, acquire knowledge and obtain the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
The scheme provided in the embodiment of the present application relates to technologies such as machine learning of artificial intelligence, and is specifically described with reference to fig. 3 by the following embodiments, an embodiment of a training method for an encoder in the embodiment of the present application includes:
101. acquiring a feature vector set corresponding to N nodes according to first graph data, wherein the feature vector set comprises N feature vectors, each feature vector corresponds to one node in a graph, and N is an integer greater than or equal to 2;
in this embodiment, the encoder training device first obtains a feature vector set corresponding to N nodes according to first graph data, where the first graph data is obtained based on a graph structure, and the graph structure includes at least two nodes and one edge, where N is an integer greater than or equal to 2. Based on this, the feature vector set corresponding to the N nodes is represented as:
wherein H represents a feature vector set, N represents the total number of nodes, and i representsThe number (i) of the nodes is,
and F represents the dimensionality number corresponding to each feature vector.
It should be noted that the encoder training apparatus may be disposed in a server, or may be disposed in a terminal device, and this application takes the case of being disposed in a server as an example for description, however, this should not be construed as a limitation to this application. In addition, the process of extracting the feature vector set may be performed by GATAE, or may be performed in the process of data preprocessing, which is not limited here.
102. According to the feature vector set, coding the feature vector corresponding to each node through an attention network self-coder to obtain N coding vectors, wherein the attention network self-coder is used for coding the coding vectors in the N coding vectors of the self-coder through an attention network and has a corresponding relation with the feature vectors in the feature vector set;
in this embodiment, the encoder training device inputs the feature vector corresponding to each node into the GATAE, and the GATAE encodes the feature vector by using a graph attention network (GAT), and obtains N coding vectors after the feature vectors corresponding to the N nodes are all encoded, where the coding vectors may also be referred to as graph embedding (graph embedding). It can be understood that GAT proposes to use an attention mechanism to weight and sum features of adjacent nodes, in the GAT, each node in the graph may assign a different weight value to the adjacent node according to the features of the adjacent node, and after the attention is introduced, only the features of the adjacent node need to be concerned, and information of the whole graph does not need to be obtained.
Based on this, N code vectors obtained after the feature vector set is coded are represented as:
wherein H' represents N code vectors, N represents the total number of nodes, and i represents the ith code vectorThe node is a node of the network,
and F 'represents the number of dimensions corresponding to each coding vector, and F' and F may be equal or different, which is not limited herein.
The GATAE is trained based on the architecture of the self-encoder, which is a neural network that uses a back-propagation algorithm to make the output value equal to the input value, compresses the input feature vector into a potential spatial representation, i.e., a coded vector, and then reconstructs the output through this spatial representation (i.e., the coded vector).
103. Decoding the N coding vectors through a decoder to obtain second image data;
in this embodiment, the encoder training apparatus inputs the N encoded vectors output by the GATAE to the decoder, and the decoder calculates the second graph data in the following manner:
R=σ(H′H′T);
wherein R represents secondary picture data, H 'represents N coded vectors, H'TRepresents the transpose of H', and σ () represents the Sigmoid function.
104. And updating the first model parameter of the graph attention network self-encoder by adopting a loss function according to the first graph data and the second graph data.
In this embodiment, the encoder training apparatus updates the current first model parameter of the GATAE by using the loss function based on the first graph data and the second graph data. Specifically, it is assumed that the feature extracted to the node a is a feature vector a, the feature vector a is encoded to obtain an encoded vector a, and the feature vector a is decoded to obtain node a data in the second graph data, so that in the training process, a loss value between the node a data in the first graph data and the node a data in the second graph data is calculated by using a loss function, the loss value is used to trace back the GATAE, and the current model parameter is updated. One possible way of training is when the loss value converges, the GATAE training can be considered complete.
For convenience of introduction, please refer to fig. 4, where fig. 4 is a schematic flowchart of a process of performing graph network training based on an auto-encoder architecture in the embodiment of the present application, as shown in step a1, an encoder training apparatus obtains graph data a, extracts feature vector sets corresponding to N nodes from the graph data a, and inputs the feature vector sets to GATAE, or directly inputs the graph data a to GATAE and extracts feature vector sets corresponding to N nodes through GATAE. In step a2, the feature vector set is encoded by GATAE, thereby outputting N encoded vectors. In step A3, the N encoded vectors are input to the decoder, which outputs the graph data B in step a 4. In step a5, a loss value between map data a and map data B is calculated using a loss function.
In the embodiment of the application, a training method of an encoder is provided, which includes obtaining a feature vector set corresponding to N nodes according to first graph data, where the feature vector set includes N feature vectors, then performing encoding processing on the feature vectors corresponding to each node through a graph attention network self-encoder according to the feature vector set to obtain N encoding vectors, then performing decoding processing on the N encoding vectors through a decoder to obtain second graph data, and finally updating a first model parameter of the graph attention network self-encoder by using a loss function according to the first graph data and the second graph data. Through the mode, under the framework of the self-encoder, the self-encoder based on the graph attention network can distribute corresponding weights to different adjacent nodes and encode according to the importance of different adjacent nodes, so that the interaction condition between the nodes is fully considered, and the framework of the self-encoder adopted in the training process can improve the performance of the network, and the encoding result is more accurate.
Optionally, on the basis of the embodiment corresponding to fig. 3, in a first optional embodiment of the training method for an encoder provided in the embodiment of the present application, acquiring a feature vector set corresponding to N nodes according to first graph data may include:
acquiring data corresponding to each node in the N nodes according to the first graph data;
generating a characteristic vector corresponding to each node according to the data corresponding to each node in the N nodes;
and acquiring a characteristic vector set according to the characteristic vector corresponding to each node in the N nodes.
In this embodiment, a way of generating a feature vector set is introduced, where an encoder training apparatus first obtains first graph data, where the graph data is data based on a graph structure, and the graph structure is a data structure, where a node in the graph structure may have 1 or more adjacent nodes, and a connection between two nodes is referred to as an edge.
Taking the social network as an example, the first graph data includes social data of each node, such as user age, user gender, user location, and user tag. After extracting the social data of the nodes, the data needs to be characterized, and for example, the gender of the user is assumed to be "male", the gender of the user is assumed to be "1", and the gender of the user is assumed to be "female", the gender of the user is assumed to be "2". Alternatively, if the user sex is "male", the characteristic of the user sex is represented by (1,0), and if the user sex is "female", the characteristic of the user sex is represented by (0, 1).
For example, assume that there are 5 types of user tags, namely "enthusiasm", "apathy", "fun", "basketball love", and "animation", wherein the feature of the user tag "fun" is denoted by "1", the feature of the user tag "apathy" is denoted by "2", the feature of the user tag "fun" is denoted by "3", the feature of the user tag "basketball love" is denoted by "4", and the feature of the user tag "animation" is denoted by "5". Alternatively, the feature of the user tag "fun" is represented by (1,0,0,0,0), the feature of the user tag "indifference" is represented by (0,1,0,0,0), the feature of the user tag "fun" is represented by (0,0,1,0,0), the feature of the user tag "basketball" is represented by (0,0,0,1,0), and the feature of the user tag "animation" is represented by (0,0,0,0, 1).
And extracting the features of all dimensions of each node according to the above mode, splicing all dimensions to obtain feature vectors, and obtaining a feature vector set when N feature vectors are obtained.
Secondly, in the embodiment of the present application, a manner of generating a feature vector set is provided, and through the manner, a feature vector corresponding to each node can be generated by using graph data and used for subsequent GATAE training, thereby improving the feasibility of the scheme.
Optionally, on the basis of each embodiment corresponding to fig. 3, in a second optional embodiment of the training method for an encoder provided in the embodiment of the present application, according to the feature vector set, the feature vector corresponding to each node is encoded by the graph attention network self-encoder to obtain N encoded vectors, where the method includes:
the encoder training device acquires M adjacent nodes corresponding to any node aiming at any node in the N nodes, wherein M is an integer greater than or equal to 1;
acquiring a feature vector of any node and a feature vector of each adjacent node in M adjacent nodes according to the feature vector set;
acquiring (M +1) attention coefficients through a graph attention network self-encoder based on the feature vector of any node and the feature vector of each adjacent node, wherein the (M +1) attention coefficients comprise the attention coefficient of any node;
and acquiring a coding vector corresponding to any node through a graph attention network self-encoder based on the (M +1) attention coefficients.
In this embodiment, a method for encoding a feature vector by using GATAE is introduced, for convenience of description, any one of N nodes is taken as an example, and it should be noted that other nodes are also encoded in a similar manner, which is not described herein again.
Specifically, for convenience of understanding, reference will be made to fig. 5 in the following description, please refer to fig. 5, where fig. 5 is a diagram structure diagram of first diagram data in an embodiment of the present application, as shown in the diagram, there are 11 nodes in the diagram, that is, N is 11, and there are connecting edges between nodes, taking anode 1 as an example, and adjacent nodes of thenode 1 are anode 2, anode 3, anode 4, and anode 5, that is, M is 4. Then, feature vectors of thenode 1, thenode 2, thenode 3, thenode 4, and thenode 5 need to be obtained, that is, (M +1) feature vectors are obtained. The attention coefficients are calculated for the feature vectors of the respective nodes by GATAE, and (M +1) attention coefficients are obtained. Since not only the influence of the neighboring nodes but also the attention of the node to itself needs to be considered, takingnode 1 as an example, 5 attention coefficients need to be calculated, and finally, the 5 attention coefficients are used to encodenode 1 to obtain a corresponding encoding vector.
Secondly, in the embodiment of the present application, a method for encoding a feature vector by using a GATAE is provided, and by the above manner, an attention coefficient between each node and an adjacent node can be calculated, and the importance degree of each node is determined by the attention coefficient, so that the robustness of the GATAE is better. In addition, because the calculation objects are only adjacent nodes, the whole graph does not need to be visited, and only the adjacent nodes need to be concerned, so that the efficiency of model training and model prediction is improved.
Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 3, in a third optional embodiment of the training method for an encoder provided in the embodiment of the present application, based on the feature vector of any node and the feature vector of each neighboring node, obtaining (M +1) attention coefficients from the encoder through the attention network may include:
calculating M original attention coefficients through a graph attention network self-encoder based on the feature vector of any node and the feature vector of each adjacent node;
calculating an original attention coefficient of any node through a graph attention network self-encoder based on the feature vector of any node;
and (2) carrying out normalization processing on the original attention coefficient of any node and each original attention coefficient in the M original attention coefficients to obtain (M +1) attention coefficients.
In this embodiment, a method for calculating an attention coefficient is described, for convenience of description, an arbitrary node of the N nodes is taken as an example for description, and it should be noted that other nodes also calculate the attention coefficient in a similar manner, which is not described herein again.
Specifically, for easy understanding, please refer to fig. 6, fig. 6 is a schematic diagram of an attention mechanism of the graph-based attention network self-encoder in the embodiment of the present application, and as shown in the figure, taking
node 1 as an example, it is assumed that adjacent nodes of
node 1 are
node 2,
node 3 and
node 4, respectively, where a feature vector of
node 1 is
The feature vector of
node 2 is
The feature vector of
node 3 is
The feature vector of the
node 4 is
Thus, the original attention coefficients of the
node 1 to itself, the
node 2 to the
node 1, the
node 3 to the
node 1, and the
node 4 to the
node 1 are calculated respectively. In the process of calculating the original attention coefficient, a weight matrix needs to be trained for all nodes, namely
Wherein, W is a weight matrix, F represents the number of dimensions corresponding to each feature vector, and F' represents the number of dimensions corresponding to each encoding vector.
Referring to fig. 7, fig. 7 is a schematic diagram of an attention coefficient obtaining method based on a graph attention network self-encoder in the embodiment of the present application, and as shown in the figure, an original attention coefficient is calculated as follows:
wherein e is
ijThe importance of the jth node to the ith node, i.e. the original attention coefficient of the ith node,
a feature vector representing the ith node,
represents the eigenvector of the jth node, a () represents a function, and W represents the weight matrix.
Based on this, the original attention coefficient e is obtained in conjunction with FIG. 611(i.e., representing the importance ofnode 1 to itself), the original attention coefficient e12(i.e., representing the importance ofnode 2 to node 1), the original attention coefficient e13(i.e., representing the importance ofnode 3 to node 1) and the original attention coefficient e14(i.e., representing the importance ofnode 4 to node 1).
In order to make the original attention coefficients easier to compare, softmax is introduced to normalize all the original attention coefficients, and thus obtain the attention coefficients. The attention coefficient is calculated as follows:
wherein alpha isijIndicating the attention coefficient of the ith node, eijRepresenting the original attention coefficient, N, of the ith nodeiRepresenting all the neighbors of the ith node (including the ith node itself).
Based on this, attention coefficients α are obtained in conjunction with fig. 6, respectively11(i.e., representing the importance ofnode 1 to itself), attention coefficient α12(i.e., representing the importance ofnode 2 to node 1), attention coefficient α13(i.e., representing the importance ofnode 3 to node 1) and attention coefficient α14(i.e., representing the importance ofnode 4 to node 1).
In the embodiment of the present application, a method for calculating an attention coefficient is provided, and through the above manner, normalization processing needs to be performed on the calculated original attention coefficient and the original attention coefficient, and the contribution degree of each attention coefficient can be the same by using the normalization processing, so that the accuracy of the model can be improved.
Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 3, in a fourth optional embodiment of the training method for an encoder provided in the embodiment of the present application, based on (M +1) attention coefficients, obtaining, by a graph attention network self-encoder, an encoding vector corresponding to any node may include:
acquiring K output results corresponding to any node through a graph attention network self-encoder based on (M +1) attention coefficients corresponding to any node, a feature vector of any node and a feature vector of each adjacent node, wherein K is an integer greater than or equal to 1;
if K is equal to 1, determining the output result as the coding vector corresponding to any node;
and if K is larger than 1, performing splicing processing on the K output results to obtain a coding vector corresponding to any node, or performing average processing on the K output results to obtain a coding vector corresponding to any node.
In this embodiment, a method for performing encoding based on a multi-head attention mechanism is introduced, for convenience of description, an arbitrary node of N nodes is taken as an example for description, and it should be noted that other nodes also determine an encoding vector in a similar manner, which is not described herein again.
Specifically, K output results corresponding to any node are obtained by the graph attention network self-encoder, if K is equal to 1, it indicates that the current output result is the encoding result, and the encoding vector corresponding to the node may be calculated in the following manner:
wherein,
representing the code vector corresponding to the ith node, alpha
ijDenotes the attention coefficient of the ith node, i denotes the ith node, j denotes the jth node, σ () denotes the Sigmoid function, N
iAll the neighbors of the ith node (including the ith node itself), W represents the weight matrix,
representing the feature vector of the jth node.
In practical applications, if a node is encoded based on a multi-head attention mechanism, multiple output results need to be processed, for convenience of description, please refer to fig. 8, where fig. 8 is a schematic diagram of a multi-head attention mechanism based on a graph attention network self-encoder in an embodiment of the present application, as shown in fig. 8, fig. 8 is a schematic diagram when K is equal to 3 (i.e., 3 heads), each arrow represents an independent attention calculation, and aggregated features from each head are spliced or averaged to obtain a corresponding encoding vector.
One possible way is to compute the code vector in a concatenated way:
wherein,
which represents the result of the k-th output,
indicating the coding vector corresponding to the ith node, K indicating the kth head based on the attention mechanism, K indicating the total number of heads based on the attention mechanism,
denotes the attention coefficient of the ith node in the k headers, i denotes the ith node, j denotes the jth node, σ () denotes the Sigmoid function, N
iAll representing the ith nodeNeighboring nodes (including the ith node itself), W
kA weight matrix representing the kth head,
the feature vector representing the jth node, | | | represents the splice.
Another possible way is to compute the code vector in a concatenated way:
if a multi-head attention mechanism is performed on the final network layer, the stitching operation is no longer used, but rather the average of the K heads is calculated in the manner described above.
In the embodiment of the present application, a method for encoding based on a multi-head attention mechanism is provided, and through the above method, a learning process of self-attention can be more stable based on the multi-head attention mechanism, so that the GATAE can learn related information in different representation spaces, and thus robustness of the GATAE is improved.
Optionally, on the basis of the foregoing respective embodiments corresponding to fig. 3, in a fifth optional embodiment of the training method for an encoder provided in the embodiment of the present application, updating the first model parameter of the graph attention network self-encoder by using the loss function according to the first graph data and the second graph data may include:
determining a second model parameter by adopting a cross entropy loss function according to the first graph data and the second graph data;
updating the first model parameter of the graph attention network self-encoder to a second model parameter;
after updating the first model parameter of the graph attention network self-encoder by using the loss function according to the first graph data and the second graph data, the method may further include:
and if the model training condition is met, stopping updating the model parameters of the self-encoder of the attention network.
In this embodiment, a method for training a GATAE by using a cross entropy loss function is introduced, and after first graph data and second graph data are obtained, model parameters of the GATAE are updated by using a minimum loss function.
Specifically, the following cross entropy loss function is employed:
where loss represents the result of the cross entropy loss function, N represents the total number of nodes, i represents the ith node,
represents data corresponding to the ith node in the first graph data, y
iAnd represents data corresponding to the ith node in the second graph data.
Secondly, in the embodiment of the application, a method for training the GATAE by using the cross entropy loss function is provided, and through the above manner, the gradient of the cross entropy loss function to the last layer of weight of the GATAE is no longer related to the derivative of the activation function, but is only in direct proportion to the difference between the output value and the true value, so that the convergence speed is high, and since the back propagation is continuous multiplication, the updating of the whole weight matrix is accelerated, and the training efficiency of the GATAE is further improved.
Optionally, on the basis of the foregoing embodiments corresponding to fig. 3, in a sixth optional embodiment of the training method for an encoder provided in the embodiment of the present application, if the model training condition is satisfied, stopping updating the model parameter of the graph attention network self-encoder may include:
if the result of the cross entropy loss function is smaller than or equal to the loss threshold, determining that the model training condition is met, and stopping updating the model parameters of the self-encoder of the attention network;
or if the iteration times reach the time threshold, determining that the model training condition is met, and stopping updating the model parameters of the self-encoder of the attention network.
In this embodiment, two methods for updating the model parameters of the GATAE are introduced, in the GATAE training process, the encoder training device needs to determine whether a model training condition is satisfied, if so, the model training is stopped, and the model parameters obtained after the last iteration are used as the model parameters of the GATAE. If the condition is not met, the iterative training continues.
The first model training condition is to determine whether the result of the cross entropy loss function is less than or equal to a loss threshold, specifically, the loss threshold may be set to 0.001, 0.005, 0.01, 0.02 or other values approaching 0, and assuming that the loss threshold is 0.001, the result of the cross entropy loss function is less than or equal to 0.0001, that is, the model training condition is satisfied. It is to be understood that the examples in this implementation are only for understanding the present solution, and the loss threshold should be flexibly determined in combination with the actual situation.
The second model training condition is to determine whether the number of iterations reaches a number threshold, specifically, the number threshold may be set to 10000, 50000, 100000, 200000, or other values, and if the number threshold is 10000, the number of iterations reaches 10000, and the model training condition is satisfied. It is understood that the example in this implementation is only for understanding the present solution, and the time threshold should be flexibly determined in combination with the actual situation.
In the embodiment of the application, two methods for updating the model parameters of the GATAE are provided, and through the method, a model training condition for the GATAE can be selected according to the actual situation, so that the flexibility and the feasibility of model training are improved.
In order to further verify the technical scheme provided by the application, a series of experiments are performed on GATAE provided by the application, and in an experimental setting, the adopted data sets include a Cora data set, a citeser data set and a Reddit data set, wherein the Cora data set and the citeser data set are derived from a citation network, namely a network formed by citation relations between papers and papers, co-authors and the like, and the Reddit data set is derived from a social network, specifically from posts of a forum, and if two posts are commented by the same person, the two posts are considered to be associated when composing a picture. Referring to table 1, table 1 shows the specific configurations of the Cora, CiteSeer, and Reddit datasets.
TABLE 1
| Number of nodes | Number of edges | Number of features |
| Cora data set | 2708 | 10556 | 1433 |
| CiteSeer dataset | 3327 | 9104 | 3703 |
| Reddit dataset | 232965 | 1146158892 | 602 |
Based on the composition of the data sets in Table 1, in training GATAE, a Stochastic Gradient Descent (SGD) optimizer may be used to train on an index with a learning rate of 1e-3, a L2 penalty of 1e-3, and a round (Epoch) of 10. The training task adopts connection prediction, namely, whether a connection exists between two nodes is predicted by using graph embedding (embedding) output by a model, and Area Under a Curve (AUC) is used as an evaluation index. In order to more intuitively see the effect of the GATAE provided by the present application in practical application, 4 types of models for comparison, namely, a Spectral Clustering (SC) model, a Deep Walk (DW) model, a graph auto-encoder (GAE), and a variational graph auto-encoder (VGAE), are introduced in an experimental process, please refer to table 2, and table 2 is a result of verification of various types of network models on different data sets.
TABLE 2
| Model (model) | Cora data set | CiteSeer dataset | Reddit dataset |
| SC | 84.6±0.01 | 80.2±0.02 | 84.2±0.02 |
| DW | 83.1±0.01 | 80.5±0.02 | 84.4±0.001 |
| GAE | 83.91±0.49 | 78.7±0.01 | 82.2±0.02 |
| VGAE | 84.28±0.15 | 78.9±0.03 | 82.7±0.02 |
| GATAE | 88.48±0.06 | 84.9±0.02 | 96.4±0.001 |
Obviously, based on table 2, it can be seen that the GATAE provided by the present application has higher performance on various data sets than other types of models, and is particularly more sensitive to the Reddit data set derived from the social network, and the recommendation effect is better.
With reference to the above description, the following describes an information recommendation method provided in an embodiment of the present application, and referring to fig. 9, an embodiment of the information recommendation method in the embodiment of the present application includes:
201. receiving an information recommendation request sent by a client, wherein the information recommendation request carries an identifier of a target node and an identifier of the client;
in this embodiment, an information recommendation device receives an information recommendation request sent by a client, where the information recommendation request carries an identifier of the client and an identifier of a target node, and the target node has different meanings in different scenes, for example, in a friend recommendation scene, the target node represents a target user. For another example, in a product recommendation scenario, the target node represents a target product. For another example, in a movie recommendation scenario, the target node represents a target movie.
It should be noted that the information recommendation device may be deployed in a server, or may be deployed in a terminal device, and the information recommendation device is deployed in the server for example, and the times are not limited in this application.
202. According to the information recommendation request, acquiring a feature vector corresponding to a target node and feature vectors corresponding to P adjacent nodes, wherein P is an integer greater than or equal to 1;
in this embodiment, the information recommendation device determines a corresponding target node based on an identifier of the target node carried in the information recommendation request, and determines P neighboring nodes having a connection relationship with the target node according to the graph structure, so as to obtain data of the target node and data of the P neighboring nodes according to the graph data. Then, the information recommendation device may obtain the corresponding feature vector according to the data of the target node, and obtain the feature vector corresponding to each neighboring node according to the data of the P neighboring nodes, respectively.
It can be understood that, in the above embodiments, how to generate the corresponding feature vector according to the data has been described, and therefore, details are not described here. In addition, the process of extracting the feature vector may be performed by GATAE, or may be performed in the process of data preprocessing, which is not limited here.
203. Acquiring a target coding vector through a graph attention network self-encoder based on the feature vector corresponding to the target node, wherein the graph attention network self-encoder is obtained by adopting the training of the embodiment;
in this embodiment, the information recommendation device inputs the feature vector of the target node into the GATAE, and the GATAE encodes the feature vector of the target node, thereby obtaining a target encoding vector.
204. Acquiring P coding vectors through a graph attention network self-encoder based on the feature vectors corresponding to the P adjacent nodes, wherein the coding vectors in the P coding vectors have corresponding relations with the adjacent nodes;
in this embodiment, the information recommendation device inputs the feature vectors corresponding to P neighboring nodes to the GATAE, and the GATAE encodes the feature vector corresponding to each neighboring node, thereby obtaining P encoded vectors.
205. Determining P inter-node distances according to the target coding vector and the P coding vectors, wherein the inter-node distance in the P inter-node distances represents the distance between the target node and the adjacent node;
in this embodiment, the information recommendation device calculates the inter-node distance between each of the P code vectors and the target code vector, assuming that the target node is node a, and the P nodes are node B, node C, and node D, respectively, so as to calculate the distances between node a and node B, obtaininter-node distance 1, calculate the distance between node a and node C, obtaininter-node distance 2, and calculate the distance between node a and node D, obtaininter-node distance 3.
It is understood that the inter-node distance may be a cosine distance or a euclidean distance, etc., and is not limited herein.
206. Arranging P nodes from small to large, and taking the adjacent nodes arranged at the first Q nodes as a node set to be recommended, wherein Q is an integer which is greater than or equal to 1 and less than or equal to P;
in this embodiment, the information recommendation device needs to arrange the distances between P nodes from small to large, for convenience of description, please refer to table 3, where table 3 is an arrangement result obtained by arranging the distances between P nodes.
TABLE 3
| Target node | Neighboring node | Distance between nodes |
| Node A | Node B | 0.02 |
| Node A | Node C | 0.19 |
| Node A | Node D | 0.23 |
| Node A | Node E | 0.30 |
| Node A | Node F | 0.50 |
| Node A | Node G | 0.66 |
| Node A | Node H | 0.73 |
As shown in table 3, assuming that P is 7, thereby obtaining 7 inter-node distances, and assuming that Q is 1, the node B is determined as a node to be recommended in the set of nodes to be recommended. And assuming that Q is 3, determining the node B, the node C and the node D as the nodes to be recommended in the node set to be recommended.
207. And pushing a node set to be recommended to the client according to the information recommendation request.
In this embodiment, the information recommendation device determines the corresponding client based on the identifier of the client carried in the information recommendation request, and then pushes the node set to be recommended to the client, so that the client can display the node set to be recommended.
As will be explained in conjunction with the friend recommendation scenario below, when considering the interest of the user a, the attributes of the friends of the user a may be referred to based on the attributes of the user a, which is obviously unreasonable if the influence of all the friends of the user a on the user a is the same. For example, there are many store users in friends of the user a, and although the user a does not care about the store users, the store users will cause great interference when making friend recommendations for the user a, and thus the user a is mistakenly considered to like the store users. The GATAE provided by the application can automatically evaluate the relationship between the user A and the friends, so that the user A is recommended with more high-quality friends by referring to the friends with close relationship.
In a specific scenario, assuming that the friends of the user A like playing basketball, it is highly likely that the user A also likes playing basketball, but the user A is not explicitly written in the personal introduction of the user A but serves as the personal hidden attribute of the user A, and the GATAE can discover the hidden attribute of the user A, so that the user A has a more comprehensive knowledge of the information of the user A.
The embodiment of the application provides an information recommendation method, which includes the steps of firstly receiving an information recommendation request sent by a client, obtaining a feature vector corresponding to a target node and feature vectors corresponding to P adjacent nodes according to the information recommendation request, obtaining a target coding vector and P coding vectors through a graph attention network self-encoder, determining the distance between the P nodes according to the target coding vector and the P coding vectors, and finally arranging the distance between the P nodes from small to large to use the previous Q adjacent nodes as a node set to be recommended, so that the node set to be recommended is pushed to the client according to the information recommendation request. By the method, the GATAE is utilized to encode the target node of the graph structure with the adjacent nodes, the importance degree of different adjacent nodes to the target node and the interaction request between the adjacent nodes and the target node can be fully considered, and therefore the accuracy and reliability of information recommendation are improved.
Optionally, on the basis of the embodiment corresponding to fig. 9, in a first optional embodiment of the information recommendation method provided in the embodiment of the present application, the obtaining, according to the information recommendation request, the feature vector corresponding to the target node and the feature vectors corresponding to P neighboring nodes may include:
according to the information recommendation request, user data corresponding to a target user are obtained, wherein the user data comprise at least one of user gender, user age, user label information, a region where the user is located and user occupation, and the target user and a target node have a corresponding relation;
acquiring user data corresponding to each associated user in P associated users, wherein the associated users have one-to-one correspondence with adjacent nodes;
generating a feature vector corresponding to a target user according to user data corresponding to the target user;
generating a feature vector corresponding to each associated user in the P associated users according to the user data corresponding to each associated user in the P associated users;
pushing a node set to be recommended to a client according to an information recommendation request, which may include:
and sending the information of the Q associated users to the client of the target user according to the information recommendation request so as to enable the client to display the information of the Q associated users.
In the embodiment, an information recommendation method based on a friend recommendation scene is introduced. Specifically, the information recommendation device determines a target node according to the information recommendation request, where the target node may be a target user, and a neighboring node of the target user is an associated user (for example, a user in a buddy list) of the target user, and respectively obtains user data of the target user and the associated user, and it can be understood that the user data includes at least one of user gender, user age, user tag information, a region where the user is located, and user occupation. And then, carrying out feature processing on the user data to obtain a corresponding feature vector, then obtaining a target coding vector of a target user through GATAE, and obtaining a coding vector of each associated user through GATAE. If the target user has P associated users, determining P inter-node distances according to the target coding vector and the P coding vectors, and finally selecting the first Q inter-node distances with the minimum distance according to the P inter-node distances, thereby determining the corresponding Q nodes to be recommended (namely determining Q associated users).
For convenience of introduction, please refer to fig. 10, where fig. 10 is a diagram structure diagram of a friend recommendation scene based on an embodiment of the present application, and as shown in the diagram, it is assumed that a target user has 9 associated users, that is, M is 9, and user data of the target user and the 9 associated users are respectively obtained, as shown in table 4, where table 4 is a diagram of user data of each user.
TABLE 4
| Gender of user | Age of the user | User tag information | User's area | User occupation |
| Target user | For male | 28 | Cartoon | Shenzhen (Shenzhen medicine) | Engineer(s) |
| Tom | For male | 32 | Basketball | Shenzhen (Shenzhen medicine) | Doctor |
| Tim | For male | 28 | Cartoon | Shenzhen (Shenzhen medicine) | Engineer(s) |
| Mingming liquor | For male | 25 | Football game | Beijing | Have no business |
| Red and red | Woman | 33 | Exercise of sports | Shanghai province | Lawyer |
| Baby medicine | Formale | 30 | Food | All of the achievements | Cook (a cook) |
| Zhao Jie (Miss Zhao) | Woman | 48 | Shopping | Shenzhen (Shenzhen medicine) | Engineer(s) |
| Lige | For male | 50 | Exercise of sports | Shanghai province | Lawyer |
| What is in small | Woman | 22 | Basketball | Hangzhou province | Secretary |
| Wei (a Chinese character of' Wei | Woman | 20 | Cartoon | Shanghai province | Student's desk |
As can be seen from table 4, each user has data of multiple dimensions, that is, including user gender, user age, user tag information, user location area, and user occupation, it is understood that data of other dimensions may also be obtained in practical applications, which is only an illustration here. Based on the user data in table 4, the inter-node distance between each associated user and the target user can be further calculated, and the inter-node distances are arranged from small to large, please refer to table 5, where table 5 is an illustration of the arrangement of the inter-node distances from small to large.
TABLE 5
| Associating users | Distance between nodes |
| Tom | 0.02 |
| Mingming liquor | 0.19 |
| What is in small | 0.23 |
| Zhao Jie (Miss Zhao) | 0.30 |
| Wei (a Chinese character of' Wei | 0.50 |
| Tim | 0.66 |
| Lige | 0.73 |
| Baby medicine | 0.85 |
| Red and red | 0.94 |
As can be seen from table 5, assuming that Q is set to 4, that is, the associated users with the distances between the nodes arranged in the top 4 are obtained, the set of nodes to be recommended includes the associated user "Tom", the associated user "ming", the associated user "small" and the associated user "miss Zhao", so that the information of the 4 associated users is pushed to the client of the target user, please refer to fig. 11, where fig. 11 is an information recommendation interface schematic diagram based on a friend recommendation scene in the embodiment of the present application, as shown in the figure, the target user can find the friend information recommended by the system through the information recommendation interface, that is, the information of "Tom", the information of "ming", the information of "small" and the information of "miss".
Secondly, in the embodiment of the application, an information recommendation method based on a friend recommendation scene is provided, and through the method, the importance degree of different associated users to a target user can be fully considered, so that associated user information which meets requirements better can be pushed to the target user in the friend recommendation scene.
Optionally, on the basis of each embodiment corresponding to fig. 9, in a second optional embodiment of the information recommendation method provided in the embodiment of the present application, the obtaining, according to the information recommendation request, the feature vector corresponding to the target node and the feature vectors corresponding to P neighboring nodes may include:
according to the information recommendation request, acquiring commodity data corresponding to a target commodity, wherein the commodity data comprises at least one item of commodity name, commodity category, commodity sales volume, commodity production place and commodity evaluation information, and the target commodity and a target node have a corresponding relation;
acquiring commodity data corresponding to each associated commodity in the P associated commodities, wherein the associated commodities have a one-to-one correspondence relationship with adjacent nodes;
generating a characteristic vector corresponding to the target commodity according to commodity data corresponding to the target commodity;
generating a feature vector corresponding to each associated commodity in the P associated commodities according to commodity data corresponding to each associated commodity in the P associated commodities;
pushing a node set to be recommended to a client according to an information recommendation request, which may include:
and sending the Q related commodities to the client according to the information recommendation request so as to enable the client to display the Q related commodities.
In the commodity recommendation scenario, if two commodities are purchased by the same person, the two commodities are considered to be related during composition, for example, if a user A purchases a brand B dumbbell and a brand A phone at the same time, the brand B dumbbell and the brand A phone are adjacent nodes to each other. Specifically, the information recommendation device first determines a target node according to the information recommendation request, where the target node may be a target commodity, and an adjacent node of the target commodity is a related commodity of the target commodity, and obtains commodity data of the target commodity and the related commodity respectively, where it is understood that the commodity data includes at least one of a commodity name, a commodity category, a commodity sales volume, a commodity production place, and commodity evaluation information. And then carrying out feature processing on the commodity data to obtain a corresponding feature vector, then obtaining a target coding vector of the target commodity through GATAE, and obtaining a coding vector of each associated commodity through GATAE. If the target commodity has P associated commodities, determining P inter-node distances according to the target coding vector and the P coding vectors, and finally selecting the first Q inter-node distances with the minimum distance according to the P inter-node distances so as to determine the corresponding Q nodes to be recommended (namely determining Q associated commodities).
For convenience of introduction, please refer to fig. 12, where fig. 12 is a diagram structure diagram of a recommendation scene based on a commodity in the embodiment of the present application, as shown in the diagram, assuming that a target commodity has 9 associated commodities, that is, M is 9, commodity data of the target commodity and the 9 associated commodities are respectively obtained, assuming that the target commodity is "K brand electric fan", as shown in table 6, table 6 is a diagram of commodity data of each commodity.
TABLE 6
As can be seen from table 6, each commodity has data of multiple dimensions, that is, the data includes a commodity name, a commodity category, a commodity sales volume, a commodity origin and commodity evaluation information, and it can be understood that data of other dimensions can also be obtained in practical applications, which is only one example here. Based on the commodity data in table 6, the inter-node distance between each associated commodity and the target commodity may be further calculated, and the inter-node distances may be arranged from small to large, see table 7, where table 7 is an illustration of arranging the inter-node distances in order from small to large.
TABLE 7
| Associated merchandise | Distance between nodes |
| Brand A telephone | 0.03 |
| B brand mobile phone | 0.20 |
| Brand C printer | 0.28 |
| D brand projector | 0.30 |
| G brand game machine | 0.49 |
| Brand A television | 0.58 |
| Unmanned plane brand F | 0.67 |
| G brand dumbbell | 0.72 |
| H brand potato chips | 0.84 |
As can be seen from table 7, assuming that Q is set to 4, that is, a related commodity with a distance between nodes arranged in the top 4 bits is obtained, the set of nodes to be recommended includes a related commodity "brand a phone", a related commodity "brand B phone", a related commodity "brand C printer", and a related commodity "brand D projector", and then the 4 related commodities are pushed to the client, please refer to fig. 13, fig. 13 is an information recommendation interface schematic diagram based on a commodity recommendation scene in the embodiment of the present application, as shown in the figure, when the user selects "brand K electric fan" on the client, the related commodities recommended by the system for the user can be found through the information recommendation interface, that is, the related commodities are "brand a phone", "brand B brand phone", "brand printer", and "brand D projector".
Secondly, in the embodiment of the application, an information recommendation method based on a commodity recommendation scene is provided, and through the method, the importance degree of different associated commodities to a target commodity can be fully considered, so that associated commodity information which meets requirements better can be pushed in the commodity recommendation scene.
Optionally, on the basis of each embodiment corresponding to fig. 9, in a third optional embodiment of the information recommendation method provided in the embodiment of the present application, obtaining the feature vector corresponding to the target node and the feature vectors corresponding to P neighboring nodes according to the information recommendation request may include:
according to the information recommendation request, acquiring film data corresponding to a target film, wherein the film data comprises at least one item of film name, film category, film box office information, film participant information, film evaluation information, film duration and film playing effect, and a target user and a target node have a corresponding relation;
acquiring film data corresponding to each associated film in the P associated films, wherein the associated films and adjacent nodes have one-to-one correspondence;
generating a feature vector corresponding to the target film according to the film data corresponding to the target film;
generating a feature vector corresponding to each associated film in the P associated films according to the film data corresponding to each associated film in the P associated films;
pushing a node set to be recommended to a client according to an information recommendation request, which may include:
and sending the Q associated films to the client according to the information recommendation request so that the client displays the Q associated films.
In the present embodiment, an information recommendation method based on a movie recommendation scenario is introduced, where in the movie recommendation scenario, if two movies are watched by the same person, the two movies are considered to be related during composition, for example, if a user a simultaneously watches "movie C" and "movie M", then "movie C" and "movie M" are mutually adjacent nodes. Specifically, the information recommendation device first determines a target node according to the information recommendation request, where the target node may specifically be a target movie, and an adjacent node of the target movie is an associated movie of the target movie, and respectively obtains movie data of the target movie and the associated movie, and it can be understood that the movie data includes at least one of a movie name, a movie category, movie box room information, movie participant information, movie evaluation information, a movie duration, and a movie playing effect. And then, performing feature processing on the film data to obtain corresponding feature vectors, then obtaining a target coding vector of the target film through GATAE, and obtaining a coding vector of each associated film through GATAE. If the target film has P associated films, the distances between P nodes are further determined according to the target coding vectors and the P coding vectors, and finally the distances between the first Q nodes with the minimum distance are selected according to the distances between the P nodes, so that the corresponding Q nodes to be recommended (namely Q associated films) are determined.
For convenience of introduction, please refer to fig. 14, fig. 14 is a diagram structure diagram of a recommended scene based on films in the embodiment of the present application, and as shown in the diagram, assuming that a target film has 9 associated films, that is, M is 9, film data of the target film and the 9-bit associated film are respectively obtained, assuming that the target film is "film D", as shown in table 8, table 8 is a diagram of film data of each film.
TABLE 8
| Film box office information | Film category | Duration of film | Film evaluation information | Film playing effect |
| Film D | 1.1 hundred million | Science magic sheet | 126 minutes | Goodness 65% | 2D Effect |
| Film A | 1.6 hundred million | Action piece | 135 minutes | The goodness degree is 85% | 3D effects |
| Film T | 3500 ten thousand | Action piece | 111 minutes | The goodness degree is 70% | 3D effects |
| Film C | 7200 ten thousand | Cartoon | 120 minutes | The goodness degree is 80% | 2D Effect |
| Film Z | 1.2 hundred million | Love tablet | 128 minutes | The good score is 88% | 2D Effect |
| Film X | 9000 Wan | Action piece | 154 minutes | Good commentThe degree is 89% | 2D Effect |
| Film M | 1.5 hundred million | Science magic sheet | 178 minutes | The goodness degree is 72% | 3D effects |
| Film N | 1.1 hundred million | Science magic sheet | 135 minutes | The goodness rating is 90% | 2D Effect |
| Film H | 5000 ten thousand | Love tablet | 140 minutes | The goodness degree is 95% | 2D Effect |
| Film K | 9800 million | Cartoon | 142 minutes | The goodness rating is 78% | 3D effects |
As can be seen from table 8, each movie has data of multiple dimensions, that is, the data includes a movie category, movie box room information, movie rating information, movie duration, and movie playing effect, and it can be understood that data of other dimensions can also be obtained in practical applications, which is only one example here. Based on the movie data in table 8, the inter-node distance between each associated movie and the target movie may be further calculated, and the inter-node distances are arranged from small to large, see table 9, where table 9 is an illustration of arranging the inter-node distances from small to large.
TABLE 9
| Associated film | Distance between nodes |
| Film A | 0.11 |
| Film Z | 0.25 |
| Film H | 0.29 |
| Film K | 0.30 |
| Film T | 0.49 |
| Film N | 0.58 |
| Film M | 0.70 |
| Film X | 0.79 |
| Film C | 0.92 |
As can be seen from table 9, assuming that Q is 4, that is, an associated film with a distance between nodes arranged in the top 4 bits is obtained, the set of nodes to be recommended includes an associated film "film a", an associated film "film Z", an associated film "film H", and an associated film "film K", and then the 4 associated films are pushed to the client, please refer to fig. 15, where fig. 15 is an information recommendation interface schematic diagram based on a film recommendation scene in the embodiment of the present application, as shown in the drawing, when a user selects "film D" on the client, the associated films recommended by the system are found through the information recommendation interface, that is, "film a", "film Z", "film H", and "film K".
Secondly, in the embodiment of the application, an information recommendation method based on a film recommendation scene is provided, and through the method, the importance degree of different associated films to a target film can be fully considered, so that associated film information which meets requirements better can be pushed in the film recommendation scene.
Referring to fig. 16, fig. 16 is a schematic diagram of an embodiment of a training apparatus for a network in an embodiment of the present application, in which anencoder training apparatus 30 includes:
an obtainingmodule 301, configured to obtain a feature vector set corresponding to N nodes according to first graph data, where the feature vector set includes N feature vectors, each feature vector corresponds to a node in a graph, and N is an integer greater than or equal to 2;
theencoding module 302 is configured to perform encoding processing on the feature vector corresponding to each node through a graph attention network self-encoder according to the feature vector set to obtain N encoding vectors, where the graph attention network self-encoder is a self-encoder that performs encoding through a graph attention network, and a coding vector in the N encoding vectors and a feature vector in the feature vector set have a corresponding relationship;
adecoding module 303, configured to perform decoding processing on the N encoded vectors by using a decoder to obtain second graph data;
and thetraining module 304 is configured to update the first model parameter of the graph attention network self-encoder by using a loss function according to the first graph data and the second graph data.
In the embodiment of the application, the encoder training device is provided, and by adopting the device, under the framework of a self-encoder, the self-encoder based on the graph attention network can distribute corresponding weights to different adjacent nodes and encode according to the importance of different adjacent nodes, so that the interaction condition between the nodes is fully considered, and the framework of the self-encoder adopted in the training process can improve the performance of the network, and the encoding result is more accurate.
Alternatively, on the basis of the embodiment corresponding to fig. 16, in another embodiment of theencoder training device 30 provided in the embodiment of the present application,
an obtainingmodule 301, specifically configured to obtain, according to the first graph data, data corresponding to each node in the N nodes;
generating a characteristic vector corresponding to each node according to the data corresponding to each node in the N nodes;
and acquiring a characteristic vector set according to the characteristic vector corresponding to each node in the N nodes.
Secondly, in the embodiment of the present application, an encoder training device is provided, and with the above device, a feature vector corresponding to each node can be generated by using graph data, and is used for subsequent GATAE training, thereby improving the feasibility of the scheme.
Alternatively, on the basis of the embodiment corresponding to fig. 16, in another embodiment of theencoder training device 30 provided in the embodiment of the present application,
theencoding module 302 is specifically configured to, for any node in the N nodes, obtain M neighboring nodes corresponding to the any node, where M is an integer greater than or equal to 1;
acquiring a feature vector of any node and a feature vector of each adjacent node in M adjacent nodes according to the feature vector set;
acquiring (M +1) attention coefficients through a graph attention network self-encoder based on the feature vector of any node and the feature vector of each adjacent node, wherein the (M +1) attention coefficients comprise the attention coefficient of any node;
and acquiring a coding vector corresponding to any node through a graph attention network self-encoder based on the (M +1) attention coefficients.
Secondly, in the embodiment of the present application, an encoder training apparatus is provided, and with the above apparatus, an attention coefficient between each node and an adjacent node can be calculated, and the importance degree of each node is determined by the attention coefficient, so that the robustness of GATAE is better. In addition, because the calculation objects are only adjacent nodes, the whole graph does not need to be visited, and only the adjacent nodes need to be concerned, so that the efficiency of model training and model prediction is improved.
Alternatively, on the basis of the embodiment corresponding to fig. 16, in another embodiment of theencoder training device 30 provided in the embodiment of the present application,
theencoding module 302 is specifically configured to calculate M original attention coefficients through a graph attention network self-encoder based on the feature vector of any node and the feature vector of each neighboring node;
calculating an original attention coefficient of any node through a graph attention network self-encoder based on the feature vector of any node;
and (2) carrying out normalization processing on the original attention coefficient of any node and each original attention coefficient in the M original attention coefficients to obtain (M +1) attention coefficients.
Thirdly, in the embodiment of the present application, an encoder training device is provided, where the device needs to perform normalization processing on the calculated original attention coefficient and the original attention coefficient, and the normalization processing can make the contribution degree of each attention coefficient the same, so as to improve the model accuracy.
Alternatively, on the basis of the embodiment corresponding to fig. 16, in another embodiment of theencoder training device 30 provided in the embodiment of the present application,
theencoding module 302 is specifically configured to obtain, by an attention network self-encoder, K output results corresponding to any node based on (M +1) attention coefficients corresponding to any node, a feature vector of any node, and a feature vector of each adjacent node, where K is an integer greater than or equal to 1;
if K is equal to 1, determining the output result as the coding vector corresponding to any node;
and if K is larger than 1, performing splicing processing on the K output results to obtain a coding vector corresponding to any node, or performing average processing on the K output results to obtain a coding vector corresponding to any node.
In the embodiment of the present application, an encoder training apparatus is provided, and with the above apparatus, a learning process of self-attention can be more stable based on a multi-head attention mechanism, so that GATAE can learn related information in different representation spaces, thereby improving robustness of GATAE.
Alternatively, on the basis of the embodiment corresponding to fig. 16, in another embodiment of theencoder training device 30 provided in the embodiment of the present application,
atraining module 304, specifically configured to determine a second model parameter by using a cross entropy loss function according to the first graph data and the second graph data;
updating the first model parameter of the graph attention network self-encoder to a second model parameter;
thetraining module 304 is further configured to, after updating the first model parameter of the graph attention network self-encoder by using the loss function according to the first graph data and the second graph data, stop updating the model parameter of the graph attention network self-encoder if the model training condition is satisfied.
Secondly, in the embodiment of the present application, an encoder training apparatus is provided, and with the above apparatus, the gradient of the cross entropy loss function to the last layer of weight of the GATAE is no longer related to the derivative of the activation function, but is only proportional to the difference between the output value and the true value, so that the convergence rate is fast, and since the back propagation is continuous multiplication, the update of the whole weight matrix is accelerated, thereby improving the training efficiency of the GATAE.
Alternatively, on the basis of the embodiment corresponding to fig. 16, in another embodiment of theencoder training device 30 provided in the embodiment of the present application,
thetraining module 304 is specifically configured to determine that a model training condition is met and stop updating model parameters of the attention network self-encoder if a result of the cross entropy loss function is less than or equal to a loss threshold;
or,
and if the iteration times reach a time threshold value, determining that the model training condition is met, and stopping updating the model parameters of the self-encoder of the attention network.
Thirdly, in the embodiment of the present application, an encoder training device is provided, and with the adoption of the above device, a model training condition for GATAE can be selected according to an actual situation, so that flexibility and feasibility of model training are improved.
Referring to fig. 17, fig. 17 is a schematic view of an embodiment of an information recommendation apparatus in an embodiment of the present application, and aninformation recommendation apparatus 40 includes:
the obtainingmodule 401 is configured to receive an information recommendation request sent by a client, where the information recommendation request carries an identifier of a target node and an identifier of the client;
an obtainingmodule 401, configured to obtain, according to the information recommendation request, a feature vector corresponding to the target node and feature vectors corresponding to P neighboring nodes, where P is an integer greater than or equal to 1;
the obtainingmodule 401 is further configured to obtain a target coding vector through a graph attention network self-encoder based on the feature vector corresponding to the target node, where the graph attention network self-encoder is obtained by training using any one of the methods in the foregoing embodiments;
the obtainingmodule 401 is further configured to obtain P code vectors through a graph attention network self-encoder based on the feature vectors corresponding to the P neighboring nodes, where the code vectors in the P code vectors have a corresponding relationship with the neighboring nodes;
a determiningmodule 402, configured to determine P inter-node distances according to the target coding vector and the P coding vectors, where an inter-node distance in the P inter-node distances represents a distance between the target node and an adjacent node;
an arrangingmodule 403, configured to arrange P nodes at a small-to-large distance, and use the previous Q adjacent nodes as a set of nodes to be recommended, where Q is an integer greater than or equal to 1 and less than or equal to P;
and the pushingmodule 404 is configured to push a node set to be recommended to the client according to the information recommendation request.
In the embodiment of the application, an information recommendation device is provided, and by using the device, a target node of a graph structure is encoded by using adjacent nodes by using a GATAE, so that the importance degree of different adjacent nodes to the target node and an interaction request between the adjacent nodes and the target node can be fully considered, and the accuracy and reliability of information recommendation are improved.
Optionally, on the basis of the embodiment corresponding to fig. 17, in another embodiment of theinformation recommendation device 40 provided in the embodiment of the present application,
the obtainingmodule 401 is specifically configured to obtain user data corresponding to a target user according to the information recommendation request, where the user data includes at least one of user gender, user age, user tag information, a region where the user is located, and a user occupation, and the target user and the target node have a corresponding relationship;
acquiring user data corresponding to each associated user in P associated users, wherein the associated users have one-to-one correspondence with adjacent nodes;
generating a feature vector corresponding to a target user according to user data corresponding to the target user;
generating a feature vector corresponding to each associated user in the P associated users according to the user data corresponding to each associated user in the P associated users;
the pushingmodule 404 is specifically configured to send information of the Q associated users to the client of the target user according to the information recommendation request, so that the client displays the information of the Q associated users.
Secondly, in the embodiment of the application, an information recommendation device is provided, and by adopting the device, the importance degree of different associated users to a target user can be fully considered, so that associated user information which meets requirements better can be pushed for the target user in a friend recommendation scene.
Optionally, on the basis of the embodiment corresponding to fig. 17, in another embodiment of theinformation recommendation device 40 provided in the embodiment of the present application,
the obtainingmodule 401 is specifically configured to obtain, according to the information recommendation request, commodity data corresponding to a target commodity, where the commodity data includes at least one of a commodity name, a commodity category, a commodity sales volume, a commodity origin, and commodity evaluation information, and the target commodity and a target node have a corresponding relationship;
acquiring commodity data corresponding to each associated commodity in the P associated commodities, wherein the associated commodities have a one-to-one correspondence relationship with adjacent nodes;
generating a characteristic vector corresponding to the target commodity according to commodity data corresponding to the target commodity;
generating a feature vector corresponding to each associated commodity in the P associated commodities according to commodity data corresponding to each associated commodity in the P associated commodities;
the pushingmodule 404 is specifically configured to send the Q associated commodities to the client according to the information recommendation request, so that the client displays the Q associated commodities.
Secondly, in the embodiment of the application, an information recommendation device is provided, and by adopting the device, the importance degree of different associated commodities to a target commodity can be fully considered, so that associated commodity information which meets requirements better can be pushed in a commodity recommendation scene.
Optionally, on the basis of the embodiment corresponding to fig. 17, in another embodiment of theinformation recommendation device 40 provided in the embodiment of the present application,
the obtainingmodule 401 is specifically configured to obtain, according to the information recommendation request, movie data corresponding to a target movie, where the movie data includes at least one of a movie name, a movie category, movie box office information, movie participant information, movie evaluation information, movie duration, and a movie playing effect, and a target user and a target node have a corresponding relationship;
acquiring film data corresponding to each associated film in the P associated films, wherein the associated films and adjacent nodes have one-to-one correspondence;
generating a feature vector corresponding to the target film according to the film data corresponding to the target film;
generating a feature vector corresponding to each associated film in the P associated films according to the film data corresponding to each associated film in the P associated films;
the pushingmodule 404 is specifically configured to send the Q associated movies to the client according to the information recommendation request, so that the client displays the Q associated movies.
Secondly, in the embodiment of the present application, an information recommendation apparatus is provided, and with the above apparatus, importance of different associated movies to a target movie can be fully considered, so that associated movie information that more meets requirements can be pushed in a movie recommendation scene.
Fig. 18 is a schematic structural diagram of a server provided in this embodiment, where theserver 500 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 522 (e.g., one or more processors) and amemory 532, and one or more storage media 530 (e.g., one or more mass storage devices) storing anapplication 542 ordata 544.Memory 532 andstorage media 530 may be, among other things, transient storage or persistent storage. The program stored on thestorage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, thecentral processor 522 may be configured to communicate with thestorage medium 530, and execute a series of instruction operations in thestorage medium 530 on theserver 500.
TheServer 500 may also include one ormore power supplies 526, one or more wired or wireless network interfaces 550, one or more input-output interfaces 558, and/or one ormore operating systems 541, such as a Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTMAnd so on.
The steps performed by the server in the above embodiment may be based on the server structure shown in fig. 18.
In the embodiment of the present application, theCPU 522 is configured to perform the following steps:
acquiring a feature vector set corresponding to N nodes according to first graph data, wherein the feature vector set comprises N feature vectors, each feature vector corresponds to one node in a graph, and N is an integer greater than or equal to 2;
according to the feature vector set, coding the feature vector corresponding to each node through an attention network self-coder to obtain N coding vectors, wherein the attention network self-coder is used for coding the coding vectors in the N coding vectors of the self-coder through an attention network and has a corresponding relation with the feature vectors in the feature vector set;
decoding the N coding vectors through a decoder to obtain second image data;
and updating the first model parameter of the graph attention network self-encoder by adopting a loss function according to the first graph data and the second graph data.
In the embodiment of the present application, theCPU 522 is configured to perform the following steps:
acquiring a feature vector corresponding to a target node and feature vectors corresponding to P adjacent nodes, wherein P is an integer greater than or equal to 1;
acquiring a target coding vector through a graph attention network self-encoder based on the feature vector corresponding to the target node;
acquiring P coding vectors through a graph attention network self-encoder based on the feature vectors corresponding to the P adjacent nodes, wherein the coding vectors in the P coding vectors have corresponding relations with the adjacent nodes;
determining P inter-node distances according to the target coding vector and the P coding vectors, wherein the inter-node distance in the P inter-node distances represents the distance between the target node and the adjacent node;
arranging P nodes from small to large, and taking the adjacent nodes arranged at the first Q nodes as a node set to be recommended, wherein Q is an integer which is greater than or equal to 1 and less than or equal to P;
and pushing a node set to be recommended to the client.
The embodiment of the present application further provides another encoder training apparatus and an information recommendation apparatus, as shown in fig. 19, for convenience of description, only the portions related to the embodiment of the present application are shown, and details of the specific technology are not disclosed, please refer to the method portion of the embodiment of the present application. The terminal device may be any terminal device including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a Point of Sales (POS), a vehicle-mounted computer, and the like, taking the terminal device as a Personal computer as an example:
fig. 19 is a block diagram showing a partial configuration of a personal computer related to the terminal device according to the embodiment of the present application. Referring to fig. 19, the personal computer includes: radio Frequency (RF)circuit 610,memory 620,input unit 630,display unit 640,sensor 650,audio circuit 660, wireless fidelity (WiFi)module 670,processor 680, andpower supply 690. Those skilled in the art will appreciate that the personal computer configuration shown in FIG. 19 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
The following describes each component of the personal computer in detail with reference to fig. 19:
theRF circuit 610 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information of a base station and then processes the received downlink information to theprocessor 680; in addition, the data for designing uplink is transmitted to the base station. In general,RF circuit 610 includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, theRF circuitry 610 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), and the like.
Thememory 620 may be used to store software programs and modules, and theprocessor 680 may execute various functional applications of the personal computer and data processing by operating the software programs and modules stored in thememory 620. Thememory 620 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the personal computer, and the like. Further, thememory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
Theinput unit 630 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the personal computer. Specifically, theinput unit 630 may include atouch panel 631 andother input devices 632. Thetouch panel 631, also referred to as a touch screen, may collect touch operations of a user (e.g., operations of the user on thetouch panel 631 or near thetouch panel 631 by using any suitable object or accessory such as a finger or a stylus) thereon or nearby, and drive the corresponding connection device according to a preset program. Alternatively, thetouch panel 631 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to theprocessor 680, and can receive and execute commands sent by theprocessor 680. In addition, thetouch panel 631 may be implemented using various types, such as resistive, capacitive, infrared, and surface acoustic wave. Theinput unit 630 may includeother input devices 632 in addition to thetouch panel 631. In particular,other input devices 632 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.
Thedisplay unit 640 may be used to display information input by a user or information provided to the user and various menus of the personal computer. TheDisplay unit 640 may include aDisplay panel 641, and optionally, theDisplay panel 641 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, thetouch panel 631 can cover thedisplay panel 641, and when thetouch panel 631 detects a touch operation thereon or nearby, the touch panel is transmitted to theprocessor 680 to determine the type of the touch event, and then theprocessor 680 provides a corresponding visual output on thedisplay panel 641 according to the type of the touch event. Although thetouch panel 631 and thedisplay panel 641 are shown in fig. 19 as two separate components to implement the input and output functions of the personal computer, in some embodiments, thetouch panel 631 and thedisplay panel 641 may be integrated to implement the input and output functions of the personal computer.
The personal computer may also include at least onesensor 650, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of thedisplay panel 641 according to the brightness of ambient light, and a proximity sensor that turns off thedisplay panel 641 and/or the backlight when the personal computer moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration) for identifying the attitude of a personal computer, and related functions (such as pedometer and tapping) for vibration identification; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured in the pc, the description thereof is omitted.
Audio circuit 660, speaker 661, and microphone 662 can provide an audio interface between a user and a personal computer. Theaudio circuit 660 may transmit the electrical signal converted from the received audio data to the speaker 661, and convert the electrical signal into an audio signal through the speaker 661 for output; on the other hand, the microphone 662 converts the collected sound signals into electrical signals, which are received by theaudio circuit 660 and converted into audio data, which are processed by the audiodata output processor 680 and then transmitted via theRF circuit 610 to, for example, another personal computer, or output to thememory 620 for further processing.
WiFi belongs to short-distance wireless transmission technology, and a personal computer can help a user to receive and send e-mails, browse webpages, access streaming media and the like through aWiFi module 670, and provides wireless broadband Internet access for the user. Although fig. 19 shows theWiFi module 670, it is understood that it does not belong to the essential constitution of the personal computer, and may be omitted entirely as needed within the scope not changing the essence of the invention.
Theprocessor 680 is a control center of the personal computer, and connects various parts of the entire personal computer using various interfaces and lines, and performs various functions of the personal computer and processes data by operating or executing software programs and/or modules stored in thememory 620 and calling data stored in thememory 620, thereby monitoring the personal computer as a whole. Optionally,processor 680 may include one or more processing units; optionally, theprocessor 680 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated intoprocessor 680.
The personal computer also includes a power supply 690 (e.g., a battery) for powering the various components, optionally logically connected to theprocessor 680 via a power management system, so that functions such as managing charging, discharging, and power consumption are performed via the power management system.
Although not shown, the personal computer may further include a camera, a bluetooth module, etc., which will not be described herein.
In this embodiment, theprocessor 680 included in the terminal device further has the following functions:
acquiring a feature vector set corresponding to N nodes according to first graph data, wherein the feature vector set comprises N feature vectors, each feature vector corresponds to one node in a graph, and N is an integer greater than or equal to 2;
according to the feature vector set, coding the feature vector corresponding to each node through an attention network self-coder to obtain N coding vectors, wherein the attention network self-coder is used for coding the coding vectors in the N coding vectors of the self-coder through an attention network and has a corresponding relation with the feature vectors in the feature vector set;
decoding the N coding vectors through a decoder to obtain second image data;
and updating the first model parameter of the graph attention network self-encoder by adopting a loss function according to the first graph data and the second graph data.
In this embodiment, theprocessor 680 included in the terminal device further has the following functions:
acquiring a feature vector corresponding to a target node and feature vectors corresponding to P adjacent nodes, wherein P is an integer greater than or equal to 1;
acquiring a target coding vector through a graph attention network self-encoder based on the feature vector corresponding to the target node;
acquiring P coding vectors through a graph attention network self-encoder based on the feature vectors corresponding to the P adjacent nodes, wherein the coding vectors in the P coding vectors have corresponding relations with the adjacent nodes;
determining P inter-node distances according to the target coding vector and the P coding vectors, wherein the inter-node distance in the P inter-node distances represents the distance between the target node and the adjacent node;
arranging P nodes from small to large, and taking the adjacent nodes arranged at the first Q nodes as a node set to be recommended, wherein Q is an integer which is greater than or equal to 1 and less than or equal to P;
and pushing a node set to be recommended to the client.
Embodiments of the present application also provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the steps performed in the foregoing embodiments.
Embodiments of the present application also provide a computer program product including a program, which, when run on a computer, causes the computer to perform the steps as performed in the foregoing embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.