Movatterモバイル変換


[0]ホーム

URL:


CN113822293A - Model processing method, device and equipment for graph data and storage medium - Google Patents

Model processing method, device and equipment for graph data and storage medium
Download PDF

Info

Publication number
CN113822293A
CN113822293ACN202110636601.1ACN202110636601ACN113822293ACN 113822293 ACN113822293 ACN 113822293ACN 202110636601 ACN202110636601 ACN 202110636601ACN 113822293 ACN113822293 ACN 113822293A
Authority
CN
China
Prior art keywords
neural network
target
node
neighbor
graph neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110636601.1A
Other languages
Chinese (zh)
Inventor
陈煜钊
卞亚涛
荣钰
徐挺洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co LtdfiledCriticalTencent Technology Shenzhen Co Ltd
Priority to CN202110636601.1ApriorityCriticalpatent/CN113822293A/en
Publication of CN113822293ApublicationCriticalpatent/CN113822293A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

The application relates to a model processing method, a model processing device and a model processing equipment for graph data and a storage medium, and relates to the technical field of artificial intelligence. The method comprises the following steps: inputting the sample graph data into the target graph neural network model; processing the sample graph data based on at least two graph neural network layers to obtain at least two characteristic data; determining neighbor difference degree information corresponding to at least two pieces of feature data based on at least two pieces of feature data; the neighbor difference degree information is used for indicating the non-smoothness degree of the at least two characteristic data; determining a first loss function value based on the at least two neighbor difference degree information; updating model parameters of the target graph neural network model based on the first loss function value. By the scheme, the problem of over-smoothness of the graph neural network can be solved by performing model training by using the first loss function value obtained by calculating the neighbor difference degree information, and the performance of the graph neural network model after being trained and updated is improved.

Description

Model processing method, device and equipment for graph data and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for model processing of graph data.
Background
Model training of the graph neural network model can be achieved using a teacher-student knowledge distillation framework. The teacher-student knowledge distillation framework enables the student models to achieve better performance than conventional training methods by migrating knowledge contained in teacher models with better performance to relatively lightweight student models.
In the related technology, a graph neural network model with better performance after training is selected as a teacher model, and probability distribution output by the teacher model is directly used as a label to train the graph neural network model as a student model.
However, in the above method for training the graph neural network model, due to the special structure of the graph data, it is not possible to obtain better performance by simply stacking multiple graph networks, so that the training effect of the graph neural network model is poor.
Disclosure of Invention
The embodiment of the application provides a model processing method, a model processing device and a model processing storage medium for graph data, which can improve the performance of a model after training and updating. The technical scheme is as follows:
in one aspect, a model processing method for graph data is provided, the method comprising:
inputting the sample graph data into the target graph neural network model; the target graph neural network model comprises at least two graph neural network layers;
processing the sample graph data based on at least two graph neural network layers to obtain at least two characteristic data; at least two pieces of feature data are obtained by respectively carrying out feature extraction on at least two graph neural network layers;
determining neighbor difference degree information corresponding to at least two pieces of feature data based on the at least two pieces of feature data; the neighbor difference degree information is used for indicating the non-smoothness degree of at least two feature data;
determining a first loss function value based on at least two of the neighbor difference degree information;
updating model parameters of the target graph neural network model based on the first loss function values.
In another aspect, there is provided a model processing apparatus for graph data, the apparatus including:
the sample input module is used for inputting sample map data into the target map neural network model; the target graph neural network model comprises at least two graph neural network layers;
the characteristic acquisition module is used for processing the sample graph data based on at least two graph neural network layers to acquire at least two characteristic data; at least two pieces of feature data are obtained by respectively carrying out feature extraction on at least two graph neural network layers;
the information determining module is used for determining neighbor difference degree information corresponding to at least two pieces of feature data based on the at least two pieces of feature data; the neighbor difference degree information is used for indicating the non-smoothness degree of at least two feature data;
a loss value determination module for determining a first loss function value based on at least two of the neighbor difference degree information;
and the model updating module is used for updating the model parameters of the target graph neural network model based on the first loss function value.
In one possible implementation manner, the information determining module includes:
and the information determining submodule is used for determining the neighbor difference degree information corresponding to at least two pieces of feature data based on the at least two pieces of feature data and the granularity of the feature data.
In a possible implementation manner, the information determining sub-module includes:
a neighboring node determining unit, configured to determine, in response to that the granularity of the feature data is a node level, a neighboring node of each node corresponding to the target feature data; the neighbor node is at least one other node directly connected with the corresponding node through an edge; the target feature data is any one of at least two of the feature data;
a virtual node obtaining unit, configured to obtain, based on the respective neighboring nodes of the respective nodes corresponding to the target feature data, virtual neighboring nodes of the respective nodes corresponding to the target feature data;
a difference value determining unit, configured to determine, based on the virtual neighboring nodes of the respective nodes corresponding to the target feature data, respective neighboring difference values of the respective nodes corresponding to the target feature data;
an information determining unit, configured to determine, based on respective neighbor difference values of the nodes corresponding to the target feature data, the neighbor difference information corresponding to the target feature data.
In a possible implementation manner, the virtual node obtaining unit is configured to,
determining a first adjacency matrix corresponding to a target node based on the neighbor node of the target node; the first adjacency matrix is used to indicate a neighboring relationship between the target node and the neighbor node; the target node is any one of the nodes corresponding to the target characteristic data;
obtaining the virtual neighbor node of the target node based on the target feature data, the degree matrix of the target node, and the first adjacency matrix.
In one possible implementation, the obtaining the virtual neighbor node of the target node based on the target feature data, the degree matrix of the target node, and the first adjacency matrix includes:
acquiring category label information corresponding to the target node and the category label information corresponding to the neighbor nodes of the target node;
determining a target neighbor node of the target node from the neighbor nodes of the target node; the class label information of the target neighbor node is different from the class label information of the target node;
acquiring first characteristic data; the first feature data is partial feature data of the target feature data indicating the target node and the target neighbor node;
obtaining the virtual neighbor node of the target node based on the first characteristic data, the degree matrix of the target node, and the first adjacency matrix.
In a possible implementation manner, the virtual node obtaining unit is configured to,
determining the similarity between the virtual neighbor node corresponding to a target node and the target node as the neighbor difference value of the target node corresponding to the target feature data.
In one possible implementation manner, the loss value determining module includes:
a most significant value determining submodule, configured to compare the neighboring difference values corresponding to the at least two neighboring difference degree information, and determine a maximum value of the neighboring difference values;
the initial determining submodule is used for determining the graph neural network layer to which the maximum value of each neighbor difference value belongs as an initial network layer;
a first loss value determining submodule, configured to determine the first loss function value based on the neighbor difference degree information of the feature data corresponding to a target graph neural network layer; the target graph neural network is the originating network layer and at least one of the graph neural network layers that is adjacent after the originating network layer.
In one possible implementation manner, the loss value determining module includes:
an average value determining sub-module, configured to determine an average value of differences of at least two neighboring difference information, where the average value of differences is an average value of difference values of neighboring included in the corresponding neighboring difference information;
a sub-value determination sub-module, configured to determine a sub-value of a loss function between adjacent network layers of the at least two graph neural network layers based on a difference average value of each of the at least two neighboring difference degree information;
a second loss value determination submodule for determining the first loss function value based on the loss function sub-value between adjacent ones of the at least two graph neural network layers.
In one possible implementation, the sub-value determining sub-module includes:
a first sub-value determining unit, configured to, in response to the difference average value corresponding to the nth graph neural network layer being greater than the difference average value corresponding to the n +1 th graph neural network layer, input the neighbor difference degree information corresponding to the nth graph neural network layer and the neighbor difference degree information corresponding to the n +1 th graph neural network layer into a weighted mean square error loss function, and obtain the loss function sub-value between the nth graph neural network layer and the n +1 th graph neural network layer; n is an integer of 1 or more;
a second sub-value determining unit for determining that the loss function sub-value between the nth graph neural network layer and the n +1 th graph neural network layer is zero in response to the difference average value corresponding to the nth graph neural network layer being less than or equal to the difference average value corresponding to the n +1 th graph neural network layer.
In one possible implementation, the model updating module includes:
a model updating submodule for updating model parameters of the target graph neural network model based on the first loss function value and the second loss function value; the second loss function value is a cross-entropy loss function value determined based on labeling information of the sample graph data and prediction information output by the target graph neural network model.
In one possible implementation, the model update sub-module includes:
an overall loss value determination unit configured to determine a sum of the first loss function value and the second loss function value as an overall training loss function value;
and the model updating unit is used for updating the model parameters of the target graph neural network model based on the overall training loss function value.
In one possible implementation, the granularity of the feature data includes at least one of a node level, a connection edge level, a subgraph level, and an entire graph level.
In another aspect, a computer device is provided, which comprises a processor and a memory, wherein the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the above-mentioned model processing method for graph data.
In another aspect, a computer-readable storage medium is provided, in which at least one computer program is stored, the computer program being loaded and executed by a processor to implement the above-described model processing method for graph data.
In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the model processing method for graph data provided in the various alternative implementations described above.
The technical scheme provided by the application can comprise the following beneficial effects:
in the scheme shown in the embodiment of the application, the feature data output after the feature extraction is performed on the sample graph data by obtaining the neural network layers of each graph is calculated to obtain the neighbor difference degree information corresponding to each feature data, the non-smoothness degree represented by the graph features extracted by each graph neural network layer is quantitatively measured, and further, the model training is performed by using the first loss function value obtained by calculating the neighbor difference degree information, so that the guiding is performed on the training process of each graph neural network layer through the respective non-smoothness related information of the multiple graph neural network layers in the model, the problem of the over-smoothness of the graph neural network can be solved, and the model performance of the graph neural network after the updating of the training is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 illustrates a flow chart of a method for model processing of graph data provided by an exemplary embodiment of the present application;
FIG. 2 illustrates a schematic diagram of a model processing system for graph data provided by an exemplary embodiment of the present application;
FIG. 3 illustrates a flow chart of a method for model processing of graph data provided by an exemplary embodiment of the present application;
FIG. 4 is a diagram illustrating a method for calculating neighbor difference information corresponding to a target node according to the embodiment shown in FIG. 3;
FIG. 5 illustrates a flow diagram of a model processing system for graph data provided by an exemplary embodiment of the present application;
FIG. 6 is a block diagram of a model processing apparatus for image classification according to an exemplary embodiment of the present application;
FIG. 7 illustrates a block diagram of a computer device shown in an exemplary embodiment of the present application;
fig. 8 shows a block diagram of a computer device according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The embodiment of the application provides a model processing method for graph data, which can improve the model performance of a graph neural network after being trained and updated.
FIG. 1 is a flow diagram illustrating a method for model processing of graph data in accordance with an exemplary embodiment. The model processing method for graph data may be performed by a computer device. For example, the computer device may include at least one of a terminal or a server. As shown in fig. 1, the model processing method for graph data includes the steps of:
step 101, inputting sample graph data into a target graph neural network model; the target graph neural network model comprises at least two graph neural network layers.
In an embodiment of the present application, a computer device inputs sample map data into a target map neural network model. The target graph neural network model is a neural network model for analyzing and processing graph data.
Among other things, the present application relates to the art of Artificial Intelligence (AI), which is a theory, method, technique, and application system that utilizes a digital computer or a machine controlled by a digital computer to simulate, extend, and expand human Intelligence, perceive the environment, acquire knowledge, and use the knowledge to obtain optimal results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. The display device comprising the image acquisition component mainly relates to the computer vision technology and the machine learning/depth learning direction.
The artificial intelligence technology comprises Machine Learning (ML), which is a multi-field cross subject and relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.
In one possible implementation manner, a sample map data set including at least two sample map data is collected in advance, and the sample map data in the sample map data set is sequentially input into the target map neural network model.
102, processing sample graph data based on at least two graph neural network layers to obtain at least two characteristic data; the at least two feature data are obtained by respectively carrying out feature extraction on at least two graph neural network layers.
In the embodiment of the application, after the sample graph data is input into the target graph neural network model, the feature extraction is respectively carried out sequentially through at least two graph neural network layers.
In one possible implementation, the extracted feature data includes a feature matrix composed of feature vectors corresponding to respective nodes in the sample map data.
103, determining neighbor difference degree information corresponding to at least two pieces of feature data based on the at least two pieces of feature data; the neighbor difference information is used to indicate a degree of non-smoothness of the at least two feature data.
In the embodiment of the application, each graph neural network layer corresponds to one extracted feature data, neighbor difference degree information corresponding to the feature data output by each graph neural network layer can be obtained by performing calculation based on the feature data, and the neighbor difference degree information is used for indicating the non-smoothness degree corresponding to the feature data extracted by each layer of network.
The non-smoothness may be used to indicate the degree of difference between information in the feature data corresponding to each node in the graph data and information in the feature data corresponding to other nodes adjacent to the node, where generally, the larger the overall difference of information in the feature data corresponding to the adjacent node is, the higher the non-smoothness of the feature data is, and conversely, the smaller the overall difference of information in the feature data corresponding to the adjacent node is, the lower the non-smoothness of the feature data is.
Step 104, determining a first loss function value based on the difference degree information of at least two adjacent neighbors.
In this embodiment of the application, the computer device obtains neighbor difference degree information corresponding to each of the feature data extracted by the at least two graph neural network layers, and may determine, according to the neighbor difference degree information corresponding to each of the feature data, a first loss function value obtained by inputting the sample graph data.
And 105, updating the model parameters of the target graph neural network model based on the first loss function value.
In summary, in the solution shown in the embodiment of the present application, feature data output after feature extraction is performed on sample graph data by each graph neural network layer is obtained, neighbor difference degree information corresponding to each feature data is obtained through calculation, a non-smoothness degree represented by a graph feature extracted by each graph neural network layer is quantitatively measured, and further, model training is performed by using a first loss function value obtained through calculation of the neighbor difference degree information, so that non-smoothness related information of each of a plurality of graph neural network layers in a model is used to guide a training process of each graph neural network layer, thereby solving an over-smoothness problem of the graph neural network, and further improving model performance of the graph neural network after being updated through training.
In the related art, knowledge distillation is to migrate knowledge contained in a target network (i.e., a teacher model with better performance) into an online learning network (i.e., a student model with relatively lighter weight), so that the student model achieves better performance than conventional training methods.
Among them, knowledge distillation may include two modes. One is Soft-Label Distillation (Soft-Label Distillation), which uses the probability distribution output by the teacher model as a smooth Label to train the student model.
For example, a teacher graph neural network for protein structure prediction is additionally introduced in advance, the teacher graph neural network is a graph neural network trained in advance, a prediction probability distribution corresponding to a sample protein structure can be output by inputting the sample protein structure into the teacher graph neural network, and the student graph neural network trained from the beginning for protein structure analysis is trained by taking the prediction probability distribution as a smooth label.
Another approach is Feature Distillation (Feature Distillation), which adds regularization constraints in the Feature representation space for the training of neural networks. The intermediate activation features of the parameterized neural network can be directly used as knowledge signals for feature distillation learning, and a feature transformation function can be designed to extract specific knowledge from the teacher model, for example, feature attention diagrams, similarity maps and the like are used.
For example, a teacher graph neural network for protein structure prediction is also introduced in advance, the teacher graph neural network is a graph neural network trained in advance, intermediate output features of the teacher graph neural network can be obtained by inputting a sample protein structure into the teacher graph neural network, and the student graph neural network is trained by using the intermediate output features as feature distillation learning knowledge signals.
The two methods of training the neural network by using knowledge distillation mainly focus on high-level image understanding tasks, such as molecular structure classification, molecular structure segmentation and the like in the molecular biology field, and the neural network of the teacher model needs to be introduced, so that the selection of the neural network of the teacher model is difficult, and if the selection of the teacher model is improper, the performance of the student model may be damaged to a certain extent. On the other hand, because an additional teacher model is introduced in the training process, the space and time cost for training the student network is increased by more than two times. The time and space overhead of model training of large-scale graph data is greatly increased.
The solution shown in the above embodiments of the present application improves the over-smoothing problem of the neural network from an orthogonal perspective, i.e. a way of optimizing the training strategy of the neural network, so that no change of the network model structure or input data is required. By adopting the self-distillation training algorithm, the performance of the graph network model can be obviously improved under the condition of increasing a small amount of training overhead. In an exemplary aspect, aspects of the above-described embodiments of the present application relate to a model processing system for graph data that includes a model training update section. FIG. 2 is a schematic diagram illustrating a model processing system for graph data in accordance with an exemplary embodiment. As shown in fig. 2, for the model training updating portion, themodel training device 210 performs model updating on the target graph neural network model through each set of sample graph data, and the updated target graph neural network model can be uploaded to the cloud or the database for use.
Themodel training device 210 may be a computer device with machine learning capability, for example, the computer device may be a stationary computer device such as a personal computer, a server, and a stationary scientific research device, or the computer device may also be a mobile computer device such as a tablet computer and an e-book reader. The embodiment of the present application is not limited to the specific type of themodel training device 210.
Among other things, the terminal 240 may be a computer device. Theserver 230 may be a background server of the terminal 240, an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and an artificial intelligence platform. Wherein themodel training device 210 and the terminal 240 may be the same computer device.
FIG. 3 is a flow diagram illustrating a method for model processing of graph data in accordance with an exemplary embodiment. The model processing method for graph data may be performed by a computer device. For example, the computer device may be the model training device of FIG. 2. As shown in fig. 3, the model processing method for graph data includes the steps of:
step 301, inputting the sample map data into the target map neural network model.
In the embodiment of the application, the model training device acquires at least one sample graph data, and inputs the at least one sample graph data into a target graph neural network model which needs to be subjected to model training.
The sample graph data is graph data of a training sample serving as a target graph neural network model, and the sample graph data comprises node information of at least two nodes and a relation between the at least two nodes. The target Graph Neural Network model may be any Graph Neural Network model (GNN) to be model trained. The target graph neural network model comprises at least two graph neural network layers.
Optionally, the graph neural network layer is a neural network for processing graph data.
For example, data can be naturally converted into graph data in many fields, including a biomolecule field, a protein field, an image analysis field, a social relationship network field, a software engineering field and a natural language processing field, and the various fields can convert the related data into the graph data and input the graph data into a graph neural network model for analysis processing, so as to achieve the purpose of performing data analysis processing in various fields.
For example, in the research field of the social relationship network, each user in the social relationship network may be converted into a node (information of the node may be information of an attribute, a behavior record, and the like of the corresponding user), and a relationship between each user is converted into a connection edge between the nodes, so as to obtain graph data corresponding to the social relationship network.
For another example, in the process of research in the field of biomolecules, each atom in a molecular structure may be converted into a node (information of the node may be information of atom type, atom attribute, and the like), and a chemical bond connection relationship between each atom is converted into a connection edge between nodes, so as to obtain graph data corresponding to the molecular structure; in the field of research of protein structure analysis, amino acids in a protein structure may be converted into nodes (information of the nodes may be information of amino acid properties, amino acid types, and the like), and the connection relationship between the amino acids may be converted into connection edges between the nodes, thereby obtaining graph data corresponding to the protein structure.
Step 302, processing the sample graph data based on at least two graph neural network layers to obtain at least two feature data.
In the embodiment of the application, the model training device processes input sample graph data based on at least two graph neural network layers in the target graph neural network model, and at least two feature data are obtained from the at least two graph neural network layers.
Wherein, at least two feature data are obtained by respectively extracting features of at least two graph neural network layers. For example, at least two feature data have a one-to-one correspondence with at least two graph neural network layers.
Illustratively, if the target graph neural network model comprises a first layer graph neural network, a second layer graph neural network and a third layer graph neural network, the sample graph data a input into the target graph neural network model is firstly subjected to feature extraction through the first layer graph neural network to obtain feature data output by the first layer graph neural network, then the feature data output by the first layer graph neural network is subjected to feature extraction through the second layer graph neural network to obtain feature data output by the second layer graph neural network, and then the feature data output by the second layer graph neural network is subjected to feature extraction through the third layer graph neural network to obtain feature data output by the third layer graph neural network.
In one possible implementation, if feature representation extraction with different granularities is performed on sample graph data, feature data corresponding to different granularities are output by each graph neural network layer.
For example, the feature data may include a feature matrix obtained by feature extraction of the graph data of the graph neural network layer at the corresponding granularity.
In one possible implementation, the granularity of the feature data includes at least one of a node level, a connection edge level, a subgraph level, and an overall graph level.
For example, if the corresponding granularity for performing the feature representation extraction is a node level, a feature matrix composed of feature vectors of respective nodes extracted by the graph neural network layer may be determined as the feature data, and if the corresponding granularity for performing the feature representation extraction is a connection edge level, a feature matrix of edges extracted by the graph neural network layer may be determined as the feature data.
Step 303, determining neighbor difference degree information corresponding to the at least two feature data based on the at least two feature data and the granularity of the feature data.
In this embodiment of the application, the model training device determines neighbor difference degree information corresponding to feature data output by at least two graph neural network layers respectively, based on at least two feature data output by at least two graph neural network layers in the received target graph neural network model and granularity adopted when extracting the feature data.
The neighboring difference degree information is used to indicate the degree of non-smoothness of at least two feature data, that is, the neighboring difference degree information corresponding to each feature data is used to indicate the degree of non-smoothness of the feature data.
That is, the feature data output by each graph neural network layer corresponds to one neighbor difference degree information.
For example, feature data a output by the first layer of graph neural network layer corresponds to neighbor difference information a, feature data B output by the second layer of graph neural network layer corresponds to neighbor difference information B, and feature data C output by the third layer of graph neural network layer corresponds to neighbor difference information C.
The neighbor difference degree information may be a vector, a matrix, a value set, or a parameter value, etc. for indicating the degree of non-smoothness of the feature data.
In a possible implementation manner, in response to that the granularity of the feature data is a node level, respective neighbor nodes of each node corresponding to the target feature data are determined, then virtual neighbor nodes of each node corresponding to the target feature data are obtained based on the respective neighbor nodes of each node corresponding to the target feature data, then respective neighbor difference values of each node corresponding to the target feature data are determined based on the virtual neighbor nodes of each node corresponding to the target feature data, and neighbor difference degree information corresponding to the target feature data is determined based on the respective neighbor difference values of each node corresponding to the target feature data.
The neighbor node may be at least one other node directly connected to the corresponding node through an edge, and the target feature data is any one of the at least two feature data.
That is to say, when feature extraction is performed in the graph neural network layer and the granularity adopted is the node level, the neighbor nodes corresponding to the target feature data are determined, the virtual neighbor nodes corresponding to the nodes can be calculated through the obtained neighbor nodes corresponding to the nodes, the neighbor difference values corresponding to the nodes can be determined through calculation of the nodes and the virtual neighbor nodes corresponding to the nodes, and the neighbor difference information corresponding to the target feature data can be determined through calculation of the neighbor difference values corresponding to the nodes.
Illustratively, when the target feature data is feature data B output by the second-layer graph neural network layer, nodes corresponding to the feature data B include a node a, a node B, a node c, and a node d, where the node B and the node d are determined as neighbor nodes of the node a because the node a is directly connected to the node B and the node d, then the virtual neighbor node of the node a is obtained through the neighbor nodes of the node a, the node B, and the node d, and the neighbor difference value corresponding to the node a is determined through the obtained virtual neighbor node of the node a. And then, calculating the neighbor difference values of the node b, the node c and the node d according to the above mode. Based on the neighbor difference values of the node a, the node B, the node c and the node d, neighbor difference degree information corresponding to the characteristic data B can be calculated and determined.
In one possible implementation manner, a first adjacency matrix corresponding to a target node is determined based on a neighboring node of the target node; and acquiring the virtual neighbor node of the target node based on the target characteristic data, the degree matrix of the target node and the first adjacent matrix.
The first adjacency matrix is used for indicating the adjacent relation between the target node and the adjacent nodes, and the target node is any one of the nodes corresponding to the target characteristic data.
That is to say, the model training device extracts each neighboring node corresponding to the target node, and determines the first adjacency matrix only containing the neighboring relationship between the target node and the neighboring node based on each extracted neighboring node. And then, carrying out point multiplication on the first adjacent matrix and the characteristic data to obtain a summation result of each adjacent node, and carrying out point multiplication on the summation result of each adjacent node and an inverse matrix of a degree matrix of the target node to obtain an average value obtained by dividing the summation result of each adjacent node by the degree.
Alternatively, the calculation formula of the virtual neighbor node is as follows,
Figure BDA0003105998820000131
wherein,
Figure BDA0003105998820000132
the virtual neighbor nodes of the target nodes v corresponding to the characteristic data output by the neural network layer of the graph at the l-th layer are represented, D is a degree matrix of the nodes, A is a graph data node adjacency matrix, and X is a graph data node adjacency matrix(l)The feature matrix is composed of feature vectors of neighbor nodes of each node output by the neural network layer of the first layer graph.
In another possible implementation manner, the category label information corresponding to the target node and the category label information corresponding to the neighbor node of the target node are obtained, the target neighbor node of the target node is determined from the neighbor nodes of the target node, the first feature data is obtained, and the virtual neighbor node of the target node is obtained based on the first feature data, the degree matrix of the target node, and the first adjacency matrix.
The class label information of the target neighboring node is different from the class label information of the target node, and the first feature data is part of feature data used for indicating the target node and the target neighboring node in the target feature data.
That is to say, the model training device may extract each neighboring node corresponding to the target node first, then obtain category label information corresponding to the target node and each neighboring node, determine a target neighboring node that meets the condition based on a condition that the category label information of the target neighboring node is different from the category label information of the target node, determine a feature matrix composed of feature vectors of the target neighboring node as first feature data, then perform point multiplication on the first neighboring matrix and the first feature data to obtain a summation result of each target neighboring node, perform point multiplication on the summation result of each target neighboring node and an inverse matrix of a degree matrix of the target node to obtain an average value obtained by dividing the summation result of each target neighboring node by the degree.
Alternatively, after considering the class label information of the node, the calculation formula of the virtual neighbor node is as follows,
Figure BDA0003105998820000141
wherein,
Figure BDA0003105998820000142
the virtual neighbor nodes of the target nodes v corresponding to the characteristic data output by the graph neural network layer of the l-th layer are represented, D is a degree matrix of the nodes, A is a graph data node adjacency matrix,
Figure BDA0003105998820000143
the characteristic matrix is composed of characteristic vectors of target neighbor nodes of all nodes output by the neural network layer of the graph at the l-th layer, and y (v) is the class label information of the target node v.
In a possible implementation manner, the similarity between the virtual neighboring node corresponding to the target node and the target node is determined as a neighboring difference value of the target node corresponding to the target feature data.
The model training device may calculate a similarity between the target node and the virtual neighboring node corresponding to the target node obtained through the calculation, and convert the calculated similarity value into a distance index.
Optionally, the method for calculating the similarity between the virtual neighboring node corresponding to the target node and the target node is a cosine distance similarity measurement algorithm, or a similarity measurement algorithm based on a Radial Basis Function (RBF) kernel Function.
Wherein, in response to the cosine distance similarity measurement algorithm and without considering the class label information of the node, the calculation of the neighbor difference value can be carried out by the following formula,
Figure BDA0003105998820000144
wherein,
Figure BDA0003105998820000151
is a neighbor difference value calculated by a target node v corresponding to characteristic data output by the neural network layer of the first layer graph,
Figure BDA0003105998820000152
the feature matrix is composed of feature vectors of neighbor nodes of a target node v output by the neural network layer of the first layer graph.
In one possible implementation manner, the neighbor difference degree information corresponding to the target feature data is determined in response to obtaining respective neighbor difference values of nodes corresponding to the target feature data.
For example, in response to that the neighbor difference information is a neighbor difference vector composed of neighbor difference values, after obtaining neighbor difference values of each node corresponding to the graph neural network layer by the above-mentioned neighbor difference value calculation method, the neighbor difference vector corresponding to the graph neural network layer may be composed. That is, the corresponding neighbor disparity vector of the neural network layer of the ith layer map can be
Figure BDA0003105998820000153
Where N is the number of nodes.
By way of example, the entire graph-level representation of the feature data may be defined as the following formula,
Figure BDA0003105998820000154
where P is the number of neural network layers, M is the number of all nodes, G(P)Is a graph-level feature matrix obtained by converging feature vectors of all nodes on a graph output by a P-layer graph neural network,
Figure BDA0003105998820000155
the feature matrix is composed of feature vectors of neighbor nodes of a target node v output by a P-layer graph neural network layer. Can be used forAnd taking the feature matrix of the graph grade as neighbor difference degree information, namely knowledge representation, of self-distillation training.
The feature data representation for the sub-graph level can be defined as the following formula,
Figure BDA0003105998820000158
where s is the number of randomly sampled subgraphs and SGi represents the ith subgraph.
For the feature data of the edge level, the original node adjacency matrix can be based on, and the adjacency matrix of the edge is generated, the definition of the adjacency matrix of the edge is shown as follows,
Figure BDA0003105998820000156
that is, if edge i and edge j have a common vertex, [ A ] Ae]i,jIs 1, otherwise is 0.
The neighbor variance information for the side-ranked self-distillation training may be the following formula,
Figure BDA0003105998820000157
wherein E represents an edge feature matrix, and M represents a total of M edges.
In one possible implementation, in order to reduce the difficulty of the distillation learning of the single model, the retention of the neighboring difference degree information adopts a progressive migration method. Namely, the neighbor difference degree information of the adjacent next graph neural network layer is matched with the neighbor difference degree information of the graph neural network layer of the previous layer.
The neighbor difference information of the next graph neural network layer is matched with the neighbor difference information of the previous graph neural network layer, and the neighbor difference information of the next graph neural network layer can be fitted with the neighbor difference information of the previous graph neural network layer.
Fig. 4 is a schematic diagram illustrating a method for calculating neighbor difference degree information corresponding to a target node according to an embodiment of the present application. As shown in fig. 4, if thesample graph data 41 inputted into the neural network model of the target graph includesnode 1,node 2,node 3,node 4 andnode 5, then, the correspondingneighbor difference value 42 of each node under the characteristic data is calculated, if the target node is thenode 3, the first-order neighbor node of thenode 3 can be firstly determined as the neighbor node, namely, it is determined that thenode 2, thenode 4 and thenode 5 are neighbor nodes of thenode 3, then a virtual neighbor node is obtained by calculation according to thenode 2, thenode 4 and thenode 5, finally similarity calculation is carried out according to the virtual neighbor node and thenode 3, a neighbor difference value S3 is obtained, the calculation is carried out on thenode 1, thenode 2, thenode 3, thenode 4 and thenode 5 respectively through the method, the corresponding neighborhooddifference degree information 43 under the feature data can be obtained (S1, S2, S3, S4, S5).
Step 304, determining a first loss function value based on the at least two neighboring difference degree information.
In the embodiment of the application, the model training device calculates a first loss function value for model updating of the target graph neural network model based on the determined neighbor difference information corresponding to each graph neural network layer by adaptively performing an Adaptive neighbor difference retrieval (ADR) distillation strategy on the neighbor difference information.
In a possible implementation manner, each neighboring difference value corresponding to at least two neighboring difference information is compared, a maximum value of each neighboring difference value is determined, the graph neural network layer to which the maximum value of each neighboring difference value belongs is determined as an initial network layer, and the first loss function is determined based on the neighboring difference information of the feature data corresponding to the target graph neural network layer.
The target graph neural network is an initial network layer and at least one graph neural network layer adjacent to the initial network layer.
Since the initial graph data input into the target neural network model may contain higher sparsity and more noise, the accuracy of the neighbor difference value of each node calculated by the graph neural network layer which performs feature extraction first is low. This requires the selection of neighbor disparity values that account for the neighbor disparity vector.
In one possible implementation, the model training device automatically selects the graph neural network layer with the highest neighbor difference value as the initial network layer, which may also be referred to as an initial supervision target.
Wherein, the calculation formula for determining the number of network layers of the initial network layer is as follows,
l*=argmaxk{||s(k)|||k∈{1,…,L-1}}
wherein l*Is the number of network layers of the starting network layer.
In one possible implementation, a difference average value of each of the at least two neighbor difference degree information is determined, a loss function sub-value between adjacent network layers of the at least two graph neural network layers is determined based on the difference average value of each of the at least two neighbor difference degree information, and a first loss function value is determined based on the loss function sub-value between adjacent network layers of the at least two graph neural network layers.
The difference average value is an average value of each neighboring difference value included in the corresponding neighboring difference degree information.
That is to say, when the difference value of each neighbor corresponding to the neural network layer of the previous layer of the graph is larger than the difference value of each neighbor corresponding to the neural network layer of the next layer of the graph in the average sense, applying distillation canonical constraint to determine the loss function sub-value between the neural network layers of the previous and next graphs.
Illustratively, in response to the difference average value corresponding to the nth graph neural network layer being greater than the difference average value corresponding to the (n + 1) th graph neural network layer, inputting the neighbor difference degree information corresponding to the nth graph neural network layer and the neighbor difference degree information corresponding to the (n + 1) th graph neural network layer into a weighted mean square error loss function, obtaining a loss function sub-value between the nth graph neural network layer and the (n + 1) th graph neural network layer, where n is an integer greater than or equal to 1, and in response to the difference average value corresponding to the nth graph neural network layer being less than or equal to the difference average value corresponding to the (n + 1) th graph neural network layer, determining that the loss function sub-value between the nth graph neural network layer and the (n + 1) th graph neural network layer is zero.
In a possible implementation manner, if the difference average value corresponding to the ith graph neural network layer is greater than the difference average value corresponding to the (l + 1) th graph neural network layer, the loss function sub-value between the ith graph neural network layer and the (l + 1) th graph neural network layer is calculated as follows,
Figure BDA0003105998820000171
wherein, the SG (-) function is an instruction of gradient truncation operation, i.e. in the training process, the gradient return of the target neighbor difference vector is aborted and is regarded as a supervision signal.
In one possible implementation, the determining of the loss function sub-value between adjacent ones of the at least two graph neural network layers is performed based on a difference average value of each of the at least two neighbor difference degree information, and the determining of the first loss function value is performed after the determining of the target graph neural network layer is performed based on the loss function sub-value between adjacent ones of the at least two graph neural network layers.
That is, a difference average value of each of the neighbor difference degree information corresponding to the target graph neural network layer is determined, a loss function sub-value between adjacent network layers in the target graph neural network layer is determined based on the difference average value of each of the neighbor difference degree information corresponding to the target graph neural network layer, and the first loss function value is determined based on the loss function sub-value between adjacent network layers in the target graph neural network layer.
Wherein, the target graph neural network layer may refer to each adjacent network layer starting from the starting network layer.
In one possible implementation, since nodes in different connection density regions are considered to have different smoothing rates, the degree of the node may be used to weight it when matching the neighboring disparity vector. Based on the above considerations, the first loss function corresponding to the target graph neural network model may be an distillation regularization term (ADR), which is expressed as the following formula,
Figure BDA0003105998820000181
wherein 1 (-) is an indicator function for teacher selection under knowledge distillation framework, automatically selecting the initial network layer as the initial supervision target, d2(s(l+1),s(l)) For loss function sub-values between the ith graph neural network layer and the (l + 1) th graph neural network layer, add/*Adding the loss function sub-values between L-1 to obtain a first loss function value LADR
Illustratively, the first loss function corresponding to the feature data for the entire graph scale, i.e., the self-distillation training loss function, may be the following formula,
Figure BDA0003105998820000182
step 305, updating model parameters of the target graph neural network model based on the first loss function value and the second loss function value.
In this embodiment, after the model training device determines the first loss function value and the second loss function value, the model training device updates the model parameters of the target graph neural network model based on the first loss function value and the second loss function value.
And the second loss function value is a cross-entropy loss function value determined based on the labeling information of the sample graph data and the prediction information output by the target graph neural network model.
In one possible implementation, the sum of the first loss function value and the second loss function value is used as an overall training loss function value, and model parameters of the target graph neural network model are updated based on the overall training loss function value.
Illustratively, the neural network layer is responsive to the determined targetmap isFrom layer 2 network to layer 4 network, the determined loss function sub-value betweenlayer 2 network andlayer 3 network is L23If the sub-value of the loss function between the layer-3 network and the layer-4 network is 0, the first loss function value is L23+0, i.e. the first loss function value is L23According to the conventional model training process, since the second loss function is a cross-entropy loss function, it can be determined that the second loss function value is L based on the prediction information of the sample map data output by the target map neural network model and the label information of the sample map dataCEIt can be determined that the overall training loss function value is L23+LCEBased on L13+LCEAnd updating the model parameters of the target graph neural network model.
In one possible implementation, when the model parameter of the target graph neural network model is updated based on the sum of the first loss function value and the second loss function value, the at least two graph neural network layers are updated by the sum of the first loss function value and the second loss function value, and other parts except the at least two graph neural network layers in the target graph neural network model are updated by the second loss function value.
Illustratively, if the target graph neural network model includes a three-layer graph neural network layer, a fully-connected layer, and a prediction output layer, when the first loss function value and the second loss function value are calculated based on the above method, the three-layer graph neural network layer may be updated by the sum of the first loss function and the second loss function, and meanwhile, other parts of the graph neural network layer including the fully-connected layer and the prediction output layer are directly updated by the second loss function value, and the other parts of the graph neural network layer do not involve the over-smoothing problem.
In one possible implementation, instead of directly taking the sum of the first loss function value and the second loss function value as the overall training loss function value, corresponding weights may be set for the first loss function value and the second loss function value, respectively, and the overall training loss function value may be determined by means of weighted summation based on the set weights.
For example, if the determined first loss function value is L23The second loss function value is LCEL is set according to the influence degree of the non-smoothness degree of the characteristic data on the model performance23The corresponding weight is 0.4, LCEThe corresponding weight is 0.6, and based on the weighted sum algorithm, the overall training loss function value can be obtained to be 0.4 x L23+0.6*LCE
In summary, in the solution shown in the embodiment of the present application, feature data output after feature extraction is performed on sample graph data by each graph neural network layer is obtained, neighbor difference degree information corresponding to each feature data is obtained through calculation, a non-smoothness degree represented by a graph feature extracted by each graph neural network layer is quantitatively measured, and further, model training is performed by using a first loss function value obtained through calculation of the neighbor difference degree information, so that non-smoothness related information of each of a plurality of graph neural network layers in a model is used to guide a training process of each graph neural network layer, thereby solving an over-smoothness problem of the graph neural network, and further improving model performance of the graph neural network after being updated through training.
The application proposes a self-distillation training strategy of a graph neural network, which can be proposed based on the following idea: the generation of the over-smoothing problem of the graph neural network mainly occurs in the deep layer of the graph neural network, namely the over-smoothing problem may occur after a plurality of information transfer iterations, so that the deep layer of the graph neural network can be supervised and constrained by utilizing the non-smoothing characteristic of the shallow node feature of the graph neural network, and the learning algorithm is guided to punish the graph neural network model generating the over-smoothing node feature. Based on the above idea, the present application defines neighbor difference degree information used to quantitatively measure the degree of non-smoothness represented by the graph feature data extracted by each layer of the graph neural network layer. And then, based on the neighbor difference degree information, a self-distillation training algorithm with non-smoothness layer-by-layer retention for layer-by-layer migration is provided. FIG. 5 is a flowchart illustrating a model processing system for graph data in accordance with an exemplary embodimentAs shown in FIG. 5, the target graphneural network model 52 included in the system is a GNN model including a four-layer graph neural network, and thesample graph data 51, i.e., X, of the target graphneural network model 52 is input(0)Through the neural network of each layer diagram, the finally output prediction information is probability distribution
Figure BDA0003105998820000201
Based on probability distribution
Figure BDA0003105998820000202
And sample map data X(0)Calculates a corresponding second loss function value, i.e. a cross-entropy loss function LCE. Wherein, the output characteristic X of each layer of graph neural network(l)Respectively calculating corresponding knowledge signal representation s through a neighbor difference degree vector calculation module(l). Then, expressing the knowledge signals corresponding to the neural network of each layer diagram as s(l)The distillation regularizationterm calculation module 53 calculates a first loss function value, i.e., a distillation loss function LADRFunction of distillation loss LADRAnd cross entropy loss function LCEAnd adding the values to obtain an overall training loss function value, and iteratively updating the target graphneural network model 52 based on the overall training loss function value.
The model training method for the graph neural network self-distillation, which is carried out by the scheme, has universality and high efficiency. On one hand, the defined neighbor difference degree information can be effectively used for indicating the generalization performance of the trained model; on the other hand, a graph neural network under the distillation training can extract a graph characterization vector with higher quality, and the performance of the model is obviously improved under the condition of increasing a small amount of training overhead.
In summary, in the solution shown in the embodiment of the present application, feature data output after feature extraction is performed on sample graph data by each graph neural network layer is obtained, neighbor difference degree information corresponding to each feature data is obtained through calculation, a non-smoothness degree represented by a graph feature extracted by each graph neural network layer is quantitatively measured, and further, model training is performed by using a first loss function value obtained through calculation of the neighbor difference degree information, so that non-smoothness related information of each of a plurality of graph neural network layers in a model is used to guide a training process of each graph neural network layer, thereby solving an over-smoothness problem of the graph neural network, and further improving model performance of the graph neural network after being updated through training.
Fig. 6 is a block diagram illustrating a model processing apparatus for graph data according to an exemplary embodiment, and as shown in fig. 6, the model processing apparatus for graph data may be implemented as all or part of a computer device in hardware or a combination of hardware and software to perform all or part of the steps of the method shown in the corresponding embodiment of fig. 1 or 3. The model processing apparatus for graph data may include:
asample input module 610 for inputting sample map data into the target map neural network model; the target graph neural network model comprises at least two graph neural network layers;
afeature obtaining module 620, configured to process the sample graph data based on at least two graph neural network layers to obtain at least two feature data; at least two pieces of feature data are obtained by respectively carrying out feature extraction on at least two graph neural network layers;
aninformation determining module 630, configured to determine, based on at least two of the feature data, neighbor difference degree information corresponding to at least two of the feature data; the neighbor difference degree information is used for indicating the non-smoothness degree of at least two feature data;
a lossvalue determining module 640, configured to determine a first loss function value based on at least two of the neighbor difference degree information;
amodel updating module 650, configured to update model parameters of the target graph neural network model based on the first loss function value.
In a possible implementation manner, theinformation determining module 630 includes:
and the information determining submodule is used for determining the neighbor difference degree information corresponding to at least two pieces of feature data based on the at least two pieces of feature data and the granularity of the feature data.
In a possible implementation manner, the information determining sub-module includes:
a neighboring node determining unit, configured to determine, in response to that the granularity of the feature data is a node level, a neighboring node of each node corresponding to the target feature data; the neighbor node is at least one other node directly connected with the corresponding node through an edge; the target feature data is any one of at least two of the feature data;
a virtual node obtaining unit, configured to obtain, based on the respective neighboring nodes of the respective nodes corresponding to the target feature data, virtual neighboring nodes of the respective nodes corresponding to the target feature data;
a difference value determining unit, configured to determine, based on the virtual neighboring nodes of the respective nodes corresponding to the target feature data, respective neighboring difference values of the respective nodes corresponding to the target feature data;
an information determining unit, configured to determine, based on respective neighbor difference values of the nodes corresponding to the target feature data, the neighbor difference information corresponding to the target feature data.
In a possible implementation manner, the virtual node obtaining unit is configured to,
determining a first adjacency matrix corresponding to a target node based on the neighbor node of the target node; the first adjacency matrix is used to indicate a neighboring relationship between the target node and the neighbor node; the target node is any one of the nodes corresponding to the target characteristic data;
obtaining the virtual neighbor node of the target node based on the target feature data, the degree matrix of the target node, and the first adjacency matrix.
In one possible implementation, the obtaining the virtual neighbor node of the target node based on the target feature data, the degree matrix of the target node, and the first adjacency matrix includes:
acquiring category label information corresponding to the target node and the category label information corresponding to the neighbor nodes of the target node;
determining a target neighbor node of the target node from the neighbor nodes of the target node; the class label information of the target neighbor node is different from the class label information of the target node;
acquiring first characteristic data; the first feature data is partial feature data of the target feature data indicating the target node and the target neighbor node;
obtaining the virtual neighbor node of the target node based on the first characteristic data, the degree matrix of the target node, and the first adjacency matrix.
In a possible implementation manner, the virtual node obtaining unit is configured to,
determining the similarity between the virtual neighbor node corresponding to a target node and the target node as the neighbor difference value of the target node corresponding to the target feature data.
In a possible implementation manner, the lossvalue determining module 640 includes:
a most significant value determining submodule, configured to compare the neighboring difference values corresponding to the at least two neighboring difference degree information, and determine a maximum value of the neighboring difference values;
the initial determining submodule is used for determining the graph neural network layer to which the maximum value of each neighbor difference value belongs as an initial network layer;
a first loss value determining submodule, configured to determine the first loss function value based on the neighbor difference degree information of the feature data corresponding to a target graph neural network layer; the target graph neural network is the originating network layer and at least one of the graph neural network layers that is adjacent after the originating network layer.
In a possible implementation manner, the lossvalue determining module 640 includes:
an average value determining sub-module, configured to determine an average value of differences of at least two neighboring difference information, where the average value of differences is an average value of difference values of neighboring included in the corresponding neighboring difference information;
a sub-value determination sub-module, configured to determine a sub-value of a loss function between adjacent network layers of the at least two graph neural network layers based on a difference average value of each of the at least two neighboring difference degree information;
a second loss value determination submodule for determining the first loss function value based on the loss function sub-value between adjacent ones of the at least two graph neural network layers.
In one possible implementation, the sub-value determining sub-module includes:
a first sub-value determining unit, configured to, in response to the difference average value corresponding to the nth graph neural network layer being greater than the difference average value corresponding to the n +1 th graph neural network layer, input the neighbor difference degree information corresponding to the nth graph neural network layer and the neighbor difference degree information corresponding to the n +1 th graph neural network layer into a weighted mean square error loss function, and obtain the loss function sub-value between the nth graph neural network layer and the n +1 th graph neural network layer; n is an integer of 1 or more;
a second sub-value determining unit for determining that the loss function sub-value between the nth graph neural network layer and the n +1 th graph neural network layer is zero in response to the difference average value corresponding to the nth graph neural network layer being less than or equal to the difference average value corresponding to the n +1 th graph neural network layer.
In one possible implementation, themodel updating module 650 includes:
a model updating submodule for updating model parameters of the target graph neural network model based on the first loss function value and the second loss function value; the second loss function value is a cross-entropy loss function value determined based on labeling information of the sample graph data and prediction information output by the target graph neural network model.
In one possible implementation, the model update sub-module includes:
an overall loss value determination unit configured to determine a sum of the first loss function value and the second loss function value as an overall training loss function value;
and the model updating unit is used for updating the model parameters of the target graph neural network model based on the overall training loss function value.
In one possible implementation, the granularity of the feature data includes at least one of a node level, a connection edge level, a subgraph level, and an entire graph level.
In summary, in the solution shown in the embodiment of the present application, feature data output after feature extraction is performed on sample graph data by each graph neural network layer is obtained, neighbor difference degree information corresponding to each feature data is obtained through calculation, a non-smoothness degree represented by a graph feature extracted by each graph neural network layer is quantitatively measured, and further, model training is performed by using a first loss function value obtained through calculation of the neighbor difference degree information, so that non-smoothness related information of each of a plurality of graph neural network layers in a model is used to guide a training process of each graph neural network layer, thereby solving an over-smoothness problem of the graph neural network, and further improving model performance of the graph neural network after being updated through training.
FIG. 7 illustrates a block diagram of acomputer device 700, shown in an exemplary embodiment of the present application. The computer device may be implemented as a server in the above-mentioned aspects of the present application. Thecomputer device 700 includes a Central Processing Unit (CPU) 701, asystem Memory 704 including a Random Access Memory (RAM) 702 and a Read-Only Memory (ROM) 703, and asystem bus 705 connecting thesystem Memory 704 and theCPU 701. Thecomputer device 700 also includes amass storage device 706 for storing anoperating system 709,application programs 710, andother program modules 711.
Themass storage device 706 is connected to thecentral processing unit 701 through a mass storage controller (not shown) connected to thesystem bus 705. Themass storage device 706 and its associated computer-readable media provide non-volatile storage for thecomputer device 700. That is, themass storage device 706 may include a computer-readable medium (not shown) such as a hard disk or Compact Disc-Only Memory (CD-ROM) drive.
Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, CD-ROM, Digital Versatile Disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. Thesystem memory 704 andmass storage device 706 described above may be collectively referred to as memory.
Thecomputer device 700 may also operate as a remote computer connected to a network via a network, such as the internet, in accordance with various embodiments of the present disclosure. That is, thecomputer device 700 may be connected to thenetwork 708 through thenetwork interface unit 707 connected to thesystem bus 705, or thenetwork interface unit 707 may be used to connect to other types of networks or remote computer systems (not shown).
The memory further includes at least one instruction, at least one program, a code set, or a set of instructions, which is stored in the memory, and thecentral processing unit 701 implements all or part of the steps in the model processing method for graph data shown in the above embodiments by executing the at least one instruction, the at least one program, the code set, or the set of instructions.
Fig. 8 shows a block diagram of acomputer device 800 provided in an exemplary embodiment of the present application. Thecomputer device 800 may be implemented as the terminal described above, such as: a smartphone, a tablet, a laptop, or a desktop computer.Computer device 800 may also be referred to by other names such as user equipment, portable terminals, laptop terminals, desktop terminals, and the like.
Generally, thecomputer device 800 includes: aprocessor 801 and amemory 802.
Theprocessor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. Theprocessor 801 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). Theprocessor 801 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, theprocessor 801 may be integrated with a GPU (Graphics Processing Unit) which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, theprocessor 801 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 802 may include one or more computer-readable storage media, which may be non-transitory.Memory 802 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium inmemory 802 is used to store at least one instruction for execution byprocessor 801 to implement all or part of the steps in the model processing method for image classification provided by the method embodiments herein.
In some embodiments, thecomputer device 800 may further optionally include: aperipheral interface 803 and at least one peripheral. Theprocessor 801,memory 802 andperipheral interface 803 may be connected by bus or signal lines. Various peripheral devices may be connected toperipheral interface 803 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of aradio frequency circuit 804, adisplay screen 805, acamera assembly 806, anaudio circuit 807, apositioning assembly 808, and apower supply 809.
Theperipheral interface 803 may be used to connect at least one peripheral related to I/O (Input/Output) to theprocessor 801 and thememory 802. In some embodiments, theprocessor 801,memory 802, andperipheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one or two of theprocessor 801, thememory 802, and theperipheral interface 803 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.
In some embodiments, thecomputer device 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyro sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815 and proximity sensor 816.
Those skilled in the art will appreciate that the configuration illustrated in FIG. 8 is not intended to be limiting of thecomputer device 800 and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components may be employed.
In an exemplary embodiment, a computer readable storage medium is also provided, for storing at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement all or part of the steps of the above-mentioned model processing method for image classification. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product or a computer program is also provided, which comprises computer instructions, which are stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform all or part of the steps of the method described in any of the embodiments of fig. 1 or 3.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (15)

1. A method of model processing for graph data, the method comprising:
inputting the sample graph data into the target graph neural network model; the target graph neural network model comprises at least two graph neural network layers;
processing the sample graph data based on at least two graph neural network layers to obtain at least two characteristic data; at least two pieces of feature data are obtained by respectively carrying out feature extraction on at least two graph neural network layers;
determining neighbor difference degree information corresponding to at least two pieces of feature data based on the at least two pieces of feature data; the neighbor difference degree information is used for indicating the non-smoothness degree of at least two feature data;
determining a first loss function value based on at least two of the neighbor difference degree information;
updating model parameters of the target graph neural network model based on the first loss function values.
2. The method according to claim 1, wherein the determining neighboring difference degree information corresponding to at least two of the feature data based on at least two of the feature data comprises:
determining the neighbor difference degree information corresponding to at least two pieces of the feature data based on the at least two pieces of the feature data and the granularity of the feature data.
3. The method of claim 2, wherein the determining the neighbor difference information corresponding to at least two of the feature data based on the at least two of the feature data and the granularity of the feature data comprises:
determining respective neighbor nodes of each node corresponding to the target characteristic data in response to the granularity of the characteristic data being at the node level; the neighbor node is at least one other node directly connected with the corresponding node through an edge; the target feature data is any one of at least two of the feature data;
acquiring virtual neighbor nodes of each node corresponding to the target characteristic data based on the respective neighbor nodes of each node corresponding to the target characteristic data;
determining respective neighbor difference values of the nodes corresponding to the target characteristic data based on the virtual neighbor nodes of the nodes corresponding to the target characteristic data;
and determining the neighbor difference degree information corresponding to the target characteristic data based on the respective neighbor difference values of the nodes corresponding to the target characteristic data.
4. The method according to claim 3, wherein the obtaining a virtual neighbor node of each node corresponding to the target feature data based on the respective neighbor node of each node corresponding to the target feature data comprises:
determining a first adjacency matrix corresponding to a target node based on the neighbor node of the target node; the first adjacency matrix is used to indicate a neighboring relationship between the target node and the neighbor node; the target node is any one of the nodes corresponding to the target characteristic data;
obtaining the virtual neighbor node of the target node based on the target feature data, the degree matrix of the target node, and the first adjacency matrix.
5. The method of claim 4, wherein the obtaining the virtual neighbor node of the target node based on the target feature data, the degree matrix of the target node, and the first adjacency matrix comprises:
acquiring category label information corresponding to the target node and the category label information corresponding to the neighbor nodes of the target node;
determining a target neighbor node of the target node from the neighbor nodes of the target node; the class label information of the target neighbor node is different from the class label information of the target node;
acquiring first characteristic data; the first feature data is partial feature data of the target feature data indicating the target node and the target neighbor node;
obtaining the virtual neighbor node of the target node based on the first characteristic data, the degree matrix of the target node, and the first adjacency matrix.
6. The method according to claim 3, wherein the determining respective neighbor difference values of the respective nodes corresponding to the target feature data based on the virtual neighbor nodes of the respective nodes corresponding to the target feature data comprises:
determining the similarity between the virtual neighbor node corresponding to a target node and the target node as the neighbor difference value of the target node corresponding to the target feature data.
7. The method of claim 1, wherein determining a first loss function value based on at least two of the neighbor difference degree information comprises:
comparing each neighbor difference value corresponding to at least two neighbor difference degree information to determine the maximum value of each neighbor difference value;
determining the graph neural network layer to which the maximum value of each neighbor difference value belongs as an initial network layer;
determining the first loss function based on the neighbor difference degree information of the feature data corresponding to a target graph neural network layer; the target graph neural network is the originating network layer and at least one of the graph neural network layers that is adjacent after the originating network layer.
8. The method of claim 1, wherein determining a first loss function value based on at least two of the neighbor difference degree information comprises:
determining a difference average value of each of at least two pieces of the neighbor difference degree information, the difference average value being an average value of each neighbor difference value included in the corresponding neighbor difference degree information;
determining a loss function sub-value between adjacent network layers of the at least two graph neural network layers based on respective difference averages of the at least two neighbor difference degree information;
determining the first loss function value based on the loss function sub-value between adjacent ones of at least two of the graph neural network layers.
9. The method of claim 8, wherein determining a loss function sub-value between adjacent ones of the at least two graph neural network layers based on respective difference averages of the at least two neighbor difference degree information comprises:
in response to the difference average corresponding to the nth graph neural network layer being greater than the difference average corresponding to the n +1 th graph neural network layer, inputting the neighbor difference information corresponding to the nth graph neural network layer and the neighbor difference information corresponding to the n +1 th graph neural network layer into a weighted mean square error loss function, obtaining the loss function sub-value between the nth graph neural network layer and the n +1 th graph neural network layer; n is an integer of 1 or more;
determining that the loss function sub-value between the nth graph neural network layer and the n +1 th graph neural network layer is zero in response to the difference average corresponding to the nth graph neural network layer being less than or equal to the difference average corresponding to the n +1 th graph neural network layer.
10. The method of claim 1, wherein updating the model parameters of the target graph neural network model based on the first loss function value comprises:
updating model parameters of the target graph neural network model based on the first loss function value and the second loss function value; the second loss function value is a cross-entropy loss function value determined based on labeling information of the sample graph data and prediction information output by the target graph neural network model.
11. The method of claim 10, wherein updating model parameters of the target graph neural network model based on the first and second loss function values comprises:
(ii) taking the sum of the first loss function value and the second loss function value as an overall training loss function value;
updating the model parameters of the target graph neural network model based on the overall training loss function value.
12. The method of claim 2, wherein the granularity of the feature data comprises at least one of a node level, a connection edge level, a subgraph level, and an entire graph level.
13. A model processing apparatus for graph data, the apparatus comprising:
the sample input module is used for inputting sample map data into the target map neural network model; the target graph neural network model comprises at least two graph neural network layers;
the characteristic acquisition module is used for processing the sample graph data based on at least two graph neural network layers to acquire at least two characteristic data; at least two pieces of feature data are obtained by respectively carrying out feature extraction on at least two graph neural network layers;
the information determining module is used for determining neighbor difference degree information corresponding to at least two pieces of feature data based on the at least two pieces of feature data; the neighbor difference degree information is used for indicating the non-smoothness degree of at least two feature data;
a loss value determination module for determining a first loss function value based on at least two of the neighbor difference degree information;
and the model updating module is used for updating the model parameters of the target graph neural network model based on the first loss function value.
14. A computer device comprising a processor and a memory, the memory storing at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the model processing method for graph data according to any one of claims 1 to 12.
15. A computer-readable storage medium, in which at least one computer program is stored, which is loaded and executed by a processor to implement the model processing method for graph data according to any one of claims 1 to 12.
CN202110636601.1A2021-06-082021-06-08Model processing method, device and equipment for graph data and storage mediumPendingCN113822293A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202110636601.1ACN113822293A (en)2021-06-082021-06-08Model processing method, device and equipment for graph data and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202110636601.1ACN113822293A (en)2021-06-082021-06-08Model processing method, device and equipment for graph data and storage medium

Publications (1)

Publication NumberPublication Date
CN113822293Atrue CN113822293A (en)2021-12-21

Family

ID=78912516

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202110636601.1APendingCN113822293A (en)2021-06-082021-06-08Model processing method, device and equipment for graph data and storage medium

Country Status (1)

CountryLink
CN (1)CN113822293A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114972772A (en)*2022-06-232022-08-30清华大学Method, device, equipment and storage medium for customizing graph neural network architecture

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114972772A (en)*2022-06-232022-08-30清华大学Method, device, equipment and storage medium for customizing graph neural network architecture

Similar Documents

PublicationPublication DateTitle
CN112183577B (en) A training method, image processing method and device for a semi-supervised learning model
CN111553480B (en)Image data processing method and device, computer readable medium and electronic equipment
CN110458107B (en)Method and device for image recognition
CN109522942B (en) An image classification method, device, terminal device and storage medium
CN109993102B (en)Similar face retrieval method, device and storage medium
CN112434721A (en)Image classification method, system, storage medium and terminal based on small sample learning
CN114298122B (en)Data classification method, apparatus, device, storage medium and computer program product
CN113807399A (en)Neural network training method, neural network detection method and neural network detection device
CN110781894B (en) Point cloud semantic segmentation method, device and electronic device
CN113657087B (en)Information matching method and device
CN115131698B (en)Video attribute determining method, device, equipment and storage medium
CN116310318B (en)Interactive image segmentation method, device, computer equipment and storage medium
CN113515669B (en) Data processing method and related equipment based on artificial intelligence
CN113392317A (en)Label configuration method, device, equipment and storage medium
CN113609337A (en)Pre-training method, device, equipment and medium of graph neural network
CN113704528B (en)Cluster center determining method, device and equipment and computer storage medium
CN113705293A (en)Image scene recognition method, device, equipment and readable storage medium
CN116701706B (en)Data processing method, device, equipment and medium based on artificial intelligence
WO2022100607A1 (en)Method for determining neural network structure and apparatus thereof
CN112668608B (en)Image recognition method and device, electronic equipment and storage medium
CN116977265A (en)Training method and device for defect detection model, computer equipment and storage medium
CN113822293A (en)Model processing method, device and equipment for graph data and storage medium
CN117058498A (en)Training method of segmentation map evaluation model, and segmentation map evaluation method and device
CN117010480A (en)Model training method, device, equipment, storage medium and program product
CN116049660B (en)Data processing method, apparatus, device, storage medium, and program product

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp