the virtual neighbor nodes of the target nodes v corresponding to the characteristic data output by the neural network layer of the graph at the l-th layer are represented, D is a degree matrix of the nodes, A is a graph data node adjacency matrix, and X is a graph data node adjacency matrix^(l)The feature matrix is composed of feature vectors of neighbor nodes of each node output by the neural network layer of the first layer graph.

In another possible implementation manner, the category label information corresponding to the target node and the category label information corresponding to the neighbor node of the target node are obtained, the target neighbor node of the target node is determined from the neighbor nodes of the target node, the first feature data is obtained, and the virtual neighbor node of the target node is obtained based on the first feature data, the degree matrix of the target node, and the first adjacency matrix.

The class label information of the target neighboring node is different from the class label information of the target node, and the first feature data is part of feature data used for indicating the target node and the target neighboring node in the target feature data.

That is to say, the model training device may extract each neighboring node corresponding to the target node first, then obtain category label information corresponding to the target node and each neighboring node, determine a target neighboring node that meets the condition based on a condition that the category label information of the target neighboring node is different from the category label information of the target node, determine a feature matrix composed of feature vectors of the target neighboring node as first feature data, then perform point multiplication on the first neighboring matrix and the first feature data to obtain a summation result of each target neighboring node, perform point multiplication on the summation result of each target neighboring node and an inverse matrix of a degree matrix of the target node to obtain an average value obtained by dividing the summation result of each target neighboring node by the degree.

Alternatively, after considering the class label information of the node, the calculation formula of the virtual neighbor node is as follows,

wherein,

the virtual neighbor nodes of the target nodes v corresponding to the characteristic data output by the graph neural network layer of the l-th layer are represented, D is a degree matrix of the nodes, A is a graph data node adjacency matrix,

the characteristic matrix is composed of characteristic vectors of target neighbor nodes of all nodes output by the neural network layer of the graph at the l-th layer, and y (v) is the class label information of the target node v.

In a possible implementation manner, the similarity between the virtual neighboring node corresponding to the target node and the target node is determined as a neighboring difference value of the target node corresponding to the target feature data.

The model training device may calculate a similarity between the target node and the virtual neighboring node corresponding to the target node obtained through the calculation, and convert the calculated similarity value into a distance index.

Optionally, the method for calculating the similarity between the virtual neighboring node corresponding to the target node and the target node is a cosine distance similarity measurement algorithm, or a similarity measurement algorithm based on a Radial Basis Function (RBF) kernel Function.

Wherein, in response to the cosine distance similarity measurement algorithm and without considering the class label information of the node, the calculation of the neighbor difference value can be carried out by the following formula,

wherein,

is a neighbor difference value calculated by a target node v corresponding to characteristic data output by the neural network layer of the first layer graph,

the feature matrix is composed of feature vectors of neighbor nodes of a target node v output by the neural network layer of the first layer graph.

In one possible implementation manner, the neighbor difference degree information corresponding to the target feature data is determined in response to obtaining respective neighbor difference values of nodes corresponding to the target feature data.

For example, in response to that the neighbor difference information is a neighbor difference vector composed of neighbor difference values, after obtaining neighbor difference values of each node corresponding to the graph neural network layer by the above-mentioned neighbor difference value calculation method, the neighbor difference vector corresponding to the graph neural network layer may be composed. That is, the corresponding neighbor disparity vector of the neural network layer of the ith layer map can be

Where N is the number of nodes.

By way of example, the entire graph-level representation of the feature data may be defined as the following formula,

where P is the number of neural network layers, M is the number of all nodes, G^(P)Is a graph-level feature matrix obtained by converging feature vectors of all nodes on a graph output by a P-layer graph neural network,

the feature matrix is composed of feature vectors of neighbor nodes of a target node v output by a P-layer graph neural network layer. Can be used forAnd taking the feature matrix of the graph grade as neighbor difference degree information, namely knowledge representation, of self-distillation training.

The feature data representation for the sub-graph level can be defined as the following formula,

where s is the number of randomly sampled subgraphs and SGi represents the ith subgraph.

For the feature data of the edge level, the original node adjacency matrix can be based on, and the adjacency matrix of the edge is generated, the definition of the adjacency matrix of the edge is shown as follows,

that is, if edge i and edge j have a common vertex, [ A ] A^e]_i,jIs 1, otherwise is 0.

The neighbor variance information for the side-ranked self-distillation training may be the following formula,

wherein E represents an edge feature matrix, and M represents a total of M edges.

In one possible implementation, in order to reduce the difficulty of the distillation learning of the single model, the retention of the neighboring difference degree information adopts a progressive migration method. Namely, the neighbor difference degree information of the adjacent next graph neural network layer is matched with the neighbor difference degree information of the graph neural network layer of the previous layer.

The neighbor difference information of the next graph neural network layer is matched with the neighbor difference information of the previous graph neural network layer, and the neighbor difference information of the next graph neural network layer can be fitted with the neighbor difference information of the previous graph neural network layer.

Fig. 4 is a schematic diagram illustrating a method for calculating neighbor difference degree information corresponding to a target node according to an embodiment of the present application. As shown in fig. 4, if thesample graph data 41 inputted into the neural network model of the target graph includesnode 1,node 2,node 3,node 4 andnode 5, then, the correspondingneighbor difference value 42 of each node under the characteristic data is calculated, if the target node is thenode 3, the first-order neighbor node of thenode 3 can be firstly determined as the neighbor node, namely, it is determined that thenode 2, thenode 4 and thenode 5 are neighbor nodes of thenode 3, then a virtual neighbor node is obtained by calculation according to thenode 2, thenode 4 and thenode 5, finally similarity calculation is carried out according to the virtual neighbor node and thenode 3, a neighbor difference value S3 is obtained, the calculation is carried out on thenode 1, thenode 2, thenode 3, thenode 4 and thenode 5 respectively through the method, the corresponding neighborhooddifference degree information 43 under the feature data can be obtained (S1, S2, S3, S4, S5).

Step 304, determining a first loss function value based on the at least two neighboring difference degree information.

In the embodiment of the application, the model training device calculates a first loss function value for model updating of the target graph neural network model based on the determined neighbor difference information corresponding to each graph neural network layer by adaptively performing an Adaptive neighbor difference retrieval (ADR) distillation strategy on the neighbor difference information.

In a possible implementation manner, each neighboring difference value corresponding to at least two neighboring difference information is compared, a maximum value of each neighboring difference value is determined, the graph neural network layer to which the maximum value of each neighboring difference value belongs is determined as an initial network layer, and the first loss function is determined based on the neighboring difference information of the feature data corresponding to the target graph neural network layer.

The target graph neural network is an initial network layer and at least one graph neural network layer adjacent to the initial network layer.

Since the initial graph data input into the target neural network model may contain higher sparsity and more noise, the accuracy of the neighbor difference value of each node calculated by the graph neural network layer which performs feature extraction first is low. This requires the selection of neighbor disparity values that account for the neighbor disparity vector.

In one possible implementation, the model training device automatically selects the graph neural network layer with the highest neighbor difference value as the initial network layer, which may also be referred to as an initial supervision target.

Wherein, the calculation formula for determining the number of network layers of the initial network layer is as follows,

l^*＝argmax_k{||s^(k)|||k∈{1,…,L-1}}

wherein l^*Is the number of network layers of the starting network layer.

In one possible implementation, a difference average value of each of the at least two neighbor difference degree information is determined, a loss function sub-value between adjacent network layers of the at least two graph neural network layers is determined based on the difference average value of each of the at least two neighbor difference degree information, and a first loss function value is determined based on the loss function sub-value between adjacent network layers of the at least two graph neural network layers.

The difference average value is an average value of each neighboring difference value included in the corresponding neighboring difference degree information.

That is to say, when the difference value of each neighbor corresponding to the neural network layer of the previous layer of the graph is larger than the difference value of each neighbor corresponding to the neural network layer of the next layer of the graph in the average sense, applying distillation canonical constraint to determine the loss function sub-value between the neural network layers of the previous and next graphs.

Illustratively, in response to the difference average value corresponding to the nth graph neural network layer being greater than the difference average value corresponding to the (n + 1) th graph neural network layer, inputting the neighbor difference degree information corresponding to the nth graph neural network layer and the neighbor difference degree information corresponding to the (n + 1) th graph neural network layer into a weighted mean square error loss function, obtaining a loss function sub-value between the nth graph neural network layer and the (n + 1) th graph neural network layer, where n is an integer greater than or equal to 1, and in response to the difference average value corresponding to the nth graph neural network layer being less than or equal to the difference average value corresponding to the (n + 1) th graph neural network layer, determining that the loss function sub-value between the nth graph neural network layer and the (n + 1) th graph neural network layer is zero.

In a possible implementation manner, if the difference average value corresponding to the ith graph neural network layer is greater than the difference average value corresponding to the (l + 1) th graph neural network layer, the loss function sub-value between the ith graph neural network layer and the (l + 1) th graph neural network layer is calculated as follows,

wherein, the SG (-) function is an instruction of gradient truncation operation, i.e. in the training process, the gradient return of the target neighbor difference vector is aborted and is regarded as a supervision signal.

In one possible implementation, the determining of the loss function sub-value between adjacent ones of the at least two graph neural network layers is performed based on a difference average value of each of the at least two neighbor difference degree information, and the determining of the first loss function value is performed after the determining of the target graph neural network layer is performed based on the loss function sub-value between adjacent ones of the at least two graph neural network layers.

That is, a difference average value of each of the neighbor difference degree information corresponding to the target graph neural network layer is determined, a loss function sub-value between adjacent network layers in the target graph neural network layer is determined based on the difference average value of each of the neighbor difference degree information corresponding to the target graph neural network layer, and the first loss function value is determined based on the loss function sub-value between adjacent network layers in the target graph neural network layer.

Wherein, the target graph neural network layer may refer to each adjacent network layer starting from the starting network layer.

In one possible implementation, since nodes in different connection density regions are considered to have different smoothing rates, the degree of the node may be used to weight it when matching the neighboring disparity vector. Based on the above considerations, the first loss function corresponding to the target graph neural network model may be an distillation regularization term (ADR), which is expressed as the following formula,

wherein 1 (-) is an indicator function for teacher selection under knowledge distillation framework, automatically selecting the initial network layer as the initial supervision target, d²(s^(l+1),s^(l)) For loss function sub-values between the ith graph neural network layer and the (l + 1) th graph neural network layer, add/^*Adding the loss function sub-values between L-1 to obtain a first loss function value L_ADR。

Illustratively, the first loss function corresponding to the feature data for the entire graph scale, i.e., the self-distillation training loss function, may be the following formula,

step 305, updating model parameters of the target graph neural network model based on the first loss function value and the second loss function value.

In this embodiment, after the model training device determines the first loss function value and the second loss function value, the model training device updates the model parameters of the target graph neural network model based on the first loss function value and the second loss function value.

And the second loss function value is a cross-entropy loss function value determined based on the labeling information of the sample graph data and the prediction information output by the target graph neural network model.

In one possible implementation, the sum of the first loss function value and the second loss function value is used as an overall training loss function value, and model parameters of the target graph neural network model are updated based on the overall training loss function value.

Illustratively, the neural network layer is responsive to the determined targetmap isFrom layer 2 network to layer 4 network, the determined loss function sub-value betweenlayer 2 network andlayer 3 network is L₂₃If the sub-value of the loss function between the layer-3 network and the layer-4 network is 0, the first loss function value is L₂₃+0, i.e. the first loss function value is L₂₃According to the conventional model training process, since the second loss function is a cross-entropy loss function, it can be determined that the second loss function value is L based on the prediction information of the sample map data output by the target map neural network model and the label information of the sample map data_CEIt can be determined that the overall training loss function value is L₂₃+L_CEBased on L₁₃+L_CEAnd updating the model parameters of the target graph neural network model.

In one possible implementation, when the model parameter of the target graph neural network model is updated based on the sum of the first loss function value and the second loss function value, the at least two graph neural network layers are updated by the sum of the first loss function value and the second loss function value, and other parts except the at least two graph neural network layers in the target graph neural network model are updated by the second loss function value.

Illustratively, if the target graph neural network model includes a three-layer graph neural network layer, a fully-connected layer, and a prediction output layer, when the first loss function value and the second loss function value are calculated based on the above method, the three-layer graph neural network layer may be updated by the sum of the first loss function and the second loss function, and meanwhile, other parts of the graph neural network layer including the fully-connected layer and the prediction output layer are directly updated by the second loss function value, and the other parts of the graph neural network layer do not involve the over-smoothing problem.

In one possible implementation, instead of directly taking the sum of the first loss function value and the second loss function value as the overall training loss function value, corresponding weights may be set for the first loss function value and the second loss function value, respectively, and the overall training loss function value may be determined by means of weighted summation based on the set weights.

For example, if the determined first loss function value is L₂₃The second loss function value is L_CEL is set according to the influence degree of the non-smoothness degree of the characteristic data on the model performance₂₃The corresponding weight is 0.4, L_CEThe corresponding weight is 0.6, and based on the weighted sum algorithm, the overall training loss function value can be obtained to be 0.4 x L₂₃+0.6*L_CE。

The application proposes a self-distillation training strategy of a graph neural network, which can be proposed based on the following idea: the generation of the over-smoothing problem of the graph neural network mainly occurs in the deep layer of the graph neural network, namely the over-smoothing problem may occur after a plurality of information transfer iterations, so that the deep layer of the graph neural network can be supervised and constrained by utilizing the non-smoothing characteristic of the shallow node feature of the graph neural network, and the learning algorithm is guided to punish the graph neural network model generating the over-smoothing node feature. Based on the above idea, the present application defines neighbor difference degree information used to quantitatively measure the degree of non-smoothness represented by the graph feature data extracted by each layer of the graph neural network layer. And then, based on the neighbor difference degree information, a self-distillation training algorithm with non-smoothness layer-by-layer retention for layer-by-layer migration is provided. FIG. 5 is a flowchart illustrating a model processing system for graph data in accordance with an exemplary embodimentAs shown in FIG. 5, the target graphneural network model 52 included in the system is a GNN model including a four-layer graph neural network, and thesample graph data 51, i.e., X, of the target graphneural network model 52 is input⁽⁰⁾Through the neural network of each layer diagram, the finally output prediction information is probability distribution

Based on probability distribution

And sample map data X⁽⁰⁾Calculates a corresponding second loss function value, i.e. a cross-entropy loss function L_CE. Wherein, the output characteristic X of each layer of graph neural network^(l)Respectively calculating corresponding knowledge signal representation s through a neighbor difference degree vector calculation module^(l). Then, expressing the knowledge signals corresponding to the neural network of each layer diagram as s^(l)The distillation regularizationterm calculation module 53 calculates a first loss function value, i.e., a distillation loss function L_ADRFunction of distillation loss L_ADRAnd cross entropy loss function L_CEAnd adding the values to obtain an overall training loss function value, and iteratively updating the target graphneural network model 52 based on the overall training loss function value.

The model training method for the graph neural network self-distillation, which is carried out by the scheme, has universality and high efficiency. On one hand, the defined neighbor difference degree information can be effectively used for indicating the generalization performance of the trained model; on the other hand, a graph neural network under the distillation training can extract a graph characterization vector with higher quality, and the performance of the model is obviously improved under the condition of increasing a small amount of training overhead.

Fig. 6 is a block diagram illustrating a model processing apparatus for graph data according to an exemplary embodiment, and as shown in fig. 6, the model processing apparatus for graph data may be implemented as all or part of a computer device in hardware or a combination of hardware and software to perform all or part of the steps of the method shown in the corresponding embodiment of fig. 1 or 3. The model processing apparatus for graph data may include:

asample input module 610 for inputting sample map data into the target map neural network model; the target graph neural network model comprises at least two graph neural network layers;

afeature obtaining module 620, configured to process the sample graph data based on at least two graph neural network layers to obtain at least two feature data; at least two pieces of feature data are obtained by respectively carrying out feature extraction on at least two graph neural network layers;

aninformation determining module 630, configured to determine, based on at least two of the feature data, neighbor difference degree information corresponding to at least two of the feature data; the neighbor difference degree information is used for indicating the non-smoothness degree of at least two feature data;

a lossvalue determining module 640, configured to determine a first loss function value based on at least two of the neighbor difference degree information;

amodel updating module 650, configured to update model parameters of the target graph neural network model based on the first loss function value.

In a possible implementation manner, theinformation determining module 630 includes:

In a possible implementation manner, the lossvalue determining module 640 includes:

In one possible implementation, the sub-value determining sub-module includes:

In one possible implementation, themodel updating module 650 includes:

In one possible implementation, the model update sub-module includes:

FIG. 7 illustrates a block diagram of acomputer device 700, shown in an exemplary embodiment of the present application. The computer device may be implemented as a server in the above-mentioned aspects of the present application. Thecomputer device 700 includes a Central Processing Unit (CPU) 701, asystem Memory 704 including a Random Access Memory (RAM) 702 and a Read-Only Memory (ROM) 703, and asystem bus 705 connecting thesystem Memory 704 and theCPU 701. Thecomputer device 700 also includes amass storage device 706 for storing anoperating system 709,application programs 710, andother program modules 711.

Themass storage device 706 is connected to thecentral processing unit 701 through a mass storage controller (not shown) connected to thesystem bus 705. Themass storage device 706 and its associated computer-readable media provide non-volatile storage for thecomputer device 700. That is, themass storage device 706 may include a computer-readable medium (not shown) such as a hard disk or Compact Disc-Only Memory (CD-ROM) drive.

Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, CD-ROM, Digital Versatile Disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. Thesystem memory 704 andmass storage device 706 described above may be collectively referred to as memory.

Thecomputer device 700 may also operate as a remote computer connected to a network via a network, such as the internet, in accordance with various embodiments of the present disclosure. That is, thecomputer device 700 may be connected to thenetwork 708 through thenetwork interface unit 707 connected to thesystem bus 705, or thenetwork interface unit 707 may be used to connect to other types of networks or remote computer systems (not shown).

The memory further includes at least one instruction, at least one program, a code set, or a set of instructions, which is stored in the memory, and thecentral processing unit 701 implements all or part of the steps in the model processing method for graph data shown in the above embodiments by executing the at least one instruction, the at least one program, the code set, or the set of instructions.

Fig. 8 shows a block diagram of acomputer device 800 provided in an exemplary embodiment of the present application. Thecomputer device 800 may be implemented as the terminal described above, such as: a smartphone, a tablet, a laptop, or a desktop computer.Computer device 800 may also be referred to by other names such as user equipment, portable terminals, laptop terminals, desktop terminals, and the like.

Generally, thecomputer device 800 includes: aprocessor 801 and amemory 802.

Theprocessor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. Theprocessor 801 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). Theprocessor 801 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, theprocessor 801 may be integrated with a GPU (Graphics Processing Unit) which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, theprocessor 801 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 802 may include one or more computer-readable storage media, which may be non-transitory.Memory 802 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium inmemory 802 is used to store at least one instruction for execution byprocessor 801 to implement all or part of the steps in the model processing method for image classification provided by the method embodiments herein.

In some embodiments, thecomputer device 800 may further optionally include: aperipheral interface 803 and at least one peripheral. Theprocessor 801,memory 802 andperipheral interface 803 may be connected by bus or signal lines. Various peripheral devices may be connected toperipheral interface 803 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of aradio frequency circuit 804, adisplay screen 805, acamera assembly 806, anaudio circuit 807, apositioning assembly 808, and apower supply 809.

Theperipheral interface 803 may be used to connect at least one peripheral related to I/O (Input/Output) to theprocessor 801 and thememory 802. In some embodiments, theprocessor 801,memory 802, andperipheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one or two of theprocessor 801, thememory 802, and theperipheral interface 803 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.

In some embodiments, thecomputer device 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyro sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815 and proximity sensor 816.

Those skilled in the art will appreciate that the configuration illustrated in FIG. 8 is not intended to be limiting of thecomputer device 800 and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components may be employed.

In an exemplary embodiment, a computer readable storage medium is also provided, for storing at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement all or part of the steps of the above-mentioned model processing method for image classification. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product or a computer program is also provided, which comprises computer instructions, which are stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform all or part of the steps of the method described in any of the embodiments of fig. 1 or 3.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method of model processing for graph data, the method comprising:

2. The method according to claim 1, wherein the determining neighboring difference degree information corresponding to at least two of the feature data based on at least two of the feature data comprises:

determining the neighbor difference degree information corresponding to at least two pieces of the feature data based on the at least two pieces of the feature data and the granularity of the feature data.

3. The method of claim 2, wherein the determining the neighbor difference information corresponding to at least two of the feature data based on the at least two of the feature data and the granularity of the feature data comprises:

determining respective neighbor nodes of each node corresponding to the target characteristic data in response to the granularity of the characteristic data being at the node level; the neighbor node is at least one other node directly connected with the corresponding node through an edge; the target feature data is any one of at least two of the feature data;

acquiring virtual neighbor nodes of each node corresponding to the target characteristic data based on the respective neighbor nodes of each node corresponding to the target characteristic data;

determining respective neighbor difference values of the nodes corresponding to the target characteristic data based on the virtual neighbor nodes of the nodes corresponding to the target characteristic data;

and determining the neighbor difference degree information corresponding to the target characteristic data based on the respective neighbor difference values of the nodes corresponding to the target characteristic data.

4. The method according to claim 3, wherein the obtaining a virtual neighbor node of each node corresponding to the target feature data based on the respective neighbor node of each node corresponding to the target feature data comprises:

5. The method of claim 4, wherein the obtaining the virtual neighbor node of the target node based on the target feature data, the degree matrix of the target node, and the first adjacency matrix comprises:

6. The method according to claim 3, wherein the determining respective neighbor difference values of the respective nodes corresponding to the target feature data based on the virtual neighbor nodes of the respective nodes corresponding to the target feature data comprises:

7. The method of claim 1, wherein determining a first loss function value based on at least two of the neighbor difference degree information comprises:

comparing each neighbor difference value corresponding to at least two neighbor difference degree information to determine the maximum value of each neighbor difference value;

determining the graph neural network layer to which the maximum value of each neighbor difference value belongs as an initial network layer;

determining the first loss function based on the neighbor difference degree information of the feature data corresponding to a target graph neural network layer; the target graph neural network is the originating network layer and at least one of the graph neural network layers that is adjacent after the originating network layer.

8. The method of claim 1, wherein determining a first loss function value based on at least two of the neighbor difference degree information comprises:

determining a difference average value of each of at least two pieces of the neighbor difference degree information, the difference average value being an average value of each neighbor difference value included in the corresponding neighbor difference degree information;

determining a loss function sub-value between adjacent network layers of the at least two graph neural network layers based on respective difference averages of the at least two neighbor difference degree information;

determining the first loss function value based on the loss function sub-value between adjacent ones of at least two of the graph neural network layers.

9. The method of claim 8, wherein determining a loss function sub-value between adjacent ones of the at least two graph neural network layers based on respective difference averages of the at least two neighbor difference degree information comprises:

in response to the difference average corresponding to the nth graph neural network layer being greater than the difference average corresponding to the n +1 th graph neural network layer, inputting the neighbor difference information corresponding to the nth graph neural network layer and the neighbor difference information corresponding to the n +1 th graph neural network layer into a weighted mean square error loss function, obtaining the loss function sub-value between the nth graph neural network layer and the n +1 th graph neural network layer; n is an integer of 1 or more;

determining that the loss function sub-value between the nth graph neural network layer and the n +1 th graph neural network layer is zero in response to the difference average corresponding to the nth graph neural network layer being less than or equal to the difference average corresponding to the n +1 th graph neural network layer.

10. The method of claim 1, wherein updating the model parameters of the target graph neural network model based on the first loss function value comprises:

updating model parameters of the target graph neural network model based on the first loss function value and the second loss function value; the second loss function value is a cross-entropy loss function value determined based on labeling information of the sample graph data and prediction information output by the target graph neural network model.

11. The method of claim 10, wherein updating model parameters of the target graph neural network model based on the first and second loss function values comprises:

(ii) taking the sum of the first loss function value and the second loss function value as an overall training loss function value;

updating the model parameters of the target graph neural network model based on the overall training loss function value.

12. The method of claim 2, wherein the granularity of the feature data comprises at least one of a node level, a connection edge level, a subgraph level, and an entire graph level.

13. A model processing apparatus for graph data, the apparatus comprising:

14. A computer device comprising a processor and a memory, the memory storing at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the model processing method for graph data according to any one of claims 1 to 12.

15. A computer-readable storage medium, in which at least one computer program is stored, which is loaded and executed by a processor to implement the model processing method for graph data according to any one of claims 1 to 12.