Movatterモバイル変換


[0]ホーム

URL:


CN115935941A - Electric power service system data alignment method based on graph convolution neural network - Google Patents

Electric power service system data alignment method based on graph convolution neural network
Download PDF

Info

Publication number
CN115935941A
CN115935941ACN202211606845.6ACN202211606845ACN115935941ACN 115935941 ACN115935941 ACN 115935941ACN 202211606845 ACN202211606845 ACN 202211606845ACN 115935941 ACN115935941 ACN 115935941A
Authority
CN
China
Prior art keywords
node
graph
nodes
neural network
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211606845.6A
Other languages
Chinese (zh)
Other versions
CN115935941B (en
Inventor
蔡宇翔
蒋鑫
付婷
倪文书
王川丰
杨启帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Fujian Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Fujian Electric Power Co Ltd
Original Assignee
State Grid Fujian Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Fujian Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Fujian Electric Power Co Ltd, Information and Telecommunication Branch of State Grid Fujian Electric Power Co LtdfiledCriticalState Grid Fujian Electric Power Co Ltd
Priority to CN202211606845.6ApriorityCriticalpatent/CN115935941B/en
Publication of CN115935941ApublicationCriticalpatent/CN115935941A/en
Application grantedgrantedCritical
Publication of CN115935941BpublicationCriticalpatent/CN115935941B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明涉及一种基于图卷积神经网络的电力业务系统数据对齐方法,包括:对设备台账信息数据进行清洗、预处理;根据实体及实体间的关系构建知识图谱,获取知识图谱间预对齐的实体对;将知识图谱输入图自注意力卷积神经网络进行训练,将实体对作为对齐种子,作为图自注意力卷积神经网络的监督信息;通过图自注意力卷积神经网络得到各节点的嵌入向量表示,计算知识图谱间各节点的相似性,将最相似的两个节点作为对齐节点;根据对齐节点对待对齐电力业务系统实体的属性数据进行改写。本发明通过训练好的网络实体对齐模型,关联不同业务系统等价实体,预测多个电力业务系统中指向真实世界中的同一对象的数据的对应情况,保障不同数据源数据的可靠性。

Figure 202211606845

The invention relates to a method for aligning data in a power business system based on a graph convolutional neural network, including: cleaning and preprocessing equipment ledger information data; constructing a knowledge graph according to entities and relationships between entities, and obtaining pre-alignment between knowledge graphs Entity pairs; Input the knowledge map into the graph self-attention convolutional neural network for training, and use the entity pair as the alignment seed and the supervision information of the graph self-attention convolutional neural network; through the graph self-attention convolutional neural network, each The embedding vector representation of the node calculates the similarity of each node between the knowledge graphs, and takes the two most similar nodes as the alignment node; according to the alignment node, the attribute data of the power business system entity to be aligned is rewritten. The invention uses a trained network entity alignment model to associate equivalent entities of different business systems, predicts the corresponding situation of data pointing to the same object in the real world in multiple power business systems, and ensures the reliability of data from different data sources.

Figure 202211606845

Description

Translated fromChinese
一种基于图卷积神经网络的电力业务系统数据对齐方法A data alignment method for power business system based on graph convolutional neural network

技术领域Technical Field

本发明涉及一种基于图卷积神经网络的电力业务系统数据对齐方法,属于电力业务数据处理技术领域。The invention relates to a data alignment method for an electric power business system based on a graph convolutional neural network, and belongs to the technical field of electric power business data processing.

背景技术Background Art

电力行业的深化改革要求电力企业进一步搞好信息化建设,进而更大程度实现信息互联和资源共享,使电力企业实现对数据资源的有效管理,充分挖掘数据资源的价值,实现降本增益的目标,从而进一步拓展行业空间。The deepening reform of the power industry requires power companies to further improve their information construction, and then realize information interconnection and resource sharing to a greater extent, so that power companies can effectively manage data resources, fully tap the value of data resources, and achieve the goal of reducing costs and increasing profits, thereby further expanding the industry space.

但是,在运用电力业务系统过程中,不同业务部门对设备信息的记录数据会存在不一致的情况,并且随着时间增长,电力网络数据日益复杂,数据在不同电力业务系统上传播时频繁复制、修改,导致不同业务系统中数据的可靠性降低。这时需要对不同业务系统中表示同一设备的信息进行修改,保证数据的可靠性与一致性。However, in the process of using the power business system, different business departments may have inconsistent data on equipment information. As time goes by, power network data becomes increasingly complex, and data is frequently copied and modified when it is transmitted on different power business systems, resulting in reduced reliability of data in different business systems. At this time, it is necessary to modify the information representing the same device in different business systems to ensure data reliability and consistency.

发明内容Summary of the invention

为了克服上述问题,本发明提供一种基于图卷积神经网络的电力业务系统数据对齐方法,该方法通过训练好的网络实体对齐模型,关联不同业务系统等价实体,预测多个电力业务系统中指向真实世界中的同一对象的数据的对应情况,保障不同数据源数据的可靠性。In order to overcome the above problems, the present invention provides a data alignment method for an electric power business system based on a graph convolutional neural network. The method associates equivalent entities of different business systems through a trained network entity alignment model, predicts the correspondence of data pointing to the same object in the real world in multiple electric power business systems, and ensures the reliability of data from different data sources.

本发明的技术方案如下:The technical solution of the present invention is as follows:

一种基于图卷积神经网络的电力业务系统数据对齐方法,包括:A data alignment method for a power business system based on a graph convolutional neural network, comprising:

获取两待对齐电力业务系统的设备台账信息数据,对所述设备台账信息数据进行清洗、预处理;Acquire equipment ledger information data of two power business systems to be aligned, and clean and pre-process the equipment ledger information data;

分别获取各电力业务系统中的实体及所述实体间的关系,根据所述实体及所述实体间的关系构建知识图谱,获取两所述知识图谱间预对齐的实体对,其中,所述实体为所述知识图谱的节点,所述实体间的关系为所述知识图谱的边;Respectively obtain entities in each power business system and the relationships between the entities, construct a knowledge graph based on the entities and the relationships between the entities, and obtain pre-aligned entity pairs between the two knowledge graphs, wherein the entities are nodes of the knowledge graph and the relationships between the entities are edges of the knowledge graph;

将所述知识图谱输入图自注意力卷积神经网络进行训练,将所述实体对作为对齐种子,作为所述图自注意力卷积神经网络训练时的监督信息;Inputting the knowledge graph into a graph self-attention convolutional neural network for training, and using the entity pairs as alignment seeds and as supervision information during the training of the graph self-attention convolutional neural network;

通过所述图自注意力卷积神经网络得到各所述节点的嵌入向量表示,计算所述知识图谱间各节点的相似性,将最相似的两个节点作为对齐节点;Obtaining an embedded vector representation of each of the nodes through the graph self-attention convolutional neural network, calculating the similarity of each node between the knowledge graphs, and taking the two most similar nodes as alignment nodes;

根据对齐节点对待对齐电力业务系统实体的属性数据进行改写。The attribute data of the electric power business system entity to be aligned is rewritten according to the alignment node.

进一步的,对所述设备台账信息数据进行清洗、预处理,具体为剔除所述设备台账信息数据中损坏的数据,将剩余的数据处理为CSV格式。Furthermore, the equipment inventory information data is cleaned and preprocessed, specifically, damaged data in the equipment inventory information data is removed, and the remaining data is processed into a CSV format.

进一步,根据所述实体及所述实体间的关系构建知识图谱,获取两所述知识图谱间预对齐的实体对,具体为:Further, a knowledge graph is constructed according to the entities and the relationships between the entities, and entity pairs pre-aligned between the two knowledge graphs are obtained, specifically:

提取知识图谱的节点,具体为将所述设备台账信息数据中用于区分实体的属性作为所述知识图谱的节点,将所述实体的文本信息字段作为所述节点的文本特征;Extracting nodes of the knowledge graph, specifically, using the attributes used to distinguish entities in the equipment ledger information data as nodes of the knowledge graph, and using the text information field of the entity as text features of the node;

通过语言模型提取所述文本特征的语义信息,得到节点的嵌入向量表示;Extracting semantic information of the text features through a language model to obtain an embedded vector representation of the node;

根据所述实体间的关系构造所述节点的边;Constructing edges of the nodes according to the relationships between the entities;

得到两待对齐的知识图谱GkGet two knowledge graphs Gk to be aligned:

Gk={Ek,Rk,Tk};Gk = {Ek , Rk , Tk };

其中,k=1或2,Ek、Rk和Tk分别为所述知识图谱中节点的集合、实体关系的集合和三元组<e1,r,e2>,e1,e2∈E,r∈R,r为实体e1和实体e2之间的关系;Wherein, k=1 or 2, Ek , Rk and Tk are respectively the set of nodes, the set of entity relations and the triple <e1 , r, e2 > in the knowledge graph, e1 , e2 ∈E, r∈R, r is the relation between entity e1 and entity e2 ;

根据所述实体具有唯一值的属性对实体进行预对齐,得到实体对,实体对集合S为:The entities are pre-aligned according to the attributes of the entities with unique values to obtain entity pairs. The entity pair set S is:

Figure BDA0003998902990000021
Figure BDA0003998902990000021

其中,x、y分别为知识图谱G1、G2的实体节点。Among them, x and y are the entity nodes of the knowledge graphsG1 andG2 respectively.

进一步,所述语言模型为LaBSE模型。Furthermore, the language model is a LaBSE model.

进一步,将所述知识图谱输入图自注意力卷积神经网络进行训练,将所述实体对作为对齐种子,作为所述图自注意力卷积神经网络训练时的监督信息,具体为:Furthermore, the knowledge graph is input into a graph self-attention convolutional neural network for training, and the entity pairs are used as alignment seeds and as supervision information during the training of the graph self-attention convolutional neural network, specifically:

S1、将所述实体对集合S以预设比例划分为训练集与验证集,测试集为未对齐的节点;S1, dividing the entity pair set S into a training set and a validation set according to a preset ratio, and the test set is the unaligned nodes;

使用单层图注意力卷积神经网络对知识图谱中的各节点的邻居信息进行聚合操作,具体为:A single-layer graph attention convolutional neural network is used to aggregate the neighbor information of each node in the knowledge graph, specifically:

S2、随机采样目标节点的20个邻居节点作为邻居集合Ni,所述邻居节点包含目标节点本身,通过所述邻居集合Ni中各邻居节点的嵌入向量表示

Figure BDA0003998902990000022
来更新目标节点的嵌入向量表示
Figure BDA0003998902990000023
聚合公式如下:S2. Randomly sample 20 neighbor nodes of the target node as the neighbor setNi , where the neighbor nodes include the target node itself, and represent the neighbor nodes in the neighbor setNi by the embedding vector
Figure BDA0003998902990000022
To update the embedding vector representation of the target node
Figure BDA0003998902990000023
The aggregation formula is as follows:

Figure BDA0003998902990000024
Figure BDA0003998902990000024

Figure BDA0003998902990000025
Figure BDA0003998902990000025

其中,W为可训练权重矩阵,将所述节点的嵌入向量表示映射到高层次的特征,σ均为非线性激活函数Sigmoid,计算方式如下:Among them, W is a trainable weight matrix that maps the embedding vector representation of the node to high-level features, and σ is a nonlinear activation function Sigmoid, which is calculated as follows:

Figure BDA0003998902990000026
Figure BDA0003998902990000026

aij为节点j对节点i的重要性,计算方式如下:aij is the importance of node j to node i, and is calculated as follows:

Figure BDA0003998902990000031
Figure BDA0003998902990000031

其中,a为一个线性层,将向量转换为数值,LeakyReLU为非线性激活函数,表示如下:Among them, a is a linear layer that converts the vector into a numerical value, and LeakyReLU is a nonlinear activation function, which is expressed as follows:

Figure BDA0003998902990000032
Figure BDA0003998902990000032

其中,p为系数;Where p is the coefficient;

S3、基于所述实体对所述图自注意力卷积神经网络进行训练,根据梯度下降更新网络参数,采用贝叶斯个性化排名作为监督学习的目标函数,表达式如下:S3. Based on the entity, the graph self-attention convolutional neural network is trained, the network parameters are updated according to the gradient descent, and the Bayesian personalized ranking is used as the objective function of supervised learning. The expression is as follows:

Figure BDA0003998902990000033
Figure BDA0003998902990000033

其中,(x,y,y-)为知识图谱的节点x构建训练的三元组,y为另一知识图谱中与节点x预对齐的节点,y-为除x、y以外随机采样的任一节点;Where (x, y, y- ) is the triplet for training constructed by node x of the knowledge graph, y is the node pre-aligned with node x in another knowledge graph, and y- is any randomly sampled node except x and y;

S4、基于图结构多视图增强方法进行无监督训练,具体为:S4. Unsupervised training based on graph structure multi-view enhancement method, specifically:

通过对编码器网络参数θ进行扰动得到扰动网络参数θ′,将同一知识图谱节点输入网络与扰动网络中得到节点的两种视图表示h,h′,表示如下:By perturbing the encoder network parameter θ to obtain the perturbation network parameter θ′, the same knowledge graph node is input into the network and the perturbation network to obtain two view representations h and h′ of the node, which are expressed as follows:

h=f(N;θ),h′=f(N;θ′);h=f(N; θ), h′=f(N; θ′);

对编码器进行扰动的方式为:;The way to perturb the encoder is:

θ′l=θl+η·Δθlθ′ll +η·Δθl ;

Figure BDA0003998902990000034
Figure BDA0003998902990000034

其中,θl和θ′l分别为第l层图自注意力卷积神经网络的参数和第l层扰动图自注意力卷积神经网络的参数,η为可调节的扰动强度超参数,Δθl为均值为零和方差为

Figure BDA0003998902990000035
的高斯分布的扰动项;Where θl and θ′l are the parameters of the l-th layer graph self-attention convolutional neural network and the l-th layer perturbation graph self-attention convolutional neural network, respectively, η is an adjustable perturbation intensity hyperparameter, Δθl is a zero mean and variance
Figure BDA0003998902990000035
The disturbance term of the Gaussian distribution;

将InfoNCE作为目标优化函数来拉近同一节点的原表示和扰动表示,推远与其他节点的扰动表示,表示如下:InfoNCE is used as the target optimization function to bring the original representation and perturbation representation of the same node closer, and push the perturbation representation of other nodes further away, as shown below:

Figure BDA0003998902990000036
Figure BDA0003998902990000036

其中,N为一个训练批次中节点的数目,sim为余弦相似度,

Figure BDA0003998902990000037
Figure BDA0003998902990000038
τ为可调节参数;Where N is the number of nodes in a training batch, sim is the cosine similarity,
Figure BDA0003998902990000037
Figure BDA0003998902990000038
τ is an adjustable parameter;

S5、重复步骤S2~S5,直至目标函数的值收敛或者达到预先设定的训练次数。S5. Repeat steps S2 to S5 until the value of the objective function converges or reaches a preset number of training times.

进一步,将所述实体对集合S以4:1的比例划分为训练集与验证集。Furthermore, the entity pair set S is divided into a training set and a validation set in a ratio of 4:1.

进一步,使用两所述知识图谱中节点较少的知识图谱的未对齐节点作为测试集。Furthermore, the unaligned nodes of the knowledge graph with fewer nodes in the two knowledge graphs are used as a test set.

进一步,通过所述图自注意力卷积神经网络得到各所述节点的嵌入向量表示,计算所述知识图谱间各节点的相似性,将最相似的两个节点作为对齐节点,具体为:Furthermore, the embedded vector representation of each node is obtained through the graph self-attention convolutional neural network, the similarity of each node between the knowledge graphs is calculated, and the two most similar nodes are used as alignment nodes, specifically:

通过所述图自注意力卷积神经网络计算所述知识图谱各节点的嵌入向量表示;Calculate the embedding vector representation of each node of the knowledge graph through the graph self-attention convolutional neural network;

根据欧几里得范数计算两个知识图谱间各节点表示的相似性:Calculate the similarity of each node representation between two knowledge graphs based on the Euclidean norm:

Figure BDA0003998902990000041
Figure BDA0003998902990000041

取最相似的两个节点作为对齐节点。Take the two most similar nodes as the alignment nodes.

进一步,还包括对有标签的训练集和验证集节点进行性能统计,对无标签的测试集节点进行人工检查,判断是否有效,具体为:Furthermore, it also includes performance statistics for labeled training set and validation set nodes, and manual inspection of unlabeled test set nodes to determine whether they are effective, specifically:

对有标签的训练集和验证集节点计算其Hits@1、Hits@10:Calculate Hits@1 and Hits@10 for labeled training and validation set nodes:

Figure BDA0003998902990000042
Figure BDA0003998902990000042

其中,S为对齐节点对的集合,|S|为对齐节点对的个数,ranki为第i个对齐节点对的链接预测排名,I为判断函数,若为真值,则I=1,否则,I=0;Where S is the set of aligned node pairs, |S| is the number of aligned node pairs, ranki is the link prediction ranking of the i-th aligned node pair, and I is the judgment function. If it is a true value, I = 1, otherwise, I = 0;

对无标签的测试集节点进行人工检查,判断是否有效。Manually check the unlabeled test set nodes to determine whether they are valid.

本发明具有如下有益效果:The present invention has the following beneficial effects:

本发明通过训练好的网络实体对齐模型,关联不同业务系统等价实体,预测多个电力业务系统中指向真实世界中的同一对象的数据的对应情况,保障不同数据源数据的可靠性。本发明结合图自注意力卷积神经网络、监督学习和扰动视角对比学习,相比于现有技术能够更好的对两个知识图谱中的节点进行对齐。The present invention associates equivalent entities of different business systems through a trained network entity alignment model, predicts the correspondence of data pointing to the same object in the real world in multiple power business systems, and ensures the reliability of data from different data sources. Compared with the prior art, the present invention can better align nodes in two knowledge graphs by combining graph self-attention convolutional neural networks, supervised learning, and perturbation perspective comparative learning.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明方法的流程图。FIG. 1 is a flow chart of the method of the present invention.

图2为本发明实施例的图自注意力卷积神经网络训练过程示意图。Figure 2 is a schematic diagram of the self-attention convolutional neural network training process of an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图和具体实施例来对本发明进行详细的说明。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

参考图1-2,一种基于图卷积神经网络的电力业务系统数据对齐方法,包括:Referring to FIG1-2, a method for aligning data of a power business system based on a graph convolutional neural network includes:

获取两待对齐电力业务系统的设备台账信息数据,对所述设备台账信息数据进行清洗、预处理;Acquire equipment ledger information data of two power business systems to be aligned, and clean and pre-process the equipment ledger information data;

分别获取各电力业务系统中的实体及所述实体间的关系,根据所述实体及所述实体间的关系构建知识图谱,获取两所述知识图谱间预对齐的实体对,其中,所述实体为所述知识图谱的节点,所述实体间的关系为所述知识图谱的边;Respectively obtain entities in each power business system and the relationships between the entities, construct a knowledge graph based on the entities and the relationships between the entities, and obtain pre-aligned entity pairs between the two knowledge graphs, wherein the entities are nodes of the knowledge graph and the relationships between the entities are edges of the knowledge graph;

将所述知识图谱输入图自注意力卷积神经网络进行训练,将所述实体对作为对齐种子,作为所述图自注意力卷积神经网络训练时的监督信息;Inputting the knowledge graph into a graph self-attention convolutional neural network for training, and using the entity pairs as alignment seeds and as supervision information during the training of the graph self-attention convolutional neural network;

通过所述图自注意力卷积神经网络得到各所述节点的嵌入向量表示,计算所述知识图谱间各节点的相似性,将最相似的两个节点作为对齐节点;Obtaining an embedded vector representation of each of the nodes through the graph self-attention convolutional neural network, calculating the similarity of each node between the knowledge graphs, and taking the two most similar nodes as alignment nodes;

根据对齐节点对待对齐电力业务系统实体的属性数据进行改写。The attribute data of the electric power business system entity to be aligned is rewritten according to the alignment node.

在一种具体实施例中,对所述设备台账信息数据进行清洗、预处理,具体为剔除所述设备台账信息数据中损坏的数据,将剩余的数据处理为CSV格式。In a specific embodiment, the equipment inventory information data is cleaned and preprocessed, specifically, damaged data in the equipment inventory information data is removed, and the remaining data is processed into a CSV format.

在本发明的一种实施方式中,根据所述实体及所述实体间的关系构建知识图谱,获取两所述知识图谱间预对齐的实体对,具体为:In one embodiment of the present invention, a knowledge graph is constructed according to the entities and the relationships between the entities, and entity pairs pre-aligned between the two knowledge graphs are obtained, specifically:

提取知识图谱的节点,具体为将所述设备台账信息数据中用于区分实体的属性作为所述知识图谱的节点,将所述实体的文本信息字段作为所述节点的文本特征;Extracting nodes of the knowledge graph, specifically, using the attributes used to distinguish entities in the equipment ledger information data as nodes of the knowledge graph, and using the text information field of the entity as text features of the node;

通过语言模型提取所述文本特征的语义信息,得到节点的嵌入向量表示;Extracting semantic information of the text features through a language model to obtain an embedded vector representation of the node;

根据所述实体间的关系构造所述节点的边;Constructing edges of the nodes according to the relationships between the entities;

得到两待对齐的知识图谱GkGet two knowledge graphs Gk to be aligned:

Gk={Ek,Rk,Tk};Gk = {Ek , Rk , Tk };

其中,k=1或2,Ek、Rk和Tk分别为所述知识图谱中节点的集合、实体关系的集合和三元组<e1,r,e2>,e1,e2∈E,r∈R,r为实体e1和实体e2之间的关系;Wherein, k=1 or 2, Ek , Rk and Tk are respectively the set of nodes, the set of entity relations and the triple <e1 , r, e2 > in the knowledge graph, e1 , e2 ∈E, r∈R, r is the relation between entity e1 and entity e2 ;

根据所述实体具有唯一值的属性对实体进行预对齐,得到实体对,实体对集合S为:The entities are pre-aligned according to the attributes of the entities with unique values to obtain entity pairs. The entity pair set S is:

Figure BDA0003998902990000051
Figure BDA0003998902990000051

其中,x、y分别为知识图谱G1、G2的实体节点。Among them, x and y are the entity nodes of the knowledge graphsG1 andG2 respectively.

在一种具体实施例中,所述语言模型为LaBSE模型。In a specific embodiment, the language model is a LaBSE model.

在本发明的一种实施方式中,将所述知识图谱输入图自注意力卷积神经网络进行训练,将所述实体对作为对齐种子,作为所述图自注意力卷积神经网络训练时的监督信息,具体为:In one embodiment of the present invention, the knowledge graph is input into a graph self-attention convolutional neural network for training, and the entity pair is used as an alignment seed and as supervision information during the training of the graph self-attention convolutional neural network, specifically:

S1、将所述实体对集合S以预设比例划分为训练集与验证集,测试集为未对齐的节点;S1, dividing the entity pair set S into a training set and a validation set according to a preset ratio, and the test set is the unaligned nodes;

使用单层图注意力卷积神经网络对知识图谱中的各节点的邻居信息进行聚合操作,具体为:A single-layer graph attention convolutional neural network is used to aggregate the neighbor information of each node in the knowledge graph, specifically:

S2、随机采样目标节点的20个邻居节点作为邻居集合Ni,所述邻居节点包含目标节点本身,通过所述邻居集合Ni中各邻居节点的嵌入向量表示

Figure BDA0003998902990000052
来更新目标节点的嵌入向量表示
Figure BDA0003998902990000053
聚合公式如下:S2. Randomly sample 20 neighbor nodes of the target node as the neighbor setNi , where the neighbor nodes include the target node itself, and represent the neighbor nodes in the neighbor setNi by the embedding vector
Figure BDA0003998902990000052
To update the embedding vector representation of the target node
Figure BDA0003998902990000053
The aggregation formula is as follows:

Figure BDA0003998902990000054
Figure BDA0003998902990000054

Figure BDA0003998902990000061
Figure BDA0003998902990000061

其中,W为可训练权重矩阵,将所述节点的嵌入向量表示映射到高层次的特征,σ均为非线性激活函数Sigmoid,计算方式如下:Among them, W is a trainable weight matrix that maps the embedding vector representation of the node to high-level features, and σ is a nonlinear activation function Sigmoid, which is calculated as follows:

Figure BDA0003998902990000062
Figure BDA0003998902990000062

aij为节点j对节点i的重要性,计算方式如下:aij is the importance of node j to node i, and is calculated as follows:

Figure BDA0003998902990000063
Figure BDA0003998902990000063

其中,a为一个线性层,将向量转换为数值,LeakyReLU为非线性激活函数,表示如下:Among them, a is a linear layer that converts the vector into a numerical value, and LeakyReLU is a nonlinear activation function, which is expressed as follows:

Figure BDA0003998902990000064
Figure BDA0003998902990000064

其中,p为系数;Where p is the coefficient;

S3、基于所述实体对所述图自注意力卷积神经网络进行训练,根据梯度下降更新网络参数,采用贝叶斯个性化排名作为监督学习的目标函数,表达式如下:S3. Based on the entity, the graph self-attention convolutional neural network is trained, the network parameters are updated according to the gradient descent, and the Bayesian personalized ranking is used as the objective function of supervised learning. The expression is as follows:

Figure BDA0003998902990000065
Figure BDA0003998902990000065

其中,(x,y,y-)为知识图谱的节点x构建训练的三元组,y为另一知识图谱中与节点x预对齐的节点,y-为除x、y以外随机采样的任一节点;Where (x, y, y- ) is the triplet for training constructed by node x of the knowledge graph, y is the node pre-aligned with node x in another knowledge graph, and y- is any randomly sampled node except x and y;

S4、基于图结构多视图增强方法进行无监督训练,具体为:S4. Unsupervised training based on graph structure multi-view enhancement method, specifically:

通过对编码器网络参数θ进行扰动得到扰动网络参数θ′,将同一知识图谱节点输入网络与扰动网络中得到节点的两种视图表示h,h′,表示如下:By perturbing the encoder network parameter θ to obtain the perturbation network parameter θ′, the same knowledge graph node is input into the network and the perturbation network to obtain two view representations h and h′ of the node, which are expressed as follows:

h=f(N;θ),h′=f(N;θ′);h=f(N; θ), h′=f(N; θ′);

对编码器进行扰动的方式为:;The way to perturb the encoder is:

θ′l=θl+η·Δθlθ′ll +η·Δθl ;

Figure BDA0003998902990000066
Figure BDA0003998902990000066

其中,θl和θ′l分别为第l层图自注意力卷积神经网络的参数和第l层扰动图自注意力卷积神经网络的参数,η为可调节的扰动强度超参数,Δθl为均值为零和方差为

Figure BDA0003998902990000069
的高斯分布的扰动项;Where θl and θ′l are the parameters of the l-th layer graph self-attention convolutional neural network and the l-th layer perturbation graph self-attention convolutional neural network, respectively, η is an adjustable perturbation intensity hyperparameter, Δθl is a zero mean and variance
Figure BDA0003998902990000069
The Gaussian distribution of disturbance term;

将InfoNCE作为目标优化函数来拉近同一节点的原表示和扰动表示,推远与其他节点的扰动表示,表示如下:InfoNCE is used as the target optimization function to bring the original representation and perturbation representation of the same node closer, and push the perturbation representation of other nodes further away, as shown below:

Figure BDA0003998902990000067
Figure BDA0003998902990000067

其中,N为一个训练批次中节点的数目,sim为余弦相似度,

Figure BDA0003998902990000068
Figure BDA0003998902990000071
τ为可调节参数;Where N is the number of nodes in a training batch, sim is the cosine similarity,
Figure BDA0003998902990000068
Figure BDA0003998902990000071
τ is an adjustable parameter;

通过对所述图自注意力卷积神经网络的参数进行扰动得到一个扰动后的图自注意力卷积神经网络,节点在原图自注意力卷积神经网络下与扰动后的图自注意力卷积神经网络下分别输出节点的原视图与扰动视图表示,通过拉近同一个节点的原视图表示与节点的扰动视图表示,拉远与其他节点的扰动视图表示,能够得到更具鲁棒性的图自注意力卷积神经网络;A perturbed graph self-attention convolutional neural network is obtained by perturbing the parameters of the graph self-attention convolutional neural network, wherein the node outputs the original view and the perturbed view representation of the node under the original graph self-attention convolutional neural network and the perturbed graph self-attention convolutional neural network respectively, and a more robust graph self-attention convolutional neural network can be obtained by bringing the original view representation and the perturbed view representation of the same node closer and moving the perturbed view representations from other nodes further away;

S5、重复步骤S2~S5,直至目标函数的值收敛或者达到预先设定的训练次数。S5. Repeat steps S2 to S5 until the value of the objective function converges or reaches a preset number of training times.

在一种具体的实施例中,将所述实体对集合S以4:1的比例划分为训练集与验证集。In a specific embodiment, the entity pair set S is divided into a training set and a validation set in a ratio of 4:1.

在本发明的一种实施方式中,使用两所述知识图谱中节点较少的知识图谱的未对齐节点作为测试集。In one embodiment of the present invention, unaligned nodes of the knowledge graph with fewer nodes in the two knowledge graphs are used as test sets.

由于节点多的知识图谱会有大量剩余未对齐节点,这些节点在节点少的知识图谱中无真实的对齐实体,因此使用节点少的知识图谱的未对齐节点作为测试集。Since knowledge graphs with many nodes will have a large number of residual unaligned nodes, which have no real aligned entities in knowledge graphs with few nodes, the unaligned nodes of knowledge graphs with few nodes are used as test sets.

在本发明的一种实施方式中,通过所述图自注意力卷积神经网络得到各所述节点的嵌入向量表示,计算所述知识图谱间各节点的相似性,将最相似的两个节点作为对齐节点,具体为:In one embodiment of the present invention, the embedded vector representation of each node is obtained by the graph self-attention convolutional neural network, the similarity of each node between the knowledge graphs is calculated, and the two most similar nodes are used as alignment nodes, specifically:

通过所述图自注意力卷积神经网络计算所述知识图谱各节点的嵌入向量表示;Calculate the embedding vector representation of each node of the knowledge graph through the graph self-attention convolutional neural network;

根据欧几里得范数计算两个知识图谱间各节点表示的相似性:Calculate the similarity of each node representation between two knowledge graphs based on the Euclidean norm:

Figure BDA0003998902990000072
Figure BDA0003998902990000072

取最相似的两个节点作为对齐节点。Take the two most similar nodes as the alignment nodes.

在本发明的一种实施方式中,还包括对有标签的训练集和验证集节点进行性能统计,对无标签的测试集节点进行人工检查,判断是否有效,具体为:In one embodiment of the present invention, it also includes performing performance statistics on the labeled training set and validation set nodes, and manually checking the unlabeled test set nodes to determine whether they are effective, specifically:

对有标签的训练集和验证集节点计算其Hits@1、Hits@10:Calculate Hits@1 and Hits@10 for labeled training and validation set nodes:

Figure BDA0003998902990000073
Figure BDA0003998902990000073

其中,S为对齐节点对的集合,|S|为对齐节点对的个数,ranki为第i个对齐节点对的链接预测排名,I为判断函数,若为真值,则I=1,否则,I=0;Where S is the set of aligned node pairs, |S| is the number of aligned node pairs, ranki is the link prediction ranking of the i-th aligned node pair, and I is the judgment function. If it is a true value, I = 1, otherwise, I = 0;

对无标签的测试集节点进行人工检查,判断是否有效。Manually check the unlabeled test set nodes to determine whether they are valid.

参考表1,将本发明方法与基于关系感知双图卷积网络的实体对齐技术进行比较,证明本发明方法结合图自注意力卷积神经网络、监督学习和扰动视角对比学习优于现有技术,本发明方法能够较好的对两个知识图谱中的节点进行对齐操作。Referring to Table 1, the method of the present invention is compared with the entity alignment technology based on the relationship-aware dual-graph convolutional network, which proves that the method of the present invention combined with the graph self-attention convolutional neural network, supervised learning and perturbation perspective contrast learning is superior to the existing technology. The method of the present invention can better align the nodes in the two knowledge graphs.

表1Table 1

技术technologyHit@1Hit@1Hit@10Hit@10RDGCNRDGCN0.450.450.650.65本发明方法Method of the present invention0.7110.7110.8720.872

以上所述仅为本发明的实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。The above descriptions are merely embodiments of the present invention and are not intended to limit the patent scope of the present invention. Any equivalent structure made using the contents of the present invention's specification and drawings, or directly or indirectly applied in other related technical fields, are also included in the patent protection scope of the present invention.

Claims (9)

1. A data alignment method for a power service system based on a graph convolution neural network is characterized by comprising the following steps:
acquiring equipment ledger information data of two power business systems to be aligned, and cleaning and preprocessing the equipment ledger information data;
respectively obtaining entities in each electric power service system and the relation between the entities, constructing a knowledge graph according to the entities and the relation between the entities, and obtaining a pre-aligned entity pair between the two knowledge graphs, wherein the entities are nodes of the knowledge graph, and the relation between the entities is an edge of the knowledge graph;
training the knowledge graph input graph self-attention convolution neural network, and taking the entity pair as an alignment seed as supervision information during the graph self-attention convolution neural network training;
obtaining embedded vector representation of each node through the graph self-attention convolution neural network, calculating the similarity of each node between the knowledge graphs, and taking two most similar nodes as alignment nodes;
and rewriting the attribute data of the power service system entity to be aligned according to the alignment node.
2. The method according to claim 1, wherein the equipment ledger information data is cleaned and preprocessed, specifically, damaged data in the equipment ledger information data is removed, and the remaining data is processed into a CSV format.
3. The method according to claim 1, wherein a knowledge graph is constructed according to the entities and the relationship between the entities, and a pre-aligned pair of entities between the two knowledge graphs is obtained, specifically:
extracting nodes of a knowledge graph, specifically, taking attributes used for distinguishing entities in the equipment standing book information data as the nodes of the knowledge graph, and taking text information fields of the entities as text characteristics of the nodes;
extracting semantic information of the text features through a language model to obtain embedded vector representation of the nodes;
constructing edges of the nodes according to the relation between the entities;
obtaining two knowledge graphs G to be alignedk
Gk ={Ek ,Rk ,Tk };
Wherein k =1 or 2,Ek 、Rk And Tk Respectively a set of nodes, a set of entity relationships and a triple in the knowledge-graph<e1 ,r,e2 >,e1 ,e2 Belongs to E, R belongs to R, and R is an entity E1 And entity e2 The relationship between them;
and pre-aligning the entities according to the attribute with the unique value of the entity to obtain entity pairs, wherein an entity pair set S is as follows:
Figure FDA0003998902980000011
wherein x and y are knowledge maps G respectively1 、G2 The physical node of (1).
4. The graph convolution neural network-based power service system data alignment method of claim 3, wherein the language model is a LaBSE model.
5. The method according to claim 4, wherein the knowledge-graph input graph self-attention convolutional neural network is trained, and the entity pair is used as an alignment seed as supervision information for the graph self-attention convolutional neural network training, specifically:
s1, dividing the entity pair set S into a training set and a verification set according to a preset proportion, wherein the testing set is a non-aligned node;
using a single-layer graph attention convolution neural network to perform aggregation operation on neighbor information of each node in the knowledge graph, specifically:
s2, randomly sampling 20 neighbor nodes of target node as neighbor set Ni The neighbor node comprises the target node itself, and the neighbor set N is passed throughi Embedded vector representation of each neighbor node in the tree
Figure FDA0003998902980000021
To update the embedded vector representation @ofthe target node>
Figure FDA0003998902980000022
The polymerization formula is as follows:
Figure FDA0003998902980000023
Figure FDA0003998902980000024
wherein, W is a trainable weight matrix, the embedded vector representation of the node is mapped to the high-level feature, σ is a nonlinear activation function Sigmoid, and the calculation mode is as follows:
Figure FDA0003998902980000025
aij for the importance of node j to node i, the calculation is as follows:
Figure FDA0003998902980000026
where a is a linear layer, which converts vectors into values, and LeakyReLU is a nonlinear activation function, which is expressed as follows:
Figure FDA0003998902980000027
wherein p is an adjustable coefficient;
s3, training the graph self-attention convolution neural network based on the entity, updating network parameters according to gradient descent, and adopting Bayes personalized ranking as a target function of supervised learning, wherein the expression is as follows:
Figure FDA0003998902980000028
wherein (x, y)- ) Constructing a training triplet for a node x of a knowledge-graph, y being a node in another knowledge-graph that is pre-aligned with the node x, y- Is any node sampled randomly except x and y;
s4, carrying out unsupervised training based on the graph structure multi-view enhancement method, specifically:
obtaining a disturbance network parameter theta 'by disturbing the encoder network parameter theta, and further obtaining a node representation h, h' of the same node x in the knowledge graph under the network and the disturbance network, wherein the node representation h is represented as follows:
h=f(x;θ),h′=f(x;θ′);
the method for disturbing the encoder comprises the following steps: (ii) a
θl ′=θl +η·Δθl
Figure FDA0003998902980000031
Wherein, thetal And thetal ' parameters of the l-th layer image self-attention convolution neural network and parameters of the l-th layer disturbance image self-attention convolution neural network respectively, eta is an adjustable disturbance intensity hyper-parameter, and delta thetal Is that the mean is zero and the variance is
Figure FDA0003998902980000032
Gaussian distribution of
Figure FDA0003998902980000033
The perturbation term of (1);
pulling the original representation h of the same node n to be close by taking InfonCE as an objective optimization functionn And disturbance represents h'n The perturbation representation of the remote node and other nodes is expressed as follows:
Figure FDA0003998902980000034
wherein N is the number of nodes in a training batch, N' is a node other than node N, sim is cosine similarity,
Figure FDA0003998902980000035
tau is an adjustable parameter;
and S5, repeating the steps S2-S5 until the value of the target function converges or reaches the preset training times.
6. The method for data alignment of power service system based on graph-rolling neural network as claimed in claim 5, wherein the entity pair set S is divided into training set and validation set in a ratio of 4.
7. The graph convolution neural network-based power service system data alignment method of claim 5, wherein unaligned nodes of a knowledge graph with fewer nodes in the two knowledge graphs are used as a test set.
8. The method according to claim 5, wherein the graph self-attention convolutional neural network is used to obtain an embedded vector representation of each node, calculate the similarity of each node between the knowledge graphs, and use two most similar nodes as aligned nodes, specifically:
calculating an embedded vector representation of each node of the knowledge-graph by the graph self-attention convolutional neural network;
and calculating the similarity of each node representation between the two knowledge graphs according to the Euclidean norm:
Figure FDA0003998902980000036
and taking the two most similar nodes as the alignment nodes.
9. The method for aligning data of an electric power service system based on a convolutional neural network of claim 5, further comprising performing performance statistics on the labeled training set and verification set nodes, and performing manual inspection on the unlabeled test set nodes to determine whether the labeled training set and verification set nodes are valid, specifically:
calculating the Hits @1 and Hits @10 of the labeled training set and the labeled verification set nodes:
Figure FDA0003998902980000037
wherein S is the set of aligned node pairs, | S | is the number of aligned node pairs, ranki Predicting and ranking the link of the ith alignment node pair, wherein I is a judgment function, if the I is a true value, I =1, and otherwise, I =0;
and manually checking the non-labeled test set nodes to judge whether the nodes are effective or not.
CN202211606845.6A2022-12-142022-12-14 A data alignment method for power business system based on graph convolutional neural networkActiveCN115935941B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202211606845.6ACN115935941B (en)2022-12-142022-12-14 A data alignment method for power business system based on graph convolutional neural network

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202211606845.6ACN115935941B (en)2022-12-142022-12-14 A data alignment method for power business system based on graph convolutional neural network

Publications (2)

Publication NumberPublication Date
CN115935941Atrue CN115935941A (en)2023-04-07
CN115935941B CN115935941B (en)2025-07-04

Family

ID=86553592

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202211606845.6AActiveCN115935941B (en)2022-12-142022-12-14 A data alignment method for power business system based on graph convolutional neural network

Country Status (1)

CountryLink
CN (1)CN115935941B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117194728A (en)*2023-09-072023-12-08广东电网有限责任公司 A business data degree distribution analysis method and device based on graph theory
CN119962436A (en)*2025-01-142025-05-09山东大学 Grouting simulation and pre-control decision-making method and system based on cross-scale data combination

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109902171A (en)*2019-01-302019-06-18中国地质大学(武汉) Method and system for text relation extraction based on hierarchical knowledge graph attention model
CN111931505A (en)*2020-05-222020-11-13北京理工大学Cross-language entity alignment method based on subgraph embedding
CN113761221A (en)*2021-06-302021-12-07中国人民解放军32801部队Knowledge graph entity alignment method based on graph neural network
CN113807520A (en)*2021-11-162021-12-17北京道达天际科技有限公司Knowledge graph alignment model training method based on graph neural network
WO2022022045A1 (en)*2020-07-272022-02-03平安科技(深圳)有限公司Knowledge graph-based text comparison method and apparatus, device, and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109902171A (en)*2019-01-302019-06-18中国地质大学(武汉) Method and system for text relation extraction based on hierarchical knowledge graph attention model
CN111931505A (en)*2020-05-222020-11-13北京理工大学Cross-language entity alignment method based on subgraph embedding
WO2022022045A1 (en)*2020-07-272022-02-03平安科技(深圳)有限公司Knowledge graph-based text comparison method and apparatus, device, and storage medium
CN113761221A (en)*2021-06-302021-12-07中国人民解放军32801部队Knowledge graph entity alignment method based on graph neural network
CN113807520A (en)*2021-11-162021-12-17北京道达天际科技有限公司Knowledge graph alignment model training method based on graph neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孟鹏博;: "基于图神经网络的实体对齐研究综述", 现代计算机, no. 09, 25 March 2020 (2020-03-25)*

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117194728A (en)*2023-09-072023-12-08广东电网有限责任公司 A business data degree distribution analysis method and device based on graph theory
CN119962436A (en)*2025-01-142025-05-09山东大学 Grouting simulation and pre-control decision-making method and system based on cross-scale data combination

Also Published As

Publication numberPublication date
CN115935941B (en)2025-07-04

Similar Documents

PublicationPublication DateTitle
CN112966114B (en)Literature classification method and device based on symmetrical graph convolutional neural network
CN110347932B (en)Cross-network user alignment method based on deep learning
CN111079847B (en)Remote sensing image automatic labeling method based on deep learning
CN110909926A (en)TCN-LSTM-based solar photovoltaic power generation prediction method
CN110210486A (en)A kind of generation confrontation transfer learning method based on sketch markup information
CN112862093B (en)Graphic neural network training method and device
CN110209859A (en)The method and apparatus and electronic equipment of place identification and its model training
CN109767312A (en) A credit evaluation model training and evaluation method and device
CN115935941A (en)Electric power service system data alignment method based on graph convolution neural network
CN113139586B (en)Model training method, device abnormality diagnosis method, electronic device, and medium
CN114677535A (en) Training method, image classification method and device for domain adaptive image classification network
CN111046961A (en)Fault classification method based on bidirectional long-and-short-term memory unit and capsule network
CN106569954A (en)Method based on KL divergence for predicting multi-source software defects
CN113869333B (en)Image identification method and device based on semi-supervised relationship measurement network
CN114116692B (en)Mask and bidirectional model-based missing POI track completion method
CN111768792A (en) Audio Steganalysis Method Based on Convolutional Neural Network and Domain Adversarial Learning
CN111488498A (en) &#34;Node-Graph&#34; Cross-layer Graph Matching Method and System Based on Graph Neural Network
CN113655341B (en)Fault positioning method and system for power distribution network
WO2025087218A1 (en)Method and system for detecting industrial internet abnormal node, medium, and device
CN114139593A (en)Training method and device for Deviational graph neural network and electronic equipment
CN117197451A (en)Remote sensing image semantic segmentation method and device based on domain self-adaption
CN111461229B (en)Deep neural network optimization and image classification method based on target transfer and line search
CN117079017A (en)Credible small sample image identification and classification method
CN114580388A (en)Data processing method, object prediction method, related device and storage medium
CN112307914A (en) A method for open domain image content recognition based on text information guidance

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp