技术领域Technical Field
本发明涉及一种学习鉴别性语义和多视角上下文的跨域小样本关系抽取方法和装置,属于人工智能自然语言处理领域。The present invention relates to a cross-domain small sample relationship extraction method and device for learning discriminative semantics and multi-view context, and belongs to the field of artificial intelligence natural language processing.
背景技术Background technique
近年来,随着互联网的普及以及信息技术的高速发展,在互联网中涌现出了海量的非结构化的文本数据。如新闻文章,研究出版物,博客,问答论坛和社交媒体的形式生成大量的数字文本。文本作为普遍的知识载体,有丰富的知识蕴含在些海量的文本数据中。面对如此多的数据,如何高效准确的从中提取我们想要的信息变得有意义。关系抽取技术旨在提取文本中两个指定实体之间的关系,该技术可以帮助我们从中发现事物和各种概念之间的联系,从而使人类的知识快速增长。In recent years, with the popularization of the Internet and the rapid development of information technology, a huge amount of unstructured text data has emerged on the Internet. For example, a large amount of digital text is generated in the form of news articles, research publications, blogs, question-and-answer forums, and social media. As a universal knowledge carrier, text contains rich knowledge in these massive text data. Faced with so much data, how to efficiently and accurately extract the information we want from it becomes meaningful. Relation extraction technology aims to extract the relationship between two specified entities in the text. This technology can help us discover the connection between things and various concepts, thereby rapidly growing human knowledge.
传统的关系抽取技术以数据驱动的方法为主,即在大量的标注好的训练数据上训练关系抽取模型来学习各种关系的含义,进而使用训练好的模型去抽取文本中的关系。然而,数据驱动的模型严重依赖大量的标注数据,在现实中由于数据标注费时费力,所以往往没有足够的标注数据训练模型。这导致传统的模型学习不到足够的知识从而无法完成关系抽取任务。如何从已有标注数据的关系类别中学习知识并泛化到新类从而可以抽取到新类别的关系变得有意义。因此,小样本关系抽取技术变得越来越重要。然而,在某些特殊的重要专业领域(医学、金融等),由于受专业知识的限制,所能获取到的标注数据更加有限。这便需要从易获得标注数据的公共领域学习知识然后泛化到特定的专业领域的新类中。而不同的专业领域具有不同的语言特征差异,这导致在源域上训练的模型不能有效地理解目标域中的关系,降低了关系抽取的性能。因此,如何通过训练源域中的关系实例来提取目标域的关系成为一个重要的研究方向。Traditional relation extraction technology is mainly based on data-driven methods, that is, training relation extraction models on a large amount of labeled training data to learn the meaning of various relations, and then using the trained models to extract relations in texts. However, data-driven models rely heavily on a large amount of labeled data. In reality, because data labeling is time-consuming and laborious, there is often not enough labeled data to train models. This results in traditional models not being able to learn enough knowledge and thus being unable to complete relation extraction tasks. How to learn knowledge from relation categories of existing labeled data and generalize to new categories so that relations of new categories can be extracted becomes meaningful. Therefore, small sample relation extraction technology has become increasingly important. However, in some special and important professional fields (medicine, finance, etc.), due to the limitations of professional knowledge, the labeled data that can be obtained is more limited. This requires learning knowledge from the public domain where labeled data is easily available and then generalizing it to new categories in specific professional fields. Different professional fields have different language feature differences, which results in the model trained on the source domain being unable to effectively understand the relations in the target domain, reducing the performance of relation extraction. Therefore, how to extract relations in the target domain by training relation instances in the source domain has become an important research direction.
近年来,随着深度学习技术的发展,以元学习为框架的多种跨域小样本关系抽取方法取得了一定的突破。它使模型在大规模数据集(源域)中采样的多个关系抽取任务中学习跨任务的元知识并将其应用在目标域中。原型网络是一种基于元学习的有效算法。它将同类样本的类心作为关系的原型并将待预测样本归类到与其距离最相似的原型中。为了提高关系原型的质量,一些工作通过预训练来学习大量的先验知识,这可以帮助理解关系的含义。然而,先验知识并不直接与关系相关,这只能为关系抽取提供有限的帮助。因此,一些工作直接利用关系的描述信息来学习高质量的原型。In recent years, with the development of deep learning technology, various cross-domain small sample relation extraction methods based on meta-learning have made certain breakthroughs. It enables the model to learn cross-task meta-knowledge in multiple relation extraction tasks sampled from large-scale datasets (source domain) and apply it in the target domain. Prototype network is an effective algorithm based on meta-learning. It takes the class center of samples of the same type as the prototype of the relationship and classifies the samples to be predicted into the prototype with the most similar distance to it. In order to improve the quality of relation prototypes, some works learn a large amount of prior knowledge through pre-training, which can help understand the meaning of the relationship. However, prior knowledge is not directly related to the relationship, which can only provide limited help for relation extraction. Therefore, some works directly use the descriptive information of the relationship to learn high-quality prototypes.
这些工作大都使用实体为中心的匹配范式来训练模型区分不同的关系,这重点学习实体而不是上下文。然而,专业领域的实体往往具有相似的语义背景。在这种场景下,弱化上下文中的关系知识的模型难以区分实体语义相似的不同关系。因此,现有的方法仍需改进,以取得更好地区分实体语义相似的不同关系。Most of these works use the entity-centric matching paradigm to train models to distinguish different relations, which focuses on learning entities rather than context. However, entities in professional fields often have similar semantic backgrounds. In this scenario, models that weaken the relational knowledge in the context have difficulty distinguishing different relations with similar entity semantics. Therefore, existing methods still need to be improved to better distinguish different relations with similar entity semantics.
发明内容Summary of the invention
现有的域自适应小样本关系抽取方法使用实体为中心的匹配范式来训练模型区分不同的关系,这重点学习实体而不是上下文。由于模型的学习重心是实体语义,通过理解头尾实体类确定关系。当遇到具有相似实体语义的不同关系时,这种方法区分不同关系的效果受到限制。再加上这种方法本身弱化了上下文的学习,最终导致这种关系抽取方法难以区分实体语义相似的不同关系。Existing domain adaptive small sample relation extraction methods use entity-centric matching paradigm to train models to distinguish different relations, which focuses on learning entities rather than context. Since the model's learning focus is entity semantics, relations are determined by understanding the head and tail entity classes. When encountering different relations with similar entity semantics, the effectiveness of this method in distinguishing different relations is limited. In addition, this method itself weakens the learning of context, which ultimately makes it difficult for this relation extraction method to distinguish different relations with similar entity semantics.
为了解决上述问题,本发明提出了一种学习鉴别性语义和多视角上下文的跨域小样本关系抽取方法。该方法首先通过学习鉴别性的实体语义来减少由具有相似语义的实体带来的干扰。另外,为了加强模型学习上下文中的关系信息的能力,该方法还进行多视角的上下文学习。它显式的从上下文中提取局部信息和全局信息来构建多视角的上下文表征并将其作为关系信息。这可以有效地利用上下文信息来区分实体语义相似的不同关系。同时,该方法通过信息过滤机制来提取更全面的全局信息以有效学习上下文中的关系信息。In order to solve the above problems, the present invention proposes a cross-domain small sample relationship extraction method that learns discriminative semantics and multi-view context. The method first reduces the interference caused by entities with similar semantics by learning discriminative entity semantics. In addition, in order to enhance the model's ability to learn relational information in context, the method also performs multi-view context learning. It explicitly extracts local information and global information from the context to construct a multi-view context representation and uses it as relational information. This can effectively use contextual information to distinguish different relations with similar entity semantics. At the same time, the method extracts more comprehensive global information through an information filtering mechanism to effectively learn relational information in context.
本发明中提出的学习鉴别性语义和多视角的上下文的跨域小样本关系抽取方法,通过放大不同实体间的语义差异并以实体信息为基础来充分挖掘上下文中的关系信息。该方法将BERT预训练语言模型作为特征提取器。为了准确的提取句子中的实体信息,该方法使用虚拟标记来强调句子中的实体并将虚拟标记对应的隐状态作为实体信息。其中的虚拟标记可使用BERT中预留的未定义的标记。为了学习到鉴别性的实体语义,该方法使用语义提示模板来提取实体的语义。这可以引导模型充分利用自身包含的预训练知识来表示实体,从而更准确全面的获取实体语义。获取实体语义之后,本发明通过语义对比学习来放大不同实体之间的语义差异。这可以使模型具有学习鉴别性实体语义的能力从而有效区分具有相似实体语义的不同关系。The cross-domain small sample relationship extraction method for learning discriminative semantics and multi-perspective context proposed in the present invention fully mines the relationship information in the context by amplifying the semantic differences between different entities and based on entity information. The method uses the BERT pre-trained language model as a feature extractor. In order to accurately extract entity information in a sentence, the method uses virtual tags to emphasize the entities in the sentence and uses the hidden state corresponding to the virtual tags as entity information. The virtual tags can use undefined tags reserved in BERT. In order to learn discriminative entity semantics, the method uses semantic hint templates to extract the semantics of the entity. This can guide the model to make full use of the pre-trained knowledge contained in itself to represent entities, thereby obtaining entity semantics more accurately and comprehensively. After obtaining the entity semantics, the present invention amplifies the semantic differences between different entities through semantic contrast learning. This can enable the model to have the ability to learn discriminative entity semantics and effectively distinguish different relationships with similar entity semantics.
另外,该方法以实体信息为基础在上下文中挖掘可以帮助理解关系的全局信息和局部信息。局部信息通过进一步挖掘实体信息之间的深层次的关联关系得到。全局信息分别利用头实体和尾实体在上下文中挖掘可以帮助理解关系的信息。全局信息和局部信息相互促进从而可以进一步利用上下文中的关系信息来区分不同的关系类别。实体信息在挖掘上下文中的关系信息的过程中起到了引导作用,避免提取到与关系无关的信息。同时,在引导的过程中,为了避免过度专注实体信息从而提取到更全面的全局信息,该方法通过信息过滤机制过滤掉全局信息中的部分实体信息。具体的,信息过滤机制计算基于头实体和尾实体得到的上下文中的关系信息的相似度并将其作为权重。当权重值比较大时则表明所得到的关系信息中包含了较多的实体信息。此时需要以较大的比重过滤掉部分实体信息从而更有效的发挥提取到的关系信息的作用。最后,该方法利用一般的训练方法得到关系分类损失函数训练模型抽取各种类别的关系。In addition, the method mines global information and local information in the context based on entity information, which can help understand the relationship. Local information is obtained by further mining the deep association between entity information. Global information uses head entities and tail entities to mine information in the context that can help understand the relationship. Global information and local information promote each other, so that the relationship information in the context can be further used to distinguish different relationship categories. Entity information plays a guiding role in the process of mining relationship information in the context, avoiding the extraction of information irrelevant to the relationship. At the same time, in the process of guiding, in order to avoid over-focusing on entity information and thus extracting more comprehensive global information, the method filters out some entity information in the global information through an information filtering mechanism. Specifically, the information filtering mechanism calculates the similarity of the relationship information in the context obtained based on the head entity and the tail entity and uses it as a weight. When the weight value is relatively large, it means that the obtained relationship information contains more entity information. At this time, it is necessary to filter out some entity information with a larger proportion so as to more effectively play the role of the extracted relationship information. Finally, the method uses a general training method to obtain a relationship classification loss function training model to extract various categories of relationships.
本发明采用的技术方案如下:The technical solution adopted by the present invention is as follows:
一种学习鉴别性语义和多视角上下文的跨域小样本关系抽取方法,包括以下步骤:A cross-domain small-sample relation extraction method for learning discriminative semantics and multi-view contexts, comprising the following steps:
进行数据预处理,包括将语义提示模板拼接在数据集中每个句子的尾部;Perform data preprocessing, including concatenating the semantic hint template at the end of each sentence in the dataset;
构建跨域小样本关系抽取模型,所述跨域小样本关系抽取模型包含特征提取网络、语义对比学习网络、多视角上下文学习网络和关系分类网络,所述多视角上下文学习网络中包含信息过滤机制;Constructing a cross-domain small sample relationship extraction model, wherein the cross-domain small sample relationship extraction model includes a feature extraction network, a semantic contrast learning network, a multi-view context learning network and a relationship classification network, wherein the multi-view context learning network includes an information filtering mechanism;
利用数据集中的训练集,通过语义对比学习损失函数和关系分类损失函数训练所述跨域小样本关系抽取模型,并利用验证集获得最优模型;Using the training set in the data set, the cross-domain small sample relation extraction model is trained through the semantic contrast learning loss function and the relation classification loss function, and the optimal model is obtained using the validation set;
利用所述最优模型抽取目标域的句子中的关系。The optimal model is used to extract relations in sentences of the target domain.
进一步地,所述数据预处理还包括:使用虚拟标记强调实体在关系抽取中的作用,并将虚拟标记分别添加在头实体和尾实体的前后位置;所述虚拟标记使用BERT中预留的未定义的标记。Furthermore, the data preprocessing also includes: using virtual tags to emphasize the role of entities in relationship extraction, and adding virtual tags before and after the head entity and the tail entity respectively; the virtual tags use undefined tags reserved in BERT.
进一步地,所述语义对比学习网络的处理过程包括:首先使用所述语义提示模板提取实体的语义,然后通过语义对比学习来放大不同实体之间的语义差异,使模型具有学习鉴别性实体语义的能力从而有效区分具有相似实体语义的不同关系。Furthermore, the processing process of the semantic contrastive learning network includes: firstly, extracting the semantics of the entity using the semantic hint template, and then amplifying the semantic differences between different entities through semantic contrastive learning, so that the model has the ability to learn discriminative entity semantics and effectively distinguish different relationships with similar entity semantics.
进一步地,所述多视角上下文学习网络包含两个卷积神经网络,分别作为全局信息和局部信息的特征提取器;其中一个卷积神经网络接收实体信息和整个句子信息,以实体信息为基础从句子信息中提取能够帮助理解关系的上下文信息作为全局信息,并通过信息过滤机制过滤掉全局信息中的部分实体信息,过滤信息的多少由根据头尾实体提取到的关系信息的相似度权重决定,权重越高则过滤掉的实体信息越多;另一个卷积神经网络只接收头实体和尾实体信息,通过深层的特征提取得到实体间潜在的联系作为局部信息;然后将全局信息和局部信息融合得到最终的上下文中的关系信息,即多视角的上下文关系信息。Furthermore, the multi-view context learning network includes two convolutional neural networks, which serve as feature extractors for global information and local information respectively; one of the convolutional neural networks receives entity information and the entire sentence information, extracts contextual information that can help understand the relationship from the sentence information based on the entity information as global information, and filters out part of the entity information in the global information through an information filtering mechanism. The amount of filtered information is determined by the similarity weight of the relationship information extracted from the head and tail entities. The higher the weight, the more entity information is filtered out; the other convolutional neural network only receives the head entity and the tail entity information, and obtains the potential connection between the entities as local information through deep feature extraction; then the global information and the local information are fused to obtain the final relationship information in the context, that is, the multi-view contextual relationship information.
进一步地,所述关系分类网络将同类关系的关系表征的类心最为关系原型,并将待预测的关系归类到与其最相似的关系原型中,其中对于相似度的测量使用无参的度量函数。Furthermore, the relationship classification network uses the class center of the relationship representation of the same type of relationship as the relationship prototype, and classifies the relationship to be predicted into the relationship prototype that is most similar to it, wherein a parameter-free metric function is used to measure the similarity.
进一步地,所述语义对比学习损失函数使模型具有学习判别性实体表征的能力,通过对比学习拉近实体与其对应的语义在表征空间中的距离,并拉远不同实体语义在表征空间中的距离;所述语义对比学习损失函数表示如下:Furthermore, the semantic contrastive learning loss function enables the model to learn discriminative entity representations, shortens the distance between entities and their corresponding semantics in the representation space through contrastive learning, and increases the distance between different entity semantics in the representation space; the semantic contrastive learning loss function is expressed as follows:
其中,和/>分别代表第i个关系类别的第k个样本的头实体和尾实体信息,/>和/>为对应的实体语义信息,d为标准化的余弦相似度距离函数,/>表示将k组数据得到的对比学习损失取均值作为最终的语义对比损失。in, and/> Respectively represent the head entity and tail entity information of the kth sample of the i-th relationship category,/> and/> is the corresponding entity semantic information, d is the standardized cosine similarity distance function, /> It means taking the average of the contrastive learning losses obtained from k groups of data as the final semantic contrastive loss.
进一步地,所述关系分类损失函数使模型具有抽取不同类别关系的能力,在采样得到的一组数据构成的任务中,将查询集中的样本归类到支持集中的某一类别原型中,并在反复采样的多组任务上训练模型;所述关系分类损失函数表示如下:Furthermore, the relationship classification loss function enables the model to extract relationships of different categories, classify the samples in the query set into a certain category prototype in the support set in a task consisting of a set of sampled data, and train the model on multiple sets of repeatedly sampled tasks; the relationship classification loss function is expressed as follows:
其中,qj为查询集中的第j的样本,pq表示该样本所属于的关系原型类别,τ为温度参数,用于控制相似度的平滑程度,d表示标准化的余弦相似度距离函数,G表示待分类样本的总数。Among them,qj is the jth sample in the query set,pq represents the relation prototype category to which the sample belongs, τ is the temperature parameter used to control the smoothness of the similarity, d represents the standardized cosine similarity distance function, and G represents the total number of samples to be classified.
进一步地,所述信息过滤机制中的权重计算方式如下:Furthermore, the weight calculation method in the information filtering mechanism is as follows:
其中,和/>分别表示使用卷积神经网络根据头实体和尾实体提取到的上下文中的关系信息,w为相似度权重,HGI和TGI为经过信息过滤之后的关系信息,两者结合得到最终的全局信息GI。in, and/> They represent the relational information in the context extracted from the head entity and the tail entity using the convolutional neural network, w is the similarity weight, HGI and TGI are the relational information after information filtering, and the two are combined to obtain the final global information GI.
一种学习鉴别性语义和多视角上下文的跨域小样本关系抽取装置,其包括:A cross-domain small sample relationship extraction device for learning discriminative semantics and multi-view context, comprising:
数据预处理模块,用于进行数据预处理,包括将语义提示模板拼接在数据集中每个句子的尾部;A data preprocessing module is used to perform data preprocessing, including splicing a semantic hint template at the end of each sentence in the data set;
模型构建模块,用于构建跨域小样本关系抽取模型,所述跨域小样本关系抽取模型包含特征提取网络、语义对比学习网络、多视角上下文学习网络和关系分类网络,所述多视角上下文学习网络中包含信息过滤机制;A model building module, used to build a cross-domain small sample relationship extraction model, wherein the cross-domain small sample relationship extraction model includes a feature extraction network, a semantic contrast learning network, a multi-view context learning network and a relationship classification network, wherein the multi-view context learning network includes an information filtering mechanism;
模型训练模块,用于利用数据集中的训练集,通过语义对比学习损失函数和关系分类损失函数训练所述跨域小样本关系抽取模型,并利用验证集获得最优模型;A model training module is used to train the cross-domain small sample relationship extraction model using a training set in the data set through a semantic contrast learning loss function and a relationship classification loss function, and obtain an optimal model using a validation set;
关系抽取模块,用于利用所述最优模型抽取目标域的句子中的关系。The relation extraction module is used to extract the relations in the sentences of the target domain using the optimal model.
本发明的关键点包括:The key points of the present invention include:
1.本方法提出鉴别性实体语义学习和多视角的上下文学习将学习实体信息和上下文信息整合在一个模型中,这能够有效的避免模型混淆实体语义相似的不同关系,使模型更有效的从一般领域泛化到特殊的专业领域中。1. This method proposes discriminative entity semantic learning and multi-perspective contextual learning to integrate the learning entity information and contextual information into one model, which can effectively avoid the model from confusing different relationships with similar entity semantics, and enable the model to generalize more effectively from general fields to specific professional fields.
2.本方法提出的跨域小样本关系抽取框架包含两个模块。鉴别性的语义对比学习模块放大不同实体之间的语义差异,这可以使模型感知到相似实体的语义之间的细微差异从而具有学习鉴别性实体语义的能力。多视角上下文学习模块以实体信息为基础挖掘实体之间潜在的深层关联关系,并结合整个句子的信息挖掘上下文中的可用于帮助理解关系的信息。这可以进一步利用上下文信息来帮助模型区分具有相似实体语义的不同关系。同时,该模块中的信息过滤机制可以帮助模型在提取上下文中的关系信息时避免过度关注实体信息,从而得到更全面的上下文信息。2. The cross-domain small sample relationship extraction framework proposed in this method contains two modules. The discriminative semantic contrast learning module amplifies the semantic differences between different entities, which can enable the model to perceive the subtle differences between the semantics of similar entities and thus have the ability to learn discriminative entity semantics. The multi-view context learning module mines the potential deep associations between entities based on entity information, and combines the information of the entire sentence to mine the information in the context that can be used to help understand the relationship. This can further utilize context information to help the model distinguish different relations with similar entity semantics. At the same time, the information filtering mechanism in this module can help the model avoid excessive attention to entity information when extracting relational information in the context, thereby obtaining more comprehensive context information.
3.本方法提出了两种相互作用、共同影响的损失函数,包括:语义对比学习损失函和关系分类损失。3. This method proposes two loss functions that interact and influence each other, including: semantic contrastive learning loss function and relation classification loss.
与现有技术相比,本发明的积极效果为:Compared with the prior art, the present invention has the following positive effects:
1.本发明针对目前的跨域小样本关系抽取方法中,受专业领域实体语义相似度较高的影响,学习到实体语义不具有鉴别性,提出了鉴别性的实体语义学习模块,利用语义对比学习使模型具有学习鉴别性的实体表征的能力,进而提高模型识别具有相似实体语义的不同关系的能力。1. In view of the fact that the current cross-domain small sample relationship extraction method is affected by the high semantic similarity of entities in professional fields, the learned entity semantics are not discriminative. The present invention proposes a discriminative entity semantic learning module, which uses semantic contrast learning to enable the model to have the ability to learn discriminative entity representations, thereby improving the model's ability to identify different relations with similar entity semantics.
2.本发明针对目前方法中学习句子中的关系信息时,以实体为中心的学习方式不能充分学习上下文中的关系信息问题,提出了多视角的上下文学习模块,基于头实体和尾实体提取局部关系信息,基于实体的句子信息提取全局关系信息,两者相互促进共同结合形成多视角的上下文信息,提高模型利用上下文信息区分不同关系的能力。2. In order to address the problem that the entity-centric learning method in current methods cannot fully learn the relational information in the context when learning the relational information in sentences, the present invention proposes a multi-perspective contextual learning module, which extracts local relational information based on the head entity and the tail entity, and extracts global relational information based on the sentence information of the entity. The two promote each other and combine to form multi-perspective contextual information, thereby improving the model's ability to distinguish different relationships using contextual information.
3.本发明提供权重自适应的信息过滤机制,可根据提取到的上下文信息的相似度的大小过滤掉部分实体信息,避免学习上下文的过程中过度关注实体从而获得更全面的上下文知识。3. The present invention provides a weight-adaptive information filtering mechanism, which can filter out some entity information according to the similarity of the extracted context information, avoiding excessive attention to entities in the process of learning context, thereby obtaining more comprehensive context knowledge.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本发明方法的流程示意图;Fig. 1 is a schematic flow diagram of the method of the present invention;
图2是本发明方法提出的框架结构示意图。FIG. 2 is a schematic diagram of the framework structure proposed by the method of the present invention.
具体实施方式Detailed ways
为更好地表达本发明中提出的学习鉴别性语义和多视角上下文的跨域小样本关系抽取方法,下面结合附图和具体实施方式对本发明作进一步的说明。In order to better express the cross-domain small sample relationship extraction method for learning discriminative semantics and multi-view context proposed in the present invention, the present invention is further described below in conjunction with the accompanying drawings and specific implementation methods.
图1为本发明的一种学习鉴别性语义和多视角上下文的跨域小样本关系抽取方法的整体流程图,包括数据预处理、初始化模型框架、模型训练、关系抽取四个部分。FIG1 is an overall flow chart of a cross-domain small sample relationship extraction method for learning discriminative semantics and multi-view context of the present invention, which includes four parts: data preprocessing, initializing the model framework, model training, and relationship extraction.
步骤1.数据预处理。将人工设计的实体语义提示模板拼接在数据集中每个句子的尾部。Step 1. Data preprocessing: The manually designed entity semantic hint template is spliced at the end of each sentence in the dataset.
步骤2.初始化模型框架。图2是本发明中设计的模型框架,该框架包含语义对比学习模块、多视角上下文学习模块、信息过滤机制和关系分类模块。Step 2. Initialize the model framework. FIG2 is a model framework designed in the present invention, which includes a semantic contrast learning module, a multi-view context learning module, an information filtering mechanism and a relationship classification module.
步骤3.模型训练。本发明通过语义对比损失和关系分类损失训练模型。在验证集上当总失函数收敛并且在验证集中达到最优效果时保存模型的参数作为最优模型。在验证以及测试过程中,该方法仅将数据输入多视角的上下文模块来提取多视角的上下文信息。Step 3. Model training. The present invention trains the model through semantic contrast loss and relation classification loss. When the total loss function converges on the validation set and the optimal effect is achieved in the validation set, the parameters of the model are saved as the optimal model. During the validation and testing process, the method only inputs data into the multi-view context module to extract multi-view context information.
步骤4.抽取关系。利用步骤3中得到的最优模型,将测试集数据作为输入模型。每一个句子中的关系由多视角的上下文信息和实体信息组合得到。然后通过度量函数计算待分类关系与给定的关系类别的相似度,并将其类别定为与其最相似的关系类别。Step 4. Extract relations. Using the optimal model obtained in step 3, the test set data is used as the input model. The relations in each sentence are obtained by combining the context information and entity information from multiple perspectives. Then the similarity between the relation to be classified and the given relation category is calculated through the metric function, and its category is determined as the most similar relation category.
按照本发明提供的方案,本发明的一个实施例的一种学习鉴别性语义和多视角上下文的跨域小样本关系抽取方法的具体步骤如下:According to the solution provided by the present invention, the specific steps of a cross-domain small sample relationship extraction method for learning discriminative semantics and multi-view context in one embodiment of the present invention are as follows:
步骤1.数据预处理。由于该方法需要语义提示模板引导模型给出实体的语义,所以本发明设计了语义提示模板并将其添加在每一条数据中。其次,本发明将BERT内部未定义的标记作为强调实体在关系抽取中的作用的虚拟标记,并将其分别添加在头实体和尾实体的前后位置。Step 1. Data preprocessing. Since this method requires a semantic hint template to guide the model to give the semantics of the entity, the present invention designs a semantic hint template and adds it to each piece of data. Secondly, the present invention uses the undefined tags inside BERT as virtual tags to emphasize the role of the entity in relation extraction, and adds them before and after the head entity and the tail entity respectively.
步骤2.初始化模型框架。该方法使用度量学习作为基本的模型框架,这是一种最流行的基于元学习的框架。该方法使用BERT预训练语言模型作为特征编码器。模型的输入主要分成两部分:支持集和查询集。模型最终的目的是将查询集归类到支持集中的某一关系类别中。Step 2. Initialize the model framework. This method uses metric learning as the basic model framework, which is one of the most popular meta-learning-based frameworks. This method uses the BERT pre-trained language model as the feature encoder. The input of the model is mainly divided into two parts: the support set and the query set. The ultimate goal of the model is to classify the query set into a certain relation category in the support set.
该方法包括:The method includes:
首先将采样得到的支持集和查询集数据输入特征编码器以获取特征编码。由于添加了实体的虚拟标记和语义提示模板,该方法所得到的特征编码即实体特征编码和语义特征编码。First, the sampled support set and query set data are input into the feature encoder to obtain feature encoding. Due to the addition of virtual tags of entities and semantic hint templates, the feature encoding obtained by this method is entity feature encoding and semantic feature encoding.
然后通过语义对比学习放大不同实体之间的语义差异,这个过程通过本发明设计的损失函数(语义对比学习损失)来完成。而且,该过程仅在支持集上进行,因为不能利用查询集中数据之间的类别联系。Then, semantic contrast learning is used to amplify the semantic differences between different entities. This process is completed by the loss function (semantic contrast learning loss) designed by the present invention. Moreover, this process is only performed on the support set because the category connection between the data in the query set cannot be used.
对于学习多视角上下文的部分,首先初始化两个卷积神经网络作为全局信息和局部信息的特征提取器。每个卷积神经网络都有四层,四层的卷积核数量分别为256、128、64和1,卷积核的大小被设置为5。其中一个卷积神经网络接收实体信息和整个句子信息,以实体信息为基础从句子信息中提取可以帮助理解关系的上下文信息作为全局信息。在这个过程中,支持集和查询集中的所有数据都要进行信息提取。在提取全局信息的过程中可能受实体信息影响明显,导致提取到的全局信息中包含较多的实体信息。这会降低上下文中关系信息的作用。因此,该方法通过信息过滤机制过滤掉全局信息中的部分实体信息。这有助于获得更加全面的全局信息。过滤信息的多少由根据头尾实体提取到的关系信息的相似度权重决定。权重越高则过滤掉的实体信息越多。另一个卷积神经网络只接收头实体和尾实体信息,通过深层的特征提取得到实体间潜在的联系作为局部信息。然后该方法将全局信息和局部信息融合得到最终的上下文中的关系信息,即多视角的上下文关系信息。并将其与实体信息再次结合得到一个句子的关系表征。For the part of learning multi-view context, two convolutional neural networks are first initialized as feature extractors for global information and local information. Each convolutional neural network has four layers, and the number of convolution kernels in the four layers is 256, 128, 64 and 1 respectively, and the size of the convolution kernel is set to 5. One of the convolutional neural networks receives entity information and the entire sentence information, and extracts contextual information that can help understand the relationship from the sentence information based on the entity information as global information. In this process, all data in the support set and the query set must be subjected to information extraction. In the process of extracting global information, it may be significantly affected by entity information, resulting in more entity information contained in the extracted global information. This will reduce the role of relational information in the context. Therefore, this method filters out part of the entity information in the global information through an information filtering mechanism. This helps to obtain more comprehensive global information. The amount of filtered information is determined by the similarity weight of the relational information extracted from the head and tail entities. The higher the weight, the more entity information is filtered out. The other convolutional neural network only receives the head entity and the tail entity information, and obtains the potential connection between entities as local information through deep feature extraction. Then this method fuses the global information and the local information to obtain the final relational information in the context, that is, the multi-view contextual relational information. And combine it with the entity information again to get a relational representation of a sentence.
最后,进行关系分类,该方法将同类关系的关系表征的类心最为该类关系的原型,并将待预测的关系归类到与其最相似的关系原型中。对于相似度的测量,使用无参的度量函数,如余弦相似度。这个过程通过分类损失函数来完成。Finally, the relationship classification is performed. This method takes the center of the relationship representation of the same type of relationship as the prototype of the relationship, and classifies the relationship to be predicted into the most similar relationship prototype. For the measurement of similarity, a parameter-free metric function is used, such as cosine similarity. This process is completed through the classification loss function.
步骤3.模型训练。该方法通过优化两个损失函数来完成模型的训练。Step 3. Model training: This method completes model training by optimizing two loss functions.
第一个损失函数即语义对比学习损失函数,其作用是使模型具有学习判别性实体表征的能力。具体的,该损失函数由对比学习框架获得。该方法通过对比学习拉近实体与其对应的语义在表征空间中的距离,并拉远不同实体语义在表征空间中的距离。这个过程放大了不同实体之间的语义差异,从而学习到了判别性的实体表征。对比学习通过一种数据分组的形式完成。每一组对比学习的数据由支持集中每个关系类别中的一条数据组成。本发明将每一组对比学习所得到的损失函数取均值作为最终的对比学习损失函数。对比学习中需要计算实体和语义之间以及不同实体的语义之间在特征空间中的距离,也就是相似度。距离的计算使用无参的距离函数,如余弦相似度。The first loss function is the semantic contrastive learning loss function, which is used to enable the model to learn discriminative entity representations. Specifically, the loss function is obtained by the contrastive learning framework. This method uses contrastive learning to shorten the distance between the entity and its corresponding semantics in the representation space, and to increase the distance between different entity semantics in the representation space. This process amplifies the semantic differences between different entities, thereby learning discriminative entity representations. Contrastive learning is completed in the form of data grouping. Each group of contrastive learning data consists of one piece of data from each relationship category in the support set. The present invention takes the average of the loss functions obtained from each group of contrastive learning as the final contrastive learning loss function. In contrastive learning, it is necessary to calculate the distance between entities and semantics and between the semantics of different entities in the feature space, that is, the similarity. The distance is calculated using a parameter-free distance function, such as cosine similarity.
第二个损失函数即关系分类损失函数,是使模型具有抽取不同类别关系的能力。具体的,该损失函数由元学习框架获得。即在采样得到的一组数据构成的任务中,模型需要将查询集中的样本归类到支持集中的某一类别原型中。并在反复采样的多组任务上训练模型。支持集中的样本需要先通过原型学习得到每个关系类别的原型才能参与分类过程。关系原型可认为是这个类别关系的一个标准。归类过程中同样利用无参的距离函数计算关系之间的相似度,并使用温度参数控制概率的平滑度。该过程得到的损失函数代表了模型的分类能力。优化损失函数即最大化待分类样本被分类到正确关系类别中的概率,最小化被分类到其他关系类别中的概率。The second loss function, the relation classification loss function, enables the model to extract relations of different categories. Specifically, this loss function is obtained by the meta-learning framework. That is, in a task consisting of a set of sampled data, the model needs to classify the samples in the query set into a certain category prototype in the support set. And train the model on multiple sets of tasks that are repeatedly sampled. The samples in the support set need to first obtain the prototype of each relation category through prototype learning before they can participate in the classification process. The relation prototype can be considered as a standard for this category relationship. In the classification process, the parameter-free distance function is also used to calculate the similarity between relations, and the temperature parameter is used to control the smoothness of the probability. The loss function obtained in this process represents the classification ability of the model. Optimizing the loss function is to maximize the probability of the sample to be classified into the correct relation category and minimize the probability of being classified into other relation categories.
该方法通过分配不同的权重将两个损失函数组合成模型总损失函数。本发明在验证集上。当总损失函数收敛并且在验证集中达到最优效果时保存模型的参数作为最优模型。在验证以及测试过程中,该方法仅将数据输入多视角的上下文模块来提取多视角的上下文信息。The method combines two loss functions into a total model loss function by assigning different weights. The present invention is on a validation set. When the total loss function converges and reaches the optimal effect in the validation set, the parameters of the model are saved as the optimal model. During the validation and testing process, the method only inputs data into a multi-view context module to extract multi-view context information.
步骤4.抽取关系。利用步骤3中得到的最优模型,将测试集数据作为输入模型。每一个句子中的关系由多视角的上下文信息和实体信息组合得到。然后通过度量函数计算待分类关系与给定的关系类别的相似度,并将其类别定为与其最相似的关系类别。Step 4. Extract relations. Using the optimal model obtained in step 3, the test set data is used as the input model. The relations in each sentence are obtained by combining the context information and entity information from multiple perspectives. Then the similarity between the relation to be classified and the given relation category is calculated through the metric function, and its category is determined as the most similar relation category.
示例性地,步骤1中,一个添加了语义提示模板的句子为:“[E1]Newton[\E1]served as the president of[E2]the Royal Society[\E2].Newton means[M1],theRoyal Society means[M2]”。前半句为一个数据集中的句子,其中[E1]、[\E1]、[E2]、[\E2]代表未定义的虚拟标记,用于强调实体在关系中的作用。后半句为语义提示模板,[M1]和[M2]分别代表头实体和尾实体的语义。For example, in step 1, a sentence with a semantic hint template added is: "[E1] Newton [\E1] served as the president of [E2] the Royal Society [\E2]. Newton means [M1], the Royal Society means [M2]". The first half of the sentence is a sentence in a dataset, where [E1], [\E1], [E2], [\E2] represent undefined virtual tags used to emphasize the role of entities in the relationship. The second half of the sentence is a semantic hint template, where [M1] and [M2] represent the semantics of the head entity and the tail entity, respectively.
示例性地,步骤2中,卷积神经网络由多个卷积层、激活层、和Dropout层构成。每个卷积层通过卷积核与输入的卷积操作,提取输入的特征,激活层使用激活函数增加模型的非线性表达能力,Dropout层可以在一定程度上缓解有样本缺少导致的过拟合问题。For example, in step 2, the convolutional neural network is composed of multiple convolutional layers, activation layers, and Dropout layers. Each convolutional layer extracts the input features by convolution operation between the convolution kernel and the input, the activation layer uses the activation function to increase the nonlinear expression ability of the model, and the Dropout layer can alleviate the overfitting problem caused by lack of samples to a certain extent.
示例性地,步骤3中,语义对比学习损失函数表达为:Exemplarily, in step 3, the semantic contrastive learning loss function is expressed as:
其中,和/>分别代表第i个关系类别的第k个样本的头实体和尾实体信息,HSki和TSki为对应的实体语义信息,d为标准化的余弦相似度距离函数,/>表示将K组数据得到的对比学习损失取均值作为最终的语义对比损失,N表示关系类别的数量。in, and/> They represent the head entity and tail entity information of the kth sample of the ith relationship category, HSki and TSki are the corresponding entity semantic information, d is the standardized cosine similarity distance function, /> It means taking the average of the contrastive learning losses obtained from K groups of data as the final semantic contrastive loss, and N represents the number of relationship categories.
示例性地,步骤3中,信息过滤机制中的权重计算方式如下:Exemplarily, in step 3, the weight calculation method in the information filtering mechanism is as follows:
其中,和/>分别表示使用卷积神经网络根据头实体和尾实体提取到的上下文中的关系信息。w为相似度权重。HGI和TGI为经过信息过滤之后的关系信息。两者结合得到最终的全局信息GI。in, and/> They represent the relational information in the context extracted from the head entity and the tail entity using the convolutional neural network. w is the similarity weight. HGI and TGI are the relational information after information filtering. The two are combined to obtain the final global information GI.
示例性地,步骤3中,每个句子中蕴含的关系的最终表示形式如下:For example, in step 3, the final representation of the relation implied in each sentence is as follows:
其中,R表示句子中蕴含的关系信息,Head和Tail分别表示头实体和尾实体,MC为多视角的上下文信息,即由局部信息LI和全局信息GI得到。Among them, R represents the relational information contained in the sentence, Head and Tail represent the head entity and tail entity respectively, and MC is the multi-perspective context information, which is obtained by local information LI and global information GI.
示例性地,步骤3中,关系分类损失函数表示如下:Exemplarily, in step 3, the relation classification loss function is expressed as follows:
其中,qj为查询集中的第j个样本,pq表示该样本所属于的关系原型类别,τ为温度参数,用于控制相似度的平滑程度,d表示标准化的余弦相似度距离函数,G表示待分类样本的总数。Among them,qj is the jth sample in the query set,pq represents the relation prototype category to which the sample belongs, τ is the temperature parameter used to control the smoothness of the similarity, d represents the standardized cosine similarity distance function, and G represents the total number of samples to be classified.
示例性地,步骤3中,模型的总损失函数表示如下:Exemplarily, in step 3, the total loss function of the model is expressed as follows:
L=λ1LE+λ2LFSL=λ1 LE +λ2 LFS
其中,λ1和λ2分别为语义对比学习损失和关系分类损失的权重。两个权重根据模型在验证集中的效果人工确定。Among them, λ1 and λ2 are the weights of semantic contrastive learning loss and relation classification loss respectively. The two weights are manually determined according to the performance of the model in the validation set.
表1是本发明与现有方法的对比实验数据。Table 1 is the comparative experimental data of the present invention and the existing method.
表1Table 1
通过上述实验结果可以证明本发明(MCDS)在多种任务情况下的效果要优于现有方法。尤其是在每种关系类别只有一个样本的任务中,本发明的优势更加显著。The above experimental results prove that the present invention (MCDS) is superior to the existing methods in various tasks, especially in tasks where each relationship category has only one sample, the advantages of the present invention are more significant.
本发明的另一实施例提供一种学习鉴别性语义和多视角上下文的跨域小样本关系抽取装置,其包括:Another embodiment of the present invention provides a cross-domain small sample relationship extraction device for learning discriminative semantics and multi-view context, comprising:
数据预处理模块,用于进行数据预处理,包括将语义提示模板拼接在数据集中每个句子的尾部;A data preprocessing module is used to perform data preprocessing, including splicing a semantic hint template at the end of each sentence in the data set;
模型构建模块,用于构建跨域小样本关系抽取模型,所述跨域小样本关系抽取模型包含特征提取网络、语义对比学习网络、多视角上下文学习网络和关系分类网络,所述多视角上下文学习网络中包含信息过滤机制;A model building module, used to build a cross-domain small sample relationship extraction model, wherein the cross-domain small sample relationship extraction model includes a feature extraction network, a semantic contrast learning network, a multi-view context learning network and a relationship classification network, wherein the multi-view context learning network includes an information filtering mechanism;
模型训练模块,用于利用数据集中的训练集,通过语义对比学习损失函数和关系分类损失函数训练所述跨域小样本关系抽取模型,并利用验证集获得最优模型;A model training module is used to train the cross-domain small sample relationship extraction model using a training set in the data set through a semantic contrast learning loss function and a relationship classification loss function, and obtain an optimal model using a validation set;
关系抽取模块,用于利用所述最优模型抽取目标域的句子中的关系。The relation extraction module is used to extract the relations in the sentences of the target domain using the optimal model.
其中各模块的具体实施过程参见前文对本发明方法的描述。The specific implementation process of each module refers to the above description of the method of the present invention.
本发明的另一实施例提供一种计算机设备(计算机、服务器、智能手机等),其包括存储器和处理器,所述存储器存储计算机程序,所述计算机程序被配置为由所述处理器执行,所述计算机程序包括用于执行本发明方法中各步骤的指令。Another embodiment of the present invention provides a computer device (computer, server, smart phone, etc.), which includes a memory and a processor, the memory stores a computer program, the computer program is configured to be executed by the processor, and the computer program includes instructions for executing each step in the method of the present invention.
本发明的另一实施例提供一种计算机可读存储介质(如ROM/RAM、磁盘、光盘),所述计算机可读存储介质存储计算机程序,所述计算机程序被计算机执行时,实现本发明方法的各个步骤。Another embodiment of the present invention provides a computer-readable storage medium (such as ROM/RAM, magnetic disk, optical disk), wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a computer, the steps of the method of the present invention are implemented.
尽管为说明目的公开了本发明的具体内容、实施算法以及附图,其目的在于帮助理解本发明的内容并据以实施,但是本领域的技术人员可以理解:在不脱离本发明及所附的权利要求的精神和范围内,各种替换、变化和修改都是可能的。本发明不应局限于本说明书最佳实施例和附图所公开的内容,本发明要求保护的范围以权利要求书界定的范围为准。Although the specific contents, implementation algorithms and drawings of the present invention are disclosed for the purpose of illustration, the purpose is to help understand the contents of the present invention and implement them accordingly, it can be understood by those skilled in the art that various substitutions, changes and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. The present invention should not be limited to the contents disclosed in the best embodiment of this specification and the drawings, and the scope of protection claimed by the present invention shall be subject to the scope defined in the claims.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311564750.7ACN117852523A (en) | 2023-11-22 | 2023-11-22 | A cross-domain small sample relationship extraction method and device for learning discriminative semantics and multi-view context |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311564750.7ACN117852523A (en) | 2023-11-22 | 2023-11-22 | A cross-domain small sample relationship extraction method and device for learning discriminative semantics and multi-view context |
| Publication Number | Publication Date |
|---|---|
| CN117852523Atrue CN117852523A (en) | 2024-04-09 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311564750.7APendingCN117852523A (en) | 2023-11-22 | 2023-11-22 | A cross-domain small sample relationship extraction method and device for learning discriminative semantics and multi-view context |
| Country | Link |
|---|---|
| CN (1) | CN117852523A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118536509A (en)* | 2024-07-22 | 2024-08-23 | 贵州师范大学 | A small sample named entity recognition method based on metric learning |
| CN118734069A (en)* | 2024-06-14 | 2024-10-01 | 郑州丰嘉科技股份有限公司 | A method and system for constructing a large language dataset based on contrastive learning |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118734069A (en)* | 2024-06-14 | 2024-10-01 | 郑州丰嘉科技股份有限公司 | A method and system for constructing a large language dataset based on contrastive learning |
| CN118536509A (en)* | 2024-07-22 | 2024-08-23 | 贵州师范大学 | A small sample named entity recognition method based on metric learning |
| Publication | Publication Date | Title |
|---|---|---|
| CN108920445B (en) | A Named Entity Recognition Method and Device Based on Bi-LSTM-CRF Model | |
| CN109376242B (en) | Text classification method based on cyclic neural network variant and convolutional neural network | |
| CN107944559B (en) | Method and system for automatically identifying entity relationship | |
| CN110390017B (en) | Target emotion analysis method and system based on attention gating convolutional network | |
| CN107247702A (en) | A kind of text emotion analysis and processing method and system | |
| CN109344759A (en) | A Relative Recognition Method Based on Angle Loss Neural Network | |
| CN117852523A (en) | A cross-domain small sample relationship extraction method and device for learning discriminative semantics and multi-view context | |
| CN114648635B (en) | Multi-label image classification method fusing strong correlation among labels | |
| CN113392179B (en) | Text annotation method and device, electronic device, and storage medium | |
| CN112199954B (en) | Disease entity matching method and device based on voice semantics and computer equipment | |
| CN110852066B (en) | A method and system for multilingual entity relation extraction based on adversarial training mechanism | |
| CN111783688B (en) | A classification method of remote sensing image scene based on convolutional neural network | |
| CN114943017B (en) | Cross-modal retrieval method based on similarity zero sample hash | |
| CN114048729A (en) | Medical literature evaluation methods, electronic devices, storage media and program products | |
| CN114925212A (en) | Relation extraction method and system for automatically judging and fusing knowledge graph | |
| CN116108853B (en) | Cross-domain emotion analysis method based on knowledge migration and storage medium | |
| CN111898038A (en) | A social media fake news detection method based on human-machine collaboration | |
| CN115187910A (en) | Video classification model training method, device, electronic device and storage medium | |
| CN114117070A (en) | A knowledge graph construction method, system and storage medium | |
| CN115775349A (en) | False news detection method and device based on multi-mode fusion | |
| CN118397250A (en) | A generative zero-shot object detection method and system based on distilled CLIP model | |
| CN118626575A (en) | Weather query system and weather query method | |
| CN117033961A (en) | Multi-mode image-text classification method for context awareness | |
| CN117252204A (en) | Machine account number detection method and device for federal comparison countermeasure knowledge distillation | |
| CN110867225A (en) | Character-level clinical concept extraction named entity recognition method and system |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |