Movatterモバイル変換


[0]ホーム

URL:


CN110727745A - Vocabulary relevancy calculation method and device based on narrative table - Google Patents

Vocabulary relevancy calculation method and device based on narrative table
Download PDF

Info

Publication number
CN110727745A
CN110727745ACN201910335423.1ACN201910335423ACN110727745ACN 110727745 ACN110727745 ACN 110727745ACN 201910335423 ACN201910335423 ACN 201910335423ACN 110727745 ACN110727745 ACN 110727745A
Authority
CN
China
Prior art keywords
thesaurus
family
word
relevancy
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910335423.1A
Other languages
Chinese (zh)
Inventor
杨雅萍
白燕
乐夏芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Geographic Sciences and Natural Resources of CAS
Original Assignee
Institute of Geographic Sciences and Natural Resources of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Geographic Sciences and Natural Resources of CASfiledCriticalInstitute of Geographic Sciences and Natural Resources of CAS
Priority to CN201910335423.1ApriorityCriticalpatent/CN110727745A/en
Publication of CN110727745ApublicationCriticalpatent/CN110727745A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

The embodiment of the invention provides a method and a device for calculating vocabulary relevancy based on a narrative word list, wherein the method comprises the following steps: acquiring a first narrative, a first family head word, a second narrative and a second family head word in a narrative table, wherein the first narrative and the first family head word are positioned in a first narrative tree, and the second narrative and the second family head word are positioned in a second narrative tree; respectively calculating the correlation degree of the first narrative and the first family head word, the correlation degree of the second narrative and the second family head word and the correlation degree of the first family head word and the second family head word; according to a preset calculation method, the correlation degree of the first narrative and the first family head word, the correlation degree of the second narrative and the second family head word and the correlation degree of the first family head word and the second family head word are calculated to obtain the correlation degree of the first narrative and the second narrative, the defect that the correlation degree of the narratives on different narrative trees cannot be determined in the prior art is overcome, and the calculation precision of the semantic correlation degree is improved.

Description

Translated fromChinese
一种基于叙词表的词汇相关度计算方法及装置A thesaurus-based method and device for calculating lexical relevance

技术领域technical field

本发明实施例涉及数据处理技术领域,尤其涉及一种基于叙词表的词汇相关度计算方法及装置。Embodiments of the present invention relate to the technical field of data processing, and in particular, to a method and device for calculating lexical relevance based on a thesaurus.

背景技术Background technique

地理词汇的相关度在地理信息检索、地理语义互操作和地理数据推荐等领域有着重要应用。已有的地理词汇相关度算法主要分为两类:基于知识库的方法和基于统计的方法,其中,基于知识库的方法用手工建立的概念知识库,如分类词典和本体,把知识网络看做是一个图,把概念看做图中的节点,认为概念之间语义相关度与概念之间最短路径的长度成反比。基于统计的方法利用词汇在一个大的文档语料库中共现的概率来计算相关度。由于基于统计的方法没有考虑词语、短语、句子间的语义,因而其算法的准确度有限。The relevance of geographic vocabulary has important applications in geographic information retrieval, geographic semantic interoperability and geographic data recommendation. Existing geographic lexical relevancy algorithms are mainly divided into two categories: knowledge base-based methods and statistical-based methods. Among them, knowledge base-based methods use manually established conceptual knowledge bases, such as thesaurus and ontology, to view knowledge networks as knowledge bases. Do is a graph, regard concepts as nodes in the graph, and think that the semantic relevance between concepts is inversely proportional to the length of the shortest path between concepts. Statistically based methods use the probability of co-occurrence of words in a large corpus of documents to calculate relevance. Since statistical-based methods do not consider the semantics between words, phrases, and sentences, the accuracy of their algorithms is limited.

在ISO国际标准中,叙词表被定义为“受控和结构化的词表,在这个词表中,概念由词语表达,概念之间的关系被显式的描述”,叙词表也被认为是专业知识本体。实际上,一个叙词表是有很多个词汇簇(也称为叙词树)组成,一棵叙词树表达一个主题。在一棵叙词树中,有一个族首词,所有的词汇由三种显式申明的关系的组织起来。这三种关系为:等同关系,等级关系和相关关系。In the ISO International Standard, a thesaurus is defined as "a controlled and structured vocabulary in which concepts are expressed by words and the relationships between concepts are explicitly described", and the thesaurus is also It is considered to be the ontology of professional knowledge. In fact, a thesaurus consists of many lexical clusters (also called thesaurus trees), and a thesaurus tree expresses a topic. In a thesaurus tree, there is a family head, and all words are organized by three explicitly stated relationships. The three relationships are: Equivalence, Hierarchy, and Correlation.

目前,有好多专业人士利用语义的相关度在改善信息检索方面都取得了较好的结果。然而,仅仅能计算在同一棵词语树上的两个词语的相关度,在不同词语树上的词语的相关度直接被赋值为0,这和实际情况是不符合的。At present, many professionals have achieved good results in improving information retrieval by using semantic relevance. However, only the relevancy of two words in the same word tree can be calculated, and the relevancy of words in different word trees is directly assigned as 0, which is inconsistent with the actual situation.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种基于叙词表的词汇相关度计算方法及装置,用以解决现有技术中不同的叙词树上的叙词的相关度无法确定的缺陷。Embodiments of the present invention provide a thesaurus-based lexical relevancy calculation method and device, which are used to solve the defect in the prior art that the relevancy of thesaurus on different thesaurus trees cannot be determined.

第一方面,本发明实施例提供一种基于叙词表的词汇相关度计算方法,包括:In a first aspect, an embodiment of the present invention provides a thesaurus-based vocabulary relevancy calculation method, including:

获取叙词表中的第一叙词、第一族首词、第二叙词和第二族首词,其中,所述第一叙词和所述第一族首词位于第一叙词树,所述第二叙词和第二族首词位于第二叙词树;Obtain the first thesaurus, the first word of the family, the second thesaurus, and the second word of the family in the thesaurus, where the first thesaurus and the first word of the first family are located in the first thesaurus tree , the second thesaurus and the second family head are located in the second thesaurus tree;

分别计算第一叙词和第一族首词的相关度、第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度;Calculate the degree of relevancy between the first thesaurus and the first word of the family, the degree of relevancy between the second thesaurus and the second family of words, and the degree of relevancy between the first and the second family of words;

根据预设的计算方法,对所述第一叙词和第一族首词的相关度、所述第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度进行计算,得到所述第一叙词和所述第二叙词的相关度。According to a preset calculation method, the degree of relevancy between the first thesaurus and the first word of the family, the degree of relevancy between the second thesaurus and the The relevancy degree of the family head word is calculated to obtain the relevancy degree of the first descriptor and the second descriptor.

第二方面,本发明实施例提供一种基于叙词表的词汇相关度计算装置,包括:In a second aspect, an embodiment of the present invention provides a thesaurus-based vocabulary relatedness computing device, including:

获取模块,用于获取叙词表中的第一叙词、第一族首词、第二叙词和第二族首词,其中,所述第一叙词和所述第一族首词位于第一叙词树,所述第二叙词和第二族首词位于第二叙词树;The obtaining module is used to obtain the first thesaurus, the first family head word, the second thesaurus and the second family head word in the thesaurus table, wherein, the first thesaurus and the first family head word are located in the first thesaurus tree, the second thesaurus and the second family head word are located in the second thesaurus tree;

第一计算模块,用于分别计算第一叙词和第一族首词的相关度、第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度;The first calculation module is used to calculate the degree of relevancy between the first thesaurus and the first word of the family, the degree of relevancy between the second thesaurus and the second word of the family, and the first word and the second family of the first word. relevance;

第二计算模块,用于根据预设的计算方法,对所述第一叙词和第一族首词的相关度、所述第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度进行计算,得到所述第一叙词和所述第二叙词的相关度。The second calculation module is configured to, according to a preset calculation method, determine the degree of relevancy between the first thesaurus and the first word of the family, the degree of relevancy between the second thesaurus and the first word of the second family, and the degree of the first The degree of relevancy between the head word of a family and the head word of the second family is calculated to obtain the degree of relevancy of the first descriptor and the second descriptor.

第三方面,本发明实施例还提供了一种计算机设备,包括存储器和处理器,所述处理器和所述存储器通过总线完成相互间的通信;所述存储器存储有可被所述处理器执行的程序指令,所述处理器调用所述程序指令能够执行如下第一方面所述的方法。In a third aspect, an embodiment of the present invention further provides a computer device, including a memory and a processor, where the processor and the memory communicate with each other through a bus; the memory stores data that can be executed by the processor. the program instructions, the processor can call the program instructions to execute the method described in the first aspect below.

第四方面,本发明实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时用于存储如前所述的计算机程序的方法。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method for storing the aforementioned computer program is provided.

本发明实施例提供的基于叙词表的词汇相关度计算方法及装置,计算一个叙词和它的族首词之间的相关度,及两个叙词的族首词之间的相关度,最后把这三个相关度的乘积作为两个叙词间的相关度,克服了现有技术中因无法确定不同的叙词树上的叙词的相关度而直接将不同叙词树上词汇建的语义相关度被赋值为0的缺陷,提高了语义相关度计算的精度。The thesaurus-based vocabulary correlation calculation method and device provided by the embodiments of the present invention calculate the correlation between a thesaurus and its family head, and the correlation between the family heads of two thesaurus, Finally, the product of the three relevancy degrees is used as the relevancy degree between the two descriptors, which overcomes the inability to determine the relevancy degree of the descriptors on different thesaurus trees in the prior art and directly constructs words on different thesaurus trees. The defect that the semantic relevance of is assigned as 0 improves the accuracy of semantic relevance calculation.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明实施例提供的基于叙词表的词汇相关度计算方法的流程示意图;1 is a schematic flowchart of a thesaurus-based vocabulary relatedness calculation method provided by an embodiment of the present invention;

图2为本发明实施例提供的基于叙词表的词汇相关度计算装置的结构示意图;FIG. 2 is a schematic structural diagram of a thesaurus-based vocabulary correlation calculation device provided by an embodiment of the present invention;

图3为本发明实施例提供的计算机设备的结构框图。FIG. 3 is a structural block diagram of a computer device provided by an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

图1为本发明实施例提供的基于叙词表的词汇相关度计算方法的流程示意图,如图1所示,所示方法包括:FIG. 1 is a schematic flowchart of a thesaurus-based vocabulary relatedness calculation method provided by an embodiment of the present invention. As shown in FIG. 1 , the shown method includes:

S101、获取叙词表中的第一叙词、第一族首词、第二叙词和第二族首词,其中,所述第一叙词和所述第一族首词位于第一叙词树,所述第二叙词和第二族首词位于第二叙词树;S101. Obtain a first thesaurus, a first family head word, a second thesaurus and a second family head word in the thesaurus table, wherein the first thesaurus and the first family head word are located in the first thesaurus a word tree, wherein the second thesaurus and the second family head are located in the second thesaurus tree;

S102、分别计算第一叙词和第一族首词的相关度、第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度;S102, respectively calculating the degree of relevancy between the first thesaurus and the first word of the family, the degree of relevancy between the second thesaurus and the second word of the family, and the degree of relevancy between the first and the second family head;

S103、根据预设的计算方法,对所述第一叙词和第一族首词的相关度、所述第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度进行计算,得到所述第一叙词和所述第二叙词的相关度。S103. According to a preset calculation method, the correlation between the first thesaurus and the first family head word, the correlation between the second thesaurus and the second family head word, and the first family head word and The relevancy degree of the second family head word is calculated to obtain the relevancy degree of the first descriptor and the second descriptor.

具体地,在一个叙词表中包括多个叙词树,在同一棵叙词树上的有多个概念节点也就是叙词,同一棵叙词树上的两个词语之间的路径分为等同型路径,等级型路径,相关型路径和复合型路径。等同型路径表示两个叙词之间的关系是等同的。等级型路径表示两个叙词在同一棵叙词树的同一等级上。相关型路径表示两个叙词之间是相关关系。复合型路径表示两个叙词由至少前面所述的两种路径所连接。例如,“尘暴”和“沙暴”之间的路径是等同型路径,“灾害性天气”和“寒潮”之间是等级型路径,“寒潮”和“冷气团”之间是相关型路径,“晴天”和“冷气团”之间是复合型路径。对不同类型的路径,可通过不同的算法来计算,具体的算法属于现有技术,在本发明实施例中不做具体的阐述。Specifically, a thesaurus includes multiple thesaurus trees, there are multiple concept nodes in the same thesaurus tree, that is, thesaurus, and the path between two words on the same thesaurus tree is divided into Identical paths, hierarchical paths, relational paths, and compound paths. An equivalence path indicates that the relationship between two descriptors is equivalent. Hierarchical paths indicate that two descriptors are at the same level in the same descriptor tree. Correlation paths indicate that there is a correlation between two descriptors. A compound path means that two descriptors are connected by at least the two paths described above. For example, the paths between "dust storms" and "sandstorms" are isotype paths, "severe weather" and "cold waves" are hierarchical paths, "cold waves" and "cold air masses" are correlated paths, and "sunny weather" ” and the “cold air mass” are compound paths. Different types of paths can be calculated by different algorithms, and specific algorithms belong to the prior art, and are not specifically described in the embodiments of the present invention.

为了计算不同叙词树上的叙词的相关度,本发明实施例提供了基于叙词表的词汇相关度计算方法,具体的,获取到不同叙词树上的两个叙词,分别计算一个叙词和它的族首词之间的相关度,然后再计算两个叙词的族首词之间的相关度,再根据预设的计算方法,对叙词和族首词之间的相关度和两个族首词之间的相关度进行计算,得到不同叙词树上的两个叙词的相关度。In order to calculate the relevancy of thesaurus on different thesaurus trees, the embodiment of the present invention provides a vocabulary relevancy calculation method based on the thesaurus. The correlation between the descriptor and its family head, then calculate the correlation between the family heads of the two thesaurus, and then according to the preset calculation method, the correlation between the thesaurus and the family head is calculated. Calculate the correlation between the degree and the two family head words, and obtain the correlation between the two thesaurus on different thesaurus trees.

其中,一个叙词和它的族首词之间的相关度采用上述的计算方法来得到,在本发明实施例中不做详细介绍。The degree of correlation between a descriptor and its family head is obtained by using the above calculation method, which is not described in detail in the embodiment of the present invention.

其中,第一、第二只是用来区分不同叙词树上的叙词,并没有顺序的限定。Among them, the first and second are only used to distinguish the descriptors on different descriptor trees, and there is no order limit.

本发明实施例提供的基于叙词表的词汇相关度计算方法,通过计算一个叙词和它的族首词之间的相关度,及两个叙词的族首词之间的相关度,最后把这三个相关度的乘积作为两个叙词间的相关度,克服了现有技术中因无法确定不同的叙词树上的叙词的相关度而直接将不同叙词树上词汇的语义相关度被赋值为0的缺陷,提高了语义相关度计算的精度。The thesaurus-based vocabulary correlation calculation method provided by the embodiment of the present invention, by calculating the correlation between a thesaurus and its family head, and the correlation between the family heads of two thesaurus, finally Taking the product of these three relevancy degrees as the relevancy degree between the two descriptors overcomes the inability to determine the relevancy degree of the descriptors on different thesaurus trees in the prior art and directly converts the semantics of words on different thesaurus trees. The defect that relevance is assigned as 0 improves the accuracy of semantic relevance calculation.

在上述实施例的基础上,可选地,所述计算第一族首词和第二族首词的相关度具体为:采用HowNet方法计算所述第一族首词和所述第二族首词的相关度。On the basis of the above embodiment, optionally, the calculating the correlation between the first family head word and the second family head word is specifically: using the HowNet method to calculate the first family head word and the second family head word word relatedness.

可选地,所述预设的计算方法为:将得到的所述第一叙词和第一族首词的相关度、所述第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度进行乘法运算。Optionally, the preset calculation method is: the correlation between the first thesaurus and the first family head word, the correlation between the second thesaurus and the second family head word, and the obtained A multiplication operation is performed on the relevancy of the first family head word and the second family head word.

具体地,所述进行乘法运算为:Specifically, the multiplication operation is as follows:

Rel(C1,C2)=Rel(C1,O1)×Rel(O1,O2)×Rel(C2,O2);Rel(C1 , C2 )=Rel(C1 ,O1 )×Rel(O1 ,O2 )×Rel(C2 ,O2 );

其中,O1为第一族首词;Wherein, O1 is the first word of the first family;

O2为第二族首词;O2 is the first word of the second family;

C1为第一叙词树;C1 is the first thesaurus tree;

C2为第二叙词树;C2 is the second thesaurus tree;

Rel(C1,O1)为第一叙词树和第一族首词之间的相关度;Rel(C1 ,O1 ) is the correlation between the first thesaurus tree and the first word of the first family;

Rel(C2,O2)为第二叙词树和第二族首词之间的相关度;Rel(C2 ,O2 ) is the correlation between the second thesaurus tree and the second family head word;

Rel(O1,O2)为第一族首词和第二族首词之间的相关度。Rel(O1 ,O2 ) is the degree of relevancy between the first family head word and the second family head word.

其中,Rel(C1,O1)和Rel(C2,O2)通过采用现有技术的算法进行计算,Rel(O1,O2)采用HowNet算法来计算。Wherein, Rel(C1 , O1 ) and Rel(C2 , O2 ) are calculated by using the algorithm of the prior art, and Rel(O1 , O2 ) is calculated by using the HowNet algorithm.

可选地,所述方法还包括:对所述第一叙词和所述第二叙词的相关度进行评估。Optionally, the method further includes: evaluating the degree of relevancy between the first descriptor and the second descriptor.

在上述实施例的基础上,本发明实施例还提供一种地理术语相关度数据集,所述数据集包括两个对等的子数据集,一个子数据集用于反演出相关度的模参数,一个子数据集用于评估计算的精度。On the basis of the above embodiment, the embodiment of the present invention further provides a geographic term relevancy data set, the data set includes two equal sub-data sets, and one sub-data set is used to invert the modulo parameter of the relevancy degree , a sub-dataset for evaluating the accuracy of the computation.

具体地,所述对所述第一叙词和所述第二叙词的相关度进行评估具体为:Specifically, the evaluation of the correlation between the first descriptor and the second descriptor is as follows:

采用专家评分的方式,对计算的所述相关度进行评估。The calculated relevancy is evaluated by means of expert scoring.

也就是,利用本发明实施例提供的计算方法计算的两个叙词的相关度,再用专家评分的方式对上述的相关度进行对比,说明本发明实施例提供的计算方法得到的相关度精度较高。That is, the correlation between the two thesaurus is calculated by using the calculation method provided by the embodiment of the present invention, and then the above-mentioned correlation is compared by means of expert scoring to illustrate the accuracy of the correlation obtained by the calculation method provided by the embodiment of the present invention. higher.

图2为本发明实施例提供的基于叙词表的词汇相关度计算装置的结构示意图,如图2所示,所述装置包括:获取模块10、第一计算模块20和第二计算模块30,其中,FIG. 2 is a schematic structural diagram of a thesaurus-based vocabulary correlation calculation device provided by an embodiment of the present invention. As shown in FIG. 2 , the device includes: anacquisition module 10, afirst calculation module 20, and asecond calculation module 30, in,

获取模块10用于获取叙词表中的第一叙词、第一族首词、第二叙词和第二族首词,其中,所述第一叙词和所述第一族首词位于第一叙词树,所述第二叙词和第二族首词位于第二叙词树;The obtainingmodule 10 is used for obtaining the first thesaurus, the first family head word, the second thesaurus and the second family head word in the thesaurus, wherein the first thesaurus and the first family head word are located in the thesaurus. the first thesaurus tree, the second thesaurus and the second family head word are located in the second thesaurus tree;

第一计算模块20用于分别计算第一叙词和第一族首词的相关度、第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度;Thefirst calculation module 20 is used to calculate the degree of relevancy between the first thesaurus and the first word of the family, the degree of relevancy between the second thesaurus and the second word of the family, and the degree of the first and the second family head. relevance;

第二计算模块30用于根据预设的计算方法,对所述第一叙词和第一族首词的相关度、所述第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度进行计算,得到所述第一叙词和所述第二叙词的相关度。Thesecond calculation module 30 is configured to, according to a preset calculation method, calculate the correlation between the first thesaurus and the first family head word, the correlation between the second thesaurus and the second family head word, and the first The degree of relevancy between the head word of a family and the head word of the second family is calculated to obtain the degree of relevancy of the first descriptor and the second descriptor.

为了计算不同叙词树上的叙词的相关度,本发明实施例提供了基于叙词表的词汇相关度计算装置,具体的,获取模块10获取到不同叙词树上的两个叙词,第一计算模块20分别计算一个叙词和它的族首词之间的相关度,然后再计算两个叙词的族首词之间的相关度,再第二计算模块30根据预设的计算方法,对叙词和族首词之间的相关度和两个族首词之间的相关度进行计算,得到不同叙词树上的两个叙词的相关度。In order to calculate the relevancy of thesaurus on different thesaurus trees, the embodiment of the present invention provides a vocabulary relevancy calculation device based on the thesaurus. Specifically, theacquisition module 10 acquires two thesaurus on different thesaurus trees, Thefirst calculation module 20 respectively calculates the degree of correlation between a thesaurus and its family head word, then calculates the degree of correlation between the family head words of the two thesaurus, and then thesecond calculation module 30 calculates according to a preset method, the correlation between the descriptor and the family head word and the correlation between the two family head words are calculated, and the correlation between the two descriptors on different thesaurus trees is obtained.

本发明实施例提供的基于叙词表的词汇相关度计算装置,通过计算一个叙词和它的族首词之间的相关度,及两个叙词的族首词之间的相关度,最后把这三个相关度的乘积作为两个叙词间的相关度,克服了现有技术中因无法确定不同的叙词树上的叙词的相关度而直接将不同叙词树上词汇的语义相关度被赋值为0的缺陷,提高了语义相关度计算的精度。The apparatus for calculating the vocabulary relatedness based on the thesaurus provided by the embodiment of the present invention calculates the relatedness between a thesaurus and its family head, and the relatedness between the family heads of the two thesaurus, and finally Taking the product of these three relevancy degrees as the relevancy degree between the two descriptors overcomes the inability to determine the relevancy degree of the descriptors on different thesaurus trees in the prior art and directly converts the semantics of words on different thesaurus trees. The defect that relevance is assigned as 0 improves the accuracy of semantic relevance calculation.

在上述实施例的基础上,可选地,所述计算第一族首词和第二族首词的相关度具体为:采用HowNet方法计算所述第一族首词和所述第二族首词的相关度。On the basis of the above embodiment, optionally, the calculating the correlation between the first family head word and the second family head word is specifically: using the HowNet method to calculate the first family head word and the second family head word word relatedness.

可选地,所述预设的计算方法为:将得到的所述第一叙词和第一族首词的相关度、所述第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度进行乘法运算。Optionally, the preset calculation method is: the correlation between the first thesaurus and the first family head word, the correlation between the second thesaurus and the second family head word, and the obtained A multiplication operation is performed on the relevancy of the first family head word and the second family head word.

具体地,所述进行乘法运算为:Specifically, the multiplication operation is as follows:

Rel(C1,C2)=Rel(C1,O1)×Rel(O1,O2)×Rel(C2,O2);Rel(C1 , C2 )=Rel(C1 ,O1 )×Rel(O1 ,O2 )×Rel(C2 ,O2 );

其中,O1为第一族首词;Wherein, O1 is the first word of the first family;

O2为第二族首词;O2 is the first word of the second family;

C1为第一叙词树;C1 is the first thesaurus tree;

C2为第二叙词树;C2 is the second thesaurus tree;

Rel(C1,O1)为第一叙词树和第一族首词之间的相关度;Rel(C1 ,O1 ) is the correlation between the first thesaurus tree and the first word of the first family;

Rel(C2,O2)为第二叙词树和第二族首词之间的相关度;Rel(C2 ,O2 ) is the correlation between the second thesaurus tree and the second family head word;

Rel(O1,O2)为第一族首词和第二族首词之间的相关度。Rel(O1 ,O2 ) is the degree of relevancy between the first family head word and the second family head word.

其中,Rel(C1,O1)和Rel(C2,O2)通过采用现有技术的算法进行计算,Rel(O1,O2)采用HowNet算法来计算。Wherein, Rel(C1 , O1 ) and Rel(C2 , O2 ) are calculated by using the algorithm of the prior art, and Rel(O1 , O2 ) is calculated by using the HowNet algorithm.

图3为本发明实施例提供的计算机设备的结构框图,如图3所示,该服务器可以包括:处理器(processor)810、通信接口(Communications Interface)820、存储器(memory)830和通信总线840,其中,处理器810,通信接口820,存储器830通过通信总线840完成相互间的通信。处理器810可以调用存储器830中的逻辑指令,以执行如下方法:获取叙词表中的第一叙词、第一族首词、第二叙词和第二族首词,其中,所述第一叙词和所述第一族首词位于第一叙词树,所述第二叙词和第二族首词位于第二叙词树;FIG. 3 is a structural block diagram of a computer device provided by an embodiment of the present invention. As shown in FIG. 3 , the server may include: a processor (processor) 810, a communications interface (Communications Interface) 820, a memory (memory) 830, and acommunication bus 840 , wherein theprocessor 810 , thecommunication interface 820 , and thememory 830 communicate with each other through thecommunication bus 840 . Theprocessor 810 may invoke logic instructions in thememory 830 to perform the following method: obtain the first thesaurus, the first family head word, the second thesaurus and the second family head word in the thesaurus, wherein the first thesaurus A thesaurus and the first family head word are located in the first thesaurus tree, and the second thesaurus and the second family head word are located in the second thesaurus tree;

分别计算第一叙词和第一族首词的相关度、第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度;Calculate the degree of relevancy between the first thesaurus and the first word of the family, the degree of relevancy between the second thesaurus and the second family of words, and the degree of relevancy between the first and the second family of words;

根据预设的计算方法,对所述第一叙词和第一族首词的相关度、所述第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度进行计算,得到所述第一叙词和所述第二叙词的相关度。According to a preset calculation method, the degree of relevancy between the first thesaurus and the first word of the family, the degree of relevancy between the second thesaurus and the The relevancy degree of the family head word is calculated to obtain the relevancy degree of the first descriptor and the second descriptor.

此外,上述的存储器830中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic instructions in thememory 830 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

Translated fromChinese
1.一种基于叙词表的词汇相关度计算方法,其特征在于,包括:1. a word relatedness calculation method based on thesaurus, is characterized in that, comprises:获取叙词表中的第一叙词、第一族首词、第二叙词和第二族首词,其中,所述第一叙词和所述第一族首词位于第一叙词树,所述第二叙词和第二族首词位于第二叙词树;Obtain the first thesaurus, the first word of the family, the second thesaurus, and the second word of the family in the thesaurus, where the first thesaurus and the first word of the first family are located in the first thesaurus tree , the second thesaurus and the second family head are located in the second thesaurus tree;分别计算第一叙词和第一族首词的相关度、第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度;Calculate the degree of relevancy between the first thesaurus and the first word of the family, the degree of relevancy between the second thesaurus and the second family of words, and the degree of relevancy between the first and the second family of words;根据预设的计算方法,对所述第一叙词和第一族首词的相关度、所述第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度进行计算,得到所述第一叙词和所述第二叙词的相关度。According to a preset calculation method, the degree of relevancy between the first thesaurus and the first word of the family, the degree of relevancy between the second thesaurus and the The relevancy degree of the family head word is calculated to obtain the relevancy degree of the first descriptor and the second descriptor.2.根据权利要求1所述的方法,其特征在于,所述计算第一族首词和第二族首词的相关度具体为:2. method according to claim 1, is characterized in that, described calculating the relevancy of the first family head word and the second family head word is specifically:采用HowNet方法计算所述第一族首词和所述第二族首词的相关度。The HowNet method is used to calculate the degree of relevancy between the first family head word and the second family head word.3.根据权利要求1所述的方法,其特征在于,所述预设的计算方法为:将得到的所述第一叙词和第一族首词的相关度、所述第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度进行乘法运算。3. The method according to claim 1, wherein the preset calculation method is: the correlation degree of the obtained first thesaurus and the first family head word, the second thesaurus and the A multiplication operation is performed on the affinity of the second family head word and the affinity between the first family head word and the second family head word.4.根据权利要求3所述的方法,其特征在于,所述进行乘法运算为:4. method according to claim 3, is characterized in that, described carrying out multiplication operation is:Rel(C1,C2)=Rel(C1,O1)×Rel(O1,O2)×Rel(C2,O2);Rel(C1 , C2 )=Rel(C1 ,O1 )×Rel(O1 ,O2 )×Rel(C2 ,O2 );其中,O1为第一族首词;Wherein, O1 is the first word of the first family;O2为第二族首词;O2 is the first word of the second family;C1为第一叙词树;C1 is the first thesaurus tree;C2为第二叙词树;C2 is the second thesaurus tree;Rel(C1,O1)为第一叙词树和第一族首词之间的相关度;Rel(C1 ,O1 ) is the correlation between the first thesaurus tree and the first word of the first family;Rel(C2,O2)为第二叙词树和第二族首词之间的相关度;Rel(C2 ,O2 ) is the correlation between the second thesaurus tree and the second family head word;Rel(O1,O2)为第一族首词和第二族首词之间的相关度。Rel(O1 ,O2 ) is the degree of relevancy between the first family head word and the second family head word.5.根据权利要求3所述的方法,其特征在于,所述方法还包括:对所述第一叙词和所述第二叙词的相关度进行评估。5. The method according to claim 3, wherein the method further comprises: evaluating the degree of relevancy of the first thesaurus and the second thesaurus.6.根据权利要求5所述的方法,其特征在于,所述对所述第一叙词和所述第二叙词的相关度进行评估具体为:6. The method according to claim 5, wherein the evaluating the correlation between the first descriptor and the second descriptor is specifically:采用专家评分的方式,对计算的所述相关度进行评估。The calculated relevancy is evaluated by means of expert scoring.7.一种基于叙词表的词汇相关度计算装置,其特征在于,包括:7. A word relatedness computing device based on thesaurus, characterized in that, comprising:获取模块,用于获取叙词表中的第一叙词、第一族首词、第二叙词和第二族首词,其中,所述第一叙词和所述第一族首词位于第一叙词树,所述第二叙词和第二族首词位于第二叙词树;The obtaining module is used to obtain the first thesaurus, the first family head word, the second thesaurus and the second family head word in the thesaurus table, wherein, the first thesaurus and the first family head word are located in the first thesaurus tree, the second thesaurus and the second family head word are located in the second thesaurus tree;第一计算模块,用于分别计算第一叙词和第一族首词的相关度、第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度;The first calculation module is used to calculate the degree of relevancy between the first thesaurus and the first word of the family, the degree of relevancy between the second thesaurus and the second word of the family, and the first word and the second family of the first word. relevance;第二计算模块,用于根据预设的计算方法,对所述第一叙词和第一族首词的相关度、所述第二叙词和第二族首词的相关度以及所述第一族首词和第二族首词的相关度进行计算,得到所述第一叙词和所述第二叙词的相关度。The second calculation module is configured to, according to a preset calculation method, determine the degree of relevancy between the first thesaurus and the first word of the family, the degree of relevancy between the second thesaurus and the first word of the second family, and the degree of the first The degree of relevancy between the head word of a family and the head word of the second family is calculated to obtain the degree of relevancy of the first descriptor and the second descriptor.8.根据权利要求7所述的装置,其特征在于,所述计算第一族首词和第二族首词的相关度具体为:8. The device according to claim 7, wherein the calculating the correlation of the first family head word and the second family head word is specifically:采用HowNet方法计算所述第一族首词和所述第二族首词的相关度。The HowNet method is used to calculate the degree of relevancy between the first family head word and the second family head word.9.一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1至6任一项所述同步方法的步骤。9. An electronic device, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements any one of claims 1 to 6 when the processor executes the program The steps of the synchronization method described in item .10.一种非暂态计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1至6任一项所述同步方法的步骤。10. A non-transitory computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the synchronization method according to any one of claims 1 to 6 are implemented.
CN201910335423.1A2019-04-242019-04-24Vocabulary relevancy calculation method and device based on narrative tablePendingCN110727745A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910335423.1ACN110727745A (en)2019-04-242019-04-24Vocabulary relevancy calculation method and device based on narrative table

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910335423.1ACN110727745A (en)2019-04-242019-04-24Vocabulary relevancy calculation method and device based on narrative table

Publications (1)

Publication NumberPublication Date
CN110727745Atrue CN110727745A (en)2020-01-24

Family

ID=69217049

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910335423.1APendingCN110727745A (en)2019-04-242019-04-24Vocabulary relevancy calculation method and device based on narrative table

Country Status (1)

CountryLink
CN (1)CN110727745A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106202854A (en)*2015-05-282016-12-07三星Sds株式会社Regulation management method, regulation management device and disease descriptor table generating method
CN107247780A (en)*2017-06-122017-10-13北京理工大学A kind of patent document method for measuring similarity of knowledge based body
CN108170840A (en)*2018-01-152018-06-15浙江大学A kind of domain classification relationship Auto-learning Method of text-oriented

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106202854A (en)*2015-05-282016-12-07三星Sds株式会社Regulation management method, regulation management device and disease descriptor table generating method
CN107247780A (en)*2017-06-122017-10-13北京理工大学A kind of patent document method for measuring similarity of knowledge based body
CN108170840A (en)*2018-01-152018-06-15浙江大学A kind of domain classification relationship Auto-learning Method of text-oriented

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZUGANG C等: "An Approach to Measuring Semantic Relatedness of Geographic Terminologies Using a Thesaurus and Lexical Database Sources", 《INTERNATIONAL JOURNAL OF GEO-INFORMATION》*

Similar Documents

PublicationPublication DateTitle
US11301637B2 (en)Methods, devices, and systems for constructing intelligent knowledge base
JP5513624B2 (en) Retrieving information based on general query attributes
CN103518187B (en) Method and system for information modeling and applications thereof
US8620951B1 (en)Search query results based upon topic
JP6124917B2 (en) Method and apparatus for information retrieval
CN109033101B (en) Label recommendation method and device
US9087044B2 (en)Establishing “is a” relationships for a taxonomy
US11226960B2 (en)Natural-language database interface with automated keyword mapping and join-path inferences
US11374882B2 (en)Intelligent chat channel processor
CN105138511A (en)Method and system for semantically analyzing search keyword
WO2017084362A1 (en)Model generation method, recommendation method and corresponding apparatuses, device and storage medium
CN111400584A (en) Recommended method, apparatus, computer equipment and storage medium for associative words
CN113326420A (en)Question retrieval method, device, electronic equipment and medium
US20170046447A1 (en)Information Category Obtaining Method and Apparatus
CN109992784A (en) A Heterogeneous Network Construction and Distance Measurement Method for Fusing Multimodal Information
EP4404083A1 (en)Automated indexing and extraction of information in digital documents
CN111914564A (en) Method and device for determining text keywords
WO2023217019A1 (en)Text processing method, apparatus, and system, and storage medium and electronic device
US8346775B2 (en)Managing information
CN110727745A (en)Vocabulary relevancy calculation method and device based on narrative table
CN114491232B (en)Information query method and device, electronic equipment and storage medium
CN112115237B (en)Construction method and device of tobacco science and technology literature data recommendation model
CN108427769A (en)A kind of personage's interest tags extracting method based on social networks
CN118093771A (en)Method and device for extracting collapse index keywords
CN116431774A (en)Question answering method and device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
WD01Invention patent application deemed withdrawn after publication

Application publication date:20200124

WD01Invention patent application deemed withdrawn after publication

[8]ページ先頭

©2009-2025 Movatter.jp