Movatterモバイル変換


[0]ホーム

URL:


CN116579297A - Entity linking method and device, storage medium and computer equipment - Google Patents

Entity linking method and device, storage medium and computer equipment
Download PDF

Info

Publication number
CN116579297A
CN116579297ACN202310460767.1ACN202310460767ACN116579297ACN 116579297 ACN116579297 ACN 116579297ACN 202310460767 ACN202310460767 ACN 202310460767ACN 116579297 ACN116579297 ACN 116579297A
Authority
CN
China
Prior art keywords
entity
candidate
input text
linked
description information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310460767.1A
Other languages
Chinese (zh)
Inventor
张倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co LtdfiledCriticalPing An Technology Shenzhen Co Ltd
Priority to CN202310460767.1ApriorityCriticalpatent/CN116579297A/en
Publication of CN116579297ApublicationCriticalpatent/CN116579297A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The invention discloses an entity linking method and device, a storage medium and computer equipment, which belong to the technical field of digital medical treatment and mainly solve the problem of low accuracy of entity linking in the prior art, and comprise the following steps: determining a plurality of candidate entities of an entity to be linked in an input text, wherein the candidate entities are bound with entity description information and entity association relations; performing text matching processing on the input text and the entity description information to obtain the information matching degree between the input text and the entity description information, and determining the context correlation degree between the input text and the entity description information; evaluating the stability of the candidate entity based on the entity association relationship to obtain a stability probability value of the candidate entity; and determining a link evaluation result of the candidate entity based on the information matching degree, the stability probability value and the context correlation degree, and linking the entity to be linked to a target entity name based on the link evaluation result.

Description

Entity linking method and device, storage medium and computer equipment
Technical Field
The present invention relates to the field of digital medical technology, and in particular, to a method and apparatus for entity linking, a storage medium, and a computer device.
Background
Entity linking refers to the process of correctly pointing identified entity objects (e.g., person names, place names, institution names, medicine names, disorder names, etc.) in free text to target entities in a knowledge base. In practical application, entity reference (term) is usually linked to the correct entity (entity) in the knowledge base to realize entity linking, and in the medical field, the use requirement of entity linking is also increasing.
In the medical field, the existing entity linking method based on Chinese data adopts similarity calculation to score the entities in candidate entities, and then the most similar entities are selected as linking contents after sorting. However, the existing method is often affected by insufficient Chinese data resources, such as insufficient digital case data, insufficient diagnosis and treatment record data and the like, and the traditional Chinese data-based entity linking method is insufficient in characteristic mining of entity information, so that the accuracy of the Chinese data-based entity linking is low, and the overall performance of the Chinese data-based entity linking is reduced.
Disclosure of Invention
In view of the above, the present invention provides an entity linking method and apparatus, a storage medium, and a computer device, and aims to solve the problem that the accuracy of entity linking is not high due to insufficient feature mining of entity information in the existing entity linking method based on chinese data.
According to one aspect of the present invention, there is provided an entity linking method, comprising:
determining a plurality of candidate entities of an entity to be linked in an input text, wherein the candidate entities are bound with entity description information and entity association relations;
performing text matching processing on the input text and the entity description information to obtain the information matching degree between the input text and the entity description information, and determining the context correlation degree between the input text and the entity description information;
evaluating the stability of the candidate entity based on the entity association relationship to obtain a stability probability value of the candidate entity;
and determining a link evaluation result of the candidate entity based on the information matching degree, the stability probability value and the context correlation degree, and linking the entity to be linked to a target entity name based on the link evaluation result.
Further, the determining a plurality of candidate entities of the entity to be linked in the input text includes:
identifying all entities in the input text as the entities to be linked;
and constructing an index between the entity to be linked and the entity alias, and inquiring in a preset knowledge base based on the index to determine a plurality of candidate entities of the entity to be linked.
Further, before the text matching process is performed on the input text and the entity description information, the method further includes:
acquiring all the entities to be linked in the input text, and determining the word frequency of the entities to be linked in the input text;
and determining the word weight of the entity to be linked in the input text based on the word frequency, so that text matching processing is performed based on the word weight.
Further, the text matching processing is performed on the input text and the entity description information to obtain the information matching degree between the input text and the entity description information, which comprises the following steps of;
acquiring the entity description information of the candidate entity;
adding the word weight to the input text to obtain a weighted input text;
and carrying out text matching processing on the weighted input text and the entity description information based on a text matching model to obtain the information matching degree between the weighted input text and the entity description information.
Further, the determining the contextual relevance between the input text and the entity description information includes:
coding the input text and the entity description information to obtain coding information representing text semantics;
And calculating to obtain the context correlation degree between the input text and the entity description information based on the coding information.
Further, the evaluating the stability of the candidate entity based on the entity association relationship to obtain a stability probability value of the candidate entity includes:
acquiring the entity association relation of the candidate entity;
performing random access processing on the candidate entity based on a random walk model so as to enable the random access information of the candidate entity to be converged to a stable distribution;
a stationary probability value for the candidate entity after convergence to a stationary distribution is determined.
Further, the determining the link evaluation result of the candidate entity based on the information matching degree, the stability probability value and the context correlation degree includes:
performing ranking scoring processing on the information matching degree, the stable probability value and the context characteristic based on a ranking model to obtain an evaluation score of the candidate entity;
and determining a target candidate entity of the entity to be linked based on the evaluation score so that the entity to be linked is linked to the target candidate entity.
According to another aspect of the present invention, there is provided an entity linking apparatus comprising:
The candidate entity determining module is used for determining a plurality of candidate entities of the entity to be linked in the input text, wherein the candidate entities are bound with entity description information and entity association relations;
the text matching module is used for carrying out text matching processing on the input text and the entity description information to obtain the information matching degree between the input text and the entity description information, and determining the context correlation degree between the input text and the entity description information;
the stability evaluation module is used for evaluating the stability of the candidate entity based on the entity association relation to obtain a stability probability value of the candidate entity;
and the entity link module is used for determining a link evaluation result of the candidate entity based on the information matching degree, the stability probability value and the context correlation degree and linking the entity to be linked to a target entity name based on the link evaluation result.
Further, the candidate entity determining module further includes:
the entity identification unit is used for identifying all entities in the input text as the entities to be linked;
and the query unit is used for constructing an index between the entity to be linked and the entity alias, and querying in a preset knowledge base based on the index to determine a plurality of candidate entities of the entity to be linked.
Further, the device further comprises:
the word weight determining module is used for acquiring all the entities to be linked in the input text and determining word frequency of the entities to be linked in the input text;
and determining the word weight of the entity to be linked in the input text based on the word frequency, so that text matching processing is performed based on the word weight.
Further, the text matching module further includes:
the matching degree determining unit is used for acquiring the entity description information of the candidate entity;
adding the word weight to the input text to obtain a weighted input text;
and carrying out text matching processing on the weighted input text and the entity description information based on a text matching model to obtain the information matching degree between the weighted input text and the entity description information.
Further, the text matching module further includes:
the relevancy determination unit is used for carrying out coding processing on the input text and the entity description information to obtain coding information representing text semantics;
and calculating to obtain the context correlation degree between the input text and the entity description information based on the coding information.
Further, the smoothness evaluation module further includes:
an association relation acquisition unit, configured to acquire the entity association relation of the candidate entity;
a random access processing unit, configured to perform random access processing on the candidate entity based on a random walk model, so that random access information of the candidate entity converges to a smooth distribution;
and the stability probability determining unit is used for determining the stability probability value of the candidate entity after convergence to the stability distribution.
Further, the entity linking module further includes:
the scoring processing unit is used for performing ranking scoring processing on the information matching degree, the stability probability value and the context characteristic based on a ranking model to obtain an evaluation score of the candidate entity;
and the entity linking unit is used for determining target candidate entities of the entity to be linked based on the evaluation score so that the entity to be linked is linked to the target candidate entities.
According to still another aspect of the present invention, there is provided a storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the entity linking method described above.
According to yet another aspect of the present invention, there is provided a computer device comprising a processor, a memory, a communication interface and a communication bus, said processor, said memory and said communication interface completing communication with each other via said communication bus;
the memory is configured to store at least one executable instruction, where the executable instruction causes the processor to perform operations corresponding to the entity linking method.
By means of the technical scheme, the technical scheme provided by the embodiment of the invention has at least the following advantages:
compared with the prior art that candidate entities are ordered based on similarity calculation results in the medical field, the method and the device for entity linking are characterized in that a plurality of candidate entities of the entity to be linked in an input text are determined, and entity description information and entity association relations are bound to the candidate entities; performing text matching processing on the input text and the entity description information to obtain the information matching degree between the input text and the entity description information, and determining the context correlation degree between the input text and the entity description information; evaluating the stability of the candidate entity based on the entity association relationship to obtain a stability probability value of the candidate entity; and determining a link evaluation result of the candidate entity based on the information matching degree, the stability probability value and the context correlation degree, and linking the entity to be linked to a target entity name based on the link evaluation result. The invention not only adopts text matching processing to fully mine the information matching degree of the entity to be linked and the candidate entity, but also fully mine the context correlation degree between the entity to be linked and the co-occurrence entity. In addition, based on the association relation of the candidate entities, entity consistency characteristics of the entities to be linked are mined, and under the condition that Chinese data resources in the medical field are insufficient, the accuracy of entity linking is improved based on the fully mined characteristics.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
fig. 1 is a schematic flow chart of an entity linking method according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating another method for entity linking according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of another method for entity linking according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of another method for entity linking according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating another entity linking method according to an embodiment of the present invention;
Fig. 6 is a schematic structural diagram of an entity linking device according to an embodiment of the present invention;
fig. 7 shows a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the invention provides an entity linking method, as shown in fig. 1, which comprises the following steps:
101. determining a plurality of candidate entities of the entity to be linked in the input text;
in the embodiment of the invention, after the current execution end searches the entity from the input text, the searched entity is used as the entity to be linked, if the input text is 'the advantage of curing adult diabetes by metformin' in the medical field, wherein the searched entity has the medicines of 'metformin' and the diseases of 'adult diabetes', the 'metformin' and the 'adult diabetes' are used as the entity to be linked, and the embodiment of the invention is not particularly limited. The current execution end determines a plurality of candidate entities of the entity to be linked from the existing knowledge base in a searching mode, wherein the candidate entities are entities related to the entity to be linked, and entity description information and entity association relations are bound. The entity description information includes a plurality of attributes and attribute values of the entity, where the attributes may be attributes such as names, categories, indications, taboos, and the like of the entity, and embodiments of the present invention are not limited in detail. The attribute value is content information corresponding to the attribute, for example, the attribute value of a certain name attribute is metformin, the attribute value of a category attribute is a drug, and the embodiment of the invention is not particularly limited.
It should be noted that, the entity association relationship is used to represent whether the multiple candidate entities are related or not, and is generally represented in a form of a knowledge graph, in the knowledge graph, if there is an association relationship between two entities, an edge is used to connect, and if there is no association relationship between two entities, there is no edge between two entities. In addition, other forms of representing the association relationship of the entities are also protected by the present invention, and the embodiment of the present invention is not specifically limited.
102. Performing text matching processing on the input text and the entity description information to obtain the information matching degree between the input text and the entity description information, and determining the context correlation degree between the input text and the entity description information;
in the embodiment of the invention, because the entity description information comprises a plurality of attributes and attribute values of the entity, in order to fully mine the entity description information of the candidate entity, the current execution end carries out text matching processing on the input text and the entity description information of the candidate entity, so as to obtain the information matching degree between the input text and the entity description information of the candidate entity. The text matching process aims at researching the relation between two pieces of text, and the process is a process for calculating similarity Based on text characterization and comprises two matching strategies of a Representation-Based (presentation-Based) and an interactive-Based (Interaction-Based). Specific text matching processing methods include a semantic model DSSM (Deep Structured Semantic Model) based on a deep network, a pre-trained language characterization model BERT (Bidirectional Encoder Representation from Transformers), an enhanced sequential inference model ESIM (Enhanced Sequential Inference Model), and the like, and embodiments of the present invention are not limited in detail. In the embodiment of the invention, the ESIM is optimized to carry out text matching processing on the input text and the entity description information.
In addition, in order to further mine entity description information of the candidate entity, the current execution end not only carries out matching processing on text characterization, but also carries out relevance operation processing by combining context semantic information of the text in the medical field. The current execution end determines the context relativity between the input text and the entity description information through the context semantic information of the entity description information of the candidate entity. Specifically, the method that the current execution end can perform feature fusion on the semantic information when the context semantic information is acquired is not particularly limited in the embodiment of the present invention.
103. Evaluating the stability of the candidate entity based on the entity association relationship to obtain a stability probability value of the candidate entity;
in the embodiment of the invention, considering that the entity to be linked in the same medical document has a certain semantic association, if the candidate entity corresponding to a certain entity to be linked in the document has more association with the candidate entities of other entities to be linked, the candidate entity has a higher probability of being the searched target entity. In addition, some more important entities may exist in the document, and candidate entities with closer important entity relationships have a greater probability of being target entities. Therefore, the current execution terminal evaluates the stability of the candidate entity based on the entity association relation between the candidate entities to obtain the stability probability value of the candidate entity. The stationary probability value is used to characterize the closeness of the association between candidate entities. For example, in an embodiment of the present invention, if one web page is linked by many other web pages, it is more important to indicate that the web page is important; and if a web page with high importance is linked to another web page, it can be considered that the importance of the linked web page is also high, and the embodiment of the present invention is not particularly limited.
104. And determining a link evaluation result of the candidate entity based on the information matching degree, the stability probability value and the context correlation degree, and linking the entity to be linked to a target entity name based on the link evaluation result.
In the embodiment of the invention, the current execution end digs 3 features obtained after the steps 101, 102 and 103: the information matching degree, the stability probability value and the context correlation degree are used as key features for determining the target entity to evaluate the candidate entity, the specific evaluation form can be a link evaluation result obtained directly through a weighted average calculation mode, or a link evaluation result obtained through a mathematical model calculation evaluation score establishment mode, and the embodiment of the invention is not particularly limited. The current execution end links the entity to be linked to the target entity name based on the link evaluation result. For example, the entity to be linked is linked to the entity name of the candidate entity with the highest information matching degree, stability probability value and context correlation weighted average calculated value, or the entity name of the candidate entity with the highest evaluation score of the mathematical model.
Further, as a refinement and extension of the foregoing embodiment, in order to more conveniently and quickly find candidate entities of the entity to be linked, another entity linking method is provided, where the step of determining multiple candidate entities of the entity to be linked in the input text includes:
identifying all entities in the input text as the entities to be linked;
and constructing an index between the entity to be linked and the entity alias, and inquiring in a preset knowledge base based on the index to determine a plurality of candidate entities of the entity to be linked.
In the embodiment of the invention, the current execution end performs entity identification processing on the input text, and identifies all entities in the input text as entities to be linked. The entity recognition processing refers to a process of performing intelligent recognition by adopting an entity recognition model after word segmentation processing is performed on an input text, so that an entity with a specific meaning in the input text is recognized, and the entity recognition processing generally comprises two parts: (1) entity boundary identification; (2) The entity category is determined, and mainly includes a person name, a place name, an organization name, a medicine name, a disease name or other proper nouns, and embodiments of the present invention are not limited in detail. Therefore, the entity linking method in the embodiment of the invention is suitable for application scenarios such as financial services, medical services, electronic commerce and the like, and the embodiment of the invention is not particularly limited.
It should be noted that, before entity query is performed, the current execution end first constructs an index between the entity to be linked and the entity alias, where the entity alias is that the same entity is represented by other names, for example, the alias of the drug name "metformin" in the medical field has metformin, metformin hydrochloride, metformin, etc. all used to represent the same entity, and the embodiment of the present invention is not limited specifically. In particular, when the index is constructed, the index can be constructed by manually constructing the metformin and the allowances such as metformin hydrochloride, metformin hydrochloride and the like, and in addition, the index can also be constructed according to the means such as similarity and the like, and the embodiment of the invention is not particularly limited. The candidate entity corresponding to the entity to be linked can be conveniently and rapidly searched from a preset knowledge base through the constructed index.
Further, as a refinement and extension of the foregoing embodiment, in order to embody importance of each entity to be linked in the input text, another entity linking method is provided, as shown in fig. 2, before performing text matching processing on the input text and the entity description information, the method further includes:
201. Acquiring all the entities to be linked in the input text, and determining the word frequency of the entities to be linked in the input text;
202. and determining the word weight of the entity to be linked in the input text based on the word frequency, so that text matching processing is performed based on the word weight.
In the embodiment of the invention, the current execution end obtains the word frequency of the entity to be linked in the input text by counting the occurrence times of the entity to be linked in the input text, and the calculation formula is as follows:
wherein TF isS The word frequency of the entity S to be linked is represented.
It should be noted that, the word frequency is used to represent how frequently the entity to be linked appears in the text, so as to reflect the importance of the entity to be linked. The current execution end distributes corresponding weight values for the entity to be linked according to the word frequency of the entity to be linked, namely, determines the word weight of the entity to be linked in the input text. For example, the weight value allocated to the entity to be linked with high frequency of the relative word is large, and the weight value allocated to the entity to be linked with low frequency of the relative word is small, and the embodiment of the invention is not particularly limited.
In the embodiment of the invention, a tf-idf weighting method is preferred to determine the word weight of the entity to be linked in the input text.
Further, as a refinement and expansion of the specific implementation manner of the foregoing embodiment, in order to quickly and efficiently determine an information matching degree between an input text and entity description information, another entity linking method is provided, as shown in fig. 3, and the steps of performing text matching processing on the input text and the entity description information to obtain the information matching degree between the input text and the entity description information include;
301. acquiring the entity description information of the candidate entity;
302. adding the word weight to the input text to obtain a weighted input text;
303. and carrying out text matching processing on the weighted input text and the entity description information based on a text matching model to obtain the information matching degree between the weighted input text and the entity description information.
In the embodiment of the present invention, the current executing end obtains entity description information of the candidate entity, where the entity description information includes a plurality of attributes and attribute values of the entity, and the attributes may be attributes such as names, categories, indications, and taboos of the entity. The attribute value is content information corresponding to the attribute, for example, the attribute value of a certain name attribute is metformin, the attribute value of a category attribute is a drug, and the embodiment of the invention is not particularly limited. The current execution end adds the word weight determined in the step 201 and the step 202 into the input text to obtain a weighted input text, and then performs text matching processing on the weighted input text and entity description information of the candidate entity based on a text matching model to obtain the information matching degree between the weighted input text and the entity description information.
It should be noted that the text matching process aims at researching the relationship between two pieces of text, and the process is a process of calculating similarity Based on text characterization, and includes two matching strategies of a Representation-Based (presentation-Based) and an interactive-Based (Interaction-Based). Specific text matching processing methods include a semantic model DSSM (Deep Structured Semantic Model) based on a deep network, a pre-trained language characterization model BERT (Bidirectional Encoder Representation from Transformers), an enhanced sequential inference model ESIM (Enhanced Sequential Inference Model), and the like, and embodiments of the present invention are not limited in detail. In the embodiment of the invention, the ESIM is optimized to carry out text matching processing on the input text and the entity description information.
Further, as a refinement and extension of the foregoing embodiment, in order to fully mine semantic features of the entity description information, another entity linking method is provided, as shown in fig. 4, where the step of determining a contextual relevance between the input text and the entity description information includes:
401. coding the input text and the entity description information to obtain coding information representing text semantics;
In the embodiment of the invention, the current execution end carries out coding processing on the input text and the entity description information of the corresponding candidate entity to obtain the coding information representing the text semantics, thereby realizing the semantic enhancement of the input text and the entity description information.
It should be noted that, in the method for implementing semantic enhancement by using the encoding process, a pre-trained language characterization model BERT is preferred, the Attention mechanism Attention of the BERT model utilizes context information to enhance the semantic representation of the target word, the Attention mechanism takes the semantic vector representation of the entity to be linked and each context word as input, firstly, the Query vector representation of the entity to be linked, the Key vector representation of each context word and the original Value representation of each word of the entity to be linked and the context are obtained through linear transformation, then the similarity of the Query vector and each Key vector is calculated as weight, the Value vector of the entity to be linked and the Value vector of each context word are weighted and fused, and the obtained Value vector is output as the Attention mechanism, thus obtaining the encoding information of the enhanced semantics of the entity to be linked. By adopting the method, the entity description information is subjected to semantic enhancement to obtain the coding information of the entity description information, and the embodiment of the invention is not particularly limited.
402. And calculating to obtain the context correlation degree between the input text and the entity description information based on the coding information.
In the embodiment of the invention, the current execution end performs the integration operation on the coding information of the entity to be linked and the entity description information to obtain the context correlation degree between the entity to be linked and the entity description information. The integration operation is used for integrating the coding information representing the text semantics into a value, for example, the full link layer is used for integrating the coding information, and the embodiment of the invention is not particularly limited. The integrated values are used in embodiments of the present invention to characterize the contextual relevance between information.
Further, as a refinement and extension of the foregoing embodiment, in order to measure importance of a candidate entity according to a probability that the candidate entity is linked, another entity linking method is provided, as shown in fig. 5, where the step of evaluating stability of the candidate entity based on the entity association relationship to obtain a stability probability value of the candidate entity includes:
501. acquiring the entity association relation of the candidate entity;
in the embodiment of the invention, the current execution end acquires the entity association relationship of the candidate entities, wherein the entity association relationship is used for representing whether a plurality of candidate entities are related or not, and is generally expressed in a form of a knowledge graph, in the knowledge graph, if the two entities have the association relationship, the two entities are connected by edges, and if the two entities have no association relationship, the two entities have no edges. In addition, other forms of representing the association relationship of the entities are also protected by the embodiments of the present invention, and the embodiments of the present invention are not particularly limited.
502. Performing random access processing on the candidate entity based on a random walk model so as to enable the random access information of the candidate entity to be converged to a stable distribution;
in the embodiment of the invention, the current execution end performs random access processing on candidate entities through a random walk model, and the process can be that entity association relations are represented by undirected graphs in a knowledge graph, and initial values are attached to each side in the undirected graphs, wherein the initial values of each side can be equal, namely the importance of each entity is considered to be consistent in the initial stage. And adopting a random walk model to carry out random access on the candidate entities in the undirected graph, namely carrying out random jump among the candidate entities in the undirected graph during access. The current execution end records the probability of random jump among candidate entities, and the random jump access is stopped in the undirected graph after the weight values of all the edges in the undirected graph are converged to stable distribution by calculating the PageRank value as the weight of each edge in the undirected graph. The process of calculating the PageRank value is as follows:
wherein em,i For any candidate entity i, PR (e) of the entity m to be linked in the input textm,i ) PageRank value of any candidate entity i which is the entity m to be linked; l (e)m ) For the candidate entity set of the entity m to be linked, PR (e) is set L (em ) PageRank value of the candidate entity; l (e)m ) I is the candidate entity set L (e) for the entity m to be linkedm ) D is the set damping coefficient.
503. A stationary probability value for the candidate entity after convergence to a stationary distribution is determined.
In the embodiment of the invention, the current execution end takes the PageRank value calculated by the access probability of random jump as the stable probability value of the candidate entity, which is the weight value of each edge converged to the stable distribution.
Further, as a refinement and extension of the foregoing embodiment, in order to quickly and accurately evaluate a candidate entity, further improve efficiency of entity linking, another entity linking method is provided, where the step of determining a link evaluation result of the candidate entity based on the information matching degree, the stability probability value, and the context correlation degree includes:
performing ranking scoring processing on the information matching degree, the stable probability value and the context characteristic based on a ranking model to obtain an evaluation score of the candidate entity;
and determining a target candidate entity of the entity to be linked based on the evaluation score so that the entity to be linked is linked to the target candidate entity.
In the embodiment of the invention, the current execution end evaluates the candidate entity by taking the information matching degree, the stability probability value and the context correlation degree as key technical characteristics of the determined target entity, wherein the specific evaluation form is that the candidate entity is subjected to the model operation by establishing a sorting model, the candidate entity is subjected to the sorting processing by the sorting model, and the current execution end obtains the evaluation score of the candidate entity based on the sorting of the candidate entity. For example, the candidate entity ranked at the forefront has the highest evaluation score, and the candidate entity ranked at the last has the lowest evaluation score, which is not particularly limited in the embodiment of the present invention. The ranking model includes an artificial neural network model ANN (Artificial Neural Network), a recurrent neural network model RNN (Recurrent Neural Network), a deep neural network model DNN (Deep Neural Network), and the like, and embodiments of the present invention are not limited in detail. And the current execution end determines the candidate entity with the highest evaluation score as a target candidate entity, and links the entity to be linked to the target candidate entity.
Compared with the prior art that candidate entities are ordered based on similarity calculation results, the entity linking method provided by the embodiment of the invention has the advantages that a plurality of candidate entities of the entities to be linked in an input text are determined, and entity description information and entity association relations are bound to the candidate entities; performing text matching processing on the input text and the entity description information to obtain the information matching degree between the input text and the entity description information, and determining the context correlation degree between the input text and the entity description information; evaluating the stability of the candidate entity based on the entity association relationship to obtain a stability probability value of the candidate entity; and determining a link evaluation result of the candidate entity based on the information matching degree, the stability probability value and the context correlation degree, and linking the entity to be linked to a target entity name based on the link evaluation result. The invention not only adopts text matching processing to fully mine the information matching degree of the entity to be linked and the candidate entity, but also fully mine the context correlation degree between the entity to be linked and the co-occurrence entity. In addition, based on the association relation of the candidate entities, entity consistency characteristics of the entities to be linked are mined, and under the condition that Chinese data resources in the medical field are insufficient, the accuracy of entity linking is improved based on the fully mined characteristics.
As an implementation of the method shown in fig. 1, an embodiment of the present invention provides an entity linking device, as shown in fig. 6, where the entity linking device includes:
a candidate entity determining module 61, configured to determine a plurality of candidate entities of an entity to be linked in an input text, where the candidate entities are bound with entity description information and entity association relationships;
the text matching module 62 is configured to perform text matching processing on the input text and the entity description information, obtain an information matching degree between the input text and the entity description information, and determine a context correlation degree between the input text and the entity description information;
a stability evaluation module 63, configured to evaluate the stability of the candidate entity based on the entity association relationship, to obtain a stability probability value of the candidate entity;
the entity linking module 64 determines a link evaluation result of the candidate entity based on the information matching degree, the stationary probability value, and the context correlation degree, and links the entity to be linked to a target entity name based on the link evaluation result.
Further, the candidate entity determining module 61 further includes:
the entity identification unit is used for identifying all entities in the input text as the entities to be linked;
And the query unit is used for constructing an index between the entity to be linked and the entity alias, and querying in a preset knowledge base based on the index to determine a plurality of candidate entities of the entity to be linked.
Further, the device further comprises:
the word weight determining module is used for acquiring all the entities to be linked in the input text and determining word frequency of the entities to be linked in the input text;
and determining the word weight of the entity to be linked in the input text based on the word frequency, so that text matching processing is performed based on the word weight.
Further, the text matching module 62 further includes:
the matching degree determining unit is used for acquiring the entity description information of the candidate entity;
adding the word weight to the input text to obtain a weighted input text;
and carrying out text matching processing on the weighted input text and the entity description information based on a text matching model to obtain the information matching degree between the weighted input text and the entity description information.
Further, the text matching module 62 further includes:
the relevancy determination unit is used for carrying out coding processing on the input text and the entity description information to obtain coding information representing text semantics;
And calculating to obtain the context correlation degree between the input text and the entity description information based on the coding information.
Further, the smoothness evaluation module 63 further includes:
an association relation acquisition unit, configured to acquire the entity association relation of the candidate entity;
a random access processing unit, configured to perform random access processing on the candidate entity based on a random walk model, so that random access information of the candidate entity converges to a smooth distribution;
and the stability probability determining unit is used for determining the stability probability value of the candidate entity after convergence to the stability distribution.
Further, the entity linking module 64 further includes:
the scoring processing unit is used for performing ranking scoring processing on the information matching degree, the stability probability value and the context characteristic based on a ranking model to obtain an evaluation score of the candidate entity;
and the entity linking unit is used for determining target candidate entities of the entity to be linked based on the evaluation score so that the entity to be linked is linked to the target candidate entities.
Compared with the prior art that candidate entities are ordered based on similarity calculation results, the entity linking device provided by the embodiment of the invention has the advantages that a plurality of candidate entities of the entities to be linked in an input text are determined, and entity description information and entity association relations are bound to the candidate entities; performing text matching processing on the input text and the entity description information to obtain the information matching degree between the input text and the entity description information, and determining the context correlation degree between the input text and the entity description information; evaluating the stability of the candidate entity based on the entity association relationship to obtain a stability probability value of the candidate entity; and determining a link evaluation result of the candidate entity based on the information matching degree, the stability probability value and the context correlation degree, and linking the entity to be linked to a target entity name based on the link evaluation result. The invention not only adopts text matching processing to fully mine the information matching degree of the entity to be linked and the candidate entity, but also fully mine the context correlation degree between the entity to be linked and the co-occurrence entity. In addition, based on the association relation of the candidate entities, entity consistency characteristics of the entities to be linked are mined, and under the condition that Chinese data resources in the medical field are insufficient, the accuracy of entity linking is improved based on the fully mined characteristics.
According to one embodiment of the present invention, there is provided a storage medium storing at least one executable instruction for performing the entity linking method in any of the above method embodiments.
FIG. 7 is a schematic structural diagram of another computer device according to an embodiment of the present invention, and the specific embodiment of the present invention is not limited to the specific implementation of the computer device.
As shown in fig. 7, the computer device may include: a processor 702, a communication interface (Communications Interface), a memory 706, and a communication bus 708.
Wherein: processor 702, communication interface 704, and memory 706 perform communication with each other via a communication bus 708.
A communication interface 704 for communicating with network elements of other devices, such as clients or other servers.
The processor 702 is configured to execute the program 710, and may specifically perform relevant steps in the above-described entity linking method embodiment.
In particular, program 710 may include program code including computer-operating instructions.
The processor 702 may be a Central Processing Unit (CPU), or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included in the computer device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.
Memory 706 for storing programs 710. The memory 706 may comprise high-speed RAM memory or may further comprise non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 710 may be specifically configured to cause the processor 702 to:
determining a plurality of candidate entities of an entity to be linked in an input text, wherein the candidate entities are bound with entity description information and entity association relations;
performing text matching processing on the input text and the entity description information to obtain the information matching degree between the input text and the entity description information, and determining the context correlation degree between the input text and the entity description information;
evaluating the stability of the candidate entity based on the entity association relationship to obtain a stability probability value of the candidate entity;
and determining a link evaluation result of the candidate entity based on the information matching degree, the stability probability value and the context correlation degree, and linking the entity to be linked to a target entity name based on the link evaluation result.
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

CN202310460767.1A2023-04-232023-04-23Entity linking method and device, storage medium and computer equipmentPendingCN116579297A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310460767.1ACN116579297A (en)2023-04-232023-04-23Entity linking method and device, storage medium and computer equipment

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310460767.1ACN116579297A (en)2023-04-232023-04-23Entity linking method and device, storage medium and computer equipment

Publications (1)

Publication NumberPublication Date
CN116579297Atrue CN116579297A (en)2023-08-11

Family

ID=87542383

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310460767.1APendingCN116579297A (en)2023-04-232023-04-23Entity linking method and device, storage medium and computer equipment

Country Status (1)

CountryLink
CN (1)CN116579297A (en)

Similar Documents

PublicationPublication DateTitle
CN110929038B (en)Knowledge graph-based entity linking method, device, equipment and storage medium
US8819047B2 (en)Fact verification engine
US7636714B1 (en)Determining query term synonyms within query context
KR101109236B1 (en) Suggest related terms for multi-meaning queries
US9311823B2 (en)Caching natural language questions and results in a question and answer system
US8321424B2 (en)Bipartite graph reinforcement modeling to annotate web images
US7636713B2 (en)Using activation paths to cluster proximity query results
CN112115232A (en) A data error correction method, device and server
US20100241647A1 (en)Context-Aware Query Recommendations
EP1995669A1 (en)Ontology-content-based filtering method for personalized newspapers
CN104899322A (en)Search engine and implementation method thereof
US11893537B2 (en)Linguistic analysis of seed documents and peer groups
KR20060045786A (en) Validate relevance between keywords and website content
CN110990533A (en)Method and device for determining standard text corresponding to query text
US20120158716A1 (en)Image object retrieval based on aggregation of visual annotations
US20200065395A1 (en)Efficient leaf invalidation for query execution
CN105975459A (en)Lexical item weight labeling method and device
CN113505196A (en)Part-of-speech-based text retrieval method and device, electronic equipment and storage medium
CN115544225A (en)Digital archive information association retrieval method based on semantics
US12380220B2 (en)Automated attribute scraping for security feature implementation prediction
US20150106376A1 (en)Document tagging and retrieval using entity specifiers
CN112214511A (en) An API Recommendation Method Based on WTP-WCD Algorithm
CN119557500B (en) A method and system for accurate search of Internet massive data based on AI technology
CN115203514A (en)Commodity query redirection method and device, equipment, medium and product thereof
CN119719349A (en) User question recommendation method, device, electronic device and readable storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp