Movatterモバイル変換


[0]ホーム

URL:


CN113868406A - Search method, search system, and computer-readable storage medium - Google Patents

Search method, search system, and computer-readable storage medium
Download PDF

Info

Publication number
CN113868406A
CN113868406ACN202111450667.8ACN202111450667ACN113868406ACN 113868406 ACN113868406 ACN 113868406ACN 202111450667 ACN202111450667 ACN 202111450667ACN 113868406 ACN113868406 ACN 113868406A
Authority
CN
China
Prior art keywords
candidate
prediction
label
tags
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111450667.8A
Other languages
Chinese (zh)
Other versions
CN113868406B (en
Inventor
余忠庆
冯大辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nocode Tech Co ltd
Original Assignee
Nocode Tech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nocode Tech Co ltdfiledCriticalNocode Tech Co ltd
Priority to CN202111450667.8ApriorityCriticalpatent/CN113868406B/en
Publication of CN113868406ApublicationCriticalpatent/CN113868406A/en
Application grantedgrantedCritical
Publication of CN113868406BpublicationCriticalpatent/CN113868406B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention discloses a searching method, a searching system and a computer readable storage medium, wherein the searching method comprises the following steps: obtaining a query intention corresponding to the target problem; predicting an entity label corresponding to the search result based on the query intention, obtaining a prediction label and obtaining a corresponding prediction probability; searching answer documents corresponding to the target questions, and taking the searched answer documents as candidate documents; extracting entity tags in each candidate document based on the query intention to obtain associated tags corresponding to the candidate documents; determining candidate labels based on the predicted labels and the associated labels, and calculating the importance degree score of each candidate label based on the predicted probability; and outputting each candidate label and the candidate document corresponding to each candidate label based on the importance degree score. The invention ensures that the final search result is concise and has interpretability by fusing the search result and the prediction result, and is fit for the real intention of the user.

Description

Search method, search system, and computer-readable storage medium
Technical Field
The present invention relates to the field of data processing, and in particular, to a search method, a search system, and a computer-readable storage medium.
Background
The current search scheme is that keyword matching is performed on each answer document based on search content input by a user to obtain answer documents related to the search content, and the obtained answer documents are output as search results;
however, in practical use, the search content input by the user is often a question rather than a search keyword, and the answer related to the question is often related to a plurality of entities, for example, in the vertical medical field, the user often searches for a related disease according to symptoms, for example, the question "how do finger pain" is, and the answer related to a plurality of disease entities such as tenosynovitis, gout, arthritis, and the like.
Disclosure of Invention
The invention provides a fusion type search technology aiming at the situation that the existing search scheme is not suitable for searching based on problems and the search result relates to a plurality of different answers.
In order to solve the technical problem, the invention is solved by the following technical scheme:
a searching method is used for searching based on a target problem and obtaining a corresponding searching result, and comprises the following steps:
analyzing the intention of the target problem to obtain a corresponding query intention, wherein the query intention is used for indicating the entity type corresponding to the search result, for example, a user wants to know what disease may be according to symptoms, the query intention is disease prediction, and the search result is a corresponding disease entity;
predicting an entity label corresponding to the search result based on the query intention, obtaining a prediction label, and obtaining a prediction probability corresponding to the prediction label, namely performing entity prediction;
searching answer documents corresponding to the target questions, and taking the searched answer documents as candidate documents, namely performing document searching;
extracting entity tags in each candidate document based on the query intention to obtain corresponding associated tags;
determining candidate tags based on the predicted tags and the associated tags, calculating the importance scores of the candidate tags based on the predicted probability, taking entity tags which are the predicted tags or the associated tags as the candidate tags, and calculating the importance scores according to the predicted probability corresponding to the candidate tags;
extracting candidate documents corresponding to the candidate labels, obtaining document lists corresponding to the candidate labels one by one, taking the corresponding candidate labels and the document lists as search results, and outputting the search results based on the importance scores.
Note: the written order of steps does not represent the order of execution, and those skilled in the art will understand that when the execution condition of the steps is satisfied, such as obtaining the data required for executing the steps, the corresponding steps can be executed.
Existing conventional search methods include an index formula and a prediction formula:
the retrieval mode is a mode of document retrieval, wherein answer documents related to the target question are retrieved, the obtained answer documents are returned to the user, but the retrieved answer document information is scattered, and the user is required to automatically summarize and analyze the answer documents to obtain exact answers;
the prediction formula is to predict the entity, and output the entity and probability corresponding to the predicted search result based on a preset prediction model, but the scheme can only provide the prediction result for the user, but the interpretability is lacked, and the user is difficult to determine the accuracy of the prediction result;
aiming at the defects of the two search modes, a technical person in the field usually adopts a scheme of predicting firstly and then retrieving to search, namely, an entity corresponding to a search result is predicted firstly, and then an answer document corresponding to the entity is extracted and output; however, this scheme is greatly affected by the prediction result, and serial processing causes error accumulation, resulting in a higher error rate.
In the method, the corresponding candidate tags and the corresponding document list are used as the search results, compared with the existing search scheme, the search results are concise and interpretable, and the output answer documents are related to the target problem, so that the comparison and analysis of the user are facilitated.
According to the method and the device, the importance degree score of the candidate label is calculated based on the prediction probability and the distribution condition of the associated label, the combination of the retrieval result and the prediction result is realized, and the search result is more accurate and stable.
In summary, the search results (the candidate documents and the associated tags corresponding to the candidate documents) and the prediction results (the prediction tags are the prediction probabilities corresponding to the prediction tags) are skillfully merged by the design of the entity tags, so that the obtained search results are concise and interpretable, and the ranking of the search results based on the obtained importance scores is more suitable for the real intention of the user.
As an implementable embodiment:
taking the candidate label and the candidate document as nodes, taking the relation between the candidate label and the candidate document as an edge, and establishing an undirected graph, namely establishing connection between the candidate document and each associated label of the candidate document;
calculating importance scores corresponding to all nodes in the undirected graph based on the prediction probability;
calculating an importance score as based on the calculation formula s = (1 + a) × (N + 1) ÷ (N + 1); wherein s represents the importance score, N represents the sum of the number of associated tags corresponding to each candidate document, N represents the number of times that the candidate tags serve as the associated tags, a represents the prediction probability corresponding to the candidate tags, that is, based on the distribution condition of the associated tags, the prediction probability is simply used as the weighted weight of the corresponding candidate tags to calculate the importance score of the candidate tags, and the scheme can only correct the error of entity prediction through the distribution condition of the associated tags;
for example, when the prediction label corresponding to "how to get back from finger pain" is gout, tenosynovitis, and cold, the answer entity of arthritis is lacked, and the cold belongs to the entity predicted incorrectly, in this case, the lacked entity label of arthritis may be supplemented based on the association label of each candidate document, and the entity label of the cold predicted incorrectly may be corrected;
however, since the number of the associated tags is large, and the distribution of the associated tags is influenced by the number of the answer texts, for example, the number of the answer texts is small due to gout, and even if the prediction probability corresponding to gout is high, the finally obtained importance score is still low, the scheme can reduce the accuracy of the search result when the prediction is accurate;
based on the defects, the candidate labels and the candidate documents are used as nodes to construct the undirected graph, and the weight is transmitted based on the interaction between the entities and the documents, so that the obtained importance score is more accurate and stable.
Further:
taking a preset initial value as the initial weight of the node corresponding to each candidate document;
when the prediction probability corresponding to the candidate label is greater than or equal to a preset probability threshold value, taking the prediction probability as the initial weight of the node corresponding to the candidate label, otherwise, taking a preset initial value as the initial weight of the node corresponding to the candidate label;
and carrying out personalized PageRank iterative computation based on the undirected graph and the initial weight of each node in the undirected graph to obtain the importance score corresponding to each node in the undirected graph.
Two weight evaluation algorithms, namely an individualized PageRank algorithm and a classical PageRank algorithm, are disclosed nowadays; the probability that each node is randomly accessed in the classic PageRank algorithm is equal, the personalized PageRank algorithm allows the probability that each node is randomly accessed to be customized, and the prediction probability corresponding to the candidate tag is used as the personalized weight of the corresponding node, so that the retrieval result and the prediction result are effectively fused, and the accuracy of the importance score is effectively improved.
Furthermore, the candidate documents in the document list are arranged in a descending order according to the importance scores, and the search results are arranged in a descending order according to the importance scores of the candidate tags.
And performing induction sequencing on the retrieved reply documents based on the prediction of the search result, so that the search result obtained by searching is more comprehensive and fits the real intention of the user.
As an implementable embodiment:
calling a preset multi-label classification model based on the query intention;
the input of the multi-label classification model is the target problem, and the output is a plurality of prediction labels and the prediction probability corresponding to each prediction label.
In the method and the device, the query intents are classified based on the entity types of the search results, the mapped entities are configured for each query intention, and a corresponding multi-label classification model is constructed for each query intention, so that the accuracy of the prediction result is improved.
Further:
the multi-label classification model comprises a plurality of binary classifiers, and the binary classifiers correspond to the entity labels corresponding to the query intents one to one; and the input of the binary classifier is the target problem, and the probability that the corresponding entity label belongs to the search result is output.
In the application, each entity label corresponds to an independent binary classifier, the accuracy of the prediction result can be further improved, and in the subsequent maintenance process, only corresponding binary classifiers need to be added or deleted according to actual needs, so that the maintenance is convenient. Further, the present application is applicable to medical vertical search scenarios:
the query intention comprises disease prediction (the entity type corresponding to the search result is a disease), department prediction, examination prediction and medicine prediction;
the preset multi-label classification model comprises a disease prediction model, a department prediction model, a detection prediction model and a medicine prediction model.
The method and the device are also suitable for other problem searching scenes, and entities corresponding to answers related to all the problems can be enumerated, for example, the plant disease and insect pest searching scene.
As an implementable embodiment:
acquiring a tag set corresponding to each candidate document, wherein the tag set comprises a plurality of entity tags;
and extracting entity tags mapped with the query intention from each tag set to obtain associated tags corresponding to the candidate documents.
Further:
presetting an entity library, wherein the entity library comprises a plurality of entity sub-libraries, each entity sub-library comprises a plurality of entities, and the entity sub-libraries correspond to the query intents one by one;
obtaining an answer document, performing entity identification on the answer document based on each entity in the entity library, obtaining an entity tag corresponding to the answer document, and obtaining a tag set corresponding to the answer document.
According to the method and the device, entity identification is carried out on each answer document in advance, the entity label corresponding to each answer document is obtained, entity identification does not need to be carried out on each candidate document in the searching process, and the calculation pressure is relieved.
The invention also provides a search system for searching based on the target problem to obtain a corresponding search result, comprising:
the intention identification module is used for carrying out intention analysis on the target problem to obtain a corresponding query intention;
the retrieval module is used for retrieving answer documents corresponding to the target questions, taking the retrieved answer documents as candidate documents, and extracting entity tags in the candidate documents based on the query intention to obtain associated tags corresponding to the candidate documents;
the prediction module is used for predicting the entity label corresponding to the search result based on the query intention, obtaining a prediction label and obtaining a prediction probability corresponding to the prediction label;
and the fusion sorting module is used for determining candidate tags based on the prediction tags and the associated tags, calculating the importance scores of the candidate tags based on the prediction probability, extracting candidate documents corresponding to the candidate tags, obtaining document lists corresponding to the candidate tags one by one, taking the corresponding candidate tags and the document lists as search results, and outputting the search results based on the importance scores.
The invention also proposes a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of any of the methods described above.
Due to the adoption of the technical scheme, the invention has the remarkable technical effects that:
according to the method, the answer entity is determined through the associated tags and the predicted tags, namely the candidate tags are determined, the importance degree score of each candidate tag is calculated based on the prediction probability, so that the document retrieval result and the entity prediction result are fused, the obtained search result is concise and has interpretability, and the ranking of the search result based on the obtained importance degree score is more suitable for the real intention of a user.
In the invention, the corresponding candidate tags and the corresponding document list are used as the search results, compared with the existing search scheme, the search results are concise and interpretable, and the output answer documents are related to the target problem, thereby facilitating the comparison and analysis of each search result by the user.
In the invention, the candidate labels and the candidate documents are used as nodes to construct an undirected graph, and the initial weight of the nodes corresponding to the candidate labels is used as the prediction probability to calculate the importance scores corresponding to the nodes, so that the retrieval result and the prediction result are effectively fused, and the finally obtained search result is more accurate and stable.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a search method of the present invention;
FIG. 2 is a schematic diagram of construction of an undirected graph;
FIG. 3 is a block diagram of a search system according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples, which are illustrative of the present invention and are not to be construed as being limited thereto.
Embodiment 1, a search method for searching answers to a predicted medical question, comprising the steps of:
s100, constructing an intention recognition model, and specifically comprising the following steps:
s110, configuring the type of the query intention;
the number and category of the query intents can be configured by those skilled in the art according to actual needs, and in the embodiment, the query intents are classified into a query class and a prediction class, wherein the prediction class includes four types, namely disease prediction, department prediction, examination prediction and medicine prediction.
S120, constructing and training an intention recognition model;
the method of intention recognition is generally divided into two methods, namely, intention recognition based on a template and intention recognition based on machine learning;
the method for identifying the intention based on the template is easy to start, the accuracy rate is high, and the coverage rate and the generalization capability are low.
The intention identification method based on machine learning has high coverage rate and generalization capability.
The intention recognition problem in this embodiment may be defined as a text classification problem, and may be implemented by using a classification model such as a conventional public Logistic Regression (Logistic Regression) model, Fasttext, TextCNN, and the like.
In this embodiment, a historical query question is obtained as a sample question, and the sample question is labeled based on the query intention configured in step S110, so as to obtain first training data;
training a Fastext model based on first training data to obtain a corresponding intention recognition model;
the input of the obtained intention identification model is a target problem, and the output is a query intention.
Note: those skilled in the art can learn that the intention recognition model adopts a Fasttext model, and the first training data is a historical query problem labeled with query intention, that is, the intention recognition model of the present embodiment can be obtained by training according to the existing disclosed training method, so that the present embodiment does not describe the training steps in detail.
S200, data preparation:
s210, configuring an entity library:
the entity library includes a plurality of entity sub-libraries, each entity sub-library includes a plurality of entities, the entity sub-libraries are in one-to-one correspondence with the prediction type query intent, for example, the entity sub-library corresponding to the disease prediction includes a plurality of disease entities.
S220, document acquisition and preprocessing:
crawling document data corresponding to the medical industry by using a web crawler to obtain an original document;
the method comprises the steps of preprocessing the original document to obtain an answer document with a tag set, wherein the tag set comprises at least one entity tag, and the preprocessing comprises the steps of title extraction, text extraction, data cleaning, tag extraction, entity identification and the like.
Since the entity types in this embodiment include diseases, medicines, examinations, and departments, and each type of corresponding entity can be enumerated, in this embodiment, a dictionary matching manner is employed to extract the entity corresponding to the answer document, obtain the corresponding entity tag, and form a tag set corresponding to the answer document;
in the embodiment, the retrieval result and the prediction result are fused by the entity, so that the tag set of each answer document is extracted offline in advance to reduce the online calculation pressure in the subsequent search process.
S300, constructing a document index:
document index construction belongs to the prior art, and a person skilled in the art can construct a forward index, a reverse index and/or a vector index according to actual needs, and the embodiment does not specifically limit the forward index, the reverse index and/or the vector index;
s310, text indexing, including a forward index and a reverse index:
the forward index takes the document ID as a key value, and all field values of the document can be quickly found through the document ID;
the inverted index is used for quickly finding a document list containing a certain value through the value in the certain field of the document; the reverse index can be created for the fields needing to be retrieved, such as the title or the text of the document, and the rapid retrieval can be realized by constructing the reverse index;
the Elasticissearch can be adopted to construct the forward index and the reverse index; the Elasticissearch is an open-source distributed search engine tool, and can quickly realize indexing and retrieval of massive texts.
S320, vector indexing, namely vectorizing the title and/or the text of the document to the established index.
S400, constructing a multi-label classification model corresponding to the prediction type query intention one by one;
the types of the query intentions are classified into a query type and a prediction type, search contents input by a user comprise search keywords or questions, when the search keywords are input, entity prediction is not needed, for example, the user queries specific disease information, medicine information and hospital information, only a query result needs to be output, when the questions are input, entity prediction needs to be carried out based on the questions, and if the fingers are painful, diseases causing the fingers need to be predicted;
in the embodiment, the prediction classes are four types, namely disease prediction, department prediction, examination prediction and medicine prediction, so that four corresponding multi-label classification models need to be trained, namely a disease prediction model, a department prediction model, an examination prediction model and a medicine prediction model need to be trained. The construction method specifically comprises the following steps:
s410, constructing a binary classifier corresponding to the entities one by one:
obtaining the sample question in step S120, labeling the sample question based on the entities in the entity library configured in step S210, and obtaining second training data, where the second training data includes the sample question and at least one entity label corresponding to the sample question, for example, the entity label corresponding to the question "how the finger is going to go back" is tenosynovitis, gout, and arthritis;
acquiring an entity corresponding to a binary classifier to be trained, and acquiring a target entity;
extracting third training data from the second training data based on the target entity, wherein the third training data comprises a sample question and an entity label corresponding to the target entity, for example, when a binary classifier corresponding to gout is trained, the sample question with the entity label containing gout is extracted, and the corresponding third training data is obtained after gout is labeled for the sample question;
and training the binary classifier to be trained based on the third training data to obtain a corresponding binary classifier.
Note: in this embodiment, the input of the binary classifier is a target problem, and the output is a probability that a search result corresponding to the target problem relates to a target entity.
S420, constructing a multi-label classification model:
grouping the binary classifiers obtained by training, and dividing the binary classifiers of the target entity belonging to the same entity sub-library into the same group to obtain corresponding multi-label classification models, wherein the entity sub-libraries correspond to the prediction type query intents one to one, so that the obtained multi-label classification models also correspond to the prediction type query intents one to one;
obtaining a disease prediction model, a department prediction model, an inspection prediction model and a medicine prediction model.
S500, searching based on the target problem to obtain a corresponding search result, with reference to fig. 1, specifically including the following steps:
s510, analyzing the intention of the target problem to obtain a corresponding query intention;
acquiring a target question, inputting the target question into the intention recognition model constructed in the step S100, and outputting a corresponding query intention by the intention recognition model;
when the query intention is a query class, step S530 is performed, and when the query intention is a prediction class, step S520 and step S530 are performed.
Note: when only the query intention of the prediction class is configured in step S110, that is, the intention recognition model outputs only the query intention of the prediction class, steps S510 and S530 are directly performed after the target question is acquired.
S520, predicting an entity label corresponding to the search result based on the query intention, obtaining a prediction label, and obtaining a prediction probability corresponding to the prediction label;
s521, calling a corresponding multi-label classification model based on the query intention;
the multi-label classification model is the multi-label classification model constructed in S400;
and if the query intention is disease prediction, calling a disease prediction model.
S522, inputting the target question into the multi-label classification model called in step S521, obtaining corresponding prediction labels and prediction probabilities corresponding to the prediction labels, and then executing step S550.
Inputting the target problem into a corresponding binary classifier, outputting corresponding prediction labels and prediction probabilities corresponding to the prediction labels based on classification results of the binary classifier, wherein the binary classifier is used for predicting the probability that the entity is the entity corresponding to the search result;
according to actual needs, those skilled in the art can use the entity tag mapped with the query intention as a prediction tag, or use the entity tag corresponding to the predicted search result as a prediction tag;
the entity label mapped with the query intention is used as a prediction label, namely, the entity label corresponding to the binary classifier is used as the prediction label, the probability obtained by prediction is used as the prediction probability, the probability value is 0-1, for example, the binary classifier corresponding to the cold predicts that the prediction result of the input problem' how the finger pain is "is 0.2, namely the probability of finger pain caused by the cold is 0.2, the cold is used as the prediction label in the scheme, and the corresponding prediction probability is 0.2.
Taking an entity label corresponding to a predicted search result as a predicted label, namely, when the predicted probability is greater than or equal to a preset confidence threshold, generating a predicted label for the entity corresponding to the binary classifier, and taking the probability as the predicted probability; for example, when the confidence threshold is set to 0.5, the probability of the cold causing finger pain is predicted to be 0.2, and the probability of the gout causing finger pain is predicted to be 0.6, the scheme takes only gout as the prediction label.
S530, retrieving answer documents corresponding to the target questions, taking the retrieved answer documents as candidate documents, and then executing a step S540;
the skilled person searches according to the document index which is constructed in advance;
in this embodiment, for constructing a text index based on an Elasticsearch, a dsl (domain Specific language) language provided by the Elasticsearch is adopted for query, and a query statement is expressed in json format; for example, the method searches for ' how to cough ' in the title and the text, the corresponding query statement is { "query": { "multi _ match": { "query": how to cough "," fields ": [ ' title", "content" ] } }, and a plurality of answer documents with the highest obtained relevance scores are taken as candidate documents.
In this embodiment, for a vector index established based on faiss, a query interface provided by the faiss is used for querying, a target problem is vectorized first, then a search function provided by the faiss is called for vector retrieval, and finally a plurality of answer documents with the highest relevance scores are used as candidate documents.
Faiss, Facebook AI Similarity Search, a derived Similarity Search tool.
Note: and when the query intention is a query class, retrieving according to the retrieval method disclosed in the step based on the query content input by the user, taking the obtained candidate documents as the search results, and outputting in a descending order according to the corresponding relevance scores.
S540, extracting entity tags in each candidate document based on the query intention to obtain associated tags corresponding to the candidate documents;
s541, acquiring a label set corresponding to each candidate document, wherein the label set comprises a plurality of entity labels;
s542, extracting entity tags mapped with the query intention from each tag set, obtaining associated tags corresponding to the candidate documents, and then executing the step S550;
this step is used to remove entity tags irrelevant to the target problem in the tag set, where the entity tags in this embodiment include corresponding entity IDs, entity types, and entity names, and in this embodiment, the entity tags mapped to the query intention are extracted based on the entity types, for example:
{ "doc _ ID (document ID)", "1000", "title (target question)", "what department the cough hangs", "content (answer document)", "cough may be due to upper respiratory tract infection, can go to respiratory medicine for a visit", "entities (tag set)", [ { "entity _ ID (entity ID)", "entity _ 1001", "entity _ type (entity type)", "department", "entity _ name", "respiratory medicine" }, { "entity _ ID": entity _2003 "," entity _ type ": disease", "entity _ name": upper respiratory tract infection "};
since the query intention of the user is a department prediction, an entity tag of which the entity type is a department is taken as an association tag.
S550, determining candidate tags based on the predicted tags and the associated tags, and calculating the importance scores of the candidate tags based on the predicted probability, specifically:
s551, taking the entity label as a prediction label or an associated label as a candidate label;
s552, taking the candidate label and the candidate document as nodes, taking the relation between the candidate label and the candidate document as an edge, and establishing an undirected graph;
the retrieval result includes the candidate document and the associated tag corresponding to the candidate document, as shown in fig. 2, the document ID is used to represent the candidate document, and the entity ID is used to indicate the associated tag corresponding to the candidate document; the prediction result includes the prediction tag and the prediction probability corresponding to the prediction tag, as shown in fig. 2, the entity ID is used to indicate the prediction tag;
referring to fig. 2, the candidate labels and the candidate documents are both used as nodes, and the nodes corresponding to the candidate documents and the nodes corresponding to the associated labels are connected to obtain an undirected graph.
S553, calculating the importance degree score corresponding to each node in the undirected graph based on the prediction probability, and then executing the step S560;
taking a preset initial value as the initial weight of the node corresponding to each candidate document, wherein the initial value is set to 0.5 in the embodiment;
comparing the prediction probability corresponding to the candidate label with a preset probability threshold, when the prediction probability corresponding to the candidate label is greater than or equal to the preset probability threshold, taking the prediction probability as the initial weight of the node corresponding to the candidate label, otherwise, taking a preset initial value as the initial weight of the node corresponding to the candidate label;
when the confidence threshold and the probability threshold set in S522 are the same, since the threshold comparison has been performed in step S522, when the candidate tag has a corresponding prediction probability, the prediction probability is directly used as the initial weight of the corresponding node, and when the candidate tag has no corresponding prediction probability, the initial value is used as the initial weight of the node corresponding to the candidate tag;
in this embodiment, both the confidence threshold and the probability threshold are 0.75;
and carrying out personalized PageRank iterative computation based on the undirected graph and the initial weight of each node in the undirected graph to obtain the importance score corresponding to each node in the undirected graph.
And S560, extracting the candidate documents corresponding to the candidate labels, obtaining document lists corresponding to the candidate labels one by one, taking the corresponding candidate labels and the document lists as search results, and outputting the search results based on the importance scores.
The method for obtaining the document list comprises the steps of obtaining candidate document nodes associated with the candidate label nodes, extracting corresponding texts based on document IDs corresponding to the candidate document nodes, and obtaining the corresponding document list.
In this embodiment, the candidate documents in the document list are arranged in a descending order according to the importance scores, and the search results are arranged in a descending order according to the importance scores of the candidate tags, where the search result corresponding to the target question "how does the finger pain" is shown in the following table:
TABLE 1
Figure 563091DEST_PATH_IMAGE001
Embodiment 2, a search system, configured to perform a search based on a target problem and obtain a corresponding search result, as shown in fig. 3, includes:
theintention identification module 100 is used for performing intention analysis on the target problem to obtain a corresponding query intention;
aretrieval module 200, configured to retrieve answer documents corresponding to the target question, use the retrieved answer documents as candidate documents, and extract entity tags in each candidate document based on the query intention to obtain associated tags corresponding to the candidate documents;
aprediction module 300, configured to predict an entity tag corresponding to the search result based on the query intent, obtain a prediction tag, and obtain a prediction probability corresponding to the prediction tag;
afusion sorting module 400, configured to determine candidate tags based on the predicted tags and the associated tags, calculate importance scores of the candidate tags based on the prediction probabilities, extract candidate documents corresponding to the candidate tags, obtain document lists corresponding to the candidate tags one to one, take the corresponding candidate tags and document lists as search results, and output the search results based on the importance scores.
In this embodiment, thefusion sorting module 400 includes a fusion unit and a sorting unit;
the fusion unit comprises a construction unit and a weight calculation unit;
the construction unit takes the candidate label and the candidate document as nodes, takes the relation between the candidate label and the candidate document as an edge, and establishes an undirected graph;
the weight calculation unit is used for calculating importance scores corresponding to all nodes in the undirected graph based on the prediction probability;
and the sorting unit is used for enabling each candidate document in the document list to be sorted in a descending order according to the importance degree score and enabling the search result to be sorted in a descending order according to the importance degree score of the candidate tag.
In this embodiment, the system further comprises a database;
the database comprises a text library and an entity library;
the text library comprises a plurality of answer texts and also comprises a label set corresponding to each answer text;
the entity library comprises a plurality of entity sub-libraries, each entity sub-library comprises a plurality of entities, and the entity sub-libraries correspond to the query intents one by one.
Embodiment 3 is a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method of embodiment 1.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that:
reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. Thus, the appearances of the phrase "one embodiment" or "an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
All equivalent or simple changes of the structure, the characteristics and the principle of the invention which are described in the patent conception of the invention are included in the protection scope of the patent of the invention. Various modifications, additions and substitutions for the specific embodiments described may be made by those skilled in the art without departing from the scope of the invention as defined in the accompanying claims.

Claims (10)

1. A searching method is used for searching based on a target problem and obtaining a corresponding searching result, and is characterized by comprising the following steps:
analyzing the intention of the target problem to obtain a corresponding query intention;
predicting an entity label corresponding to the search result based on the query intention, obtaining a prediction label, and obtaining a prediction probability corresponding to the prediction label;
searching answer documents corresponding to the target questions, and taking the searched answer documents as candidate documents;
extracting entity tags in each candidate document based on the query intention to obtain associated tags corresponding to the candidate documents;
determining candidate labels based on the predicted labels and the associated labels, and calculating the importance degree score of each candidate label based on the predicted probability;
extracting candidate documents corresponding to the candidate labels, obtaining document lists corresponding to the candidate labels one by one, taking the corresponding candidate labels and the document lists as search results, and outputting the search results based on the importance scores.
2. The searching method according to claim 1, wherein the step of calculating the importance score of each candidate tag based on the predicted probability comprises:
taking the candidate label and the candidate document as nodes, taking the relation between the candidate label and the candidate document as an edge, and establishing an undirected graph;
and calculating the importance degree score corresponding to each node in the undirected graph based on the prediction probability.
3. The searching method according to claim 2, wherein the specific step of calculating the importance score corresponding to each node in the undirected graph based on the prediction probability comprises:
taking a preset initial value as the initial weight of the node corresponding to each candidate document;
when the prediction probability corresponding to the candidate label is greater than or equal to a preset probability threshold value, taking the prediction probability as the initial weight of the node corresponding to the candidate label, otherwise, taking a preset initial value as the initial weight of the node corresponding to the candidate label;
and carrying out personalized PageRank iterative computation based on the undirected graph and the initial weight of each node in the undirected graph to obtain the importance score corresponding to each node in the undirected graph.
4. The search method according to claim 2 or 3, wherein the specific step of outputting each search result based on the importance score is:
and enabling the candidate documents in the document list to be arranged in a descending order according to the importance degree scores, and enabling the search results to be arranged in a descending order according to the importance degree scores of the candidate tags.
5. The search method according to any one of claims 1 to 3, characterized in that:
calling a preset multi-label classification model based on the query intention;
the input of the multi-label classification model is the target problem, and the output is a plurality of prediction labels and the prediction probability corresponding to each prediction label.
6. The search method according to claim 5, wherein:
the multi-label classification model comprises a plurality of binary classifiers, and the binary classifiers correspond to the entity labels corresponding to the query intents one to one;
and the input of the binary classifier is the target problem, and the probability that the corresponding entity label belongs to the search result is output.
7. The search method according to claim 5, wherein:
the query intention comprises disease prediction, department prediction, examination prediction and medicine prediction;
the preset multi-label classification model comprises a disease prediction model, a department prediction model, an inspection prediction model and a medicine prediction model.
8. The search method according to any one of claims 1 to 3, characterized in that:
acquiring a tag set corresponding to each candidate document, wherein the tag set comprises a plurality of entity tags;
and extracting entity tags mapped with the query intention from each tag set to obtain associated tags corresponding to the candidate documents.
9. A search system for searching based on a target problem to obtain a corresponding search result, comprising:
the intention identification module is used for carrying out intention analysis on the target problem to obtain a corresponding query intention;
the retrieval module is used for retrieving answer documents corresponding to the target questions, taking the retrieved answer documents as candidate documents, and extracting entity tags in the candidate documents based on the query intention to obtain associated tags corresponding to the candidate documents;
the prediction module is used for predicting the entity label corresponding to the search result based on the query intention, obtaining a prediction label and obtaining a prediction probability corresponding to the prediction label;
and the fusion sorting module is used for determining candidate tags based on the prediction tags and the associated tags, calculating the importance scores of the candidate tags based on the prediction probability, extracting candidate documents corresponding to the candidate tags, obtaining document lists corresponding to the candidate tags one by one, taking the corresponding candidate tags and the document lists as search results, and outputting the search results based on the importance scores.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
CN202111450667.8A2021-12-012021-12-01Search method, search system, and computer-readable storage mediumActiveCN113868406B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202111450667.8ACN113868406B (en)2021-12-012021-12-01Search method, search system, and computer-readable storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202111450667.8ACN113868406B (en)2021-12-012021-12-01Search method, search system, and computer-readable storage medium

Publications (2)

Publication NumberPublication Date
CN113868406Atrue CN113868406A (en)2021-12-31
CN113868406B CN113868406B (en)2022-03-11

Family

ID=78985321

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202111450667.8AActiveCN113868406B (en)2021-12-012021-12-01Search method, search system, and computer-readable storage medium

Country Status (1)

CountryLink
CN (1)CN113868406B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115630144A (en)*2022-12-212023-01-20中信证券股份有限公司Document searching method and device and related equipment
CN116230250A (en)*2023-02-012023-06-06郑州大学 A disease prediction method and system based on big data
CN118467669A (en)*2024-05-092024-08-09深圳计算科学研究院 Index construction method, field search method, device, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2014182748A1 (en)*2013-05-082014-11-13Microsoft CorporationCross-lingual automatic query annotation
CN108182262A (en)*2018-01-042018-06-19华侨大学Intelligent Answer System construction method and system based on deep learning and knowledge mapping
CN109918487A (en)*2019-01-282019-06-21平安科技(深圳)有限公司 Intelligent question answering method and system based on network encyclopedia
CN110019698A (en)*2017-09-042019-07-16珠海健康云科技有限公司A kind of intelligent Service method and system of medicine question and answer
CN111382270A (en)*2020-03-052020-07-07中国平安人寿保险股份有限公司 Intent recognition method, device, device and storage medium based on text classifier
CN111753048A (en)*2020-05-212020-10-09高新兴科技集团股份有限公司Document retrieval method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2014182748A1 (en)*2013-05-082014-11-13Microsoft CorporationCross-lingual automatic query annotation
CN110019698A (en)*2017-09-042019-07-16珠海健康云科技有限公司A kind of intelligent Service method and system of medicine question and answer
CN108182262A (en)*2018-01-042018-06-19华侨大学Intelligent Answer System construction method and system based on deep learning and knowledge mapping
CN109918487A (en)*2019-01-282019-06-21平安科技(深圳)有限公司 Intelligent question answering method and system based on network encyclopedia
CN111382270A (en)*2020-03-052020-07-07中国平安人寿保险股份有限公司 Intent recognition method, device, device and storage medium based on text classifier
CN111753048A (en)*2020-05-212020-10-09高新兴科技集团股份有限公司Document retrieval method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MARIA DALTAYANNI等: "A Fast Interactive Search System for Healthcare Services", 《2012 ANNUAL SRII GLOBAL CONFERENCE》*
李红梅: "智能元搜索引擎关键技术研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》*

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115630144A (en)*2022-12-212023-01-20中信证券股份有限公司Document searching method and device and related equipment
CN115630144B (en)*2022-12-212023-04-28中信证券股份有限公司Document searching method and device and related equipment
CN116230250A (en)*2023-02-012023-06-06郑州大学 A disease prediction method and system based on big data
CN116230250B (en)*2023-02-012023-09-29郑州大学 A disease prediction method and system based on big data
CN118467669A (en)*2024-05-092024-08-09深圳计算科学研究院 Index construction method, field search method, device, equipment and medium
CN118467669B (en)*2024-05-092025-02-25深圳计算科学研究院 Index construction method, field search method, device, equipment and medium

Also Published As

Publication numberPublication date
CN113868406B (en)2022-03-11

Similar Documents

PublicationPublication DateTitle
CN111950285B (en)Medical knowledge graph intelligent automatic construction system and method with multi-mode data fusion
CN105824959B (en)Public opinion monitoring method and system
CN113868406B (en)Search method, search system, and computer-readable storage medium
Xie et al.A novel text mining approach for scholar information extraction from web content in Chinese
Song et al.Exploring author name disambiguation on PubMed-scale
CN107491655B (en)Liver disease information intelligent consultation system based on machine learning
US20180341686A1 (en)System and method for data search based on top-to-bottom similarity analysis
CN112597283A (en)Notification text information entity attribute extraction method, computer equipment and storage medium
Sandhiya et al.A review of topic modeling and its application
CN117573882A (en)Agricultural multi-mode intelligent retrieval technology and system based on multi-source heterogeneous data
CN119599130A (en)Self-adaptive sensitive information intelligent identification method, device, equipment, storage medium and product
US11507593B2 (en)System and method for generating queryeable structured document from an unstructured document using machine learning
CN117216221A (en)Intelligent question-answering system based on knowledge graph and construction method
CN119066179B (en) Question and answer processing method, computer program product, device and medium
CN119938846A (en) Method and device for generating question and answer based on knowledge graph
CN117033584B (en)Method and device for determining case text and electronic equipment
BE1027433A9 (en) A method of extracting information from semi-structured documents, an associated system and a processing device
RousseauGraph-of-words: mining and retrieving text with networks of features
EP1910918A2 (en)Method and system for automatically extracting data from web sites
CN118228824A (en)Knowledge question-answering method, knowledge question-answering device, electronic equipment and storage medium
CN115687773A (en)Cross-environment metadata matching method and system based on knowledge graph
Hai et al.Improving the Efficiency of Semantic Image Retrieval Using a Combined Graph and SOM Model
Mohemad et al.Ontological-based information extraction of construction tender documents
CN114722166B (en)Method and device for generating application problem solving knowledge and application problem solving robot
Ajitha et al.EFFECTIVE FEATURE EXTRACTION FOR DOCUMENT CLUSTERING TO ENHANCE SEARCH ENGINE USING XML.

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp