Movatterモバイル変換


[0]ホーム

URL:


CN116049376B - Method, device and system for retrieving and replying information and creating knowledge - Google Patents

Method, device and system for retrieving and replying information and creating knowledge
Download PDF

Info

Publication number
CN116049376B
CN116049376BCN202310330518.0ACN202310330518ACN116049376BCN 116049376 BCN116049376 BCN 116049376BCN 202310330518 ACN202310330518 ACN 202310330518ACN 116049376 BCN116049376 BCN 116049376B
Authority
CN
China
Prior art keywords
question
target
model
similarity
answer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310330518.0A
Other languages
Chinese (zh)
Other versions
CN116049376A (en
Inventor
杨家豪
张洪明
陈小鹏
黄平
李翠芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Taiji Information System Technology Co ltd
Original Assignee
Beijing Taiji Information System Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Taiji Information System Technology Co ltdfiledCriticalBeijing Taiji Information System Technology Co ltd
Priority to CN202310330518.0ApriorityCriticalpatent/CN116049376B/en
Publication of CN116049376ApublicationCriticalpatent/CN116049376A/en
Application grantedgrantedCritical
Publication of CN116049376BpublicationCriticalpatent/CN116049376B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

The invention discloses a method, a device and a system for retrieving and replying information and creates knowledge, which belong to the field of intelligent question and answer; obtaining the user questions and the categories of the questions, calculating the text similarity of the questions and the saved questions in the same category, and if the text similarity of any target question in the saved questions is greater than the preset text similarity, obtaining the answer of the questions according to the answers corresponding to the target questions without carrying out subsequent operations, thereby greatly accelerating the answer speed of the questions. If the answer model does not exist, articles in the same category are obtained to reduce the screening quantity, then paragraphs related to the questions of the articles are screened according to the relevance scores to obtain target paragraphs, the number of the paragraphs input into the answer model is reduced, then the target paragraphs and the questions are input into the answer model to obtain candidate answers, and answers of the questions are obtained according to the candidate answers. The scheme of the application ensures that the user can be quickly and accurately replied no matter what conditions are.

Description

Method, device and system for retrieving and replying information and creating knowledge
Technical Field
The invention relates to the field of intelligent question and answer, in particular to a method, a device and a system for retrieving and replying information.
Background
In recent years, credit engineering gradually permeates into the fields of finance, telecommunication, energy, traffic and the like, all industries show credit normalization, credit manufacturers greatly improve the performance of credit products after the credit manufacturers are verified by actual projects, the credit manufacturers develop from 'available' to 'good use', the credit manufacturers continuously push out credit products towards the diversified needs of users, the update iteration of product versions is faster, various credit knowledge shows explosive growth, and the credit manufacturers cannot quickly and accurately acquire related solutions during retrieval.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method, a device and a system for retrieving and replying information and aims to solve the problem that an information and sound manufacturer cannot acquire a related solution rapidly and accurately during retrieval.
The technical scheme adopted for solving the technical problems is as follows:
in a first aspect, a method for creating a knowledge retrieval reply is provided, including the steps of:
acquiring user questions and categories of the questions;
calculating the text similarity of the questions and the saved questions in the category;
if the text similarity of any target problem is not greater than the preset text similarity in the stored problems, acquiring an article related to the problem category, wherein the article is stored in a credit knowledge base according to the category in advance;
Calculating a relevance score for any one of the paragraphs in each of the articles to the question;
taking the paragraphs with the relevance scores larger than the preset scores as target paragraphs;
inputting the questions and the target paragraphs into a pre-constructed answer model to obtain candidate answers;
obtaining a reply of the question according to the candidate answer; or if the text similarity of any target question in the stored questions is greater than a preset text similarity, obtaining a reply of the question according to the answer corresponding to the target question.
Further, the obtaining the category of the problem includes:
performing word segmentation on the problem through a pre-built dictionary to obtain a word segmentation result and a part of speech;
if the word segmentation result contains the unregistered word which is not recorded in the dictionary, calculating the semantic similarity between the unregistered word and the word with the same part of speech as the unregistered word in the dictionary; and taking the word with the highest semantic similarity as a target word; replacing the unregistered words in the question with the target words to obtain a target question; if all words in the word segmentation result are in the dictionary, obtaining a target question according to the word segmentation result;
Inputting the target question into a pre-trained classification model to obtain the probability of the target question in each preset category;
and taking the preset category with the highest probability as the category of the problem.
Further, the obtaining the answer of the question according to the answer corresponding to the target question includes:
and taking the answer corresponding to the target question with the highest text similarity as the answer of the question.
Further, the calculating a relevance score for any of the paragraphs in each of the articles to the question includes:
and calculating the similarity between the paragraph and the problem by using a BM25 algorithm, and taking the similarity as a relevance score of the paragraph and the problem.
Further, the pre-constructed answer model comprises, when training:
inputting a training question and a first training paragraph comprising answers of the training question into an answer model to be trained to train to obtain an intermediate answer model;
acquiring a second training paragraph which does not comprise answers to the training questions;
and inputting the second training paragraph and the training question into the intermediate answer model for training to obtain a final answer model.
Further, the obtaining a reply to the question according to the candidate answer includes:
Calculating the similarity between the candidate answers and the questions;
and taking the candidate answer with the maximum similarity as a reply of the question.
Further, the obtaining a reply to the question according to the candidate answer includes:
extracting features of each candidate answer to obtain answer features;
inducing similar answer features into one obtained answer feature set;
and taking the candidate answers comprising the largest feature quantity in the answer feature set as replies of the questions.
Further, the method further comprises the following steps:
receiving feedback of the user on the reply, wherein the feedback comprises good and bad;
when the feedback is good and the question is not a saved question, saving the question and the reply; when the feedback is bad and the problem is a stored problem, multiplying the similarity of the problem and the replied text by a preset weight coefficient, wherein the weight coefficient is smaller than 1, the weight coefficient is positively correlated with the feedback proportion, and the feedback proportion is the number of times that the feedback of the problem and the replied text is good to the total number of times; when the feedback is bad and the question is not a saved question, then parameters of the answer model are adjusted according to the question and the reply.
In a second aspect, an apparatus for creating a knowledge retrieval reply is provided, including:
the problem category acquisition module is used for acquiring user problems and categories of the problems;
a similarity calculation module, configured to calculate a text similarity between the question and the saved question in the category;
the related article acquisition module is used for acquiring articles related to the problem category if the text similarity of any target problem is not greater than a preset text similarity in the stored problems, wherein the articles are stored in a credit knowledge base according to the category in advance;
a relevance score calculation module, configured to calculate a relevance score of any paragraph in each article and the problem;
a target paragraph obtaining module, configured to take a paragraph with the relevance score greater than a preset score as a target paragraph;
the candidate answer acquisition module is used for inputting the questions and the target paragraphs into a pre-constructed answer model to obtain candidate answers;
the question answer obtaining module is used for obtaining an answer of the question according to the candidate answer; or if the text similarity of any target question in the stored questions is greater than a preset text similarity, obtaining a reply of the question according to the answer corresponding to the target question.
In a third aspect, a system for creating knowledge retrieval replies is provided, comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured for performing the method of any of the first aspect solutions.
The beneficial effects are that:
the technical scheme of the application provides a method, a device and a system for searching and replying information and creating knowledge, which comprises the steps of firstly, obtaining user problems and categories of the problems; and then calculating the text similarity of the questions and the saved questions in the same category, and if the text similarity of any one target question in the saved questions is greater than the preset text similarity, obtaining the answer of the questions according to the answers corresponding to the target questions, so that the subsequent operation is not needed, and the question answer speed is greatly increased. If the answer model does not exist, articles in the same category are obtained to reduce the screening quantity, then paragraphs related to the questions of the articles are screened according to the relevance scores to obtain target paragraphs, the number of the paragraphs input into the answer model is reduced, then the target paragraphs and the questions are input into the answer model to obtain candidate answers, and finally answers of the questions are obtained according to the candidate answers. If the text similarity is high, the answer corresponding to the saved question is directly used as a reply; if the text similarity is low, the replies are obtained from the articles, and the user can be guaranteed to be quickly and accurately replied under any condition.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for creating knowledge retrieval replies provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of a device for retrieving and replying to information and knowledge according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a system structure of a credit-created knowledge retrieval reply according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present invention will be described in detail with reference to the accompanying drawings and examples. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, based on the examples herein, which are within the scope of the protection sought by those of ordinary skill in the art without undue effort, are intended to be encompassed by the present application.
First embodiment referring to fig. 1, an embodiment of the present invention provides a method for retrieving replies from credit-created knowledge, including the following steps:
s11: acquiring user questions and categories of the questions;
s12: calculating the text similarity of the problem and the saved problem in the category;
s13: if the text similarity of any target problem is not greater than the preset text similarity in the stored problems, acquiring articles related to the problem category, wherein the articles are stored in a credit knowledge base according to the category in advance;
s14: calculating a relevance score of any paragraph in each article to the problem;
s15: taking a paragraph with the relevance score larger than the preset score as a target paragraph;
s16: inputting the questions and the target paragraphs into a pre-constructed answer model to obtain candidate answers;
s17: obtaining a reply of the question according to the candidate answer;
s18: if the text similarity of any one of the stored questions is greater than the preset text similarity, the answer of the question is obtained according to the answer corresponding to the target question.
The method for retrieving and replying the credit-created knowledge provided by the embodiment of the invention comprises the steps of firstly obtaining user problems and categories of the problems; and then calculating the text similarity of the questions and the saved questions in the same category, and if the text similarity of any one target question in the saved questions is greater than the preset text similarity, obtaining the answer of the questions according to the answers corresponding to the target questions, so that the subsequent operation is not needed, and the question answer speed is greatly increased. If the answer model does not exist, articles in the same category are obtained to reduce the screening quantity, then paragraphs related to the questions of the articles are screened according to the relevance scores to obtain target paragraphs, the number of the paragraphs input into the answer model is reduced, then the target paragraphs and the questions are input into the answer model to obtain candidate answers, and finally answers of the questions are obtained according to the candidate answers. If the text similarity is high, the answer corresponding to the saved question is directly used as a reply; if the text similarity is low, the replies are obtained from the articles, and the user can be guaranteed to be quickly and accurately replied under any condition.
In a second embodiment, as a supplementary explanation of the first embodiment, the present invention provides a method for creating a knowledge retrieval reply, including the steps of:
acquiring user questions and categories of the questions; wherein obtaining the category of the problem includes: the method comprises the steps of performing word segmentation on a problem through a pre-built dictionary to obtain word segmentation results and word parts; if the word segmentation result contains the unregistered word, the unregistered word is the word which is not recorded in the dictionary, and the semantic similarity between the unregistered word and the word with the same part of speech as the unregistered word in the dictionary is calculated; and taking the word with the highest semantic similarity as a target word; the method comprises the steps that the unknown words in a problem are replaced by target words, target question sentences are obtained, a dictionary corresponds to a pre-trained classification model, and after replacement, all words in the target sentence are recorded in the target words and can be recognized by the classification model, so that classification can be more accurately carried out; if all words in the word segmentation result are in the dictionary, obtaining a target question according to the word segmentation result; inputting the target question into a pre-trained classification model to obtain the probability of the target question in each preset category; and taking the preset category with the highest probability as the category of the problem.
The method comprises the steps of inputting a target question into a pre-trained classification model to obtain the probability of each preset category of the target question, specifically, extracting target question feature words from the classification model to serve as input feature words, calculating the semantic similarity of a first input feature word and all model feature words of any model question, taking the model feature word with the highest semantic similarity as a target model feature word corresponding to the input feature word, and then calculating the semantic similarity of the next input feature word and other model feature words in the model question, wherein the other model feature words are model feature words after all model feature words are removed from the corresponding target model feature words. If the number of the input feature words of the target question is larger than the number of the model feature words of the model question, firstly adding the similarity between all the input feature words and the corresponding target model feature words to obtain a sum of the similarity, and dividing the sum of the similarity by the number of the input feature words to obtain the target similarity between the target question and the model file; if the number of the input feature words of the target question is smaller than the number of the model feature words of the model question, firstly adding the similarity between all the input feature words and the corresponding target model feature words to obtain a sum of the similarity, and dividing the sum of the similarity by the number of the model feature words to obtain the target similarity between the target question and the model file; the target similarity is the probability of the target file in the preset category corresponding to the model question. The feature word number is introduced to avoid that some model questions comprise feature words of the target questions, but actual meanings are not the same when the target questions are "what is the A standard" and the model questions are "when the A standard is issued" when the target similarities are calculated.
Calculating the text similarity of the problem and the saved problem in the category; by classifying, the category to which the problem belongs can be determined, and then the calculation amount can be greatly reduced when the text similarity with the saved problem is calculated. The text similarity calculation method specifically comprises the following steps: the TF-IDF method is adopted to vectorize the text, the text is converted into a language understood by a computer, the TF-IDF is the abbreviation of word Frequency-inverse document word Frequency (Term Frequency-Inverse DocumentFrequency), and the TF-IDF is a common weighting technology for information retrieval and data mining. TF, i.e., word frequency, represents statistics of the frequency of occurrence of any word in the text. Words that occur frequently in all text are stop words, and the weight obtained by stopping words should not be too high for text. The meaning of the weight here is the inverse document frequency, and the frequency of occurrence of a word is inversely related to the inverse document frequency size. The TF-IDF value is obtained by multiplying the word frequency of the inverse document by the word frequency. The more indispensable a word is in an article, the higher the TF-IDF value of the word itself.
After calculating the numerical value of the word frequency of the inverse document of any term, calculating cosine similarity by using the word frequency-inverse document word frequency technology, and when K terms exist in total, using a K-dimensional vector to represent any saved problem and user problem, wherein the numerical value on each dimension represents the occurrence frequency of the term in the user problem or the saved problem multiplied by the IDF value corresponding to the term. And calculating the cosine value of any saved problem and user problem, and taking the cosine value as the text similarity of the saved problem and the user problem.
If the text similarity of any one of the stored questions is greater than the preset text similarity, the answer of the question is obtained according to the answer corresponding to the target question. Obtaining a reply of the question according to the answer corresponding to the target question, including: and taking the answer corresponding to the target question with the highest text similarity as the answer of the question.
If the text similarity of any target problem is not greater than the preset text similarity in the stored problems, acquiring articles related to the problem category, wherein the articles are stored in a credit knowledge base according to the category in advance; calculating a relevance score of any paragraph in each article to the problem; specifically, the similarity of paragraphs and questions is calculated using the BM25 algorithm, and the similarity is used as a relevance score for the paragraphs and questions. Taking a paragraph with the relevance score larger than the preset score as a target paragraph; inputting the questions and the target paragraphs into a pre-constructed answer model to obtain candidate answers; obtaining a reply of the question according to the candidate answer; the article is screened by category, the calculated amount of the similarity between the paragraphs and the questions is reduced, then the paragraphs are further screened according to the relevance score, the number of the processed paragraphs of the answer model is reduced, and the recovery speed is improved.
It should be noted that, when the answer model is constructed, the answer model includes: inputting a training question and a first training paragraph comprising answers of the training question into an answer model to be trained to train to obtain a middle answer model; if training is finished at this point, then all paragraphs entered will give an answer, but in practice, although the target paragraph is related to the question, it may not have an answer, e.g., the paragraph merely gives or describes the question and does not give an answer. So in order to make the answer model more practical, after obtaining the intermediate answer model, obtaining a second training paragraph which does not include the answer of the training question; and inputting the second training paragraph and the training question into the intermediate answer model for training to obtain a final answer model. Through the training of the second training paragraph, the final answer model can accurately obtain the answer when the input paragraph has the answer, and does not give the answer when the input paragraph has no answer.
As an optional implementation manner of the embodiment of the present invention, obtaining a reply to a question according to a candidate answer includes: calculating the similarity between the candidate answers and the questions; and taking the candidate answer with the highest similarity as a answer of the question. I.e. screening a reply with highest similarity to the question from all candidate answers as the question.
As another alternative implementation manner of the embodiment of the present invention, obtaining a reply to a question according to a candidate answer includes: extracting features of each candidate answer to obtain answer features; inducing similar answer features into one obtained answer feature set; and taking the candidate answers comprising the maximum feature quantity in the answer feature set as answers of the questions. In this way, although a candidate answer is also screened as a answer, instead of calculating similarity with the question, all candidate answer features are extracted to obtain a set of all answer features related to the question, and then the candidate answer including the largest number of features in the answer feature set is used as the answer of the question, so that the obtained answer can accurately cover the question of the user.
Preferably, the method further comprises: receiving feedback of a user on the reply, wherein the feedback comprises good and bad; when the feedback is good and the problem is not the saved problem, saving the problem and replying; when the feedback is bad and the problem is the stored problem, multiplying the similarity of the text of the problem and the reply by a preset weight coefficient, wherein the weight coefficient is smaller than 1, the weight coefficient is positively correlated with the feedback proportion, and the feedback proportion is the number of times that the feedback of the problem and the reply is good to the total number of times; when the feedback is bad and the question is not a saved question, the parameters of the answer model are adjusted according to the question and the reply.
According to the specific information creation knowledge retrieval and reply method provided by the embodiment of the invention, through analyzing the problem and selecting different solving methods, the knowledge of the existing resources and knowledge base can be quickly utilized to generate the answer, the knowledge in the multidimensional information creation adaptation field can be presented, and the problems of easiness in error and low efficiency caused by the defects of data, knowledge, experience and the like in the process can be solved. And providing knowledge questions and answers, knowledge retrieval and knowledge reference replies for relevant credit and debit personnel rapidly and accurately.
In a third embodiment, the present invention provides an apparatus for creating a knowledge retrieval reply, as shown in fig. 2, including:
a question category acquiring module 21, configured to acquire a user question and a category of the question;
a similarity calculation module 22 for calculating the text similarity of the questions and the saved questions in the category;
the related article obtaining module 23 is configured to obtain articles related to the problem category if the text similarity of any one of the saved problems is greater than the preset text similarity, where the articles are stored in the created knowledge base according to the category in advance;
a relevance score calculation module 24 for calculating a relevance score for any paragraph in each article to a question;
A target paragraph obtaining module 25, configured to take a paragraph with a relevance score greater than a preset score as a target paragraph;
a candidate answer acquisition module 26, configured to input the question and the target paragraph into a pre-constructed answer model to obtain a candidate answer;
a question answer obtaining module 27, configured to obtain an answer to the question according to the candidate answer; or if the text similarity of any target question in the stored questions is greater than the preset text similarity, obtaining a reply of the questions according to the answers corresponding to the target questions.
The device for retrieving and replying the credit-created knowledge provided by the embodiment of the invention comprises a problem category acquisition module for acquiring user problems and categories of the problems; the similarity calculation module calculates the text similarity of the problems and the saved problems in the category; if the text similarity of any target problem is not greater than the preset text similarity in the stored problems, the related article acquisition module acquires articles related to the problem category, and the articles are stored in the credit knowledge base according to the category in advance; the correlation score calculation module calculates the correlation score of any paragraph and the problem in each article; the target paragraph obtaining module takes a paragraph with the relevance score being larger than a preset score as a target paragraph; the candidate answer acquisition module inputs the questions and the target paragraphs into a pre-constructed answer model to obtain candidate answers; the question answer obtaining module obtains an answer of the question according to the candidate answer; or if the text similarity of any target question in the stored questions is greater than the preset text similarity, obtaining a reply of the questions according to the answers corresponding to the target questions. The replying device provided by the embodiment of the invention firstly obtains the user problems and the categories of the problems; and then calculating the text similarity of the questions and the saved questions in the same category, and if the text similarity of any one target question in the saved questions is greater than the preset text similarity, obtaining the answer of the questions according to the answers corresponding to the target questions, so that the subsequent operation is not needed, and the question answer speed is greatly increased. If the answer model does not exist, articles in the same category are obtained to reduce the screening quantity, then paragraphs related to the questions of the articles are screened according to the relevance scores to obtain target paragraphs, the number of the paragraphs input into the answer model is reduced, then the target paragraphs and the questions are input into the answer model to obtain candidate answers, and finally answers of the questions are obtained according to the candidate answers. If the text similarity is high, the answer corresponding to the saved question is directly used as a reply; if the text similarity is low, the replies are obtained from the articles, and the user can be guaranteed to be quickly and accurately replied under any condition.
Fourth embodiment the present invention provides a system for creating knowledge retrieval replies, comprising:
a processor;
a memory for storing processor-executable instructions;
the processor is configured to perform the method of the first embodiment or the second embodiment of the credit knowledge retrieval reply.
Specifically, as shown in fig. 3, the system comprises four modules of an intelligent question-answering robot, a question understanding module, a question solving module and a belief creating knowledge base.
1) Creating a knowledge base: the knowledge base comprises four parts of knowledge collection, file management, knowledge graph and expert system.
The knowledge collection utilizes a method for non-automatically obtaining knowledge to obtain documents such as standards, outline, demand analysis, document examination, static analysis, code examination, dynamic and result, adopts automatic knowledge acquisition, generalizes new knowledge from a knowledge base system, discovers errors in the knowledge and perfects the knowledge, and can also perform machine learning in the operation of the knowledge base through a learning algorithm to discover the new knowledge, and then automatically incorporates the new knowledge into the knowledge base.
File management is due to the sparsity of document data, the associated information is often distributed among a plurality of documents, and a great deal of manual intervention is still required for accessing and retrieving document contents, which causes inefficiency in document analysis. The method comprises the steps of mapping texts in a document to concepts, relations and examples by utilizing an ontology, adding notes into the document, and establishing a multi-layer document management model for document management so as to improve the understandability and accessibility of data in the document, so that the document content can be understood and operated by a computer, and further, the functions of complex searching, document interoperation and the like are provided. The general layer mainly comprises standards which are followed by the credit and debit software, such as GB/T39788-2021, GB/T38639-2020, GB/T38634-2020 and the like, and is used for representing the common attribute, structure and the like of the software document and providing guidance for adding comments; the specific domain layer edits different application scenes based on different professional domains, further represents proprietary concepts, attributes, relations and the like of different software application domains such as an operating system, a database, middleware, a cloud platform, office software and the like, serves as annotation labels of semantic documents, a document template with metadata labels is developed based on an ontology model of the information creation domain, the labels define the documents as an operating unit for carrying out operations such as identification, structuring and information interaction, semantic documents are generated, and the life cycle of the software documents is effectively controlled and managed.
The knowledge graph utilizes knowledge extraction and association technology to accurately collect and deeply mine the technology in the knowledge graph to construct a belief-created knowledge triplet, and the triplet mainly comprises two types of (entity, attribute value) and (entity, relationship, entity). (entity, attribute value) is typically a one-to-one triplet and (entity, relationship, entity) is typically a one-to-many triplet. And extracting the entity and the attribute by adopting two types of triples, constructing an association relationship through semantic mining and knowledge reasoning, and storing the triples in a graph database. The graph database is a database capable of efficiently processing complex relation networks and is realized based on the thought and algorithm of graph theory in mathematics. In the graph, nodes represent entities, relationships among the entities are represented as directed edges, and each node and each relationship have an attribute. A graph may be made up of graphs, lists, or other multiple entities, while nodes may appear as any structure, connected by relationships. The created knowledge graph can show knowledge in a visual form. Further, knowledge reasoning can be realized by utilizing the relation between the knowledge reasoning and the knowledge underlying the knowledge reasoning can be mined.
The expert system is used as a module for platform payment, can solve the difficult and complicated symptoms encountered by users in the process, establishes a cooperative relationship with evaluation mechanisms such as a national industrial information security development research center, a China electronic information industry development research institute, a medium-electric Taiji evaluation authentication center and the like, shares expert resources, aims at the problem of users which cannot be accurately solved in the platform, can carry out online communication with the expert, obtains a corresponding solution, and evaluates the solution given by the expert.
2) Problem understanding:
word segmentation and part-of-speech tagging: the user questions are first parsed to generate word segmentation and part-of-speech tagging results in sentences, which are input into the question templates. The system adopts jieba to perform word segmentation processing and part-of-speech tagging on natural language input by a user. The jieba is a Python Chinese word segmentation component based on, and is composed of a series of models and algorithms, and mainly uses a series of default dictionaries and user-defined dictionaries to analyze and process natural language input by a user. And analyzing the input sentence by utilizing the jieba Chinese word segmentation component, so that a word segmentation result and a part of speech of the sentence can be obtained. When the natural language input by the user is decomposed by the jieba component, the system automatically analyzes complex components in the alternative sentence so that the program of the next stage can normally run.
Problem classification: inputting the target question into a pre-trained classification model to obtain the probability of the target question in each preset category; taking the preset category with the highest probability as the category of the problem; the system analyzes and judges the correspondence between sentences input by the user and the model under the guidance of the classification model, guesses the problem intention of the user, and classifies the problems input by the user. Inputting the target question into a pre-trained classification model to obtain the probability of the target question in each preset category.
The solving method comprises the following steps: the problem solving method is selected by adopting a means of calculating the similarity of texts, the higher the similarity is, the problems and corresponding answer pairs stored in a representative knowledge base are represented, specifically, the text is vectorized by adopting a TF-IDF method, the text is converted into a language understood by a computer, the TF-IDF is an abbreviation (Term Frequency-Inverse DocumentFrequency) of word Frequency-inverse document word Frequency, and the method is a common weighting technology for information retrieval and data mining. TF, i.e., word frequency, represents statistics of the frequency of occurrence of any word in the text. Words that occur frequently in all text are stop words, and the weight obtained by stopping words should not be too high for text. The meaning of the weight here is the inverse document frequency, and the frequency of occurrence of a word is inversely related to the inverse document frequency size. The TF-IDF value is obtained by multiplying the word frequency of the inverse document by the word frequency. The more indispensable a word is in an article, the higher the TF-IDF value of the word itself.
After calculating the numerical value of the word frequency of the inverse document of any term, calculating cosine similarity by using the word frequency-inverse document word frequency technology, when K terms exist in total, using a K-dimensional vector to represent any template and question, and multiplying the number of times of the occurrence of the term in the question or the template by the IDF value corresponding to the term on each dimension. And calculating cosine values of any template and question, wherein the larger the cosine values are, the higher the similarity between the question and the template is. When the similarity reaches more than 80%, the answer of the saved questions is selected as a reply. Otherwise, selecting the article searching mode.
3) Problem solving unit:
the problem of preservation:
through the stored existing questions and corresponding answer pairs, the method realizes that the questions extremely similar to the questions of the user are found, and the corresponding answers are directly returned to the user. In the process, the system can skip complex processes of question understanding, answer extraction and the like through simple question retrieval, and improve the working efficiency while guaranteeing the accuracy of answers. The answering system selects a question with highest similarity through pre-word segmentation and text similarity analysis, and judges that a question is a synonymous question described by a user question if the similarity of the question is highest and meets a confidence threshold, so that an answer preset in advance for the question is returned as an answer to the user.
Article search:
for a given question, an answer is found from the related article. And searching based on category according to the question of the user query to obtain similar articles, calculating the relevance score of each paragraph of the articles and the question of the user query according to a BM25 algorithm, and recalling the paragraph with extremely high relevance for extracting candidate answers.
The answer model takes the questions and the standard text data as input, predicts the starting position and the ending position in the standard text data, and extracts the segment between the maximum possible starting position and the ending position as an answer. When the answer model is trained, a training question and a first training paragraph comprising answers of the training question are input into the answer model to be trained to obtain a middle answer model; acquiring a second training paragraph which does not comprise answers to the training questions; and inputting the second training paragraph and the training question into the intermediate answer model for training to obtain a final answer model. Thus, when the paragraph has no answer, the system will not force to give the answer. Finally, extracting the characteristics of each candidate answer to obtain answer characteristics; inducing similar answer features into one obtained answer feature set; and taking the candidate answers comprising the maximum feature quantity in the answer feature set as answers of the questions.
4) Intelligent question-answering robot
The intelligent question-answering robot receives the questions of the user and returns answers in a manual interaction mode, so that functions of hot question prompt, similar question prompt, question answering, answer feedback mechanism and the like are realized.
Hot problem prompt: and calculating the query rate of each question according to the historical query records of all users, and returning questions with high query rate so as to recommend hot questions.
Similar problem prompting: when a user inquires a problem, similar problem prompts are provided according to the stored problem, so that the user requirements are met in time, and the user experience is greatly improved.
Solution of the problem: according to the intelligent question-answering key technology, the question asked by the user is answered, so that the manual service cost can be reduced, and the answering efficiency and the user experience are improved.
Answer feedback mechanism: and (3) feeding back good and bad according to the results returned by the intelligent question-answering system, so that the background can store error samples, and further a reference is provided for optimization reply.
According to the system provided by the embodiment of the invention, through analyzing the problem and selecting different solving methods, the knowledge of the existing resources and knowledge base can be quickly utilized to generate the answer, the multidimensional information creation adaptation field knowledge can be presented, and the problems of easiness in error and low efficiency caused by the defects of data, knowledge, experience and the like in the retrieval process can be solved. A public platform for knowledge sharing and information exchange is provided for relevant personnel of the credit, and a credit adaptation knowledge platform integrating knowledge question-answering, knowledge retrieval and knowledge reference is provided, so that the persistent storage and sharing of knowledge data are realized.
It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.
It should be noted that in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present application, unless otherwise indicated, the meaning of "plurality" means at least two.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (9)

inputting the target question into a pre-trained classification model to obtain the probability of the target question in each preset category; the classification model extracts target question feature words as input feature words, calculates the semantic similarity of a first input feature word and all model feature words of any model question, takes the model feature word with the highest semantic similarity as a target model feature word corresponding to the input feature word, and then calculates the semantic similarity of the next input feature word and other model feature words in the model question, wherein the other model feature words are model feature words after all model feature words are removed from the corresponding target model feature words; if the number of the input feature words of the target question is larger than the number of the model feature words of the model question, firstly adding the similarity between all the input feature words and the corresponding target model feature words to obtain a sum of the similarity, and dividing the sum of the similarity by the number of the input feature words to obtain the target similarity between the target question and the model question; if the number of the input feature words of the target question is smaller than the number of the model feature words of the model question, firstly adding the similarity between all the input feature words and the corresponding target model feature words to obtain a sum of the similarity, and dividing the sum of the similarity by the number of the model feature words to obtain the target similarity between the target question and the model question; the target similarity is the probability of the target question in the preset category corresponding to the model question;
the problem category acquisition module is used for acquiring user problems and categories of the problems; the obtaining the category of the problem comprises: performing word segmentation on the problem through a pre-built dictionary to obtain a word segmentation result and a part of speech; if the word segmentation result contains the unregistered word which is not recorded in the dictionary, calculating the semantic similarity between the unregistered word and the word with the same part of speech as the unregistered word in the dictionary; and taking the word with the highest semantic similarity as a target word; replacing the unregistered words in the question with the target words to obtain a target question; if all words in the word segmentation result are in the dictionary, obtaining a target question according to the word segmentation result; inputting the target question into a pre-trained classification model to obtain the probability of the target question in each preset category; the classification model extracts target question feature words as input feature words, calculates the semantic similarity of a first input feature word and all model feature words of any model question, takes the model feature word with the highest semantic similarity as a target model feature word corresponding to the input feature word, and then calculates the semantic similarity of the next input feature word and other model feature words in the model question, wherein the other model feature words are model feature words after all model feature words are removed from the corresponding target model feature words; if the number of the input feature words of the target question is larger than the number of the model feature words of the model question, firstly adding the similarity between all the input feature words and the corresponding target model feature words to obtain a sum of the similarity, and dividing the sum of the similarity by the number of the input feature words to obtain the target similarity between the target question and the model question; if the number of the input feature words of the target question is smaller than the number of the model feature words of the model question, firstly adding the similarity between all the input feature words and the corresponding target model feature words to obtain a sum of the similarity, and dividing the sum of the similarity by the number of the model feature words to obtain the target similarity between the target question and the model question; the target similarity is the probability of the target question in the preset category corresponding to the model question; taking the preset category with the highest probability as the category of the problem;
CN202310330518.0A2023-03-312023-03-31Method, device and system for retrieving and replying information and creating knowledgeActiveCN116049376B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310330518.0ACN116049376B (en)2023-03-312023-03-31Method, device and system for retrieving and replying information and creating knowledge

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310330518.0ACN116049376B (en)2023-03-312023-03-31Method, device and system for retrieving and replying information and creating knowledge

Publications (2)

Publication NumberPublication Date
CN116049376A CN116049376A (en)2023-05-02
CN116049376Btrue CN116049376B (en)2023-07-25

Family

ID=86116804

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310330518.0AActiveCN116049376B (en)2023-03-312023-03-31Method, device and system for retrieving and replying information and creating knowledge

Country Status (1)

CountryLink
CN (1)CN116049376B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116737913B (en)*2023-08-152023-11-03中移(苏州)软件技术有限公司Reply text generation method, device, equipment and readable storage medium
CN119478953A (en)*2024-11-052025-02-18浙江人形机器人创新中心有限公司 Automatic labeling method, device, equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108491433B (en)*2018-02-092022-05-03平安科技(深圳)有限公司 Chat answering method, electronic device and storage medium
EP3822816A1 (en)*2019-11-152021-05-1942 Maru Inc.Device and method for machine reading comprehension question and answer
CN112417105B (en)*2020-10-162024-03-19泰康保险集团股份有限公司Question-answering processing method and device, storage medium and electronic equipment
CN112765306B (en)*2020-12-302024-06-07金蝶软件(中国)有限公司Intelligent question-answering method, intelligent question-answering device, computer equipment and storage medium
CN115080717A (en)*2022-06-022022-09-20特赞(上海)信息科技有限公司Question-answering method and system based on text understanding reasoning
CN115292469B (en)*2022-09-282023-02-07之江实验室Question-answering method combining paragraph search and machine reading understanding

Also Published As

Publication numberPublication date
CN116049376A (en)2023-05-02

Similar Documents

PublicationPublication DateTitle
CN111475623B (en)Case Information Semantic Retrieval Method and Device Based on Knowledge Graph
CN109472033B (en)Method and system for extracting entity relationship in text, storage medium and electronic equipment
CN108664599B (en)Intelligent question-answering method and device, intelligent question-answering server and storage medium
CN117909466A (en)Domain question-answering system, construction method, electronic device and storage medium
CN116628173B (en)Intelligent customer service information generation system and method based on keyword extraction
CN111767716B (en)Method and device for determining enterprise multi-level industry information and computer equipment
CN111190997A (en) A Question Answering System Implementation Method Using Neural Networks and Machine Learning Sorting Algorithms
CN111159363A (en)Knowledge base-based question answer determination method and device
CN116049376B (en)Method, device and system for retrieving and replying information and creating knowledge
CN114840685B (en) A method for constructing knowledge graph of emergency plan
CN113282729B (en)Knowledge graph-based question and answer method and device
CN114942981B (en)Question and answer query method and device, electronic equipment and computer readable storage medium
CN117708157A (en)SQL sentence generation method and device
CN113761104A (en) Method, device and electronic device for detecting entity relationship in knowledge graph
Mustafa et al.Optimizing document classification: unleashing the power of genetic algorithms
CN117216221A (en)Intelligent question-answering system based on knowledge graph and construction method
CN118467690A (en)Knowledge question-answering method, device, equipment and storage medium
Putra et al.Document Classification using Naïve Bayes for Indonesian Translation of the Quran
CN118503454B (en)Data query method, device, storage medium and computer program product
Ma et al.Chinese text classification review
CN117291192B (en)Government affair text semantic understanding analysis method and system
CN113704422A (en)Text recommendation method and device, computer equipment and storage medium
CN118503381A (en)Method and system for searching and generating combined strong language dialogue
CN114842982B (en)Knowledge expression method, device and system for medical information system
Rybak et al.Machine learning-enhanced text mining as a support tool for research on climate change: theoretical and technical considerations

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
CB03Change of inventor or designer information

Inventor after:Yang Jiahao

Inventor after:Zhang Hongming

Inventor after:Chen Xiaopeng

Inventor after:Huang Ping

Inventor after:Li Cuifang

Inventor before:Yang Jiahao

Inventor before:Zhang Hongming

Inventor before:Chen Xiaopeng

Inventor before:Huang Ping

Inventor before:Li Cuifen

CB03Change of inventor or designer information
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp