Movatterモバイル変換


[0]ホーム

URL:


CN106649868B - Question and answer matching process and device - Google Patents

Question and answer matching process and device
Download PDF

Info

Publication number
CN106649868B
CN106649868BCN201611271173.2ACN201611271173ACN106649868BCN 106649868 BCN106649868 BCN 106649868BCN 201611271173 ACN201611271173 ACN 201611271173ACN 106649868 BCN106649868 BCN 106649868B
Authority
CN
China
Prior art keywords
text
question
preset
answer
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611271173.2A
Other languages
Chinese (zh)
Other versions
CN106649868A (en
Inventor
周建设
袁家政
刘宏哲
刘琴
史金生
刘杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital Normal University
Original Assignee
Capital Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capital Normal UniversityfiledCriticalCapital Normal University
Priority to CN201611271173.2ApriorityCriticalpatent/CN106649868B/en
Publication of CN106649868ApublicationCriticalpatent/CN106649868A/en
Application grantedgrantedCritical
Publication of CN106649868BpublicationCriticalpatent/CN106649868B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明提供了一种问答匹配方法及装置,涉及智能问答技术领域,包括一种问答匹配方法,包括:提取输入问句文本中的关键词;根据关键词,采用索引过滤的方式从预先建立的问题库中确定目标匹配问句文本;基于莱温斯坦距离算法,从目标匹配问句文本中确定与输入问句文本的相似度最高的最佳匹配问句文本;根据最佳匹配问句文本,输出与输入问句文本对应的答案文本。本发明可以在较短的时间内输出与输入问句相应的答案,既可缩短问答匹配时长,又可提升准确率。

The invention provides a question and answer matching method and device, which relate to the technical field of intelligent question answering, including a question and answer matching method, which includes: extracting keywords in an input question text; Determine the target matching question text in the question database; based on the Levenstein distance algorithm, determine the best matching question text with the highest similarity with the input question text from the target matching question text; according to the best matching question text, Output the answer text corresponding to the input question text. The present invention can output the answer corresponding to the input question in a short time, which can not only shorten the question-answer matching time, but also improve the accuracy.

Description

Question and answer matching process and device
Technical field
The present invention relates to intelligent answer technical fields, more particularly, to a kind of question and answer matching process and device.
Background technique
With the development of science and technology, conveniently question answering system also gradually appears in people's daily life, question and answer systemSystem can be according to providing corresponding answer automatically the problem of user, and then realizes human-computer interaction.
Question answering system it is substantially a kind of find to put question to user in existing " problem-answer " set matchQuestion text, and its corresponding answer is presented to the user.The core concept of the system is the question sentence for proposing user and problemThe problem of recording in library carries out similarity calculation.The TF-IDF question sentence based on spatial model is mostly used in existing question answering system greatlySimilarity calculating method, however, the putd question to sentence of user is mostly shorter in human-computer interaction, and this method is closed when question sentence is shorterThe accuracy rate that keyword extracts is not high, and match time is long, after user's proposition problem, needs the long period that can just receive matchingAnswer, user experience be not high.
The lower and used time longer problem for the matched mode accuracy rate of the above-mentioned question and answer used in the prior art, at presentNot yet put forward effective solutions.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of question and answer matching process and device, to alleviate in the prior artThe matched mode of question and answer existing for accuracy rate is lower and used time longer problem.
To achieve the goals above, technical solution used in the embodiment of the present invention is as follows:
In a first aspect, the embodiment of the invention provides a kind of question and answer matching process, comprising: extract in input question sentence textKeyword;According to the keyword, object matching question sentence text is determined from library the problem of pre-establishing by the way of index filteringThis;Based on Lay Weinstein distance algorithm, determined from object matching question sentence text highest with the similarity of input question sentence textBest match question sentence text;According to the best match question sentence text, output answer text corresponding with input question sentence text.
With reference to first aspect, the embodiment of the invention provides the first possible embodiments of first aspect, wherein onStating the keyword extracted in input question sentence text includes: to segment to input question sentence text, generates word sequence;Remove word sequenceIn stop words, obtain entry;Using improved comentropy formula, the corresponding weight of each entry is calculated;After improvementComentropy formula are as follows:
Wherein, H (t) is the corresponding weight of entry t;ftkThe frequency in text k, n are appeared in for entry ttFor entry t appearanceFrequency in all text collections, N are the sum of text in text collection;By all entries according to obtaining after calculatingThe size of weight is ranked up, and obtains weight sequencing table;According to pre-set withdrawal ratio, extracts and close from weight sequencing tableKeyword.
With reference to first aspect, the embodiment of the invention provides second of possible embodiments of first aspect, wherein onIt states according to keyword, determines that object matching question sentence text includes: from library the problem of pre-establishing by the way of index filteringPredetermined keyword and default question sentence text according to the keyword in input question sentence text, and the problem of pre-establish in library itBetween index relative, obtain default question sentence text matching value corresponding with input question sentence text;Matching value is greater than preset matchingThe default question sentence text of threshold value is determined as object matching question sentence text.
The possible embodiment of second with reference to first aspect, the embodiment of the invention provides the third of first aspectPossible embodiment, wherein the above-mentioned keyword according in input question sentence text, and the problem of pre-establish it is pre- in libraryIf the index relative between keyword and default question sentence text, the matching corresponding with question sentence text is inputted of default question sentence text is obtainedValue include: using the problem of pre-establishing in library with input question sentence text in the identical predetermined keyword of keyword as match passKeyword;It is default in Traversal Problem library according to the predetermined keyword in problem base and the index relative between default question sentence textQuestion sentence text, to determine the number for the matching keywords for including in default question sentence text;That will include in default question sentence textNumber with keyword is as default question sentence text matching value corresponding with input question sentence text.
Second with reference to first aspect or the third possible embodiment, the embodiment of the invention provides first aspectsThe 4th kind of possible embodiment, wherein the foundation in above problem library includes: to preset default question sentence text, Yi JiyuThe default corresponding model answer text of question sentence text, and default question sentence text and model answer text are stored in problem base;Number-mark is established for each default question sentence text;Extract the corresponding predetermined keyword of each default question sentence text;It establishes defaultIndex relative between keyword and default question sentence text;Wherein, in index relative, predetermined keyword with include default keyThe number-mark that the one or more of word presets question sentence text is corresponding.
With reference to first aspect, the embodiment of the invention provides the 5th kind of possible embodiments of first aspect, wherein onIt states according to best match question sentence text, output answer text corresponding with input question sentence text includes: to judge best match question sentenceWhether the similarity of text reaches default similarity threshold;If so, it is corresponding to search best match question sentence text from problem baseModel answer text, using model answer text as the corresponding answer text output of input question sentence text;If not, from interconnectionNet searches the corresponding network answers text of input question sentence text, using network answers text as the corresponding answer of input question sentence textText output.
Second aspect, the embodiment of the present invention also provide a kind of question and answer coalignment, comprising: extraction module, it is defeated for extractingEnter the keyword in question sentence text;First determining module, for according to keyword, from pre-establishing by the way of index filteringThe problem of library in determine object matching question sentence text;Second determining module, for being based on Lay Weinstein distance algorithm, from targetWith the highest best match question sentence text of similarity determined in question sentence text with input question sentence text;Answer output module is usedAccording to best match question sentence text, output answer text corresponding with input question sentence text.
In conjunction with second aspect, the embodiment of the invention provides the first possible embodiments of second aspect, wherein onStating extraction module includes: participle unit, for segmenting to input question sentence text, generates word sequence;Stop words removal unit,For removing the stop words in word sequence, entry is obtained;Weight calculation unit, for utilizing improved comentropy formula, meterCalculation obtains the corresponding weight of each entry;Improved comentropy formula are as follows:
Wherein, H (t) is the corresponding weight of entry t;ftkThe frequency in text k, n are appeared in for entry ttFor entry t appearanceFrequency in all text collections, N are the sum of text in text collection;Sequencing unit, for pressing all entriesIt is ranked up according to the size of the weight obtained after calculating, obtains weight sequencing table;Keyword extracting unit is set in advance for basisThe withdrawal ratio set extracts keyword from weight sequencing table.
In conjunction with second aspect, the embodiment of the invention provides second of possible embodiments of second aspect, wherein onStating the first determining module includes: matching value acquiring unit, for and pre-establishing according to the keyword in input question sentence textThe problem of library in predetermined keyword and default question sentence text between index relative, obtain default question sentence text and input question sentenceThe corresponding matching value of text;First determination unit, the default question sentence text for matching value to be greater than to preset matching threshold value determineFor object matching question sentence text.
In conjunction with second aspect, the embodiment of the invention provides the third possible embodiments of second aspect, wherein onStating answer output module includes: judging unit, for judge the similarity of best match question sentence text whether reach preset it is similarSpend threshold value;Model answer output unit, for judging that the similarity of best match question sentence text reaches default similarity thresholdWhen, the corresponding model answer text of best match question sentence text is searched from problem base, is asked model answer text as inputThe corresponding answer text output of sentence text;Network answers output unit, in the similarity for judging best match question sentence textWhen not up to presetting similarity threshold, the corresponding network answers text of input question sentence text is searched from internet, by network answersText is as the corresponding answer text output of input question sentence text.
The embodiment of the invention provides a kind of question and answer matching process and devices, are extracting the keyword in input question sentence textAfterwards, index filtering by way of from problem base determine object matching question sentence text, with reduce in problem base with input question sentenceThe question sentence range that text matches, then it is determining highest most with the similarity of input question sentence text based on Lay Weinstein distance algorithmGood matching question sentence text, finally output answer text corresponding with input question sentence text.With the question and answer used in the prior artThe mode accuracy rate matched is lower and used time longer problem is compared, and method and device provided in an embodiment of the present invention can be shorterTime in corresponding with the question sentence answer of output, can not only shorten question and answer and match duration, but also accuracy rate can be promoted.
Other features and advantages of the present invention will illustrate in the following description, also, partly become from specificationIt obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention are in specification, claimsAnd specifically noted structure is achieved and obtained in attached drawing.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperateAppended attached drawing, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior artEmbodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described belowAttached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative laborIt puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 shows a kind of question and answer matching process flow chart provided by the embodiment of the present invention;
Fig. 2 shows a kind of specific flow charts of question and answer matching process provided by the embodiment of the present invention;
Fig. 3 shows a kind of method for building up flow chart of problem base provided by the embodiment of the present invention;
Fig. 4 shows a kind of structural block diagram of question and answer coalignment provided by the embodiment of the present invention;
Fig. 5 shows a kind of specific block diagram of question and answer coalignment provided by the embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present inventionTechnical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather thanWhole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premiseUnder every other embodiment obtained, shall fall within the protection scope of the present invention.
Human-computer interaction gradually incorporates people's lives at present, from the equipment of primary response or can answer aiming at the problem that userIt is commonplace with software, realize that question and answer are matched by recording the question answering system for thering is " problem-answer " to gather substantially;SoAnd question and answer matching way in the prior art mostly uses greatly the TF-IDF Question sentence parsing calculation method based on spatial model to obtainFamily is taken, the accuracy rate of which is lower and the used time is longer, is based on this, a kind of question and answer matching process provided in an embodiment of the present inventionAnd device, the matched accuracy rate of question and answer can be improved, while shortening matching duration.It is situated between in detail to the embodiment of the present invention belowIt continues.
Embodiment one:
A kind of question and answer matching process flow chart shown in Figure 1, comprising the following steps:
Step S102 extracts the keyword in input question sentence text;The input question sentence text is that user passes through human-computer interactionThe question sentence text that mode inputs;When user uses voice input mode, then need to be converted to the phonetic problem of user into text textThis, then using the writing text as input question sentence text;
Step S104 determines target by the way of index filtering according to keyword from library the problem of pre-establishingWith question sentence text;The object matching question sentence text includes multiple texts, it is therefore intended that can reduce in advance in problem base with userThe pre-set text range that matches of input question sentence text, be conducive to promote subsequent question and answer matching speed;
Step S106 is based on Lay Weinstein distance algorithm, determines from object matching question sentence text and input question sentence textThe highest best match question sentence text of similarity;Lay Weinstein distance algorithm are as follows: grasped by editors such as insertion, deletion, replacementsMake, calculates from a character string and be transformed into the editor's number of minimum required for another character string, to measure two character stringsBetween similarity;Based on the algorithm, can fast and accurately be found from the object matching question sentence text screened in advance withThe highest matching question sentence of similarity for inputting question sentence text, using the matching question sentence as best match question sentence text;
Step S108, according to best match question sentence text, output answer text corresponding with input question sentence text.
In the above method of the present embodiment, after extracting the keyword in input question sentence text, pass through the side of index filteringFormula determines object matching question sentence text from problem base, to reduce the question sentence model to match in problem base with input question sentence textIt encloses, then the highest best match question sentence text of similarity with input question sentence text is determined based on Lay Weinstein distance algorithm, mostOutput answer text corresponding with input question sentence text afterwards.This method can export answer corresponding with question sentence in a relatively short period of timeCase can not only shorten question and answer matching duration, but also can promote accuracy rate.
Specifically, in the prior art mostly using the TF-IDF Question sentence parsing calculation method based on spatial model,This method is primarily adapted for use in the similarity for calculating longer sentence or document, and the accuracy rate of keyword extraction is carried out for short question sentenceIt is not high;But the question sentence that user is mentioned in human-computer interaction is usually shorter, therefore spatial model is based on used by the prior artTF-IDF Question sentence parsing calculation method cannot preferably reach the expected of user and answer;In addition, the TF- based on spatial modelIDF Question sentence parsing calculation method also needs to establish vector space model, and process is complex and the used time is longer, thus finally fromIt is longer to find the answer time to match with the input question sentence of user in problem base (or question answering system), in conjunction with speech recognition withAn important factor for particularity of man-machine answer, question and answer matching speed is also association user Experience Degree, in conclusion the prior art causesKeep user experience not high, and the process that the above method provided in an embodiment of the present invention obtains input question sentence text is simple, matchingUsed time is shorter, and is not limited by question sentence length, is suitable for short sentence, can effectively improve the matched accuracy rate of question and answer, gives userBring good experience.
In order to facilitate understanding with implementation, reference can be made to a kind of specific flow chart of question and answer matching process shown in Fig. 2, including withLower step:
Step S202 segments input question sentence text, generates word sequence;It is one that question sentence text dividing, which will be inputted,Input question sentence text after cutting can be known as word sequence by one individual word;
Step S204 removes the stop words in word sequence, obtains entry;To save memory space and improving search efficiency,Search engine can ignore certain words or word in index pages or processing searching request automatically, these words or word, which are referred to as, to be deactivatedWord, such as auxiliary words of mood etc. usually itself have no the word of meaning, can remove word according to the deactivated vocabulary pre-establishedStop words in sequence.
The corresponding weight of each entry is calculated using improved comentropy formula in step S206;Wherein, it improvesComentropy formula afterwards are as follows:
Wherein, H (t) is the corresponding weight of entry t;ftkThe frequency in text k, n are appeared in for entry ttFor entry t appearanceFrequency in all text collections, N are the sum of text in text collection;
The corresponding weight of each entry is calculated by above-mentioned improved comentropy formula, is facilitated subsequent based on eachThe corresponding weight of entry differentiates keyword, can preferably promote the accuracy rate for extracting keyword, and use comentropy formulaCalculating process it is relatively simple, the used time for obtaining result is shorter, helps to improve question and answer matching speed.
All entries are ranked up according to the size of the weight obtained after calculating, obtain weight sequencing table by step S208;It can sort, can also sort from small to large from large to small, according to the actual situation flexibly setting.
Step S210 extracts keyword from weight sequencing table according to pre-set withdrawal ratio;For example, setting mentionsTaking ratio is 30 percent, then highest preceding 30 percent keyword of weight, such as weight are extracted from weight sequencing tableSequencing table is that ranking, total record have 100 keywords, then extract preceding 30 keywords from large to small according to weight.This modeIt can effectively reduce the scope, help to promote subsequent question and answer matching efficiency.
In order to make it easy to understand, the embodiment of the invention provides the specific example of applying step S202 to step S210 a kind of,For example, input question sentence text is " Chinese four great classical masterpieces ", the word sequence of " China/tetra-/big/masterpiece " is obtained after participle, soAfter remove stop words, and the weight of each entry is calculated using average information entropy (i.e. above-mentioned improved comentropy formula), mostObtaining keyword eventually is { China, masterpiece }.
Step S212, according to the keyword in input question sentence text, and default key the problem of pre-establish in libraryIndex relative between word and default question sentence text obtains default question sentence text matching value corresponding with question sentence text is inputted.
Following present a kind of concrete implementation modes:
(1) using the problem of pre-establishing in library with the identical predetermined keyword of keyword in input question sentence text asWith keyword;
(2) according to the predetermined keyword in problem base and the index relative between default question sentence text, in Traversal Problem libraryDefault question sentence text, to determine the number for the matching keywords for including in default question sentence text;It will be wrapped in default question sentence textThe number of the matching keywords contained is as default question sentence text matching value corresponding with input question sentence text.
In addition, in order to make it easy to understand, the present embodiment gives a kind of example using above-mentioned implementation: assuming that inputQuestion sentence text has m keyword, then can be used and is initialized as 0, length is the one-dimension array of N to record each text in problem baseThe number k value for the designated key word for including, the index chain for m keyword for then including in traversal input question sentence are every to occur oneThe corresponding position of array is just added 1 by a text, after the completion of traversal, just obtains the k value of full text, which is matching value.
The default question sentence text that matching value is greater than preset matching threshold value is determined as object matching question sentence text by step S214This;
Default question sentence text is measured by above-mentioned matching value and inputs the similarity between question sentence text, it is as a result more quasi-Really reliably, and according to matching value the default question sentence text in problem base is screened in advance, can effectively reduce energy in problem baseEnough question sentence ranges with input question matching, facilitate the efficiency for promoting subsequent determining matched text, shorten match time.
Step S216 is based on Lay Weinstein distance algorithm, determines from object matching question sentence text and input question sentence textThe highest best match question sentence text of similarity;It, can be fast and accurately from the object matching screened in advance based on the algorithmFound in question sentence text with input question sentence text similarity it is highest match question sentence (same keyword for including is most),Using the matching question sentence as best match question sentence text.
Step S218, judges whether the similarity of best match question sentence text reaches default similarity threshold;If so, holdingRow step S220;If not, executing step S222;After determining best match question sentence text in problem base, this step can be withFinally examine the best text if appropriate for as matching result, without as the prior art finally find it is most suitableAnswer is blindly exported after the matching result of conjunction, causes to give an irrelevant answer, causes user experience not high.
Step S220 searches the corresponding model answer text of best match question sentence text, by model answer from problem baseText is as the corresponding answer text output of input question sentence text;Wherein, each default question sentence text is previously stored in problem baseSheet and corresponding model answer text.
Step S222 searches the corresponding network answers text of input question sentence text from internet, network answers text is madeFor the corresponding answer text output of input question sentence text.It can be directly defeated by the input question sentence of user by modes such as rustling sound enginesEnter into internet with Network Search answer text, when not finding the text to match with user's question sentence in problem base,Meet user demand by network answers text, promotes user experience.
Wherein, it is step S102 in Fig. 1 that the step S202 in Fig. 2 is corresponding to step S210;Step S212 in Fig. 2Corresponding with step S214 is the step S104 in Fig. 1;Step S216 in Fig. 2 is corresponding with the step S106 in Fig. 1;Fig. 2In step S218 it is corresponding to step S222 be step S108 in Fig. 1.
By executing the above-mentioned steps in Fig. 2, can fast and accurately obtain corresponding with the input question sentence text of userAnswer text, and then promoted user experience.
Further, a kind of establishment process of problem base is given in the present embodiment, specifically, shown in Figure 3A kind of method for building up flow chart of problem base, the foundation of problem base are referred to following step:
Step S302 presets default question sentence text, and model answer text corresponding with default question sentence text, andDefault question sentence text and model answer text are stored in problem base;
Step S304 establishes number-mark for each default question sentence text;
Step S306 extracts the corresponding predetermined keyword of each default question sentence text;Wherein, the tool of predetermined keyword is extractedBody implementation is referred to the step S202 in Fig. 2 to step S210.
Step S308 establishes the index relative between predetermined keyword and default question sentence text;Wherein, in index relativeIn, predetermined keyword is corresponding with the default number-mark of question sentence text of the one or more comprising predetermined keyword.
Problem base provided by the embodiment of the present invention, not just for the conjunction of the question answering system " problem-answer " of the prior artCollection, but also profound processing has been carried out to the intersection of " problem-answer ", such as keyword is extracted in advance to each question sentence, andKeyword and the question sentence comprising the keyword are established into index, and facilitate to reduce memory space by way of number,Search speed is improved simultaneously, further shortens the used time for applying the problem library lookup text in question and answer matching process.
In conclusion above-mentioned question and answer matching process provided in an embodiment of the present invention, can export in a relatively short period of time withThe corresponding answer of input question sentence of user can achieve and export answer in 1s, preferably shorten question and answer matching duration, but alsoImprove accuracy rate, comprehensive the user experience is improved degree.
Embodiment two:
For question and answer matching process provided in embodiment one, the embodiment of the invention provides a kind of matchings of question and answer to fillIt sets, shown in Figure 4, which comprises the following modules:
Extraction module 402, for extracting the keyword in input question sentence text;
First determining module 404 is used for according to keyword, by the way of index filtering from library the problem of pre-establishingDetermine object matching question sentence text;
Second determining module 406, for be based on Lay Weinstein distance algorithm, from object matching question sentence text determine with it is defeatedEnter the highest best match question sentence text of similarity of question sentence text;
Answer output module 408, for according to best match question sentence text, output answer corresponding with input question sentence textText.
In the above-mentioned apparatus of the present embodiment, after the keyword that input question sentence text is extracted by extraction module 402, by theOne determining module 404 determines object matching question sentence text by the way of index filtering from problem base, to reduce in problem baseThe question sentence range to match with input question sentence text, then determined by the second determining module 406 based on Lay Weinstein distance algorithmWith the highest best match question sentence text of similarity of input question sentence text, is finally exported and inputted by answer output module 408The corresponding answer text of question sentence text.The device can export answer corresponding with question sentence in a relatively short period of time, can both shortenQuestion and answer match duration, and can promote accuracy rate.
In order to facilitate understanding with implementation, on the basis of fig. 4, reference can be made to a kind of tool of question and answer coalignment shown in fig. 5Body structural block diagram, in which:
Extraction module 402 includes: participle unit 4021, for segmenting to input question sentence text, generates word sequence;StopWord removal unit 4022 obtains entry for removing the stop words in word sequence;Weight calculation unit 4023, for utilizingThe corresponding weight of each entry is calculated in improved comentropy formula;Improved comentropy formula are as follows:
Wherein, H (t) is the corresponding weight of entry t;ftkThe frequency in text k, n are appeared in for entry ttFor entry t appearanceFrequency in all text collections, N are the sum of text in text collection;
It further include sequencing unit 4024, for all entries to be ranked up according to the size of the weight obtained after calculating,Obtain weight sequencing table;Keyword extracting unit 4025, for being mentioned from weight sequencing table according to pre-set withdrawal ratioTake keyword.
First determining module 404 includes: matching value acquiring unit 4041, for according to the key in input question sentence textWord, and predetermined keyword the problem of pre-establish in library and the index relative between default question sentence text, obtain pre- rhetoric questionSentence text matching value corresponding with input question sentence text;Specifically, matching value acquiring unit 4041 may include matching keywordsDetermine subelement, default key identical with the keyword in the input question sentence text in library the problem of for that will pre-establishWord is as matching keywords;And matching value determines subelement, for according to predetermined keyword in described problem library and defaultIndex relative between question sentence text traverses the default question sentence text in described problem library, with the determination default question sentence textIn include the matching keywords number;The number for the matching keywords for including in the default question sentence text is madeFor default question sentence text matching value corresponding with the input question sentence text.The above subelement is not shown in FIG. 5.
First determining module 404 further includes the first determination unit 4042, for matching value to be greater than preset matching threshold valueDefault question sentence text is determined as object matching question sentence text.
Answer output module 408 includes: judging unit 4081, for judge best match question sentence text similarity whetherReach default similarity threshold;Model answer output unit 4082, for judging that the similarity of best match question sentence text reachesWhen to default similarity threshold, the corresponding model answer text of best match question sentence text is searched from problem base, standard is answeredCase text is as the corresponding answer text output of input question sentence text;Network answers output unit 4083, for best in judgementWhen the similarity of matching question sentence text not up to presets similarity threshold, the corresponding network of input question sentence text is searched from internetAnswer text, using network answers text as the corresponding answer text output of input question sentence text.
The technical effect of device provided by the present embodiment, realization principle and generation is identical with previous embodiment, for letterIt describes, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.
In conclusion question and answer matching process provided in an embodiment of the present invention and device, in extracting input question sentence textAfter keyword, index filtering by way of from problem base determine object matching question sentence text, with reduce in problem base with it is defeatedEnter the question sentence range that question sentence text matches, then determines based on Lay Weinstein distance algorithm and input the similarity of question sentence text mostHigh best match question sentence text, finally output answer text corresponding with input question sentence text.With in the prior art useThe matched mode accuracy rate of question and answer is lower and used time longer problem is compared, and method and device provided in an embodiment of the present invention can be withAnswer corresponding with question sentence is exported in a relatively short period of time, can not only shorten question and answer matching duration, but also can promote accuracy rate.
The computer program product of question and answer matching process and device provided by the embodiment of the present invention, including store programThe computer readable storage medium of code, the instruction that said program code includes can be used for executing described in previous methods embodimentMethod, specific implementation can be found in embodiment of the method, details are not described herein.
In addition, in the description of the embodiment of the present invention unless specifically defined or limited otherwise, term " installation ", " phaseEven ", " connection " shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected;It canTo be mechanical connection, it is also possible to be electrically connected;It can be directly connected, can also can be indirectly connected through an intermediaryConnection inside two elements.For the ordinary skill in the art, above-mentioned term can be understood at this with concrete conditionConcrete meaning in invention.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent productIt is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other wordsThe part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meterCalculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be aPeople's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are depositedThe various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.
In the description of the present invention, it should be noted that term " center ", "upper", "lower", "left", "right", "vertical",The orientation or positional relationship of the instructions such as "horizontal", "inner", "outside" be based on the orientation or positional relationship shown in the drawings, merely toConvenient for description the present invention and simplify description, rather than the device or element of indication or suggestion meaning must have a particular orientation,It is constructed and operated in a specific orientation, therefore is not considered as limiting the invention.In addition, term " first ", " second "," third " is used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present inventionTechnical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hairIt is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the artIn the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be lightIt is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not makeThe essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the inventionWithin the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

Translated fromChinese
1.一种问答匹配方法,其特征在于,包括:1. a question and answer matching method, is characterized in that, comprises:提取输入问句文本中的关键词;Extract the keywords in the input question text;根据所述关键词,采用索引过滤的方式从预先建立的问题库中确定目标匹配问句文本;According to the keyword, the target matching question text is determined from the pre-established question bank by means of index filtering;基于莱温斯坦距离算法,从所述目标匹配问句文本中确定与所述输入问句文本的相似度最高的最佳匹配问句文本;Based on the Levenstein distance algorithm, determine the best matching question text with the highest similarity with the input question text from the target matching question text;根据所述最佳匹配问句文本,输出与所述输入问句文本对应的答案文本;According to the best matching question text, output the answer text corresponding to the input question text;提取输入问句文本中的关键词包括:Extracting keywords from the input question text includes:对输入问句文本进行分词,生成词序列;Segment the input question text to generate a word sequence;去除所述词序列中的停用词,得到词条;Remove the stop words in the word sequence to obtain the entry;利用改进后的信息熵公式,计算得到各个词条对应的权重;所述改进后的信息熵公式为:Using the improved information entropy formula, the corresponding weights of each entry are calculated; the improved information entropy formula is:其中,H(t)为词条t对应的权重;ftk为词条t出现在文本k中的频率,nt为词条t出现在所有的文本集合当中的频率,N为文本集合中文本的总数;Among them, H(t) is the weight corresponding to the entry t; ftk is the frequency of the entry t in the text k, nt is the frequency of the entry t in all the text sets, N is the text in the text set total;将所有词条按照计算后得到的所述权重的大小进行排序,得到权重排序表;Sort all the entries according to the size of the weight obtained after calculation, and obtain a weight sorting table;根据预先设置的提取比例,从所述权重排序表中提取关键词。According to a preset extraction ratio, keywords are extracted from the weight ranking table.2.根据权利要求1所述的方法,其特征在于,根据所述关键词,采用索引过滤的方式从预先建立的问题库中确定目标匹配问句文本包括:2. method according to claim 1, is characterized in that, according to described keyword, adopts the mode of index filtering to determine target matching question text from the question bank established in advance comprises:根据所述输入问句文本中的关键词,以及预先建立的问题库中的预设关键词与预设问句文本之间的索引关系,得到所述预设问句文本与所述输入问句文本对应的匹配值;According to the keywords in the input question text and the index relationship between the preset keywords in the pre-established question database and the preset question text, the preset question text and the input question are obtained The matching value corresponding to the text;将所述匹配值大于预设匹配阈值的所述预设问句文本确定为目标匹配问句文本。The preset question text whose matching value is greater than the preset matching threshold is determined as the target matching question text.3.根据权利要求2所述的方法,其特征在于,根据所述输入问句文本中的关键词,以及预先建立的问题库中的预设关键词与预设问句文本之间的索引关系,得到所述预设问句文本与所述输入问句文本对应的匹配值包括:3. The method according to claim 2, wherein, according to the keywords in the input question text, and the index relationship between the preset keywords in the pre-established question bank and the preset question text , obtaining the matching value corresponding to the preset question text and the input question text includes:将预先建立的问题库中与所述输入问句文本中的关键词相同的预设关键词作为匹配关键词;Using the preset keyword that is the same as the keyword in the input question text in the pre-established question bank as a matching keyword;根据所述问题库中的预设关键词与预设问句文本之间的索引关系,遍历所述问题库中的预设问句文本,以确定所述预设问句文本中包含的所述匹配关键词的个数;将所述预设问句文本中包含的所述匹配关键词的个数作为所述预设问句文本与所述输入问句文本对应的匹配值。According to the index relationship between the preset keywords in the question database and the preset question text, traverse the preset question text in the question database to determine the preset question text contained in the preset question text. The number of matching keywords; the number of matching keywords contained in the preset question text is taken as the matching value corresponding to the preset question text and the input question text.4.根据权利要求2或3所述的方法,其特征在于,所述问题库的建立包括:4. The method according to claim 2 or 3, wherein the establishment of the question base comprises:预先设置预设问句文本,以及与所述预设问句文本对应的标准答案文本,并将所述预设问句文本和所述标准答案文本存储于所述问题库中;Preset preset question text and standard answer text corresponding to the preset question text, and store the preset question text and the standard answer text in the question bank;为各个所述预设问句文本建立编号标识;establishing a numbered identification for each of the preset question texts;提取各个所述预设问句文本对应的预设关键词;extracting preset keywords corresponding to each of the preset question texts;建立所述预设关键词与所述预设问句文本之间的索引关系;其中,在所述索引关系中,所述预设关键词与包含所述预设关键词的一个或多个预设问句文本的编号标识相对应。establishing an index relationship between the preset keyword and the preset question text; wherein, in the index relationship, the preset keyword is associated with one or more preset keywords that contain the preset keyword. The numbered identification of the question text corresponds to.5.根据权利要求1所述的方法,其特征在于,根据最佳匹配问句文本,输出与所述输入问句文本对应的答案文本包括:5. The method according to claim 1, wherein, according to the best matching question text, outputting the answer text corresponding to the input question text comprises:判断所述最佳匹配问句文本的相似度是否达到预设相似度阈值;Judging whether the similarity of the best matching question text reaches a preset similarity threshold;如果是,从所述问题库中查找所述最佳匹配问句文本对应的标准答案文本,将所述标准答案文本作为所述输入问句文本对应的答案文本输出;If yes, look up the standard answer text corresponding to the best matching question text from the question bank, and output the standard answer text as the answer text corresponding to the input question text;如果否,从互联网查找所述输入问句文本对应的网络答案文本,将所述网络答案文本作为所述输入问句文本对应的答案文本输出。If not, search the network answer text corresponding to the input question text from the Internet, and output the network answer text as the answer text corresponding to the input question text.6.一种问答匹配装置,其特征在于,包括:6. A question and answer matching device, characterized in that, comprising:提取模块,用于提取输入问句文本中的关键词;The extraction module is used to extract the keywords in the input question text;第一确定模块,用于根据所述关键词,采用索引过滤的方式从预先建立的问题库中确定目标匹配问句文本;The first determining module is used for determining the target matching question text from the pre-established question bank by means of index filtering according to the keyword;第二确定模块,用于基于莱温斯坦距离算法,从所述目标匹配问句文本中确定与所述输入问句文本的相似度最高的最佳匹配问句文本;The second determination module is used for determining the best matching question text with the highest similarity with the input question text from the target matching question text based on the Levenstein distance algorithm;答案输出模块,用于根据所述最佳匹配问句文本,输出与所述输入问句文本对应的答案文本;an answer output module, configured to output the answer text corresponding to the input question text according to the best matching question text;所述提取模块包括:The extraction module includes:分词单元,用于对输入问句文本进行分词,生成词序列;The word segmentation unit is used to segment the input question text to generate a word sequence;停用词去除单元,用于去除所述词序列中的停用词,得到词条;a stop word removal unit, used to remove stop words in the word sequence to obtain entry;权重计算单元,用于利用改进后的信息熵公式,计算得到各个词条对应的权重;所述改进后的信息熵公式为:The weight calculation unit is used to calculate the corresponding weight of each entry by using the improved information entropy formula; the improved information entropy formula is:其中,H(t)为词条t对应的权重;ftk为词条t出现在文本k中的频率,nt为词条t出现在所有的文本集合当中的频率,N为文本集合中文本的总数;Among them, H(t) is the weight corresponding to the entry t; ftk is the frequency of the entry t in the text k, nt is the frequency of the entry t in all the text sets, N is the text in the text set total;排序单元,用于将所有词条按照计算后得到的所述权重的大小进行排序,得到权重排序表;The sorting unit is used to sort all the entries according to the size of the weights obtained after calculation, and obtain a weight sorting table;关键词提取单元,用于根据预先设置的提取比例,从所述权重排序表中提取关键词。A keyword extraction unit, configured to extract keywords from the weight sorting table according to a preset extraction ratio.7.根据权利要求6所述的装置,其特征在于,所述第一确定模块包括:7. The apparatus according to claim 6, wherein the first determining module comprises:匹配值获取单元,用于根据所述输入问句文本中的关键词,以及预先建立的问题库中的预设关键词与预设问句文本之间的索引关系,得到所述预设问句文本与所述输入问句文本对应的匹配值;A matching value obtaining unit, configured to obtain the preset question according to the keywords in the input question text and the index relationship between the preset keywords in the pre-established question database and the preset question text a matching value corresponding to the text and the text of the input question;第一确定单元,用于将所述匹配值大于预设匹配阈值的所述预设问句文本确定为目标匹配问句文本。A first determining unit, configured to determine the preset question text whose matching value is greater than a preset matching threshold as the target matching question text.8.根据权利要求6所述的装置,其特征在于,所述答案输出模块包括:8. The apparatus according to claim 6, wherein the answer output module comprises:判断单元,用于判断所述最佳匹配问句文本的相似度是否达到预设相似度阈值;a judging unit for judging whether the similarity of the best matching question text reaches a preset similarity threshold;标准答案输出单元,用于在判断所述最佳匹配问句文本的相似度达到预设相似度阈值时,从所述问题库中查找所述最佳匹配问句文本对应的标准答案文本,将所述标准答案文本作为所述输入问句文本对应的答案文本输出;The standard answer output unit is used to find the standard answer text corresponding to the best matching question text from the question bank when judging that the similarity of the best matching question text reaches a preset similarity threshold, and The standard answer text is output as the answer text corresponding to the input question text;网络答案输出单元,用于在判断所述最佳匹配问句文本的相似度未达到预设相似度阈值时,从互联网查找所述输入问句文本对应的网络答案文本,将所述网络答案文本作为所述输入问句文本对应的答案文本输出。The network answer output unit is used to search the network answer text corresponding to the input question text from the Internet when it is judged that the similarity of the best matching question text does not reach a preset similarity threshold, and convert the network answer text It is output as the answer text corresponding to the input question text.
CN201611271173.2A2016-12-302016-12-30Question and answer matching process and deviceActiveCN106649868B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201611271173.2ACN106649868B (en)2016-12-302016-12-30Question and answer matching process and device

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201611271173.2ACN106649868B (en)2016-12-302016-12-30Question and answer matching process and device

Publications (2)

Publication NumberPublication Date
CN106649868A CN106649868A (en)2017-05-10
CN106649868Btrue CN106649868B (en)2019-03-26

Family

ID=58839104

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201611271173.2AActiveCN106649868B (en)2016-12-302016-12-30Question and answer matching process and device

Country Status (1)

CountryLink
CN (1)CN106649868B (en)

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107273350A (en)*2017-05-162017-10-20广东电网有限责任公司江门供电局A kind of information processing method and its device for realizing intelligent answer
CN107590192B (en)*2017-08-112023-05-05深圳市腾讯计算机系统有限公司 Mathematical processing method, device, equipment and storage medium for text problems
CN107862058B (en)*2017-11-102021-10-22北京百度网讯科技有限公司 Method and apparatus for generating information
CN110110049A (en)*2017-12-292019-08-09深圳市优必选科技有限公司 Service consulting method, device, system, service robot and storage medium
CN108345644A (en)*2018-01-152018-07-31阿里巴巴集团控股有限公司A kind of method and device of data processing
CN108509482B (en)*2018-01-232020-12-08深圳市阿西莫夫科技有限公司Question classification method and device, computer equipment and storage medium
CN108415980A (en)*2018-02-092018-08-17平安科技(深圳)有限公司Question and answer data processing method, electronic device and storage medium
CN110555093B (en)*2018-03-302024-02-13华为技术有限公司 Text matching method, device and equipment
CN108595619A (en)*2018-04-232018-09-28海信集团有限公司A kind of answering method and equipment
CN108595629B (en)*2018-04-242021-08-06北京慧闻科技发展有限公司Data processing method and application for answer selection system
CN110597966A (en)*2018-05-232019-12-20北京国双科技有限公司Automatic question answering method and device
CN108897771B (en)*2018-05-302021-03-12东软集团股份有限公司Automatic question answering method and device, computer readable storage medium and electronic equipment
CN108763529A (en)*2018-05-312018-11-06苏州大学A kind of intelligent search method, device and computer readable storage medium
CN109190115B (en)*2018-08-142023-05-26重庆邂智科技有限公司 A text matching method, device, server and storage medium
CN109582966A (en)*2018-12-032019-04-05北京容联易通信息技术有限公司A kind of information matching method and device
CN109597994B (en)*2018-12-042023-06-06挖财网络技术有限公司Short text problem semantic matching method and system
CN109800416A (en)*2018-12-142019-05-24天津大学A kind of power equipment title recognition methods
CN109684442B (en)*2018-12-212021-03-23东软集团股份有限公司Text retrieval method, device, equipment and program product
WO2020133360A1 (en)*2018-12-292020-07-02深圳市优必选科技有限公司Question text matching method and apparatus, computer device and storage medium
CN109783516A (en)*2019-02-192019-05-21北京奇艺世纪科技有限公司A kind of query statement retrieval answering method and device
CN111611356B (en)*2019-02-252023-06-16北京嘀嘀无限科技发展有限公司Information searching method, device, electronic equipment and readable storage medium
CN111858863B (en)*2019-04-292023-07-14深圳市优必选科技有限公司Reply recommendation method, reply recommendation device and electronic equipment
CN110737751B (en)*2019-09-062023-10-20平安科技(深圳)有限公司Search method and device based on similarity value, computer equipment and storage medium
CN111782776B (en)*2019-09-262025-07-18北京沃东天骏信息技术有限公司 A method and device for realizing intention recognition by slot filling
CN110727764A (en)*2019-10-102020-01-24珠海格力电器股份有限公司Phone operation generation method and device and phone operation generation equipment
CN111241378A (en)*2020-01-072020-06-05郇延强Teaching information query method and device
CN113807148B (en)*2020-06-162024-07-02阿里巴巴集团控股有限公司Text recognition matching method and device and terminal equipment
CN111858891A (en)*2020-07-232020-10-30平安科技(深圳)有限公司 Question and answer library construction method, device, electronic device and storage medium
CN113971211A (en)*2020-07-242022-01-25中移物联网有限公司 Question and answer method, device and readable storage medium
CN111984763B (en)*2020-08-282023-09-19海信电子科技(武汉)有限公司Question answering processing method and intelligent device
CN112905760B (en)*2021-02-022023-01-13天津弈博益商信息科技有限公司Instant messaging intelligent question-answering, quality testing and anti-cheating system
CN114936272A (en)*2021-04-272022-08-23华为技术有限公司Question answering method and system
CN114003701A (en)*2021-10-282022-02-01平安国际智慧城市科技股份有限公司Intelligent question-answering processing method and device, computer equipment and storage medium
CN114203165A (en)*2021-11-302022-03-18南京理工大学Method and system for quickly identifying incomplete voice of old people
CN115239533A (en)*2022-07-302022-10-25谢幸杏Interactive online English teaching system and use method thereof
CN116244418B (en)*2023-05-112023-09-01腾讯科技(深圳)有限公司Question answering method, device, electronic equipment and computer readable storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101630312A (en)*2009-08-192010-01-20腾讯科技(深圳)有限公司Clustering method for question sentences in question-and-answer platform and system thereof
CN102193929B (en)*2010-03-082013-03-13阿里巴巴集团控股有限公司Method and equipment for searching by using word information entropy
US9299024B2 (en)*2012-12-112016-03-29International Business Machines CorporationMethod of answering questions and scoring answers using structured knowledge mined from a corpus of data
US20140229580A1 (en)*2013-02-122014-08-14Sony CorporationInformation processing device, information processing method, and information processing system
CN105989040B (en)*2015-02-032021-02-09创新先进技术有限公司Intelligent question and answer method, device and system
CN105955976B (en)*2016-04-152019-05-14中国工商银行股份有限公司A kind of automatic answering system and method
CN105975460A (en)*2016-05-302016-09-28上海智臻智能网络科技股份有限公司Question information processing method and device

Also Published As

Publication numberPublication date
CN106649868A (en)2017-05-10

Similar Documents

PublicationPublication DateTitle
CN106649868B (en)Question and answer matching process and device
CN107993724B (en)Medical intelligent question and answer data processing method and device
WO2021093755A1 (en)Matching method and apparatus for questions, and reply method and apparatus for questions
CN113112164A (en)Transformer fault diagnosis method and device based on knowledge graph and electronic equipment
CN103927358B (en)text search method and system
CN103365924B (en)A kind of method of internet information search, device and terminal
CN112667794A (en)Intelligent question-answer matching method and system based on twin network BERT model
CN118839021B (en) A dynamic relevance enhanced retrieval generation system and method driven by intelligent knowledge graph
CN105159938B (en)Search method and device
CN108268539A (en)Video matching system based on text analyzing
WO2013170587A1 (en)Multimedia question and answer system and method
CN112836029A (en) A graph-based document retrieval method, system and related components
CN104008090A (en)Multi-subject extraction method based on concept vector model
CN108763529A (en)A kind of intelligent search method, device and computer readable storage medium
CN111782800B (en)Intelligent conference analysis method for event tracing
CN118193704A (en)Geotechnical engineering knowledge question-answering system based on large language model
CN108664599A (en)Intelligent answer method, apparatus, intelligent answer server and storage medium
CN115563313A (en) Semantic retrieval system for literature and books based on knowledge graph
CN104268230A (en)Method for detecting objective points of Chinese micro-blogs based on heterogeneous graph random walk
CN112883165A (en)Intelligent full-text retrieval method and system based on semantic understanding
CN115905487A (en)Document question and answer method, system, electronic equipment and storage medium
CN103294741B (en)Similar document retrieval auxiliary device and similar document retrieval householder method
CN109255014A (en)The recognition methods of file keyword accuracy is promoted based on many algorithms
CN119202137A (en) Knowledge retrieval method and device
CN112307204A (en) Automatic identification method, system, device and storage medium based on clustering hierarchical relationship

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp