Summary of the invention
The purpose of this invention is to provide a kind of man-machine interface system knowledge base and construction method thereof, can limit the dialogue field of user and chat robots.
To achieve these goals, the invention provides a kind of man-machine interface system knowledge base comprises first corpus, second corpus, returns language material extraction unit, matching treatment unit and feedback unit.Described first corpus is used to store the language material that the user initiates to talk with; Described second corpus is used for the language material that dialogue is returned in the storage of branch field; The described language material extraction unit that returns is connected with described second corpus, is used for extracting according to each domain knowledge document the word language material in corresponding field, and the word language material in the corresponding field of described extraction is sent to described second corpus; Described matching treatment unit is connected with described second corpus with described first corpus, being used for the language material that the user is initiated to talk with and the language material of described first corpus mates, obtain the dialogue of coupling and initiate language material, and the language material that described dialogue is initiated in language material and described second corpus mates, and obtains the dialogue of coupling and returns language material; Described feedback unit is connected with described matching treatment unit, is used for that language material is returned in the dialogue of described coupling and feeds back to the user.
In one embodiment of the invention, described man-machine interface system knowledge base also comprises dialogue language material collector unit, described dialogue language material collector unit is connected with described first corpus, be used for the experiment that engages in the dialogue to the user, collect the dialogue of experiment and initiate language material, the dialogue initiation language material that frequency of utilization is higher than the prescribed threshold frequency carries out the formalization conclusion, and extremely described first corpus of language material is initiated in the dialogue that sends after formalization is concluded.
In another embodiment of the present invention, the described language material extraction unit that returns comprises that the first order is returned the language material extraction unit and the language material extraction unit is returned in the second level.The described first order is returned the language material extraction unit and is used for extracting the sentence in corresponding field according to each domain knowledge document; The described second level is returned the language material extraction unit and is returned the language material extraction unit with the described first order and be connected with described second corpus, the sentence that is used for returning according to the described first order the corresponding field that the language material extraction unit extracts extracts the word language material in corresponding field, and the word language material in the corresponding field of described extraction carried out the formalization classification, send the sorted word language material of formalization to described second corpus, the sorted word language material of described formalization is the language material that returns dialogue.
In an embodiment more of the present invention, the classification of described formalization classification is " item ", " behavior and action ", " modification ", " orientation and time " and " pure grammer ", and the sorted word language material of formalization in described corresponding field is preserved in described second corpus classification.
In another embodiment of the present invention, described man-machine interface system knowledge base also comprises natural language generation system, described natural language generation system is connected with described matching treatment unit and described feedback unit, be used for that language material is returned in the dialogue of described coupling and convert natural language to, and the result of described conversion is fed back to the user.
A kind of man-machine interface system knowledge base construction method comprises the steps: to store the language material that the user initiates to talk with; Extract the word language material in corresponding field according to each domain knowledge document; The word language material in the corresponding field that classification and storage is extracted, with the word language material in described corresponding field as the language material that returns dialogue; The language material that the language material that the user is initiated to talk with and the user of described storage initiate to talk with mates, and obtains the dialogue of coupling and initiates language material, and language material is initiated in described dialogue talk with language material with returning of described storage and mate, and obtains the dialogue of coupling and returns language material; Language material is returned in the dialogue of described coupling feed back to the user.
In one embodiment of the invention, described man-machine interface system knowledge base construction method also comprises: to user's experiment that engages in the dialogue, collect the dialogue of experiment and initiate language material, the dialogue that frequency of utilization is higher than the prescribed threshold frequency is initiated language material and is carried out formalization and conclude.The step of the language material that described storage user initiates to talk with is specially: language material is initiated in the dialogue after the file layout conclusion.
In another embodiment of the present invention, the described step of extracting the word language material in corresponding field according to each domain knowledge document is specially: extract the sentence in corresponding field according to each domain knowledge document; Sentence according to the corresponding field of extracting extracts the word language material in corresponding field; Word language material to the corresponding field of extracting carries out the formalization classification, and the sorted word language material of described formalization is the language material that returns dialogue.
In an embodiment more of the present invention, the step that described word language material to the corresponding field of extracting carries out the formalization classification is specially: according to " item ", " behavior and action ", " modifications ", " orientation and time " and " pure grammer " classification the word language material in the corresponding field of extracting is carried out formalization and classify.The step of the word language material in the corresponding field that described storage is extracted is specially: the sorted word language material of formalization in described corresponding field is preserved in classification.
In another embodiment of the present invention, described dialogue with described coupling is returned the step that language material feeds back to the user and is specially: language material is returned in the dialogue of described coupling convert natural language to; The result of described conversion is fed back to the user.
Compared with prior art, second corpus of man-machine interface system knowledge base of the present invention is the branch field, so have selectivity when user and chat robots dialogue, the dialogue topic can be controlled in the comparatively special field, thereby as much as possible the professional knowledge point in the field be passed to the user by the form of talking with.
In addition, man-machine interface system knowledge base of the present invention is set up the form of knowledge by first corpus, sets up the content of knowledge by second corpus, and two corpus form knowledge base jointly, reaches form and content is separated.
By following description also in conjunction with the accompanying drawings, it is more clear that the present invention will become, and these accompanying drawings are used to explain embodiments of the invention.
Embodiment
With reference now to accompanying drawing, describe embodiments of the invention, the similar elements label is represented similar elements in the accompanying drawing.
The present embodiment man-machine interface system knowledge base comprisesfirst corpus 20, dialogue languagematerial collector unit 10,second corpus 30, returns languagematerial extraction unit 40,matching treatment unit 50,feedback unit 70 and naturallanguage generation system 60.
Describedfirst corpus 20 is used to store the language material that the user initiates to talk with;
Described dialogue languagematerial collector unit 10, be connected with describedfirst corpus 20, be used for by chat tool for example chat robots platform, frequently asked question (FAQ, Frequently asked question), form such as user's questionnaire is to user's experiment that engages in the dialogue, collect the dialogue of experiment and initiate language material, the dialogue initiation language material that frequency of utilization is higher than the prescribed threshold frequency carries out the formalization conclusion, and extremely describedfirst corpus 20 of language material is initiated in the dialogue that sends after formalization is concluded.Wherein, when the user was experimentized, the number of test was many more, and the dialogue language material of reservation is many more, and the success ratio of mating later is just high more.
Describedsecond corpus 30 is used for the language material that dialogue is returned in the storage of branch field.
The described languagematerial extraction unit 40 that returns is connected with describedsecond corpus 30, is used for extracting according to each domain knowledge document the word language material in corresponding field, and the word language material in the corresponding field of described extraction is sent to describedsecond corpus 30;
Wherein, the described languagematerial extraction unit 40 that returns comprises that the first order is returned the language material extraction unit and the language material extraction unit is returned in the second level.The first order is returned the language material extraction unit and is used for extracting the sentence in corresponding field according to each domain knowledge document; The second level is returned the language material extraction unit and is returned the language material extraction unit with the described first order and be connected with describedsecond corpus 30, the sentence that is used for returning according to the described first order the corresponding field that the language material extraction unit extracts extracts the word language material in corresponding field, and the word language material in the corresponding field of described extraction carried out the formalization classification, send the sorted word language material of formalization to describedsecond corpus 30, the sorted word language material of described formalization is the language material that returns dialogue.Wherein, described formalization classification is the word language material interpolation additional information character to the corresponding field of extracting.
As from the foregoing, the described languagematerial extraction unit 40 that returns is described the one-tenth piece of writing of each domain knowledge document to break the whole up into parts and is become the sentence of dialogue, break the whole up into parts again, the word language material that meets above-mentioned classification in the sentence is extracted, and carry out the formalization classification, send to storage in describedsecond corpus 30 then.
Wherein, the classification of described formalization classification is " item ", " behavior and action ", " modification ", " orientation and time " and " pure grammer ", and the sorted word language material of formalization in described corresponding field is preserved in 30 classification of described second corpus.
Described matchingtreatment unit 50 is connected with describedsecond corpus 30 with describedfirst corpus 20, being used for the language material that the user is initiated to talk with and the language material of describedfirst corpus 20 mates, obtain the dialogue of coupling and initiate language material, and the language material that described dialogue is initiated in language material and describedsecond corpus 30 mates, and obtains the dialogue of coupling and returns language material.Matched rule is set up by XML (Extensible MarkupLanguage, extend markup language) and RegExp (Regular Expression, regular expression) in described matchingtreatment unit 50, and mates based on the matched rule of described foundation.
Described naturallanguage generation system 60 is connected with described matchingtreatment unit 50, is used for that language material is returned in the dialogue of described coupling and converts natural language to, and the result of described conversion is sent to describedfeedback unit 70.
Describedfeedback unit 70 is connected with described naturallanguage generation system 60, is used for the result of described naturallanguage generation system 60 conversions is fed back to the user.
As from the foregoing, man-machine interface system knowledge base of the present invention adopts the content (language material is returned in dialogue) of corpus-first corpus of two separation and form language material that the initiation dialogue stored respectively in second corpus (dialogue initiation language material) and knowledge that dialog procedure contains.Particularly, the present invention sets up the form of knowledge representation byfirst corpus 20, sets up the content of knowledge bysecond corpus 30, and two corpus form knowledge base jointly, reaches form and content is separated.
In addition,second corpus 30 of native system knowledge base is the branch field, so have selectivity when user and chat robots dialogue, the dialogue topic can be controlled in the comparatively special field, thereby as much as possible the professional knowledge point in the field be passed to the user by the form of talking with.Be appreciated that ground, the knowledge base that the present invention sets up can be developed various application fast, for example: question and answer learning system, advertisement recommendation system etc.Be different from general chat robots, the knowledge base that this invention generates is only applicable to a special field, and only at special theme, therefore, the user can't prevent that the user from having become random chat by learning knowledge with dispersion attention to other places.And because the content of knowledge base divides the field to collect, therefore, the knowledge of different field can constantly be added in the later stage.Therefore, this knowledge base model has expandability.
As shown in Figure 2, a kind of man-machine interface system knowledge base construction method comprises the steps:
Step S10, by chat tool for example chat robots platform, frequently asked question (FAQ, Frequentlyasked question), form such as user's questionnaire is to user's experiment that engages in the dialogue, collect the dialogue of experiment and initiate language material, the dialogue initiation language material that frequency of utilization is higher than the prescribed threshold frequency carries out the formalization conclusion;
Step S20, language material is initiated in the dialogue after the file layout conclusion;
Step S30 extracts the sentence in corresponding field according to each domain knowledge document;
Step S40 extracts the word language material in corresponding field according to the sentence in the corresponding field of extracting;
Step S50, according to " item ", " behavior and action ", " modification ", " orientation and time " and " pure grammer " classification the word language material in the corresponding field of extraction is carried out the formalization classification, preserve the sorted word language material of formalization in described corresponding field, with the sorted word language material of the formalization in described corresponding field as the language material that returns dialogue;
Step S60, the language material that the language material that the user is initiated to talk with and the user of described storage initiate to talk with mates, obtain the dialogue of coupling and initiate language material, and the dialogue language material that returns of described dialogue initiation language material and described storage is mated, obtain the dialogue of coupling and return language material;
Step S70 returns the dialogue of described coupling to language material and converts natural language to, promptly based on dialogic operation in first corpus and the knowledge content that mates in second corpus structure nature statement;
Step S80 feeds back to the user with the result of described conversion.
Above invention has been described in conjunction with most preferred embodiment, but the present invention is not limited to the embodiment of above announcement, and should contain various modification, equivalent combinations of carrying out according to essence of the present invention.