Embodiment
Below in conjunction with accompanying drawing, many scene managements method of the voice dialogue that the embodiment of the present invention provides and device are described in detail.
Take turns dialogue described in the embodiment of the present invention more, refer to and help user to complete specific demand or task by the mutual of many wheels of user and terminal.In embodiments of the present invention, current take turns dialogue more and divide according to field (or, scene), dialogue can be taken turns for Task more.The relation schematic diagram of many scene managements that Fig. 1 is suitable for by the embodiment of the present invention, as shown in Figure 1, in embodiments of the present invention, multiple scene can be comprised, such as, travelling scene, food and drink scene, weather scene and stock scene, more such as, travelling scene can be divided into multiple subfield scape: air ticket scene, train ticket scene and hotel's scene.
Particularly, the embodiment of the present invention carries out exemplary illustration to talk with as follows:
User: the air ticket going to Qingdao.
Mobile terminal: good, to the plane ticket in Qingdao.May I ask you when to leave?
User: tomorrow sets out.
Mobile terminal: good, tomorrow is to the plane ticket in Qingdao.You estimate by where?
User: Beijing.
Dialogue to now, mobile terminal by the relative information displaying of air ticket to user.
User: there is nice Sichuan cuisine in Qingdao?
(" having the Sichuan cuisine of nice acid? " has been arrived in speech recognition)
Mobile terminal: you will continue to order air ticket?
User: be not.
Mobile terminal: hear the Sichuan cuisine having nice acid?
User: tomorrow goes to Qingdao.
Dialogue is to now, and the corresponding Sichuan cuisine shop in Qingdao is showed user by mobile terminal.
User: the dining room near Qingdao Airport.
Mobile terminal: for you find following dining room (and carrying out corresponding result displaying)
User: can rain tomorrow in Qingdao?
Mobile terminal: Qingdao shower tomorrow, temperature 16 degrees Celsius to 20 degrees Celsius, east wind 5-6 level.
Embodiment one:
The schematic flow sheet of many scene managements method of the voice dialogue that Fig. 2 provides for the embodiment of the present invention one, as shown in Figure 1, the embodiment of the present invention comprises the steps:
Step 101, from text message, obtain the demand information of user's input, wherein, text message carries out text identification and obtains from the voice messaging of user.
Step 102, at least one scene according to demand in acquisition of information scene, and obtain at least one proper vector corresponding respectively with at least one scene.
Step 103, obtains the inner product of at least one proper vector and each self-corresponding weight vectors, obtains at least one inner product.
Step 104, determines for performing scene switching action according at least one inner product, and shows the voice content corresponding with the scene after switching.
In a step 101, by speech recognition, the voice messaging of user is converted to text message, according to one embodiment of the invention, the demand information of user is obtained from the text message that identification obtains, such as, user input voice " removes the air ticket in Qingdao ", after this voice messaging is identified as text message, gets the demand information of user's input for " air ticket ".
In a step 102, obtain at least one scene in scene according to the demand information obtained in step 101, in one embodiment, at least one scene in scene can judge according in the contextual information of voice dialogue.Wherein, in scene be multiple scenes default in conversational system (such as, travelling scene shown in Fig. 1, food and drink scene, weather scene and stock scene), particularly, get the demand information " air ticket " of user's input in a step 101, tourism scene in scene can be got (wherein according to this demand information, tourism scene can also comprise multiple subfield scapes such as air ticket scene, train ticket scene and hotel's scene), further, this subfield scape of air ticket scene in this tourism scene that this demand information is corresponding.In one embodiment, at least one proper vector corresponding with this tourism scene can be obtained from voice messaging, such as, in voice messaging " removes the air ticket in Qingdao ", " go, Qingdao, air ticket " form the feature of this voice messaging, above-mentioned feature is quantized, get final product morphogenesis characters vector, in this proper vector, specifically comprise: destination (Qingdao), air ticket (demand information), in addition, can also including but not limited to following information in proper vector in the embodiment of the present invention: departure place, date, type of seat, the information such as departure time.In one embodiment, departure place, destination and date are essential information, and type of seat, departure time are optional information; By above-mentioned proper vector, the embodiment of the present invention can be made to have good generalization ability, avoid in prior art and often increase the labeled data that a new scene will be corresponding, and the model again corresponding to Training scene.
In step 103, obtain at least one proper vector of obtaining in a step 102 and with it at least one inner product of each self-corresponding weight vectors (such as, obtaining inner product is: A1, A2, A3..., Ann is the number of inner product), wherein, weight vectors is the weight vectors of training the scene characteristic that obtains corresponding according to the language material collected, it will be understood by those skilled in the art that, the embodiment of the present invention is specially inner product with score value and carries out exemplary illustration, and the concrete account form of inner product can not form the restriction to the embodiment of the present invention.
At step 104, determine the scene switching action for performing according at least one inner product obtained in step 103, and show the voice content corresponding with the scene after switching.According to one embodiment of the invention, at step 104, at least one inner product is sorted, obtain the maximal value at least one inner product, using the decision-making action of scene switching action corresponding for this inner product as corresponding scene, and it is fed back to user by the mode of voice content.In one embodiment, get the proper vector of demand information " air ticket " the corresponding scene of user, calculating its inner product is A1, A2, A3, A4, the maximal value obtained after sequence in inner product is A2, then by A2corresponding voice content (such as, its voice content is " good, to the plane ticket in Qingdao, to may I ask you and when leave ") exports to user.
Many scene managements method of the voice dialogue that the embodiment of the present invention provides, by obtaining the demand information of user's input from Textual information, at least one scene in scene is obtained according to the demand information of user's input, thus for for provide for perform and be applicable to the scene switching action of user's request, and show the voice content corresponding with the scene after switching, the problem that in conversational system, many scenes of voice dialogue switch can be solved well.In addition, represent that scene makes conversational system have good generalization ability by proper vector, new scene can be increased fast in system, and then effectively carry out many scenes handover management, fully can also understand the true service condition of user, for user provides the most rational action decision-making, enhance Consumer's Experience.
Embodiment two:
The schematic flow sheet of many scene managements method of the voice dialogue that Fig. 3 provides for the embodiment of the present invention two, Fig. 4 is the schematic diagram of the embodiment of the present invention two Scene switching action; As shown in Figure 3, the embodiment of the present invention comprises the steps:
Step 201, from text message, obtain the demand information of user's input, wherein, text message carries out text identification and obtains from the voice messaging of user.
Step 202, according to identifying in step 201 that the demand information obtained carries out scene classification to voice dialogue, obtain demand information at least one scene in the scene that is suitable for.
Step 203, at least one scene according to obtaining in step 202 carries out scene characteristic extraction to demand information, obtains at least one proper vector corresponding respectively with at least one scene.
Step 204, obtains the inner product of at least one proper vector and each self-corresponding weight vectors, obtains at least one inner product.
Step 205, sorts at least one inner product, obtains the maximal value in all inner products.
Step 206, the scene characteristic corresponding according to maximal value carries out scene switching action to demand information, and shows the speech response corresponding with the scene after switching.
In step 201, can the description of step 101 in reference example one, be no longer described in further detail at this.
In step 202., according to the demand information obtained in step 201, scene classification is carried out to voice dialogue, obtain at least one scene be applicable in scene, such as, the demand information of user's input is " Qingdao ", " air ticket ", this voice dialogue can be categorized in the subfield scape air ticket scene of travelling scene.After classification obtains multiple scene, in step 203, according to this scene, scene characteristic extraction is carried out to demand information, get and its characteristic of correspondence vector.
In step 203 and step 204, can step 102 in reference example one and step 103, be no longer described in further detail at this.
In step 205, sort, obtain the maximal value in inner product at least one inner product obtained in step 204, such as, get the proper vector of demand information " air ticket " the corresponding scene of user, calculating its inner product is A1, A2, A3, the maximal value obtained after sequence in inner product is A2.
In step 206, Fig. 4 is the scene characteristic that the schematic diagram of the embodiment of the present invention two Scene switching action is corresponding according to maximal value, response and demand information adapt voice messaging, and voice content is fed back to user, such as, the maximal value A in the inner product mentioned in step 2052corresponding voice content is " good, to the plane ticket in Qingdao, to may I ask you and when leave ", in the process of voice dialogue, then this section of voice content is fed back to user.
It will be understood by those skilled in the art that, in the application process of reality, for scene setting and study can not be exhaustive, also the scene characteristic (the outer feature of scene) outside default scene may be there is, according to one embodiment of the invention, the proper vector of action is confirmed according to the outer feature of scene and at least one scene characteristic generating scene, scene confirms that the proper vector of action is one at least one proper vector, further, if a scene characteristic in the corresponding scene of the maximal value obtained in step 205, according to this scene characteristic, demand information is responded, if the plural proper vector in the corresponding scene of maximal value, clarifies demand information according to plural proper vector, if the outer feature of the corresponding scene of maximal value and the scene characteristic in scene, the scene characteristic in scene outward feature and scene is confirmed.
In scene clarifying process, by obtaining the difference of scene vector corresponding at least plural scene characteristic, the exponent arithmetic of this difference can be obtained, determining two scene clarification proper vectors according to exponent arithmetic result; Such as, there are proper vector f_1 and the proper vector f_2 of two scenes, calculate the difference f_1-f_2 of two scene characteristic, the exponent e ^ (f_1-f_2) that this difference of further calculating is corresponding, wherein, e represents natural constant, certainly, other numerical value can also be adopted as the truth of a matter of exponent arithmetic.The proper vector that two scenes are clarified is determined according to the operation result of this index, particularly, the weight vector computation inner product of the proper vector clarify scene and scene clarification, obtains the score of these two scene clarifications, when this score value is maximum, two scenes are clarified.
Such as, in above-mentioned many wheel voice dialogue processes, when the voice messaging that user inputs by mobile terminal " there is nice Sichuan cuisine in Qingdao " identifies " Sichuan cuisine having nice acid ", mobile terminal is according to text information, now by the embodiment of the present invention two, mobile terminal can in conjunction with contextual information and resolving information, when performing scene switching action, have employed scene to confirm, and illustrate scene and confirm corresponding speech response " you will continue to order air ticket ", thus user is made to carry out scene confirmation.
Further, after user confirms "no", mobile terminal is in conjunction with contextual information and resolving information, when performing scene switching action, have employed scene clarification, and illustrate scene clarification and confirm that corresponding speech response " hears the Sichuan cuisine having nice acid ", thus user is clarified to scene.
As shown in Figure 5, the schematic flow sheet of many scene managements method of the voice dialogue provided for the embodiment of the present invention three; In embodiments of the present invention, the many scene managements method specifically performing voice dialogue for mobile terminal carries out exemplary illustration, and as shown in Figure 5, the embodiment of the present invention comprises the steps:
In off-line learning process in step 501, in crowd's survey process, multiple scene objects can be set, allow user and mobile terminal carry out taking turns interactive voice more, thus make mobile terminal have certain sticgastuc deicision; Wherein, many survey data are one of them foundations of the mobile terminal training data in the embodiment of the present invention, and the embodiment of the present invention can be made can to realize on-line prediction based on training data.
In on-line study process in step 502, if (namely voice dialogue relates to many wheels, user and mobile terminal have carried out repeatedly voice dialogue), contextual information and the resolving information of user and mobile terminal can be collected, thus get proper vector to represent the eigenstate of scene, strengthen learning model to proper vector and weight vector computation inner product; By this process, can the embodiment of the present invention be made to reach global gain maximum, by organizing contrast experiment, experiment effect all exceedes rule-based many scene managements of the prior art more.In addition, the embodiment of the present invention, by selecting the proper vector irrelevant with scene field, utilizes proper vector to represent scene characteristic, thus covers substantially and switch relevant factor to scene, improves generalization ability.The signal of proper vector can see Fig. 4.
In scene switching action in step 503, the embodiment of the present invention is with the 4 class actions exemplarily property explanation shown in table 1, include but not limited to: represent scene outer (present (NULL)), represent scene (present (d)), scene confirms clarification (clarify (d1, d2)) between (confirm (d)) and scene.Confirm clarify with scene by scene and enhance man-machine interaction capabilities of taking turns whole in dialog procedure more.
Table 1
In the Action Selection process of step 503, utilize the enhancing learning model after the optimization of training in step 502, can according to the demand information of active user, prediction performs which the class action in table 1.
By said process, the feedback information of user can be made full use of, the action that the long-term gain of user is maximum can be doped; In addition, because proper vector chooses the feature irrelevant with concrete scene, thus new scene characteristic can be introduced fast, make scheme have good extendability.
Embodiment four:
The structural representation of many scene managements device of the voice dialogue that Fig. 6 provides for the embodiment of the present invention four; As shown in Figure 6, the embodiment of the present invention comprises
First acquisition module 41, for obtaining the demand information of user's input from text message, wherein, described text message carries out text identification and obtains from the voice messaging of described user;
Second acquisition module 42, for obtaining each at least one score value self-corresponding of at least one scene in scene according to described demand information;
Handover module 43, for determining the scene switching action for performing according at least one score value described, and shows the voice content corresponding with the scene after switching.
Wherein, the second acquisition module 42 comprises:
First acquiring unit 421, for obtaining at least one scene in scene according to described demand information, and obtains at least one proper vector corresponding respectively with at least one scene described;
Second acquisition unit 422, for obtaining the score value of at least one proper vector described and each self-corresponding weight vectors, obtains at least one score value.
Further, described first acquiring unit comprises:
Scene classification subelement (not shown), for carrying out scene classification according to described demand information to described voice dialogue, obtain described demand information at least one scene in the scene that is suitable for;
Feature extraction subelement (not shown), for carrying out scene characteristic extraction according at least one scene described to described demand information, obtains at least one proper vector corresponding respectively with at least one scene described.
The detailed description of the embodiment of the present invention and Advantageous Effects with reference to the associated description in above-described embodiment one and Advantageous Effects, can not repeat them here.
Embodiment five:
The structural representation of many scene managements device of the voice dialogue that Fig. 7 provides for the embodiment of the present invention five; As shown in Figure 7, if also get feature scene from described demand information, the embodiment of the present invention also comprises:
3rd acquisition module 44, for obtaining the proper vector that scene confirms action according to the outer feature of described scene and at least one scene characteristic described from least one scene characteristic described.
Handover module 43 comprises:
Sequencing unit 431, for sorting at least one score value described, obtains the maximal value in all score values;
Determining unit 432, determines the scene switching action for performing for the scene characteristic corresponding according to described maximal value, and shows the voice content of the scene characteristic corresponding with described maximal value.
Further, described determining unit comprises:
First responds subelement (not shown), if for a scene characteristic in the corresponding described scene of described maximal value, responded described demand information according to this scene characteristic;
Second responds subelement (not shown), if for the plural proper vector in the corresponding described scene of described maximal value, clarified described demand information according to described plural proper vector;
3rd responds subelement (not shown), if for the scene characteristic in the described maximal value outer feature of corresponding described scene and described scene, confirmed the scene characteristic in the outer feature of described scene and described scene.
Further, the 3rd response subelement (not shown) comprises:
Difference obtains subelement, for the difference of scene vector corresponding at least plural scene characteristic described in obtaining;
Clarification subelement, for obtaining the exponent arithmetic of described difference, determines to clarify described plural scene characteristic according to exponent arithmetic result.
Further, this device also comprises:
4th acquisition module 45, for obtaining the target signature of at least one scene described in crowd's survey process, carries out taking turns voice training to described target signature by statistical model more;
5th acquisition module 46, for when described statistical model has sticgastuc deicision, obtains the initial value of described weight vectors.
The detailed description of the embodiment of the present invention and Advantageous Effects with reference to the associated description in above-described embodiment two and Advantageous Effects, can not repeat them here.
To sum up, the embodiment of the present invention can make full use of the feedback information of user, can dope the action that the long-term gain of user is maximum; In addition, because proper vector chooses the feature irrelevant with concrete scene, thus new scene characteristic can be introduced fast, make scheme have good extendability.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of described claim.