Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
As used in this disclosure, "module," "device," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The man-machine conversation interruption method is applied to electronic equipment, which can be a smart sound box with man-machine function, a smart robot (such as a doorphone robot, an inquiry robot, etc.), a smart phone, a tablet computer, a car machine, etc., and the invention is not limited thereto.
Fig. 1 shows a man-machine conversation interruption method provided by an embodiment of the present invention, which is applied to an electronic device, and the method includes:
and S10, determining a dialog intention space according to the completed dialog.
Illustratively, a completed conversation may be one or more rounds of conversations that the electronic device has completed with the user. The electronic equipment determines a plurality of dialogue intentions to form a dialogue intention space according to the completed dialogue turns in the process of man-machine dialogue with a user. The plurality of dialog intents may include a user's dialog intention determined from a dialog turn that has been completed and may also include a machine dialog intention communicated to the user by the electronic device determined from a dialog turn that has been completed.
And S20, when a new dialogue statement is detected, determining a new dialogue intention according to the new dialogue statement.
Illustratively, the new dialog statements detected by the electronic device may be from the user in the previous round of the human-machine dialog, and may also be from other users. And the new dialog sentence may be a control instruction of the user to the electronic device (is a continuation of the previous dialog or a start of a new dialog), or may be merely a dialog sentence between the user and another person (there is no correlation with the previous dialog).
Illustratively, after determining a new dialog intention of a new dialog statement, it is determined whether the new dialog intention belongs to the dialog intention space as a basis for whether to respond to the new dialog intention.
S30, determining reply content corresponding to the new dialog intent at least when the new dialog intent belongs to the dialog intent space.
Illustratively, when it is determined that the new dialog intention belongs to the predetermined dialog intention space, it is indicated that the new dialog sentence is associated with a dialog that has been completed, so that it can be determined that the detected new dialog sentence is not noise, and a response is made to the new dialog sentence, avoiding a false interruption of the human-machine dialog in noise.
In some embodiments, the dialog intent space includes a current dialog intent and an associated dialog intent. Fig. 2 is a flow chart of another embodiment of the man-machine conversation interruption method of the present invention, in which the determining of the conversation intention space according to the completed conversation includes:
s11, determining the current conversation intention according to the completed conversation;
and S12, determining the associated dialog intention associated with the current dialog intention.
In the present embodiment, the dialog intention space includes not only the current dialog intention but also the associated dialog intention associated with the current dialog intention, so that the detected new dialog intention corresponding to the new dialog sentence will not directly regard the new dialog sentence as noise even if the new dialog intention is not in accordance with the current dialog intention. If the new dialogue intention is determined to accord with the associated dialogue intention, the new dialogue statement can also be determined to be non-noise, so that the problem that the target user cannot obtain the response of the electronic equipment when the new dialogue intention corresponding to the dialogue statement of the target user cannot completely accord with the current dialogue intention is avoided.
The above current dialog intention and associated dialog intention may be understood as a desired intention that fits the current dialog expectation, being a possible intention space for the user to speak when the current dialog node is present, for example:
a. the current dialog intent may be the intent of the current node: for example, in the process of registering and sending an express, the robot asks the user what the address of the user is, and the expected intention of the current node is the address.
b. The associated dialog intent may be a possible intent of the current scene: this node user may also express the intent that i don't register, complain, i ask for express delivery fee first, etc.
In some embodiments, the dialog intent space also includes dialog intents for switching dialog tasks. Illustratively, the conversation intent to switch conversation tasks may be an intent or encyclopedia knowledge across scenes that triggers other conversation tasks, such as how much tomorrow is, how many kilometers beijing is to shanghai, and so on.
The above dialog intents (the current dialog intention and/or the associated dialog intention and/or the dialog intention switching the dialog task) constitute the desired intention of the current dialog context. If the detected new dialogue intention corresponding to the new dialogue statement is not in the expected intention range, in this case, even if the signal such as voice indicates that a person is speaking, intelligent interruption does not interrupt the current machine broadcasting. Thereby effectively avoiding the error interruption.
In some embodiments, a completed conversation may include: the above answers, the current question, the current scene, domain, etc. And the factors considered in determining the dialog intent space may include, in addition to the dialog that has been completed, speech signal features, speech noise features, speech confidence features, real-time user image features, descriptive way features in the user representation, and the like.
In some embodiments, before detecting the new conversational sentence, further comprising: and broadcasting the broadcasting content corresponding to the current conversation intention. Illustratively, during the man-machine conversation, full-duplex communication can be realized between the electronic device and the user, and the electronic device can broadcast the determined broadcast content and simultaneously detect whether a new user statement exists.
In some embodiments, step S30, determining reply content corresponding to the new dialog intent at least when the new dialog intent belongs to the dialog intent space, includes: interrupting the announcement and determining reply content corresponding to the new dialog intent at least when the new dialog intent belongs to the dialog intent space.
For example, determining reply content corresponding to the new dialog intent after interrupting the announcement of the electronic device may include: judging whether the new dialog intention is complete, and if so, determining reply content corresponding to the new dialog intention; if not, it is continuously detected whether the user is speaking continuously, if so, the previous new dialogue intention is updated according to the continuously detected user sentence, and the reply content is determined.
As shown in fig. 3, which is a waveform diagram of the man-machine conversation interruption method of the present invention, in this embodiment, a waveform of machine broadcast audio and a waveform of ambient audio and user speaking audio are displayed. In this embodiment the user statement is "i want to look up yesterday that express". Before the user speaks, the electronic equipment detects the noise while broadcasting. When the user speaks, and the electronic device determines the corresponding conversation intention compound expectation intention according to the conversation sentence detected in real time when the user speaks 'i want to find down', but further judges that the conversation intention is incomplete, so that the detection is further continued until a complete conversation intention is obtained, and then response content is determined according to the complete conversation intention.
In some embodiments, before detecting the new conversational sentence, further comprising: broadcasting broadcast content corresponding to the current conversation intention; the man-machine conversation interruption method of the embodiment further comprises the following steps: determining whether the broadcast content is a forced broadcast content;
fig. 4 is a flow chart of another embodiment of the human-machine dialog interruption method of the present invention, in which the determining of the reply content corresponding to the new dialog intent at least when the new dialog intent belongs to the dialog intent space includes:
s31, when the broadcast content is the non-mandatory broadcast content and the new dialogue intention belongs to the dialogue intention space, interrupting the play of the non-mandatory broadcast content and determining reply content corresponding to the new dialogue intention;
and S32, when the broadcast content is a forced broadcast content and the new dialog intention belongs to the dialog intention space, determining reply content corresponding to the new dialog intention after the forced broadcast content is completed.
Illustratively, the broadcast content may be set to an attribute according to the service requirement, for example, the broadcast content may be set to be broadcast forcibly or broadcast non-forcibly. For example, some broadcast contents which are helpful to improve the communication efficiency can be set as forced broadcast contents, and interruption is not allowed in the broadcast process; some irrelevant additional introduction contents can be set to be non-compulsory broadcast contents, and the broadcast process can be interrupted at any time.
In this embodiment, it is determined that the detected new dialog statement expresses an intention related to the current node, but the current node is performing broadcasting of necessary service notification, intelligent interruption may not interrupt continuous broadcasting, but simultaneously records an intention expressed by the user, and after the broadcasting is completed, the unprocessed intention is sent to dialog management for processing, so as to delay response to the intention of the user.
As shown in fig. 5, which is a flowchart of another embodiment of the man-machine conversation interruption method of the present invention, the embodiment further includes:
s41, when the new dialogue intention is determined not to belong to the dialogue intention space, inquiring whether the new dialogue intention appears once;
s42, if not, recording the new dialogue intention;
s43, if yes, determining reply content corresponding to the new dialog intention.
Another key module of intelligent interruption is an interruption fault-tolerant mechanism, which is implemented by interrupting a context. For example, in practical applications, when an intention space is constructed, it is often not possible to incorporate all the intention-most income intention spaces that should be included (e.g., missing conversation intention a), resulting in incomplete intentions. Thus, when the dialog intention a is determined from the detected user sentence, the electronic device determines that the dialog intention a does not belong to the intention space, that is, is not within the desired range without interruption, because the intention space is incomplete, and then the user may further emphasize or repeat his or her own problem. The embodiment of the invention just gives the behavior habit to the user, and the method of the embodiment is set, so that the states of nodes, time and the like of each interruption in history are recorded, and intelligent interruption judges that the response to the user is required to be interrupted even if the current intention is not in the expected intention range and exists in the record according to the above and the current speech semantics.
In some embodiments, before detecting the new conversational sentence, further comprising: broadcasting broadcast content corresponding to the current conversation intention; the man-machine conversation interrupting method further comprises the following steps: judging whether the interruption times in the preset time exceed the preset times or not before interrupting the broadcast; if yes, continuing to finish broadcasting; if not, interrupting the broadcast.
The man-machine conversation interruption method can prevent the broadcasting of the electronic equipment from being interrupted frequently, and avoid influencing normal and smooth man-machine conversation. In the embodiment, another situation that a fault-tolerant mechanism is needed is provided, if the current same node is frequently interrupted, a conversation is difficult to continue, for example, if the current same node is interrupted just before 1 second, the conversation is interrupted again immediately, even if various characteristics judge that interruption is needed, the conversation is not interrupted temporarily but continuously broadcasted, and whether interruption is needed is comprehensively evaluated after the next judgment that interruption is needed is waited.
In the embodiment of the man-machine conversation interruption method, the robot monitors the words spoken by the user simultaneously in the speaking process and judges whether the user speaks to the robot, if the user speaks to the robot (namely, non-noise), the robot stops broadcasting and finishes hearing the words spoken by the user, and then the response is made according to the semantic meaning spoken by the user.
In the embodiment of the man-machine conversation interruption method, if the user finishes speaking, the robot can judge that the user finishes speaking in time, and does not need to start speaking after the user stops for a period of time to confirm that the user finishes speaking from the voice signal.
It should be noted that for simplicity of explanation, the foregoing method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention. In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In some embodiments, the present invention provides a computer-readable storage medium, in which one or more programs including execution instructions are stored, and the execution instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any of the above man-machine conversation interruption methods of the present invention.
In some embodiments, the present invention further provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform any of the human-machine conversation interruption methods described above.
In some embodiments, an embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a human-machine conversation breaking method.
Fig. 6 is a schematic hardware structure diagram of an electronic device for performing a man-machine conversation interruption method according to another embodiment of the present application, where as shown in fig. 6, the device includes:
one ormore processors 610 and amemory 620, with oneprocessor 610 being an example in fig. 6.
The apparatus for performing the man-machine conversation interruption method may further include: aninput device 630 and anoutput device 640.
Theprocessor 610, thememory 620, theinput device 630, and theoutput device 640 may be connected by a bus or other means, such as the bus connection in fig. 6.
Thememory 620, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the man-machine conversation interruption method in the embodiments of the present application. Theprocessor 610 executes various functional applications of the server and data processing by running nonvolatile software programs, instructions and modules stored in thememory 620, namely, implements the human-machine conversation interruption method of the above-described method embodiment.
Thememory 620 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the stored data area may store data created from use of the human-machine conversation interrupting apparatus, and the like. Further, thememory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments,memory 620 optionally includes memory located remotely fromprocessor 610, which may be connected to the human conversation interruption device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Theinput device 630 may receive entered numeric or character information and generate signals related to user settings and function controls of the human-machine dialog interrupting device. Theoutput device 640 may include a display device such as a display screen.
The one or more modules are stored in thememory 620 and, when executed by the one ormore processors 610, perform the human-machine conversation interruption method of any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
The electronic device of the embodiments of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.
(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.
(4) And other electronic devices with data interaction functions.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.