Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The first embodiment is as follows:
fig. 1 is a schematic flowchart of a human-computer interaction method according to an embodiment of the present invention. For a clear description of the man-machine interaction method provided by the first embodiment of the present invention, please refer to fig. 1.
The man-machine interaction method provided by the embodiment of the invention comprises the following steps:
and S101, receiving a voice signal.
In an embodiment, the device/apparatus applying the man-machine interaction method provided in this embodiment is in a mute detection state before receiving a voice signal, where power consumption of the device/apparatus is extremely low, so that the device/apparatus can maintain the capability of long-time operation.
In one embodiment, in step S101, the method may further include: when the volume of the received voice signal reaches a certain threshold, the process proceeds to step S102.
And S102, detecting the characteristics of the voice signal source.
Specifically, the voice signal source includes, but is not limited to, a user who utters a voice signal. The characteristics of the source of the voice signal may include the face orientation of the user originating the voice signal or the relative orientation of the user originating the voice signal and the controlled device.
In an embodiment, the detection of the face orientation of the user who sends out the voice signal may be performed by the control apparatus or the controlled device through an image capturing apparatus, wherein the image capturing apparatus may be, but is not limited to being, integrated in the control apparatus or the controlled device.
In one embodiment, the detection of the relative position of the user sending the voice signal and the controlled device may be performed by the control device or the controlled device through an image acquisition device and/or a sound source positioning device, wherein the image acquisition device and/or the sound source positioning device may be integrated in the controlled device or the control device.
In one embodiment, the control device may perform unified control on a plurality of controlled devices, where the controlled devices may be, for example, electronic curtains, televisions, electronic doors, air conditioners, electric lamps, and the like. In other embodiments, the control device may control only one controlled device, and the control device may be integrated in the controlled device.
S103, judging whether the voice signal comprises a wake-up word.
In one embodiment, the wake-up word refers to a specific vocabulary for waking up the controlling device or the controlled device. The wake word may be the name of the device or the name of a voice recognition program in the device, such as "Temple genius," "Xiaoai classmate," "Voice Assistant," or the like.
In other embodiments, step S103 is performed: before judging whether the voice signal comprises the awakening word, the method can comprise the following steps: the method comprises the steps of obtaining the face of a user, and judging whether the face of the user is matched with a specific face stored in advance. When the face of the user matches with the specific face stored in advance, the step S103 is entered, or when the face of the user does not match with the specific face stored in advance, the step S101 is returned to. The pre-stored specific faces may be obtained and stored by the controlled device or the control apparatus by performing image acquisition in advance through the image acquirer, and when a plurality of specific faces are stored, one or more names (e.g., a person name, a relationship name, etc.) may be set for each specific face to be associated with the specific face.
And S104, if the voice signal comprises the awakening word, performing language instruction identification on the voice signal to acquire the voice instruction.
In one embodiment, after the step of performing voice command recognition on the voice signal, the method includes: and entering a man-machine conversation mode according to the voice command, outputting corresponding conversation voice and/or performing corresponding control according to the voice command. For example, the voice command is to turn on a television, the control device controls the television to turn on or the television is turned on automatically, and at this time, the control device or the television can ask "what program do you want to watch? "and allows the television set to jump to the program after the user speaks the desired program.
In other embodiments, after the user 'S face is acquired and when the user' S face matches a specific face stored in advance, the steps S103 and S104 are entered, and then a human-computer conversation mode is entered according to a voice instruction. For example, when the acquired face is matched with a pre-stored specific face, the control device or the controlled device makes a voice call, so that the user has more intimate experience, and further, personalized man-machine conversation can be performed according to the title corresponding to the specific face, so that different man-machine interaction experiences can be brought to different users.
In one embodiment, the voice command is obtained according to the voice signal, and meanwhile, corresponding analysis operation is performed according to processing of multiple modes such as voice recognition, semantic understanding, image detection and recognition and the like of the voice signal, and a learning model is established, so that a more intelligent and personalized man-machine conversation mode can be realized, and user experience is improved. For example, after the user moves, the user sends a voice signal to say "good heat" to help me to turn on the air conditioner ", at this time, voice recognition, semantic understanding and image information of the user are performed on the voice signal, and after processing, it can be obtained that the user feels very hot after moving instead of being hot, so that the device/apparatus can send a voice prompt to the user that" you have moved, and then advise to rest for a while and then turn on the air conditioner. "
In one embodiment, when the voice signal only includes the wake-up word, the user may be actively prompted to send a voice command, and further, when the voice signal only includes the wake-up word, and the voice command is not detected within a preset time period, a voice prompt may be set to be given to the user (e.g., "little brother, what bar to say").
And S105, if the voice signal comprises the awakening word, judging whether the characteristics of the voice signal source accord with the preset characteristics or not, if so, executing the step S104, and if not, returning to the step S101.
Specifically, the preset feature includes that the face of the user faces the controlled device/control device, or the user is positioned on the front face of the controlled device.
In one embodiment, the step of detecting the characteristics of the voice signal source comprises: after detecting the face orientation of the user, when the eye features of the user can be detected, the time when the user's eyes are focused on the controlled device/control apparatus is detected. The preset feature in step S105 may further include that a time for which the user' S eyeball is focused on the controlled device/control apparatus is greater than a threshold value. Therefore, when the voice signal does not include the wake-up word and the feature of the voice signal source conforms to the preset feature, step S104 is executed, for example, when it is detected that the face of the user faces the controlled device/control apparatus and it is detected that the time when the eyeball of the user focuses on the controlled device/control apparatus is greater than the threshold, the voice command is identified for the voice signal to obtain the voice command, so that when the voice signal does not include the wake-up word, it can be determined whether the voice signal sent by the user is identified for obtaining the voice command by identifying the face orientation and the focusing state of the eyeball of the user, thereby effectively avoiding the occurrence of the situation that the user mistakenly triggers the controlled device/control apparatus when speaking in natural language (e.g. chatting), and greatly improving the accuracy of the man-machine interaction method.
In other embodiments, when the face of the user faces the front of the control apparatus, the control apparatus can determine whether a control object is included in the voice instruction (where the voice instruction is acquired according to the voice signal), where the control object may include at least one controlled device (i.e., the control apparatus may control the at least one controlled device). Further, when the voice command does not include the control object, the control device may automatically control the corresponding controlled device according to the control big data. Furthermore, when the voice signal comprises the control object, the control device can automatically and correspondingly control the control object according to the historical control information of the control object, so that the human-family interaction method provided by the invention is more intelligent.
In one embodiment, the step of detecting the characteristics of the voice signal source comprises: the relative orientation of the user emitting the voice signal and the controlled device is detected. Therefore, when the voice signal does not include the awakening word, and the relative orientation of the user and the controlled device accords with the preset characteristics, the step of acquiring the voice instruction is carried out. Wherein the relative orientation of the user and the controlled device conforms to a predetermined characteristic, for example, the user is located on the front of the controlled device (e.g., the user is located on the front of a television), the distance between the user and the controlled device is less than a threshold (e.g., the distance between the user and a fan is less than 5 meters), and the like.
The man-machine interaction method provided by the embodiment of the invention detects the characteristics of the voice signal source after receiving the voice message. And judging whether the voice signal comprises a wake-up word. And if the voice signal comprises the awakening word, performing voice instruction recognition on the voice signal to acquire the voice instruction. If the voice signal does not include the awakening word, when the characteristics of the voice signal source accord with the preset characteristics, the step of performing voice instruction recognition on the voice signal source is carried out to obtain the voice instruction, wherein the preset characteristics comprise that the face of the user faces the front face of the controlled device/control device or the user is positioned on the front face of the controlled device.
Example two:
fig. 2 is a flowchart illustrating a human-computer interaction method according to a second embodiment of the present invention. For a clear description of the man-machine interaction method provided by the second embodiment of the present invention, please refer to fig. 2.
The man-machine interaction method provided by the embodiment two of the invention is applied to a control device and comprises the following steps:
s201, receiving a voice signal.
And S202, detecting the characteristics of the voice signal source.
In particular, the characteristics of the voice signal source may include the face orientation of the user originating the voice signal or the relative orientation of the user originating the voice signal and the controlled device, wherein the voice signal source includes, but is not limited to, the user originating the voice signal. Specifically, after the voice signal is received, the characteristics of the voice signal source are detected immediately, so that the characteristics when the user sends the voice signal can be detected in time, the condition that the characteristics of the voice signal source are inaccurate due to characteristic transformation caused by other actions after the user sends the voice signal is prevented, and the accuracy of subsequent steps can be further ensured.
In an embodiment, the detecting of the face orientation of the user who sends the voice signal may be performed by the control device or the controlled device through an image capturing device (e.g., a camera), for example, when the control device receives the voice signal, the image capturing device of the control device is turned on to detect the face orientation of the user. For another example, when the control device receives the voice signal, the controlled device may be controlled to turn on the image capturing device to detect the face orientation of the user.
In an embodiment, the control device or the controlled device may include an image capturing device, and may also be an image capturing device external to the control device or the controlled device.
In one embodiment, the control apparatus may perform unified management on a plurality of controlled devices, such as electronic curtains, televisions, electronic doors, air conditioners, electric lights, and the like.
S203, judging whether the voice signal comprises a wake-up word.
And S204, if the voice signal comprises the awakening word, performing language instruction identification on the voice signal to acquire the voice instruction.
S205, if the voice signal does not include the awakening word, determining whether the characteristic of the voice signal source accords with the preset characteristic, if so, executing the step S204, and if not, returning to the step S201.
Specifically, the preset feature includes that the face of the user faces the controlled device/control device, or that the user is located on the front face of the controlled device.
In an embodiment, the preset feature may be that the face of the user faces the front of the control device, and when the face of the user faces the front of the control device, step S206 is executed.
S206: and judging whether the voice instruction comprises a control object.
Specifically, step S206 is after the step of performing voice instruction recognition on the voice signal to acquire the voice instruction.
In one embodiment, the control object includes at least one controlled device (e.g., a television, a smart speaker, an air conditioner, a washing machine, an electric lamp, an electronic curtain, an electronic door, a floor sweeping robot, or other devices).
And S207, if the voice command does not comprise a control object, acquiring the household appliance control big data according to the current environment information and/or the current time information, and acquiring at least one household device and household appliance control information corresponding to each household device according to the household appliance control big data so as to control the corresponding household devices according to the household appliance control information respectively.
In one embodiment, the voice command does not include a control object, such as a user's fuzzy voice command (e.g., "open", etc.) against the control device. Specifically, after acquiring the fuzzy voice command, the control device may acquire at least one home appliance according to the current environment information and/or the current time information, and perform corresponding control.
In one embodiment, the current environment information includes at least one of indoor temperature information, indoor brightness information, floor cleanliness information, and the number of people in the room, but the current environment information is not limited to include the indoor temperature information, the indoor brightness information, the floor cleanliness information, the number of people in the room, and the like.
In an embodiment, the control device may obtain, from the cloud server, the household appliance control big data corresponding to the current environment information and/or the current time information according to the current environment information and/or the current time information, where the household appliance control big data stored in the cloud server may be household appliance control data corresponding to the environment information and/or the time information, which is uploaded to the cloud server by the user through the control device, or household appliance control data corresponding to the environment information and/or the time information, which is uploaded to the cloud server by other users through other control terminals. Specifically, the control device obtains the household appliance control big data corresponding to the current environment information and/or the current time information from the cloud server according to the current environment information and/or the current time information, where the household appliance control big data is household appliance control big data of the user stored in the cloud or household appliance control big data commonly used by other users stored in the cloud.
Specifically, the control device may obtain the household appliance control big data corresponding to the current environment information according to the current environment information, for example, the illumination intensity is less than 50lux (i.e., the light is dark), the indoor temperature is higher than 35 °, or the floor has rubbish, and then the control device obtains the household appliance control big data of the user from the cloud server according to the current environment information to turn on the electric lamp or turn on the curtain, turn on the air conditioner, set the temperature of the air conditioner, or turn on the sweeping robot.
In addition, the control device can also obtain the household appliance control big data corresponding to the current time information according to the current time information, for example, the household appliance control big data corresponding to each time point of the user in the time period from 5 to 9 am is to open a curtain, open a music player or a water dispenser to start heating, and the like; the household appliance control big data of the user corresponding to each time point from 6 pm to 8 pm is to turn on a television or a computer or turn on a lamp and the like.
In addition, the control device may further obtain the household appliance control big data corresponding to the current environment information and the current time information according to the current environment information and the current time information, for example, if the indoor temperature is higher than 35 ° and the time is in a half period from 7 pm to 7 pm, the control device may turn on the air conditioner and set the air conditioner temperature, turn on the electric lamp, or turn on the television broadcast program (for example, a news simulcast at a central television station at 7 pm) according to the household appliance control big data of the user obtained by the control device according to the current environment information and the current time information.
In another embodiment, the control device obtains the home appliance control big data of the user corresponding to the current environment information and/or the current time information, which is stored in the control device, according to the current environment information and/or the current time information.
In one embodiment, the obtained household appliance big data includes at least one household appliance to be controlled and household appliance control information corresponding to each household appliance to be controlled. Therefore, when the voice instruction is obtained and the voice instruction does not include a control object, the man-machine interaction method provided by the embodiment can intelligently select at least one piece of household electrical appliance equipment to perform corresponding control according to the current environmental information and/or the current time, so that the man-machine interaction method provided by the embodiment not only greatly improves the accuracy of human interaction, but also enables the man-machine interaction method provided by the embodiment to be more intelligent.
S208: and if the voice command comprises the control object, correspondingly controlling the control object according to the voice command.
In one embodiment, when only the control object is included in the voice command but the control information of the control object is included in the voice command, a face of the user is detected, and historical control information of the control object corresponding to the face is acquired, so that the control object is controlled correspondingly according to the historical control information. For example, when the voice command indicates that the television is turned on and there is no channel information or program information, the control device may acquire, according to the face of the user, program information and viewing progress information (i.e., history control information) that the user has watched last time, so that the control device may control the television to turn on the multimedia information corresponding to the program information and the viewing progress information after turning on the television.
In an embodiment, the step of detecting the face of the user may be detected at the same time of detecting the feature of the voice signal source at step S202.
The man-machine interaction method provided by the second embodiment of the invention comprises the following steps: a voice signal is received. Characteristics of a source of the speech signal are detected. And judging whether the voice signal comprises a wakeup word. If the voice signal comprises the awakening word, performing language instruction identification on the voice signal to acquire the voice instruction. And if the voice signal does not comprise the awakening word, judging whether the characteristics of the voice signal source accord with the preset characteristics or not, if so, acquiring a voice instruction, and if not, returning to the step of receiving the voice signal. After the step of acquiring the voice command, it is determined whether the voice command includes a control object. And if the voice command does not comprise a control object, acquiring the household appliance control big data according to the current environment information and/or the current time information, and acquiring at least one household appliance and household appliance control information corresponding to each household appliance according to the household appliance control big data so as to control the corresponding household appliances according to the household appliance control information respectively. And if the voice command does not comprise the control object, correspondingly controlling the control object according to the voice command. Therefore, when the voice signal does not include the wake-up word, the feature of the user sending the voice signal is detected, and when the feature of the user is detected to conform to the preset feature, the step of obtaining the voice command is performed, so that the purpose of improving the accuracy of the human-computer interaction can be achieved, and after the step of obtaining the voice command, by judging whether the voice command includes the control object or not, when the voice command does not include the control object, at least one control object can be obtained according to the current environmental information and/or the current time information, and each control object is correspondingly controlled, so that the intelligence of the human-computer interaction method provided by the embodiment is greatly improved.
Example three:
fig. 3 is a schematic structural diagram of a control device according to a third embodiment of the present invention. For a clear description of thecontrol device 1 according to the third embodiment of the present invention, please refer to fig. 3.
Referring to fig. 3, acontrol device 1 according to a third embodiment of the present invention includes: the system comprises a voicesignal receiving module 101, afeature detection module 102, a wake-uprecognition module 103 and a voicecommand acquisition module 104.
Specifically, the voicesignal receiving module 101 is configured to receive a voice signal.
In an embodiment, before the voicesignal receiving module 101 receives the voice signal, the voicesignal receiving module 101 is in a mute detection state, and the power consumption of thecontrol device 1 is very low, so that thecontrol device 1 maintains the capability of long-time operation.
Specifically, thefeature detection module 102 is connected to the voicesignal receiving module 101 and configured to detect a feature of a voice signal source, where the feature of the voice signal source includes a face orientation of a user who sends a voice signal or a relative orientation of the user and a controlled device.
In one embodiment, thefeature detection module 102 includes an image acquisition device. In other embodiments, thefeature detection module 102 may include an image acquisition device and/or a sound source localization device. The image acquisition device can be used for acquiring the image information of the voice signal source, so that the characteristics of the voice signal source are identified. The sound source positioning device can judge the direction of the voice signal source according to the received voice signal.
Specifically, the wake-uprecognition module 103 is connected to the voicesignal receiving module 101, and is configured to determine whether the voice signal includes a wake-up word.
Specifically, the voiceinstruction obtaining module 104 is configured to perform voice instruction recognition on the voice signal when the voice signal includes a wake-up word to obtain a voice instruction, and perform voice instruction recognition on the voice signal when the characteristic of the voice signal source meets a preset characteristic when the voice signal does not include the wake-up word to obtain the voice instruction, where the preset characteristic includes that the face of the user faces the front of the controlled device/control apparatus 1, or that the user is located on the front of the controlled device.
In an embodiment, the voice command recognition module is configured to determine whether the voice command includes a control object. And if the voice command does not comprise a control object, acquiring the household appliance control big data according to the current environment information and/or the current time information, and acquiring at least one household appliance and household appliance control information corresponding to each household appliance according to the household appliance control big data so as to control the corresponding household appliances according to the household appliance control information respectively. And if the voice command comprises the control object, correspondingly controlling the control object according to the voice command.
In one embodiment, when the voice command only includes the control object but does not include the control information of the control object, the voice command recognition module detects a face of the user and acquires historical control information of the control object corresponding to the face, so as to perform corresponding control on the control object according to the historical control information.
In thecontrol apparatus 1 provided by the third embodiment of the present invention, the voicesignal receiving module 101 is configured to receive a voice signal. Thefeature detection module 102 is connected to the voicesignal receiving module 101 and is configured to detect a feature of a voice signal source, where the feature of the voice signal source includes a face orientation of a user who sends a voice signal or a relative orientation of the user and a controlled device. The wake-uprecognition module 103 is connected to the voicesignal receiving module 101, and is configured to determine whether the voice signal includes a wake-up word. And the voiceinstruction acquisition module 104 is configured to perform voice instruction recognition on the voice signal to acquire a voice instruction when the voice signal includes the wake-up word, and perform voice instruction recognition on the voice signal to acquire the voice instruction when the feature of the voice signal source conforms to a preset feature when the voice signal does not include the wake-up word, where the preset feature includes that the face of the user faces the front of the controlled device/control apparatus 1, or that the user is located on the front of the controlled device. Therefore, with thecontrol device 1 provided in the embodiment of the present invention, in the process of human-computer interaction, when the received voice signal does not include the wakeup word, it can be determined through the face orientation of the user or the relative orientation between the user and the controlled device whether the voice signal sent by the user needs to be subjected to voice instruction recognition, and then the controlled device is correspondingly controlled according to the voice instruction, so that the occurrence of a situation that the controlled device/or thecontrol device 1 is erroneously triggered when the user speaks in a natural language (e.g., chatting) can be effectively avoided, and the accuracy of thecontrol device 1 during human-computer interaction can be greatly improved.
Example four:
fig. 4 is a schematic structural diagram of a controlled device according to a fourth embodiment of the present invention. For clearly describing the controlleddevice 2 provided in the fourth embodiment of the present invention, please refer to fig. 4.
Referring to fig. 4, specifically, the controlleddevice 2 includes a control apparatus provided by the present invention (for example, thecontrol apparatus 1 provided by the third embodiment of the present invention). Specifically, thecontrol device 1 can implement the human-computer interaction method provided by the present invention (for example, the human-computer interaction method provided by the first embodiment and/or the human-computer interaction method provided by the second embodiment).
Therefore, the controlleddevice 2 provided in this embodiment can determine, through the face orientation of the user or the relative orientation between the user and the controlleddevice 2, whether to perform voice instruction recognition on the voice signal sent by the user when the received voice signal does not include the wakeup word in the human-computer interaction process, and then perform corresponding control on the controlleddevice 2 according to the voice instruction, so that the controlleddevice 2 provided in this embodiment can effectively avoid the occurrence of a situation that the controlleddevice 2 is falsely triggered when the user speaks in natural language (for example, chat), and thus can greatly improve the accuracy of the controlleddevice 2 during human-computer interaction.
Example five:
fig. 5 is a schematic structural diagram of a control device according to a fifth embodiment of the present invention. For clearly describing the control device provided in the fifth embodiment of the present invention, please refer to fig. 5.
The control device provided by the fifth embodiment of the present invention includes a processor a101, where the processor a101 is configured to execute the computer program A6 stored in the memory a201 to implement the steps of the human-computer interaction method described in the first embodiment or the second embodiment.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method, apparatus, or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining software and hardware may be referred to herein as a "circuit," module, "" system.
In an embodiment, the control device provided in this embodiment may include at least one processor a101 and at least one memory a201. Among them, at least one processor a101 may be referred to as a processing unit A1, and at least one memory a201 may be referred to as a storage unit A2. Specifically, the storage unit A2 stores a computer program A6, and when the computer program A6 is executed by the processing unit A1, the control apparatus provided by the present embodiment is caused to implement the steps of the human-computer interaction method as described above, such as the step S206 shown in fig. 2: and judging whether the voice instruction comprises a control object. As another example, step S105 shown in fig. 1: and judging whether the characteristics of the voice signal source accord with preset characteristics or not.
In one embodiment, the control device further comprises a bus connecting the different components (e.g. processor a101 and memory a 201). In one embodiment, the bus may represent one or more of several types of bus structures, including a memory bus or memory controller bus, a peripheral bus, and the like.
Referring to fig. 5, in an embodiment, the control device provided in the present embodiment includes a plurality of memories a201 (referred to as a storage unit A2 for short), and the storage unit A2 may include, for example, a Random Access Memory (RAM) and/or a cache memory and/or a Read Only Memory (ROM), and the like.
Referring to fig. 5, in an embodiment, the control device in this embodiment may further include a communication interface (e.g., an I/O interface A4), and the communication interface may be used to communicate with an external device (e.g., a computer, a smart terminal, etc.).
Referring to fig. 5, in an embodiment, the control device in this embodiment may further include a display device and/or an input device (e.g., the illustrated touch display screen A3).
Referring to fig. 5, in an embodiment, the control device provided in this embodiment may further include a network adapter A5, where the network adapter A5 may be used to communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, etc.). As shown in fig. 5, the network adapter A5 may communicate with other components of the control apparatus through wires.
The control device provided in this embodiment can implement the steps of the human-computer interaction method provided in the present invention, and for specific implementation and beneficial effects, reference may be made to the first embodiment and the second embodiment of the present invention, which will not be described herein again.
Example five:
in an embodiment of the present invention, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, is capable of implementing the steps of the human-computer interaction method of, for example, the first embodiment or the second embodiment. Alternatively, the computer program can realize the functions of the controller and the controlled device when executed by the processor.
In this embodiment, when being executed by a processor, a computer program in a computer-readable storage medium implements the steps of the human-computer interaction method or the functions of a controller or a controlled device, which will not be described herein again, and specific embodiments and beneficial effects may refer to embodiments one to four of the present invention.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalents or improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.