技术领域technical field
本公开涉及人工智能技术领域,具体涉及一种语音交互方法、装置、电子设备及计算机可读存储介质。The present disclosure relates to the technical field of artificial intelligence, and in particular to a voice interaction method, device, electronic equipment, and computer-readable storage medium.
背景技术Background technique
随着人工智能技术的发展,自然语音处理技术的相关性能得到了极大的提升。语音识别正在越来越多的应用在各种智能语音输出设备之上,例如智能音箱、智能手机、智能平板电脑、物联网设备等。将自然语音处理技术应用在人机交互过程中已经是越来越多智能语音输出设备的必选之路,自然语音交互正成为触摸屏之后新的人机交互方式。With the development of artificial intelligence technology, the related performance of natural speech processing technology has been greatly improved. Speech recognition is being increasingly applied to various intelligent voice output devices, such as smart speakers, smart phones, smart tablets, and IoT devices. The application of natural speech processing technology in the process of human-computer interaction has become an inevitable choice for more and more intelligent voice output devices. Natural speech interaction is becoming a new human-computer interaction method after the touch screen.
发明内容Contents of the invention
本公开实施例提供一种语音交互方法、装置、电子设备及计算机可读存储介质。Embodiments of the present disclosure provide a voice interaction method, device, electronic equipment, and computer-readable storage medium.
第一方面,本公开实施例中提供了一种语音交互方法。In a first aspect, an embodiment of the present disclosure provides a voice interaction method.
具体的,所述语音交互方法,包括:Specifically, the voice interaction method includes:
响应于激活语音交互的预设事件,输出预设语音信息;outputting preset voice information in response to a preset event activating voice interaction;
获取所述语音交互的目标对象对所述预设语音信息的反馈信息,所述反馈信息为非语音信息;Acquiring feedback information from the voice interaction target object to the preset voice information, where the feedback information is non-voice information;
在所述目标对象的反馈信息满足预设条件时,输出语音交互信息。When the feedback information of the target object satisfies the preset condition, the voice interaction information is output.
可选地,所述响应于激活语音交互的预设事件,输出预设语音信息,包括以下至少之一:Optionally, the outputting preset voice information in response to activating a preset event of voice interaction includes at least one of the following:
响应于到达预设的时间,输出所述预设语音信息;outputting the preset voice information in response to reaching a preset time;
响应于接收到预设信息,输出所述预设语音信息;outputting the preset voice information in response to receiving preset information;
响应于感测到所述目标对象在语音交互范围内时,输出所述预设语音信息。Outputting the preset voice information in response to sensing that the target object is within the voice interaction range.
可选地,响应于感测到所述目标对象在语音交互范围内时,输出所述预设语音信息,包括:Optionally, outputting the preset voice information in response to sensing that the target object is within the voice interaction range includes:
获取所述语音交互范围内的第一图像数据;Acquiring first image data within the voice interaction range;
根据所述第一图像数据识别出所述目标对象时,输出所述预设语音信息。When the target object is identified according to the first image data, the preset voice information is output.
可选地,获取所述语音交互的目标对象对所述预设语音信息的反馈信息,包括:Optionally, obtaining the feedback information of the voice interaction target object on the preset voice information includes:
获取所述预设语音信息输出后的第二图像数据;Acquiring the second image data after the preset voice information is output;
根据所述第二图像数据确定所述目标对象是否接收到所述预设语音信息。determining whether the target object has received the preset voice information according to the second image data.
可选地,根据所述第二图像数据确定所述目标对象是否接收到所述预设语音信息,包括:Optionally, determining whether the target object has received the preset voice information according to the second image data includes:
根据所述第二图像数据确定所述目标对象在语音交互范围之内时,确定所述目标对象接收到所述预设语音信息;或者,When it is determined according to the second image data that the target object is within the voice interaction range, determine that the target object has received the preset voice information; or,
在确定所述第二图像数据中所述目标对象的面部信息的方位与所述预设语音信息的输出设备的方位在第一预设误差范围内时,确定所述目标对象接收到所述预设语音信息。When it is determined that the orientation of the facial information of the target object in the second image data and the orientation of the output device of the preset voice information are within a first preset error range, it is determined that the target object has received the preset Set voice messages.
可选地,获取所述语音交互的目标对象对所述预设语音信息的反馈信息,还包括:Optionally, acquiring the feedback information of the voice interaction target object on the preset voice information further includes:
获取所述预设语音信息输出后的第二图像数据;Acquiring the second image data after the preset voice information is output;
通过比较所述第二图像数据以及输出所述预设语音信息之前获得的第一图像数据,确定所述目标对象是否接收到所述预设语音信息。By comparing the second image data with the first image data obtained before outputting the preset voice information, it is determined whether the target object has received the preset voice information.
可选地,通过比较所述第二图像数据以及输出所述预设语音信息之前获得的第一图像数据,确定所述目标对象是否接收到所述预设语音信息,包括:Optionally, determining whether the target object has received the preset voice information by comparing the second image data with the first image data obtained before outputting the preset voice information includes:
识别所述第一图像数据中所述目标对象的第一人脸和第二图像数据中的所述目标对象的第二人脸;identifying a first human face of the target subject in the first image data and a second human face of the target subject in the second image data;
通过比较所述第一人脸和所述第二人脸的面部信息,确定所述目标对象是否接收到所述预设语音信息。By comparing facial information of the first human face and the second human face, it is determined whether the target object has received the preset voice information.
可选地,获取所述语音交互的目标对象对所述预设语音信息的反馈信息,还包括:Optionally, acquiring the feedback information of the voice interaction target object on the preset voice information further includes:
确定是否接收到所述目标对象在语音交互范围内的位置信息。Determine whether the location information of the target object within the voice interaction range is received.
可选地,语音交互方法还包括:Optionally, the voice interaction method also includes:
在所述目标对象的反馈信息不满足预设条件时,重新发送所述预设语音信息。When the feedback information of the target object does not meet the preset condition, the preset voice information is resent.
可选地,重新发送所述预设语音信息,包括:Optionally, resending the preset voice information includes:
在确定所述目标对象不在语音交互范围内时,延迟发送所述预设语音信息;或者,When it is determined that the target object is not within the voice interaction range, delay sending the preset voice information; or,
在确定所述目标对象在语音交互范围内时,提高音量发送所述预设语音信息。When it is determined that the target object is within the voice interaction range, the volume is increased to send the preset voice information.
第二方面,本公开实施例提供了一种语音交互装置,包括:In a second aspect, an embodiment of the present disclosure provides a voice interaction device, including:
第一输出模块,被配置为响应于激活语音交互的预设事件,输出预设语音信息;The first output module is configured to output preset voice information in response to a preset event activating voice interaction;
第一获取模块,被配置为获取所述语音交互的目标对象对所述预设语音信息的反馈信息,所述反馈信息为非语音信息;The first obtaining module is configured to obtain feedback information of the target object of the voice interaction on the preset voice information, and the feedback information is non-voice information;
第二输出模块,被配置为在所述目标对象的反馈信息满足预设条件时,输出语音交互信息。The second output module is configured to output voice interaction information when the feedback information of the target object satisfies a preset condition.
可选地,所述第一输出模块,包括以下至少之一:Optionally, the first output module includes at least one of the following:
第一响应子模块,被配置为响应于到达预设的时间,输出所述预设语音信息;The first response submodule is configured to output the preset voice information in response to reaching a preset time;
第二响应子模块,被配置为响应于接收到预设信息,输出所述预设语音信息;The second response submodule is configured to output the preset voice information in response to receiving the preset information;
第三响应子模块,被配置为响应于感测到所述目标对象在语音交互范围内时,输出所述预设语音信息。The third response submodule is configured to output the preset voice information in response to sensing that the target object is within the voice interaction range.
可选地,所述第一输出模块,包括:Optionally, the first output module includes:
第一获取子模块,被配置为获取所述语音交互范围内的第一图像数据;The first acquisition submodule is configured to acquire the first image data within the voice interaction range;
第一输出子模块,被配置为根据所述第一图像数据识别出所述目标对象时,输出所述预设语音信息。The first output submodule is configured to output the preset voice information when the target object is recognized according to the first image data.
可选地,所述第一获取模块,包括:Optionally, the first acquisition module includes:
第二获取子模块,被配置为获取所述预设语音信息输出后的第二图像数据;The second acquisition submodule is configured to acquire the second image data after the output of the preset voice information;
第一确定子模块,被配置为根据所述第二图像数据确定所述目标对象是否接收到所述预设语音信息。The first determining submodule is configured to determine whether the target object has received the preset voice information according to the second image data.
可选地,所述第一确定子模块,包括:Optionally, the first determining submodule includes:
第二确定子模块,被配置为根据所述第二图像数据确定所述目标对象在语音交互范围之内时,确定所述目标对象接收到所述预设语音信息;或者,The second determining submodule is configured to determine that the target object has received the preset voice information when it is determined according to the second image data that the target object is within the voice interaction range; or,
第三确定子模块,被配置为在确定所述第二图像数据中所述目标对象的面部信息的方位与所述预设语音信息的输出设备的方位在第一预设误差范围内时,确定所述目标对象接收到所述预设语音信息。The third determination sub-module is configured to determine that when it is determined that the orientation of the facial information of the target object in the second image data and the orientation of the output device of the preset voice information are within a first preset error range The target object receives the preset voice information.
可选地,所述第一获取模块,还包括:Optionally, the first acquisition module also includes:
第三获取子模块,被配置为获取所述预设语音信息输出后的第二图像数据;The third acquisition submodule is configured to acquire the second image data after the output of the preset voice information;
第四确定子模块,被配置为通过比较所述第二图像数据以及输出所述预设语音信息之前获得的第一图像数据,确定所述目标对象是否接收到所述预设语音信息。The fourth determining submodule is configured to determine whether the target object has received the preset voice information by comparing the second image data with the first image data obtained before outputting the preset voice information.
可选地,所述第四确定子模块,包括:Optionally, the fourth determining submodule includes:
识别子模块,被配置为识别所述第一图像数据中所述目标对象的第一人脸和第二图像数据中的所述目标对象的第二人脸;an identification submodule configured to identify the first human face of the target object in the first image data and the second human face of the target object in the second image data;
第五确定子模块,被配置为通过比较所述第一人脸和所述第二人脸的面部信息,确定所述目标对象是否接收到所述预设语音信息。The fifth determining submodule is configured to determine whether the target object has received the preset voice information by comparing facial information of the first human face and the second human face.
可选地,所述第一获取模块,还包括:Optionally, the first acquisition module also includes:
第六确定子模块,被配置为确定是否接收到所述目标对象在语音交互范围内的位置信息。The sixth determining submodule is configured to determine whether the location information of the target object within the voice interaction range is received.
可选地,语音交互装置还包括:Optionally, the voice interaction device also includes:
发送模块,被配置为在所述目标对象的反馈信息不满足预设条件时,重新发送所述预设语音信息。The sending module is configured to resend the preset voice information when the feedback information of the target object does not meet the preset condition.
可选地,所述发送模块,包括:Optionally, the sending module includes:
第一发送子模块,被配置为在确定所述目标对象不在语音交互范围内时,延迟发送所述预设语音信息;或者,The first sending submodule is configured to delay sending the preset voice information when it is determined that the target object is not within the voice interaction range; or,
第二发送子模块,被配置为在确定所述目标对象在语音交互范围内时,提高音量发送所述预设语音信息。The second sending submodule is configured to increase the volume and send the preset voice information when it is determined that the target object is within the voice interaction range.
所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。The functions described above may be implemented by hardware, or may be implemented by executing corresponding software on the hardware. The hardware or software includes one or more modules corresponding to the above functions.
在一个可能的设计中,语音交互装置的结构中包括存储器和处理器,所述存储器用于存储一条或多条支持语音交互装置执行上述第一方面中语音交互方法的计算机指令,所述处理器被配置为用于执行所述存储器中存储的计算机指令。所述语音交互装置还可以包括通信接口,用于语音交互装置与其他设备或通信网络通信。In a possible design, the structure of the voice interaction device includes a memory and a processor, the memory is used to store one or more computer instructions that support the voice interaction device to execute the voice interaction method in the first aspect above, and the processor configured to execute computer instructions stored in said memory. The voice interaction device may also include a communication interface for the voice interaction device to communicate with other devices or a communication network.
第三方面,本公开实施例提供了一种电子设备,包括存储器和处理器;其中,所述存储器用于存储一条或多条计算机指令,其中,所述一条或多条计算机指令被所述处理器执行以实现第一方面所述的方法步骤。In a third aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor; wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are processed by the The device is executed to realize the method steps described in the first aspect.
第四方面,本公开实施例提供了一种计算机可读存储介质,用于存储语音交互装置所用的计算机指令,其包含用于执行上述第一方面中语音交互方法所涉及的计算机指令。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable storage medium for storing computer instructions used by a voice interaction device, including computer instructions for executing the voice interaction method in the first aspect above.
本公开实施例提供的技术方案可以包括以下有益效果:The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects:
本公开实施例通过智能语音输出设备内部预设事件的触发,而主动输出预设语音信息,并在输出预设语音信息后获取目标对象对该预设语音信息的反馈信息,并在反馈信息满足预设条件时,输出后续的语音交互信息。本公开实施例通过智能语音输出设备基于预设事件主动发起语音交互过程,并且在用户的反馈信息满足预设条件时,才输出后续具体的语音交互信息,可以使得智能语音输出设备应用于更多的使用场景,且在能够在确定用户处于语音交互状态时进行语音输出,避免遗漏重要的语音信息,提高了用户体验。The embodiment of the present disclosure actively outputs the preset voice information through the triggering of the preset event inside the intelligent voice output device, and obtains the feedback information of the target object on the preset voice information after outputting the preset voice information, and when the feedback information satisfies the When preset conditions, output subsequent voice interaction information. In the embodiments of the present disclosure, the intelligent voice output device actively initiates the voice interaction process based on preset events, and only outputs subsequent specific voice interaction information when the user's feedback information meets the preset conditions, so that the intelligent voice output device can be applied to more The usage scenario, and when it is determined that the user is in the voice interaction state, voice output can be performed, so as to avoid missing important voice information and improve user experience.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
结合附图,通过以下非限制性实施方式的详细描述,本公开的其它特征、目的和优点将变得更加明显。在附图中:Other features, objects and advantages of the present disclosure will become more apparent through the following detailed description of non-limiting embodiments in conjunction with the accompanying drawings. In the attached picture:
图1示出根据本公开一实施方式的语音交互方法的流程图;FIG. 1 shows a flowchart of a voice interaction method according to an embodiment of the present disclosure;
图2示出根据图1所示实施方式的步骤S101的流程图;FIG. 2 shows a flow chart of step S101 according to the embodiment shown in FIG. 1;
图3示出根据图1所示实施方式的步骤S102的流程图;FIG. 3 shows a flowchart of step S102 according to the embodiment shown in FIG. 1;
图4示出根据图1所示实施方式的步骤S102的又一流程图;FIG. 4 shows another flowchart of step S102 according to the embodiment shown in FIG. 1;
图5示出根据图4所示实施方式的步骤S402的流程图;FIG. 5 shows a flow chart of step S402 according to the embodiment shown in FIG. 4;
图6示出根据本公开一实施方式的语音交互装置的结构框图;Fig. 6 shows a structural block diagram of a voice interaction device according to an embodiment of the present disclosure;
图7是适于用来实现根据本公开一实施方式的语音交互方法的电子设备的结构示意图。Fig. 7 is a schematic structural diagram of an electronic device suitable for implementing a voice interaction method according to an embodiment of the present disclosure.
具体实施方式Detailed ways
下文中,将参考附图详细描述本公开的示例性实施方式,以使本领域技术人员可容易地实现它们。此外,为了清楚起见,在附图中省略了与描述示例性实施方式无关的部分。Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for clarity, parts not related to describing the exemplary embodiments are omitted in the drawings.
在本公开中,应理解,诸如“包括”或“具有”等的术语旨在指示本说明书中所公开的特征、数字、步骤、行为、部件、部分或其组合的存在,并且不欲排除一个或多个其他特征、数字、步骤、行为、部件、部分或其组合存在或被添加的可能性。In the present disclosure, it should be understood that terms such as "comprising" or "having" are intended to indicate the presence of features, numbers, steps, acts, components, parts or combinations thereof disclosed in the specification, and are not intended to exclude one or a plurality of other features, numbers, steps, acts, parts, parts or combinations thereof exist or are added.
另外还需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。In addition, it should be noted that, in the case of no conflict, the embodiments in the present disclosure and the features in the embodiments can be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings and embodiments.
图1示出根据本公开一实施方式的语音交互方法的流程图。如图1所示,所述语音交互方法包括以下步骤S101-S103:Fig. 1 shows a flowchart of a voice interaction method according to an embodiment of the present disclosure. As shown in Figure 1, the voice interaction method includes the following steps S101-S103:
在步骤S101中,响应于激活语音交互的预设事件,输出预设语音信息;In step S101, output preset voice information in response to a preset event for activating voice interaction;
在步骤S102中,获取所述语音交互的目标对象对所述预设语音信息的反馈信息,所述反馈信息为非语音信息;In step S102, acquiring the feedback information of the voice interaction target object on the preset voice information, the feedback information being non-voice information;
在步骤S103中,在所述目标对象的反馈信息满足预设条件时,输出语音交互信息。In step S103, when the feedback information of the target object satisfies a preset condition, output voice interaction information.
当前的自然语音交互基本上是通过语音或实体按键由用户激活后,进入后续的人机交互过程。这种方法虽然能够适用于大部分场景,然而由于处理能力以及电量消耗的原因,使得自然语音交互不能自主激活。如果自然语音设备主动激活,智能语音输出设备却无法保证交互语音被用户准确接收,进而进入一个混乱的状态。例如,智能语音输出设备反复主动激活,却因为无法接收到用户的反馈而进入不可知状态。再或者,智能语音输出设备主动激活,但是用户没有收到信息,智能语音输出设备缺默认收到了信息,则造成重要信息遗漏的问题。The current natural voice interaction is basically activated by the user through voice or physical buttons, and then enters the subsequent human-computer interaction process. Although this method can be applied to most scenarios, due to processing power and power consumption, natural voice interaction cannot be activated autonomously. If the natural voice device is actively activated, the intelligent voice output device cannot ensure that the interactive voice is accurately received by the user, and then enters a chaotic state. For example, an intelligent voice output device is repeatedly activated actively, but enters an unknown state because it cannot receive feedback from the user. Or, the intelligent voice output device is actively activated, but the user does not receive the information, and the intelligent voice output device does not receive the information by default, which causes the problem of missing important information.
针对上述问题,本公开实施例提出了上述语音交互方法,智能语音输出设备在主动触发基于自然语音的交互时,智能语音输出设备发送探测语音,即一段预先设置好的预设语音信息,该预设语音信息可以包含目标用户的标识信息,如“王先生”等。此后,智能语音输出设备获取目标对象对于上述探测语音的反馈信息,该反馈信息可以为非语音信息,智能语音输出设备根据所获取的反馈信息来判断决定发送的后续语音信息。智能语音输出设备可以通过外部传感器来获得目标对象的反馈信息。智能语音输出设备在输出探测语音后,第一:目标对象接收到该语音信息,因此需要进入后续的自然语音交互;第二:目标对象没有接收到该语音信息,因此智能语音输出设备需要暂停后续的自然语音交互;因此,智能语音输出设备可以根据外部传感器的数据对目标对象是否接收到该语音信息进行判断并根据判断进行后续自然语音交互的方式进行选择。例如,智能语音输出设备通过图像传感器获取目标对象的图像,并根据图像识别目标对象的注意力在满足预设条件,例如目标对象在听到探测语音后,将注意力集中在智能语音输出设备上时,可以认为目标对象满足交互需求,因此智能语音输出设备可以发送后续的语音交互信息,如果智能语音输出设备没有识别到目标对象的注意力不满足预设条件时,则可以采取其他措施,例如重复播放探测语音,或者过一段时间再播放探测语音等。目标对象可以是特定的人或物,也可以是任何人或物。In view of the above problems, the embodiment of the present disclosure proposes the above-mentioned voice interaction method. When the intelligent voice output device actively triggers the interaction based on natural voice, the intelligent voice output device sends a detection voice, that is, a piece of preset voice information set in advance. It is assumed that the voice information may include identification information of the target user, such as "Mr. Wang" and the like. Afterwards, the intelligent voice output device obtains the target object's feedback information on the above-mentioned detection voice, the feedback information may be non-voice information, and the intelligent voice output device determines the subsequent voice information to be sent according to the acquired feedback information. The intelligent voice output device can obtain the feedback information of the target object through the external sensor. After the intelligent voice output device outputs the detection voice, first: the target object has received the voice information, so it needs to enter the subsequent natural voice interaction; second: the target object has not received the voice information, so the intelligent voice output device needs to suspend the subsequent Natural voice interaction; therefore, the intelligent voice output device can judge whether the target object has received the voice information according to the data of the external sensor, and make a selection according to the method of subsequent natural voice interaction based on the judgment. For example, the intelligent voice output device acquires the image of the target object through the image sensor, and recognizes the attention of the target object according to the image to meet the preset conditions, for example, the target object will focus on the intelligent voice output device after hearing the detection voice , it can be considered that the target object meets the interaction requirements, so the intelligent voice output device can send subsequent voice interaction information. If the intelligent voice output device does not recognize that the target object's attention does not meet the preset conditions, other measures can be taken, such as Play the detection voice repeatedly, or play the detection voice again after a period of time. The target object can be a specific person or thing, or any person or thing.
例如,智能手机可以根据内部日程表、邮件抵达等信息判断启动一个自然语音交互的时机、自然语音交互的目标对象、自然语音交互的信息内容等。例如当日程表中标记的时间到期时,智能手机发起一个用于提醒的自然语音交互。其中交互的对象为日程表中标记的对象或手机机主,自然语音交互的信息为提醒内容的提醒信息。此时,智能手机可以首先发送目标对象的标识信息,例如“王先生,你好”。智能手机也可以联合使用内部信息和外部传感器对时机进行判断,例如智能手机在日程表中标识的时间点并通过智能手机的前置摄像头捕捉到用户正在操作智能手机的时机发起自然语音交互。再例如,智能音箱可以根据装配的无线距离检测传感器(RFID,BLUETOOTH,WIFI)信息或图像传感器识别用户抵达智能语音信箱交互范围,进而主动发起了一个天气提醒的自然语音交互。此处,无论是采用何种方式或是否使用外置传感器,其特征在于在用户没有发起自然语音交互的情况下,由智能语音输出设备主动发起自然语音交互过程。此处进一步定义智能语音输出设备主动发起自然语音交互过程是指在一次自然语音交互过程中,首先发出自然语音信号的是智能语音输出设备。For example, a smart phone can judge the timing of starting a natural voice interaction, the target object of the natural voice interaction, the information content of the natural voice interaction, etc. based on information such as internal schedules and mail arrivals. For example, when the time marked in the calendar is due, the smartphone initiates a natural voice interaction for reminder. The object of interaction is the object marked in the calendar or the owner of the mobile phone, and the information of natural voice interaction is the reminder information of the reminder content. At this time, the smart phone may first send the identification information of the target object, such as "Mr. Wang, hello". Smartphones can also jointly use internal information and external sensors to judge the timing, for example, the smartphone marks the time point in the calendar and uses the smartphone's front camera to capture the timing when the user is operating the smartphone to initiate a natural voice interaction. For another example, the smart speaker can identify the user's arrival at the smart voice mailbox interaction range according to the assembled wireless distance detection sensor (RFID, BLUETOOTH, WIFI) information or image sensor, and then actively initiate a natural voice interaction for weather reminders. Here, no matter what method is used or whether an external sensor is used, it is characterized in that the intelligent voice output device actively initiates the natural voice interaction process when the user does not initiate the natural voice interaction. It is further defined here that the intelligent voice output device actively initiates the natural voice interaction process means that in a natural voice interaction process, it is the intelligent voice output device that sends out the natural voice signal first.
通过本公开实施例,智能语音输出设备可以根据预先设置主动发起一个自然语音交互,并通过目标对象的反馈信息决定该次交互的成功状态并选择后续的自然语音交互方式。Through the embodiments of the present disclosure, the intelligent voice output device can actively initiate a natural voice interaction according to preset settings, and determine the success status of the interaction through the feedback information of the target object and select a subsequent natural voice interaction mode.
在本实施例的一个可选实现方式中,所述步骤S101,即响应于激活语音交互的预设事件,输出预设语音信息的步骤,进一步包括以下至少之一:In an optional implementation of this embodiment, the step S101, that is, the step of outputting preset voice information in response to a preset event for activating voice interaction, further includes at least one of the following:
响应于到达预设的时间,输出所述预设语音信息;outputting the preset voice information in response to reaching a preset time;
响应于接收到预设信息,输出所述预设语音信息;outputting the preset voice information in response to receiving preset information;
响应于感测到所述目标对象在语音交互范围内时,输出所述预设语音信息。Outputting the preset voice information in response to sensing that the target object is within the voice interaction range.
该可选的实现方式中,预设事件可以是预先在智能语音输出设备中设置好的,用于触发语音交互的事件,包括以下至少之一:In this optional implementation, the preset event may be an event that is pre-set in the intelligent voice output device and is used to trigger voice interaction, including at least one of the following:
到达预设的时间,例如用户所设置的日程提醒时间、闹铃时间等等;Reaching the preset time, such as the schedule reminder time set by the user, alarm time, etc.;
接收到预设信息,例如接收到新邮件、重要邮件、新信息等;Receiving preset information, such as receiving new emails, important emails, new information, etc.;
感测到所述目标对象在语音交互范围内。It is sensed that the target object is within the voice interaction range.
预设事件可以根据使用场景具体设置,在此不做限制。The preset events can be specifically set according to usage scenarios, and there is no limitation here.
在本实施例的一个可选实现方式中,如图2所示,所述步骤S101,即响应于激活语音交互的预设事件,输出预设语音信息的步骤,进一步包括以下步骤S201-S202:In an optional implementation of this embodiment, as shown in FIG. 2, the step S101, that is, the step of outputting preset voice information in response to a preset event for activating voice interaction, further includes the following steps S201-S202:
在步骤S201中,获取所述语音交互范围内的第一图像数据;In step S201, the first image data within the voice interaction range is acquired;
在步骤S202中,根据所述第一图像数据识别出所述目标对象时,输出所述预设语音信息。In step S202, when the target object is identified according to the first image data, the preset voice information is output.
该可选的实现方式中,智能语音输出设备在发出预设语音信息即探测语音之前,先获取智能语音输出设备的语音交互范围内的第一图像数据,并在从第一图像数据中识别出用于语音交互的目标对象时,才输出预设语音信息。这种方式适用于在目标对象出现在语音交互范围内,主动发起与目标对象的语音交互,例如,根据用户的设置,在用户出现在语音交互范围内时,向用户输出歌曲,或者主动询问用户是否需要开启其他联动的电器设备等等,具体可根据应用场景来设置。In this optional implementation, before the intelligent voice output device emits the preset voice information, that is, the detection voice, it first acquires the first image data within the voice interaction range of the intelligent voice output device, and recognizes the The preset voice information is only output when it is used for the target object of voice interaction. This method is suitable for actively initiating voice interaction with the target object when the target object appears within the voice interaction range, for example, according to the user's settings, when the user appears within the voice interaction range, output a song to the user, or actively ask the user Whether it is necessary to turn on other linked electrical equipment, etc., can be set according to the application scenario.
在本实施例的一个可选实现方式中,如图3所示,所述步骤S102,即获取所述语音交互的目标对象对所述预设语音信息的反馈信息的步骤,进一步包括以下步骤S301-S302:In an optional implementation of this embodiment, as shown in FIG. 3, the step S102, that is, the step of obtaining the feedback information of the preset voice information from the target object of the voice interaction, further includes the following step S301 -S302:
在步骤S301中,获取所述预设语音信息输出后的第二图像数据;In step S301, the second image data after the output of the preset voice information is acquired;
在步骤S302中,根据所述第二图像数据确定所述目标对象是否接收到所述预设语音信息。In step S302, it is determined whether the target object has received the preset voice information according to the second image data.
该可选的实现方式中,智能语音输出设备在输出预设语音信息即探测语音后,通过获取语音交互范围内的第二图像数据,确定目标对象的反馈信息。例如,在第二图像数据中如果识别出有目标对象,即可认为目标对象接收到了探测语音,进而继续后续的语音信息;再例如,通过第二图像数据识别出目标对象是否正在关注智能语音输出设备,如果是则可以认为目标对象接收到了探测语音,进而继续后续的语音信息,具体可根据实际应用场景来设置。In this optional implementation manner, the intelligent voice output device determines the feedback information of the target object by acquiring the second image data within the range of voice interaction after outputting the preset voice information, that is, the detection voice. For example, if a target object is identified in the second image data, it can be considered that the target object has received the detection voice, and then continue the follow-up voice information; for another example, it is recognized through the second image data whether the target object is paying attention to the intelligent voice output The device, if it is, can consider that the target object has received the detection voice, and then continue the subsequent voice information, which can be set according to the actual application scenario.
在本实施例的一个可选实现方式中,所述步骤S302,即根据所述第二图像数据确定所述目标对象是否接收到所述预设语音信息的步骤,进一步包括以下步骤:In an optional implementation of this embodiment, the step S302, that is, the step of determining whether the target object has received the preset voice information according to the second image data, further includes the following steps:
根据所述第二图像数据确定所述目标对象在语音交互范围之内时,确定所述目标对象接收到所述预设语音信息;或者,When it is determined according to the second image data that the target object is within the voice interaction range, determine that the target object has received the preset voice information; or,
在确定所述第二图像数据中所述目标对象的面部信息的方位与所述预设语音信息的输出设备的方位在第一预设误差范围内时,确定所述目标对象接收到所述预设语音信息。When it is determined that the orientation of the facial information of the target object in the second image data and the orientation of the output device of the preset voice information are within a first preset error range, it is determined that the target object has received the preset Set voice messages.
该可选的实现方式中,可以通过两种方式确定目标对象的反馈信息是否满足预设条件:其一,通过在语音交互范围内获取的第二图像数据,确定目标对象在第二图像内,可以认为目标对象的接收到了探测语音,智能语音输出设备可以继续输出后续的语音交互信息;其二,通过第二图像数据识别目标对象的面部信息,这种方式不但需要目标对象在第二图像数据中,而且还要在目标对象的面向方位与智能语音输出设备的方位大致上一致时,可以认为目标对象的接收到了探测语音,智能语音输出设备可以继续输出后续的语音交互信息。目标对象的面向方位可以基于面部信息确定,面部信息包括但不限于面部轮廓、面部朝向以及瞳孔焦距等。第一预设误差范围可以基于目标对象的面向方位与智能语音输出设备的方位大概是否一致来设置,此处第一预设误差范围的大小可以根据经验值来确定。In this optional implementation, there are two ways to determine whether the feedback information of the target object satisfies the preset condition: first, determine that the target object is in the second image through the second image data acquired within the range of voice interaction, It can be considered that the target object has received the detection voice, and the intelligent voice output device can continue to output subsequent voice interaction information; second, identify the facial information of the target object through the second image data. In addition, when the orientation of the target object is roughly consistent with the orientation of the intelligent voice output device, it can be considered that the target object has received the detection voice, and the intelligent voice output device can continue to output subsequent voice interaction information. The orientation of the target object may be determined based on facial information, including but not limited to facial contour, facial orientation, and pupil focal length. The first preset error range can be set based on whether the orientation of the target object is roughly consistent with the orientation of the intelligent voice output device, where the size of the first preset error range can be determined based on empirical values.
在本实施例的一个可选实现方式中,如图4所示,所述步骤S102,即获取所述语音交互的目标对象对所述预设语音信息的反馈信息的步骤,进一步包括以下步骤S401-S402:In an optional implementation of this embodiment, as shown in FIG. 4, the step S102, that is, the step of obtaining the feedback information of the preset voice information from the target object of the voice interaction, further includes the following step S401 -S402:
在步骤S401中,获取所述预设语音信息输出后的第二图像数据;In step S401, the second image data after the output of the preset voice information is acquired;
在步骤S402中,通过比较所述第二图像数据以及输出所述预设语音信息之前获得的第一图像数据,确定所述目标对象是否接收到所述预设语音信息。In step S402, it is determined whether the target object has received the preset voice information by comparing the second image data with the first image data obtained before outputting the preset voice information.
该可选的实现方式中,通过比较发出预设语音信息前后在语音交互范围内获取的第一图像数据和第二图像数据的不同来确定目标对象是否接收到预设语音信息。例如,第一图像数据中没有目标对象,而在第二图像数据中出现了目标对象,那可以认为目标对象在听到预设语音信息后,移动到语音交互范围内,以接收后续的语音信息;再例如,第一图像数据和第二图像数据中都包括目标对象,而目标对象从不关注智能语音输出设备改变为关注智能语音输出设备,则可以认为目标对象听到了预设语音信息,并准备接收后续的语音交互信息。In this optional implementation manner, it is determined whether the target object has received the preset voice information by comparing the difference between the first image data and the second image data acquired in the range of voice interaction before and after sending out the preset voice information. For example, if there is no target object in the first image data, but the target object appears in the second image data, it can be considered that the target object moves to the voice interaction range after hearing the preset voice information to receive the subsequent voice information ; For another example, both the first image data and the second image data include the target object, and the target object changes from paying attention to the intelligent voice output device to paying attention to the intelligent voice output device, then it can be considered that the target object has heard the preset voice information, and Prepare to receive subsequent voice interaction information.
在本实施例的一个可选实现方式中,如图5所示,所述步骤S402,即通过比较所述第二图像数据以及输出所述预设语音信息之前获得的第一图像数据,确定所述目标对象是否接收到所述预设语音信息的步骤,进一步包括以下步骤S501-S502:In an optional implementation manner of this embodiment, as shown in FIG. 5 , the step S402 is to determine the The step of whether the target object receives the preset voice information further includes the following steps S501-S502:
在步骤S501中,识别所述第一图像数据中所述目标对象的第一人脸和第二图像数据中的所述目标对象的第二人脸;In step S501, identifying a first human face of the target object in the first image data and a second human face of the target object in the second image data;
在步骤S502中,通过比较所述第一人脸和所述第二人脸的面部信息,确定所述目标对象是否接收到所述预设语音信息。In step S502, it is determined whether the target object has received the preset voice information by comparing facial information of the first human face and the second human face.
该可选的实现方式中,通过图像数据确定目标对象在接收到预设语音信息即探测语音前后面部信息的变化来确定目标对象是否接收到预设语音信息。通过图像传感器获取智能语音输出设备发出探测语音之前和之后的第一图像数据和第二图像数据,并从中识别出目标对象面部信息的状态变化,之后基于面部信息的状态变化来确定目标对象是否接收到预设语音信息。例如,通过识别目标对象的面部朝向、面部轮廓以及瞳孔焦距等确定目标对象的注意力从不在智能语音输出设备转换到智能语音输出设备,可以认为目标对象接收到了预设语音信息。在实现过程中,可以通过收集训练样本,并训练出相应的人工智能模型,由人工智能模型来根据目标对象前后状态变化识别出目标对象是否接收到预设语音信息。通过这种方式,可以提高判断目标对象反馈信息的准确性。In this optional implementation manner, whether the target object has received the preset voice information is determined by determining the change of the face information of the target object before and after receiving the preset voice information, that is, detecting the voice, through the image data. The image sensor is used to obtain the first image data and the second image data before and after the intelligent voice output device sends out the detection voice, and recognize the state change of the target object's facial information, and then determine whether the target object receives the signal based on the state change of the facial information. to preset voice messages. For example, by identifying the face orientation, facial contour, and pupil focal length of the target object, it is determined that the target object's attention is shifted from the intelligent voice output device to the intelligent voice output device, and it can be considered that the target object has received the preset voice information. In the implementation process, the corresponding artificial intelligence model can be trained by collecting training samples, and the artificial intelligence model can identify whether the target object has received the preset voice information according to the state changes of the target object. In this way, the accuracy of judging the feedback information of the target object can be improved.
在本实施例的一个可选实现方式中,所述步骤S102,即获取所述语音交互的目标对象对所述预设语音信息的反馈信息的步骤,进一步包括:In an optional implementation of this embodiment, the step S102, that is, the step of obtaining the feedback information of the voice interaction target object on the preset voice information, further includes:
确定是否接收到所述目标对象在语音交互范围内的位置信息。Determine whether the location information of the target object within the voice interaction range is received.
该可选的实现方式中,还可以通过获取目标对象的位置信息来确定目标对象的反馈信息。目标对象的位置信息可以通过WIFI设备、蓝牙设备、ZigBee设备、雷达设备等位置传感器确定。例如,目标对象携带有WIFI设备,智能语音输出设备通过获取目标对象的WIFI信息确定目标对象是否在语音交互范围内,如果目标对象在语音交互范围内,可以认为目标对象接收到了预设语音信息。该种方法可以通过位置传感器确定特定的目标对象是否在语音交互范围内,实现方式较为简单,成本较低。In this optional implementation manner, the feedback information of the target object may also be determined by acquiring the position information of the target object. The location information of the target object can be determined by location sensors such as WIFI devices, Bluetooth devices, ZigBee devices, and radar devices. For example, the target object carries a WIFI device, and the intelligent voice output device determines whether the target object is within the voice interaction range by obtaining the WIFI information of the target object. If the target object is within the voice interaction range, it can be considered that the target object has received the preset voice information. This method can determine whether a specific target object is within the voice interaction range through a position sensor, and the implementation method is relatively simple and the cost is low.
在本实施例的一个可选实现方式中,语音交互方法还包括:In an optional implementation of this embodiment, the voice interaction method further includes:
在所述目标对象的反馈信息不满足预设条件时,重新发送所述预设语音信息。When the feedback information of the target object does not meet the preset condition, the preset voice information is resent.
该可选的实现方式中,如果目标对象的反馈信息不满足预设条件,即通过判断确定目标对象并没有接收到预设语音信息时,可以重新发送预设语音信息,可以立即重新发送预设语音信息,也可以经过一段时间后再次发送预设语音信息,具体可根据实际情况来设置,在此不做限制。In this optional implementation, if the feedback information of the target object does not meet the preset conditions, that is, when it is determined through judgment that the target object has not received the preset voice information, the preset voice information can be resent, and the preset voice information can be resent immediately. The voice information can also send the preset voice information again after a period of time, which can be set according to the actual situation, and there is no limitation here.
在本实施例的一个可选实现方式中,上述重新发送所述预设语音信息的步骤,包括:In an optional implementation of this embodiment, the above step of resending the preset voice information includes:
在确定所述目标对象不在语音交互范围内时,延迟发送所述预设语音信息;或者,When it is determined that the target object is not within the voice interaction range, delay sending the preset voice information; or,
在确定所述目标对象在语音交互范围内时,提高音量发送所述预设语音信息。When it is determined that the target object is within the voice interaction range, the volume is increased to send the preset voice information.
该可选的实现方式中,如果确定目标对象不在语音交互范围内时,可以延迟发送预设语音信息,以便目标对象出现在语音交互范围内时,再次发送预设语音信息;如果目标对象在语音交互范围内,而没有听到预设语音信息时,可以提高音量再次发送预设语音信息,以引起目标对象的注意。通过这种方式,可以在不造成智能语音输出设备进入混乱的状态的情况下,确保目标对象能够接收到语音交互信息,防止遗漏重要信息。In this optional implementation, if it is determined that the target object is not within the voice interaction range, the preset voice information can be delayed, so that when the target object appears within the voice interaction range, the preset voice information is sent again; Within the interaction range, if the preset voice information is not heard, the volume can be increased to send the preset voice information again to attract the attention of the target object. In this way, it is possible to ensure that the target object can receive the voice interaction information without causing the intelligent voice output device to enter a state of confusion, so as to prevent omission of important information.
下面通过具体实例详细说明本公开实施例的示例性应用场景。Exemplary application scenarios of the embodiments of the present disclosure will be described in detail below through specific examples.
实施例一:Embodiment one:
在本实施例中,给出了一种基于智能音箱的自然语音交互实例。通过调取本地存储的日程信息,智能音箱得到一个“下午提醒我给儿子准备生日礼物”。在一种实施方式中,智能音箱通过摄像头发现目标对象,例如通过安装在智能音箱之上的智能摄像头对观测区域内进行目标识别,并通过对面部信息的处理得到一个用户为“机主用户王先生”,并通过操作系统获得时间为“下午3点”,此时自然语言处理模块将日程信息与元素信息进行关联计算,得到需要激活一个自然语音交互过程,并通过智能手机发送语音信息。在本实施例中,智能终端首先发送“王先生,下午好”的语音信息,并进入对目标对象王先生的状态监测过程。智能终端在发送该语音信息前通过智能摄像头持续对目标对象的面部信息进行识别,检测结果为没有检测到目标对象的面部信息。在发送语音信息后,目标对象转身或转动头部面向智能设备,智能终端对目标对象的面部信息进行识别,检测结果为发现目标对象的面部信息,并进一步评估目标对象的面部朝向。智能终端进一步检测到目标对象的面部朝向与智能终端一致,也就是图像传感器采集到的面部图像为正向面对。此时,智能终端识别到目标对象的姿态转移,则判断进入后续人机交互过程。此时,智能终端在没有收到任何目标对象的语音信息反馈时,就发出了后续的语音信息“记得给儿子准备生日礼物”。在另外一种情况下,智能设备在发送语音信息后没有监测到目标对象的姿态转移,则暂停后续的人机交互过程。同时,智能终端提高音量并发送“王先生,下午好”的信息并重新进入目标对象姿态监测状态。在另外一种情况下,智能设备在发送语音信息后识别到目标对象的面部信息,但是目标对象的面部信息与机主王先生不匹配,则关闭自然语音交互过程同时在系统内存储该提醒未完成的状态。In this embodiment, an example of natural voice interaction based on a smart speaker is given. By calling the locally stored schedule information, the smart speaker gets a "remind me to prepare a birthday present for my son in the afternoon". In one embodiment, the smart speaker finds the target object through the camera, for example, through the smart camera installed on the smart speaker, it recognizes the target in the observation area, and obtains a user as the "owner user king" through the processing of facial information. Mr.", and the time is "3:00 p.m." obtained through the operating system. At this time, the natural language processing module associates the schedule information with the element information, obtains the need to activate a natural voice interaction process, and sends voice information through the smartphone. In this embodiment, the smart terminal first sends the voice message "Mr. Wang, good afternoon", and enters into the state monitoring process of the target object, Mr. Wang. Before sending the voice information, the smart terminal continuously recognizes the facial information of the target object through the smart camera, and the detection result is that no facial information of the target object is detected. After sending the voice information, the target object turns around or turns his head to face the smart device, and the smart terminal recognizes the facial information of the target object, and the detection result is to find the facial information of the target object, and further evaluate the face orientation of the target object. The smart terminal further detects that the face orientation of the target object is consistent with the smart terminal, that is, the facial image collected by the image sensor is facing forward. At this point, when the smart terminal recognizes the posture transfer of the target object, it determines to enter the subsequent human-computer interaction process. At this time, when the smart terminal does not receive any voice information feedback from the target object, it sends a follow-up voice message "remember to prepare a birthday gift for my son". In another case, if the smart device fails to detect the posture transfer of the target object after sending the voice information, the subsequent human-computer interaction process is suspended. At the same time, the smart terminal raises the volume and sends the message "Mr. Wang, good afternoon" and re-enters the state of target attitude monitoring. In another case, if the smart device recognizes the facial information of the target object after sending the voice information, but the facial information of the target object does not match the owner Mr. Wang, the natural voice interaction process is closed and the reminder is not stored in the system. state of completion.
实施例二:Embodiment two:
在本实施例中,给出了一种基于智能手机的自然语音交互实例。在本实施例中,智能手机处于息屏休眠状态,因此无法通过显示屏与用户发起信息交互。智能手机系统识别到收到一个邮件并且该邮件被标记为“紧急”。此时,用户正在使用电脑通过手机连接到网络,因此智能手机根据网络共享状态检测到用户的存在,此时主动发起一次自然语音交互。智能手机发送“王先生,你好”,此后智能手机启动前置摄像头对状态进行检测。当智能手机检测到用户的面部图像后,发送进一步的语音信息“您有一封紧急的邮件,请查收”。如果在预计时间段内智能手机的前置摄像头没有检测到目标对象的面部信息,则终止后续的语音交互。In this embodiment, an example of natural voice interaction based on a smart phone is given. In this embodiment, the smart phone is in a dormant state with the screen off, so information interaction with the user cannot be initiated through the display screen. The smartphone system recognizes that an email has been received and the email is marked as "urgent". At this time, the user is using a computer to connect to the network through a mobile phone, so the smart phone detects the existence of the user according to the network sharing status, and initiates a natural voice interaction at this time. The smart phone sends "Mr. Wang, hello", after which the smart phone starts the front camera to detect the status. When the smartphone detects the user's face image, it sends a further voice message "You have an urgent email, please check it". If the face information of the target object is not detected by the front camera of the smart phone within the estimated time period, the subsequent voice interaction is terminated.
下述为本公开装置实施例,可以用于执行本公开方法实施例。The following are device embodiments of the present disclosure, which can be used to implement the method embodiments of the present disclosure.
图6示出根据本公开一实施方式的语音交互装置的结构框图,该装置可以通过软件、硬件或者两者的结合实现成为电子设备的部分或者全部。如图6所示,所述语音交互装置包括第一输出模块601、第一获取模块602和第二输出模块603:Fig. 6 shows a structural block diagram of a voice interaction device according to an embodiment of the present disclosure, and the device can be implemented as part or all of an electronic device through software, hardware or a combination of the two. As shown in Figure 6, the voice interaction device includes a first output module 601, a first acquisition module 602 and a second output module 603:
第一输出模块601,被配置为响应于激活语音交互的预设事件,输出预设语音信息;The first output module 601 is configured to output preset voice information in response to a preset event for activating voice interaction;
第一获取模块602,被配置为获取所述语音交互的目标对象对所述预设语音信息的反馈信息,所述反馈信息为非语音信息;The first acquiring module 602 is configured to acquire feedback information of the voice interaction target object on the preset voice information, where the feedback information is non-voice information;
第二输出模块603,被配置为在所述目标对象的反馈信息满足预设条件时,输出语音交互信息。The second output module 603 is configured to output voice interaction information when the feedback information of the target object satisfies a preset condition.
在本实施例的一个可选实现方式中,In an optional implementation of this embodiment,
所述第一输出模块,包括以下至少之一:The first output module includes at least one of the following:
第一响应子模块,被配置为响应于到达预设的时间,输出所述预设语音信息;The first response submodule is configured to output the preset voice information in response to reaching a preset time;
第二响应子模块,被配置为响应于接收到预设信息,输出所述预设语音信息;The second response submodule is configured to output the preset voice information in response to receiving the preset information;
第三响应子模块,被配置为响应于感测到所述目标对象在语音交互范围内时,输出所述预设语音信息。The third response submodule is configured to output the preset voice information in response to sensing that the target object is within the voice interaction range.
在本实施例的一个可选实现方式中,所述第一输出模块601,包括:In an optional implementation manner of this embodiment, the first output module 601 includes:
第一获取子模块,被配置为获取所述语音交互范围内的第一图像数据;The first acquisition submodule is configured to acquire the first image data within the voice interaction range;
第一输出子模块,被配置为根据所述第一图像数据识别出所述目标对象时,输出所述预设语音信息。The first output submodule is configured to output the preset voice information when the target object is recognized according to the first image data.
在本实施例的一个可选实现方式中,所述第一获取模块602,包括:In an optional implementation manner of this embodiment, the first obtaining module 602 includes:
第二获取子模块,被配置为获取所述预设语音信息输出后的第二图像数据;The second acquisition submodule is configured to acquire the second image data after the output of the preset voice information;
第一确定子模块,被配置为根据所述第二图像数据确定所述目标对象是否接收到所述预设语音信息。The first determining submodule is configured to determine whether the target object has received the preset voice information according to the second image data.
在本实施例的一个可选实现方式中,所述第一确定子模块,包括:In an optional implementation manner of this embodiment, the first determining submodule includes:
第二确定子模块,被配置为根据所述第二图像数据确定所述目标对象在语音交互范围之内时,确定所述目标对象接收到所述预设语音信息;或者,The second determining submodule is configured to determine that the target object has received the preset voice information when it is determined according to the second image data that the target object is within the voice interaction range; or,
第三确定子模块,被配置为在确定所述第二图像数据中所述目标对象的面部信息的方位与所述预设语音信息的输出设备的方位在第一预设误差范围内时,确定所述目标对象接收到所述预设语音信息。The third determination sub-module is configured to determine that when it is determined that the orientation of the facial information of the target object in the second image data and the orientation of the output device of the preset voice information are within a first preset error range The target object receives the preset voice information.
在本实施例的一个可选实现方式中,所述第一获取模块602,还包括:In an optional implementation manner of this embodiment, the first acquiring module 602 further includes:
第三获取子模块,被配置为获取所述预设语音信息输出后的第二图像数据;The third acquisition submodule is configured to acquire the second image data after the output of the preset voice information;
第四确定子模块,被配置为通过比较所述第二图像数据以及输出所述预设语音信息之前获得的第一图像数据,确定所述目标对象是否接收到所述预设语音信息。The fourth determining submodule is configured to determine whether the target object has received the preset voice information by comparing the second image data with the first image data obtained before outputting the preset voice information.
在本实施例的一个可选实现方式中,所述第四确定子模块,包括:In an optional implementation manner of this embodiment, the fourth determining submodule includes:
识别子模块,被配置为识别所述第一图像数据中所述目标对象的第一人脸和第二图像数据中的所述目标对象的第二人脸;an identification submodule configured to identify the first human face of the target object in the first image data and the second human face of the target object in the second image data;
第五确定子模块,被配置为通过比较所述第一人脸和所述第二人脸的面部信息,确定所述目标对象是否接收到所述预设语音信息。The fifth determining submodule is configured to determine whether the target object has received the preset voice information by comparing facial information of the first human face and the second human face.
在本实施例的一个可选实现方式中,所述第一获取模块602,还包括:In an optional implementation manner of this embodiment, the first acquiring module 602 further includes:
第六确定子模块,被配置为确定是否接收到所述目标对象在语音交互范围内的位置信息。The sixth determining submodule is configured to determine whether the location information of the target object within the voice interaction range is received.
在本实施例的一个可选实现方式中,语音交互装置还包括:In an optional implementation of this embodiment, the voice interaction device further includes:
发送模块,被配置为在所述目标对象的反馈信息不满足预设条件时,重新发送所述预设语音信息。The sending module is configured to resend the preset voice information when the feedback information of the target object does not meet the preset condition.
在本实施例的一个可选实现方式中,所述发送模块,包括:In an optional implementation of this embodiment, the sending module includes:
第一发送子模块,被配置为在确定所述目标对象不在语音交互范围内时,延迟发送所述预设语音信息;或者,The first sending submodule is configured to delay sending the preset voice information when it is determined that the target object is not within the voice interaction range; or,
第二发送子模块,被配置为在确定所述目标对象在语音交互范围内时,提高音量发送所述预设语音信息。The second sending submodule is configured to increase the volume and send the preset voice information when it is determined that the target object is within the voice interaction range.
上述语音交互装置与图1至图5所示实施例及相关实施例中描述的语音交互方法对应一致,具体细节可参考上述对语音交互方法的描述,在此不再赘述。The voice interaction device described above corresponds to the voice interaction method described in the embodiments shown in FIG. 1 to FIG. 5 and related embodiments. For details, please refer to the above description of the voice interaction method, which will not be repeated here.
图7是适于用来实现根据本公开实施方式的语音交互方法的电子设备的结构示意图。Fig. 7 is a schematic structural diagram of an electronic device suitable for implementing a voice interaction method according to an embodiment of the present disclosure.
如图7所示,电子设备700包括中央处理单元(CPU)701,其可以根据存储在只读存储器(ROM)702中的程序或者从存储部分708加载到随机访问存储器(RAM)703中的程序而执行上述图1所示的实施方式中的各种处理。在RAM703中,还存储有电子设备700操作所需的各种程序和数据。CPU701、ROM702以及RAM703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7 , an electronic device 700 includes a central processing unit (CPU) 701, which can operate according to a program stored in a read-only memory (ROM) 702 or a program loaded from a storage section 708 into a random access memory (RAM) 703 Instead, various processes in the embodiment shown in FIG. 1 described above are executed. In the RAM 703, various programs and data necessary for the operation of the electronic device 700 are also stored. The CPU 701 , ROM 702 , and RAM 703 are connected to each other via a bus 704 . An input/output (I/O) interface 705 is also connected to the bus 704 .
以下部件连接至I/O接口705:包括键盘、鼠标等的输入部分706;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分707;包括硬盘等的存储部分708;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分709。通信部分709经由诸如因特网的网络执行通信处理。驱动器710也根据需要连接至I/O接口705。可拆卸介质711,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器710上,以便于从其上读出的计算机程序根据需要被安装入存储部分708。The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, etc.; an output section 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 708 including a hard disk, etc. and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the Internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, optical disk, magneto-optical disk, semiconductor memory, etc. is mounted on the drive 710 as necessary so that a computer program read therefrom is installed into the storage section 708 as necessary.
特别地,根据本公开的实施方式,上文参考图1描述的方法可以被实现为计算机软件程序。例如,本公开的实施方式包括一种计算机程序产品,其包括有形地包含在及其可读介质上的计算机程序,所述计算机程序包含用于执行图1的方法的程序代码。在这样的实施方式中,该计算机程序可以通过通信部分709从网络上被下载和安装,和/或从可拆卸介质711被安装。In particular, according to an embodiment of the present disclosure, the method described above with reference to FIG. 1 may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable therefrom, the computer program comprising program code for performing the method of FIG. 1 . In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709 and/or installed from a removable medium 711 .
附图中的流程图和框图,图示了按照本公开各种实施方式的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,路程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a roadmap or block diagram may represent a module, program segment, or part of code that contains one or more Executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
描述于本公开实施方式中所涉及到的单元或模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元或模块也可以设置在处理器中,这些单元或模块的名称在某种情况下并不构成对该单元或模块本身的限定。The units or modules involved in the embodiments described in the present disclosure may be implemented by means of software or hardware. The described units or modules may also be set in the processor, and the names of these units or modules do not constitute limitations on the units or modules themselves in some cases.
作为另一方面,本公开还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施方式中所述装置中所包含的计算机可读存储介质;也可以是单独存在,未装配入设备中的计算机可读存储介质。计算机可读存储介质存储有一个或者一个以上程序,所述程序被一个或者一个以上的处理器用来执行描述于本公开的方法。As another aspect, the present disclosure also provides a computer-readable storage medium. The computer-readable storage medium may be the computer-readable storage medium included in the device described in the above-mentioned embodiments; A computer-readable storage medium assembled in a device. The computer-readable storage medium stores one or more programs, and the programs are used by one or more processors to execute the methods described in the present disclosure.
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离所述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present disclosure and an illustration of the applied technical principles. It should be understood by those skilled in the art that the scope of the invention involved in this disclosure is not limited to the technical solution formed by the specific combination of the above technical features, but also covers the technical solutions made by the above technical features without departing from the inventive concept. Other technical solutions formed by any combination of or equivalent features thereof. For example, a technical solution formed by replacing the above-mentioned features with (but not limited to) technical features with similar functions disclosed in this disclosure.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810186219.3ACN108510986A (en) | 2018-03-07 | 2018-03-07 | Voice interactive method, device, electronic equipment and computer readable storage medium |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810186219.3ACN108510986A (en) | 2018-03-07 | 2018-03-07 | Voice interactive method, device, electronic equipment and computer readable storage medium |
| Publication Number | Publication Date |
|---|---|
| CN108510986Atrue CN108510986A (en) | 2018-09-07 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810186219.3APendingCN108510986A (en) | 2018-03-07 | 2018-03-07 | Voice interactive method, device, electronic equipment and computer readable storage medium |
| Country | Link |
|---|---|
| CN (1) | CN108510986A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109584877A (en)* | 2019-01-02 | 2019-04-05 | 百度在线网络技术(北京)有限公司 | Interactive voice control method and device |
| CN110262767A (en)* | 2019-06-03 | 2019-09-20 | 清华大学 | Based on voice input Rouser, method and the medium close to mouth detection |
| CN111309198A (en)* | 2018-12-11 | 2020-06-19 | 阿里巴巴集团控股有限公司 | Information output method and information output equipment |
| CN111540383A (en)* | 2019-02-06 | 2020-08-14 | 丰田自动车株式会社 | Voice conversation device and its control device, control program and control method |
| CN111625094A (en)* | 2020-05-25 | 2020-09-04 | 北京百度网讯科技有限公司 | Interaction method and device for intelligent rearview mirror, electronic equipment and storage medium |
| CN111724772A (en)* | 2019-03-20 | 2020-09-29 | 阿里巴巴集团控股有限公司 | Interaction method, device and smart device for a smart device |
| CN112002317A (en)* | 2020-07-31 | 2020-11-27 | 北京小米松果电子有限公司 | Voice output method, device, storage medium and electronic equipment |
| CN112866066A (en)* | 2021-01-07 | 2021-05-28 | 上海喜日电子科技有限公司 | Interaction method, device, system, electronic equipment and storage medium |
| CN113674749A (en)* | 2021-08-26 | 2021-11-19 | 珠海格力电器股份有限公司 | Control method, control device, electronic equipment and storage medium |
| US11205431B2 (en) | 2019-01-02 | 2021-12-21 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method, apparatus and device for presenting state of voice interaction device, and storage medium |
| CN114721310A (en)* | 2022-04-08 | 2022-07-08 | 北京有竹居网络技术有限公司 | Intelligent device control method, expansion device control device and storage medium |
| CN116224874A (en)* | 2023-03-10 | 2023-06-06 | 四川长虹电器股份有限公司 | Control method based on multi-mode interaction |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060036478A1 (en)* | 2004-08-12 | 2006-02-16 | Vladimir Aleynikov | System, method and computer program for interactive voice recognition scheduler, reminder and messenger |
| CN103869945A (en)* | 2012-12-14 | 2014-06-18 | 联想(北京)有限公司 | Information interaction method, information interaction device and electronic device |
| CN104879882A (en)* | 2015-04-30 | 2015-09-02 | 广东美的制冷设备有限公司 | Method and system for controlling air conditioner |
| CN106225174A (en)* | 2016-08-22 | 2016-12-14 | 珠海格力电器股份有限公司 | Air conditioner control method and system and air conditioner |
| CN107480851A (en)* | 2017-06-29 | 2017-12-15 | 北京小豆儿机器人科技有限公司 | A kind of intelligent health management system based on endowment robot |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060036478A1 (en)* | 2004-08-12 | 2006-02-16 | Vladimir Aleynikov | System, method and computer program for interactive voice recognition scheduler, reminder and messenger |
| CN103869945A (en)* | 2012-12-14 | 2014-06-18 | 联想(北京)有限公司 | Information interaction method, information interaction device and electronic device |
| CN104879882A (en)* | 2015-04-30 | 2015-09-02 | 广东美的制冷设备有限公司 | Method and system for controlling air conditioner |
| CN106225174A (en)* | 2016-08-22 | 2016-12-14 | 珠海格力电器股份有限公司 | Air conditioner control method and system and air conditioner |
| CN107480851A (en)* | 2017-06-29 | 2017-12-15 | 北京小豆儿机器人科技有限公司 | A kind of intelligent health management system based on endowment robot |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111309198A (en)* | 2018-12-11 | 2020-06-19 | 阿里巴巴集团控股有限公司 | Information output method and information output equipment |
| CN109584877A (en)* | 2019-01-02 | 2019-04-05 | 百度在线网络技术(北京)有限公司 | Interactive voice control method and device |
| US11205431B2 (en) | 2019-01-02 | 2021-12-21 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method, apparatus and device for presenting state of voice interaction device, and storage medium |
| CN111540383A (en)* | 2019-02-06 | 2020-08-14 | 丰田自动车株式会社 | Voice conversation device and its control device, control program and control method |
| CN111724772A (en)* | 2019-03-20 | 2020-09-29 | 阿里巴巴集团控股有限公司 | Interaction method, device and smart device for a smart device |
| CN111724772B (en)* | 2019-03-20 | 2024-12-24 | 阿里巴巴集团控股有限公司 | Interaction method and device of intelligent device and intelligent device |
| CN110262767A (en)* | 2019-06-03 | 2019-09-20 | 清华大学 | Based on voice input Rouser, method and the medium close to mouth detection |
| CN110262767B (en)* | 2019-06-03 | 2022-03-11 | 交互未来(北京)科技有限公司 | Voice input wake-up apparatus, method, and medium based on near-mouth detection |
| CN111625094A (en)* | 2020-05-25 | 2020-09-04 | 北京百度网讯科技有限公司 | Interaction method and device for intelligent rearview mirror, electronic equipment and storage medium |
| CN111625094B (en)* | 2020-05-25 | 2023-07-14 | 阿波罗智联(北京)科技有限公司 | Interaction method and device of intelligent rearview mirror, electronic equipment and storage medium |
| CN112002317A (en)* | 2020-07-31 | 2020-11-27 | 北京小米松果电子有限公司 | Voice output method, device, storage medium and electronic equipment |
| CN112002317B (en)* | 2020-07-31 | 2023-11-14 | 北京小米松果电子有限公司 | Voice output method, device, storage medium and electronic equipment |
| CN112866066A (en)* | 2021-01-07 | 2021-05-28 | 上海喜日电子科技有限公司 | Interaction method, device, system, electronic equipment and storage medium |
| CN113674749A (en)* | 2021-08-26 | 2021-11-19 | 珠海格力电器股份有限公司 | Control method, control device, electronic equipment and storage medium |
| CN114721310A (en)* | 2022-04-08 | 2022-07-08 | 北京有竹居网络技术有限公司 | Intelligent device control method, expansion device control device and storage medium |
| CN116224874A (en)* | 2023-03-10 | 2023-06-06 | 四川长虹电器股份有限公司 | Control method based on multi-mode interaction |
| Publication | Publication Date | Title |
|---|---|---|
| CN108510986A (en) | Voice interactive method, device, electronic equipment and computer readable storage medium | |
| EP3567584B1 (en) | Electronic apparatus and method for operating same | |
| US10423235B2 (en) | Primary device that interfaces with a secondary device based on gesture commands | |
| EP3010015B1 (en) | Electronic device and method for spoken interaction thereof | |
| KR20180083587A (en) | Electronic device and operating method thereof | |
| KR102324074B1 (en) | Method for controlling sound output and an electronic device thereof | |
| CN110199350A (en) | The electronic equipment of the method and realization this method that terminate for sense speech | |
| WO2015130859A1 (en) | Performing actions associated with individual presence | |
| US12225427B2 (en) | Tracking proximities of devices and/or objects | |
| CN107582028A (en) | Sleep monitor method and device | |
| US10334100B2 (en) | Presence-based device mode modification | |
| CN107452383B (en) | Information processing method, server, terminal and information processing system | |
| US20170064084A1 (en) | Method and Apparatus for Implementing Voice Mailbox | |
| EP4080862B1 (en) | Intelligent reminding method and device | |
| WO2016145855A1 (en) | Event prompt method and device | |
| WO2016173300A1 (en) | Electronic device alarm clock reminder method and system, electronic device and terminal | |
| CN108418975A (en) | alarm clock reminding method, device, terminal and computer readable storage medium | |
| US10219127B2 (en) | Information processing apparatus and information processing method | |
| EP3001652B1 (en) | Method for providing information and an electronic device thereof | |
| CN103986819B (en) | Terminal control method and relevant apparatus | |
| CN107800883A (en) | Calendar reminder anomaly detection method, device and mobile terminal | |
| WO2017049481A1 (en) | Information reminder method and smart wristband | |
| CN112365899B (en) | Voice processing method, device, storage medium and terminal equipment | |
| CN110995932A (en) | Memorandum setting method and device, storage medium and electronic equipment | |
| CN111479060B (en) | Image acquisition method and device, storage medium and electronic equipment |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication | Application publication date:20180907 |