Movatterモバイル変換


[0]ホーム

URL:


CN104820556A - Method and device for waking up voice assistant - Google Patents

Method and device for waking up voice assistant
Download PDF

Info

Publication number
CN104820556A
CN104820556ACN201510227622.2ACN201510227622ACN104820556ACN 104820556 ACN104820556 ACN 104820556ACN 201510227622 ACN201510227622 ACN 201510227622ACN 104820556 ACN104820556 ACN 104820556A
Authority
CN
China
Prior art keywords
face image
ambient sound
voice assistant
preset condition
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510227622.2A
Other languages
Chinese (zh)
Inventor
张绍儒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Original Assignee
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shiyuan Electronics Thecnology Co LtdfiledCriticalGuangzhou Shiyuan Electronics Thecnology Co Ltd
Priority to CN201510227622.2ApriorityCriticalpatent/CN104820556A/en
Publication of CN104820556ApublicationCriticalpatent/CN104820556A/en
Pendinglegal-statusCriticalCurrent

Links

Landscapes

Abstract

The invention relates to a method and a device for waking up a voice assistant. The method comprises the following steps: acquiring a face image and corresponding environmental sound; detecting whether the face image meets a first preset condition or not; if the face image meets a first preset condition, detecting whether the environmental sound meets a second preset condition; and if the environmental sound meets a second preset condition, waking up the voice assistant, and inputting the environmental sound into the voice assistant as a voice operation instruction. When the face image and the environmental sound meet the corresponding preset conditions, the voice assistant is automatically awakened, and the environmental sound is input into the voice assistant as a voice operation instruction, namely, a user can directly issue the operation instruction, so that the redundant step of voice triggering is omitted, the simplification of communication is realized, and the electric quantity of equipment provided with the voice assistant is saved.

Description

Translated fromChinese
唤醒语音助手的方法及装置Method and device for waking up voice assistant

技术领域technical field

本发明涉及通信技术领域,特别是涉及一种唤醒语音助手的方法、唤醒语音助手的装置。The invention relates to the technical field of communication, in particular to a method for waking up a voice assistant and a device for waking up the voice assistant.

背景技术Background technique

语音识别技术从20世纪50年代开始出现,刚开始发展比较缓慢,只能识别少量的孤立的词汇,直至90年代,这项技术开始在应用和产品化方面取得了比较大的突破,成为了技术研究的重点,其中应用较广泛的有苹果公司的Siri(苹果智能语音助手),国内的科大讯飞、百度语音、搜狗语音助手等。Speech recognition technology began to appear in the 1950s, and it developed slowly at the beginning, and could only recognize a small number of isolated words. Until the 1990s, this technology began to make relatively great breakthroughs in application and productization, and became a technology The focus of the research, which is widely used is Apple's Siri (Apple's intelligent voice assistant), domestic iFlytek, Baidu voice, Sogou voice assistant, etc.

在现有的各语音助手中,一般需要输入特定的语音完成触发,从而使语音助手处于语音待输入状态。例如在接电源的情况下,只要对着带Siri的ios(苹果公司的移动操作系统)设备说一声Hey Siri,就会唤醒Siri服务。语音触发作为语音输入的开启动作,不需要接触设备,很好地解决了在特定环境下进行语音输入的问题。In the existing voice assistants, it is generally necessary to input a specific voice to complete the trigger, so that the voice assistant is in a voice waiting state. For example, when the power is connected, as long as you say Hey Siri to the ios (Apple's mobile operating system) device with Siri, the Siri service will be awakened. Voice trigger is used as the opening action of voice input, and does not need to touch the device, which solves the problem of voice input in a specific environment.

但是,现有的语音助手在使用时需要特定语音唤醒,而长时间开启语音唤醒功能会消耗较多的电量,并且用户需要先通过特定语音唤醒语音助手,然后再向语音助手输入相应的语音操作指令,过程较为冗余。However, existing voice assistants require a specific voice wake-up when in use, and turning on the voice wake-up function for a long time will consume more power, and the user needs to wake up the voice assistant through a specific voice first, and then input the corresponding voice operation to the voice assistant Instructions, the process is more redundant.

发明内容Contents of the invention

基于此,有必要针对上述问题,提供一种操作简单的唤醒语音助手的方法及装置。Based on this, it is necessary to provide an easy-to-operate method and device for waking up a voice assistant to address the above problems.

一种唤醒语音助手的方法,包括步骤:A method for waking up a voice assistant, comprising steps:

获取人脸图像以及对应的环境声音;Obtain the face image and the corresponding ambient sound;

检测所述人脸图像是否满足第一预设条件;Detecting whether the face image satisfies a first preset condition;

若所述人脸图像满足第一预设条件,检测所述环境声音是否满足第二预设条件;If the face image satisfies the first preset condition, detecting whether the ambient sound satisfies the second preset condition;

若所述环境声音满足第二预设条件,则唤醒语音助手,并将所述环境声音作为语音操作指令输入语音助手。If the ambient sound satisfies the second preset condition, wake up the voice assistant, and input the ambient sound as a voice operation instruction into the voice assistant.

一种唤醒语音助手的装置,包括:A device for waking up a voice assistant, comprising:

人脸图像获取模块,用于获取人脸图像;A human face image acquisition module is used to obtain a human face image;

环境声音获取模块,用于获取与人脸图像对应的环境声音;The ambient sound acquisition module is used to acquire the ambient sound corresponding to the face image;

人脸图像检测模块,用于检测所述人脸图像是否满足第一预设条件;A face image detection module, configured to detect whether the face image satisfies a first preset condition;

环境声音检测模块,用于在所述人脸图像满足第一预设条件时,检测所述环境声音是否满足第二预设条件;An ambient sound detection module, configured to detect whether the ambient sound satisfies a second preset condition when the face image satisfies a first preset condition;

唤醒模块,用于在所述环境声音满足第二预设条件时,唤醒语音助手,并将所述环境声音作为语音操作指令输入语音助手。A wake-up module, configured to wake up the voice assistant when the ambient sound satisfies a second preset condition, and input the ambient sound as a voice operation instruction into the voice assistant.

本发明唤醒语音助手的方法及装置,在人脸图像和环境声音满足相应的预设条件时,自动唤醒语音助手,并同时将所述环境声音作为语音操作指令输入语音助手,也即是用户可以直接下发操作指令,免去了语音触发的冗余步骤,实现了通信的简化,并节省了安装有语音助手的设备的电量。The method and device for waking up the voice assistant of the present invention automatically wakes up the voice assistant when the face image and the ambient sound meet the corresponding preset conditions, and at the same time inputs the ambient sound as a voice operation command into the voice assistant, that is, the user can Sending operation instructions directly eliminates the redundant steps of voice triggering, simplifies communication, and saves the power of devices equipped with voice assistants.

附图说明Description of drawings

图1为本发明方法实施例的流程示意图;Fig. 1 is the schematic flow sheet of the method embodiment of the present invention;

图2为本发明步骤S120具体实施例的流程示意图;FIG. 2 is a schematic flow chart of a specific embodiment of step S120 of the present invention;

图3为本发明步骤S130具体实施例的流程示意图;FIG. 3 is a schematic flow chart of a specific embodiment of step S130 of the present invention;

图4为本发明装置实施例的结构示意图;Fig. 4 is the structural representation of the device embodiment of the present invention;

图5为本发明人脸图像检测模块实施例的结构示意图;Fig. 5 is the structural representation of embodiment of face image detection module of the present invention;

图6为本发明第一判断单元实施例的结构示意图;FIG. 6 is a schematic structural diagram of an embodiment of a first judging unit of the present invention;

图7为本发明环境声音检测模块实施例的结构示意图。FIG. 7 is a schematic structural diagram of an embodiment of an ambient sound detection module of the present invention.

具体实施方式Detailed ways

下面结合附图对本发明唤醒语音助手的方法的具体实施方式做详细描述。The specific implementation of the method for waking up the voice assistant in the present invention will be described in detail below in conjunction with the accompanying drawings.

如图1所示,一种唤醒语音助手的方法,包括步骤:As shown in Figure 1, a method for waking up a voice assistant includes steps:

S110、获取人脸图像以及对应的环境声音;S110. Acquire a face image and a corresponding ambient sound;

S120、检测所述人脸图像是否满足第一预设条件;S120. Detect whether the face image satisfies a first preset condition;

S130、若所述人脸图像满足第一预设条件,检测所述环境声音是否满足第二预设条件;S130. If the face image satisfies the first preset condition, detect whether the ambient sound satisfies the second preset condition;

S140、若所述环境声音满足第二预设条件,则唤醒语音助手,并将所述环境声音作为语音操作指令输入语音助手。S140. If the ambient sound satisfies the second preset condition, wake up the voice assistant, and input the ambient sound as a voice operation instruction into the voice assistant.

人脸图像可以根据摄像头获取,环境声音可以根据麦克风等获取。获取的摄像头图像和环境声音需进行存储,以便后续的条件检测。为了节省存储空间,可以设置人脸图像和环境声音的保留存储时间,以便一次操作完成后,无用的数据及时清除,腾出存储空间。The face image can be obtained by the camera, and the ambient sound can be obtained by the microphone and the like. The acquired camera images and ambient sounds need to be stored for subsequent condition detection. In order to save storage space, you can set the retention storage time of face images and environmental sounds, so that after an operation is completed, useless data will be cleared in time to free up storage space.

获取人脸图像和环境声音后,可以先检测人脸图像是否满足第一预设条件,其中第一预设条件可以根据用户需要设置为各种具体形式。例如,如图2所示,步骤S120可以包括步骤:After the face image and ambient sound are acquired, it may be detected first whether the face image satisfies a first preset condition, wherein the first preset condition can be set in various specific forms according to user needs. For example, as shown in Figure 2, step S120 may include the steps of:

S1201、判断人脸图像是否为正面人脸图像,若是,进入步骤S1202,否则返回步骤S110;S1201, determine whether the face image is a frontal face image, if so, enter step S1202, otherwise return to step S110;

S1202、判断人脸图像中的嘴部是否有动作,若有动作,进入步骤S1203,否则返回步骤S110;S1202. Determine whether the mouth in the face image has any movement, if there is movement, go to step S1203, otherwise return to step S110;

S1203、判定所述人脸图像满足第一预设条件。S1203. Determine that the face image satisfies a first preset condition.

用户在使用语音助手时,一般正面面对智能手机或平板等,然后通过嘴部发出语音指令,也即是摄像头捕捉到的人脸图像是正面人脸图像,且检测到嘴部有动作时,按照经验,一般使用语音助手的可能性较大。判断人脸图像是否为正面人脸图像的方式有很多种,例如,步骤S1201包括步骤:When using the voice assistant, the user generally faces the smartphone or tablet, etc., and then sends a voice command through the mouth, that is, the face image captured by the camera is a frontal face image, and when the mouth is detected to move, According to experience, it is generally more likely to use voice assistants. There are many ways to judge whether the face image is a frontal face image, for example, step S1201 includes steps:

获取人脸图像中双眼的距离;Obtain the distance between the eyes in the face image;

判断双眼的距离是否在预设范围内;Determine whether the distance between the eyes is within the preset range;

若是,则判定人脸图像是正面人脸图像,否则判定人脸图像不是正面人脸图像。If so, it is determined that the human face image is a frontal human face image, otherwise it is determined that the human face image is not a frontal human face image.

由于现实中使用语音助手时,用户不可能完全正向摄像头,所以本发明检测时允许一定的偏值,即本发明不限制于确定人脸图像是正面人脸图像,也可以判断人脸图像是不是近似正面人脸图像。同时确定人脸图像是不是正面人脸图像也不仅仅限制于上述提供的方法,还可以根据现有技术中其它方式实现。Since it is impossible for the user to face the camera completely when using the voice assistant in reality, the present invention allows a certain bias during detection, that is, the present invention is not limited to determining whether the face image is a frontal face image, but can also determine whether the face image is It is not an approximate frontal face image. At the same time, determining whether the face image is a frontal face image is not limited to the method provided above, and can also be implemented in other ways in the prior art.

嘴部是人的主要声音源,当需要向语音助手发出语音指令时,会伴随着嘴部的动作。所以在判定人脸图像是正面人脸图像后,加入嘴部动作特征的判定,能够提高准确性。根据获取的人脸图像判断嘴部是否有动作可以根据现有技术中已有的方式实现。The mouth is the main source of human voice, and when it is necessary to issue voice commands to the voice assistant, it will be accompanied by the movement of the mouth. Therefore, after judging that the face image is a frontal face image, adding the judgment of mouth movement features can improve the accuracy. Judging whether there is any movement in the mouth according to the acquired face image can be realized according to existing methods in the prior art.

在检测到人脸图像时正面人脸图像且嘴部有动作时,即可以进入环境声音的检测步骤。当然用户可以根据需要添加其它的面部验证条件,从而进一步提高准确性,本发明在此不予详述。When the face image is detected and the face image is frontal and the mouth moves, the detection step of the ambient sound can be entered. Of course, the user can add other facial verification conditions as required, thereby further improving the accuracy, which will not be described in detail in the present invention.

如图3所示,步骤S130可以包括步骤:As shown in Figure 3, step S130 may include steps:

S1301、判断所述环境声音的音量是否在预设范围内;S1301. Determine whether the volume of the ambient sound is within a preset range;

S1302、判断与所述环境声音的声源的距离是否小于预设阈值;S1302. Determine whether the distance to the sound source of the ambient sound is smaller than a preset threshold;

S1303、若所述音量在预设范围内且与所述声源的距离小于预设阈值,则判定所述环境声音满足第二预设条件,否则返回步骤S110。S1303. If the volume is within a preset range and the distance from the sound source is smaller than a preset threshold, determine that the ambient sound satisfies a second preset condition; otherwise, return to step S110.

一般用户使用语音助手时,声音的音量不会太大或者太小,而且距离设备的距离不会太大,所以本发明加入了音量大小以及距离的判断。音量可以根据现有技术中的音量检测仪等得到,设备距离声源的距离可以根据声音在空气中的定性的衰减公式确定。在根据环境声音得到音量和距离后,即可以判断音量是否在预设范围内,距离是否小于预设阈值,从而确定环境声音是否满足第二预设条件,其中预设范围和预设阈值均可以根据经验自行设定。Generally, when a user uses a voice assistant, the volume of the voice will not be too loud or too low, and the distance from the device will not be too large, so the present invention adds the judgment of volume and distance. The volume can be obtained according to the volume detector in the prior art, and the distance between the device and the sound source can be determined according to the qualitative attenuation formula of sound in the air. After obtaining the volume and distance according to the ambient sound, it can be judged whether the volume is within the preset range and whether the distance is less than the preset threshold, so as to determine whether the ambient sound meets the second preset condition, wherein both the preset range and the preset threshold can be Set it yourself based on experience.

在环境声音满足第二预设条件时,确定用户需要对智能设备进行语音输入,唤醒语音助手,并同时将环境声音作为语音操作指令输入语音助手,语音助手直接执行相应的操作,免去了特定语音唤醒语音助手的步骤,用户操作简单。When the ambient sound satisfies the second preset condition, it is determined that the user needs to perform voice input to the smart device, wake up the voice assistant, and at the same time input the ambient sound as a voice operation command into the voice assistant, and the voice assistant directly performs the corresponding operation, eliminating the need for specific The steps of waking up the voice assistant by voice are easy for users to operate.

需要说明的是,本发明并不对人脸图像及环境声音的检测顺序加以限定,另外,用户可以根据实际情况进行其他对话特征的排查,例如检测到用户是在哼歌或自言自语时认为不满足第二预设条件等,本发明并不对判定条件加以限定。It should be noted that the present invention does not limit the detection sequence of face images and environmental sounds. In addition, users can check other dialogue features according to actual conditions, for example, when it is detected that the user is humming or talking to himself, he thinks that If the second preset condition and the like are not satisfied, the present invention does not limit the determination condition.

为了更好的理解本发明的实施过程,下面结合一个具体应用场景进行说明。In order to better understand the implementation process of the present invention, a specific application scenario will be described below.

在厨房中做菜时,我们想借助平板电脑进行菜谱展示,从而按照上面的步骤完成一道自己不熟悉的菜式。当做完一道工序之后我们希望平板电脑中的软件能够展示下一道工序,这时候便要向它输入指令,但是这时候有可能双手都在忙或者沾上油渍不方便触碰平板电脑,这时我们只需要脸转向平板电脑说一声“下一步”,然后本发明提供的语音唤醒软件判断出我们是在向平板电脑的语音助手输入指令,而不是对别的人说话,这样便直接完成了语音助手的唤醒步骤,并且语音助手可以立即将语音解析出来,并使菜谱软件的展示翻到下一步。When cooking in the kitchen, we want to use the tablet computer to display the recipe, so as to complete a dish that we are not familiar with according to the above steps. After finishing a process, we hope that the software in the tablet computer can display the next process. At this time, we need to input instructions to it. You only need to turn your face to the tablet computer and say "next step", and then the voice wake-up software provided by the present invention can judge that we are inputting instructions to the voice assistant of the tablet computer instead of talking to other people, so that the voice assistant is directly completed. The wake-up step, and the voice assistant can immediately analyze the voice and turn the display of the recipe software to the next step.

基于同一发明构思,本发明还提供一种唤醒语音助手的装置,下面结合附图对本发明装置的具体实施方式做详细描述。Based on the same inventive concept, the present invention also provides a device for waking up a voice assistant. The specific implementation of the device of the present invention will be described in detail below with reference to the accompanying drawings.

如图4所示,一种唤醒语音助手的装置,包括:As shown in Figure 4, a device for waking up a voice assistant includes:

人脸图像获取模块410,用于获取人脸图像;Facial image acquisition module 410, used to acquire facial images;

环境声音获取模块420,用于获取与人脸图像对应的环境声音;The ambient sound acquisition module 420 is used to acquire the ambient sound corresponding to the face image;

人脸图像检测模块430,用于检测所述人脸图像是否满足第一预设条件;A face image detection module 430, configured to detect whether the face image satisfies a first preset condition;

环境声音检测模块440,用于在所述人脸图像满足第一预设条件时,检测所述环境声音是否满足第二预设条件;An ambient sound detection module 440, configured to detect whether the ambient sound satisfies a second preset condition when the face image satisfies a first preset condition;

唤醒模块450,用于在所述环境声音满足第二预设条件时,唤醒语音助手,并将所述环境声音作为语音操作指令输入语音助手。The wakeup module 450 is configured to wake up the voice assistant when the ambient sound satisfies a second preset condition, and input the ambient sound as a voice operation instruction into the voice assistant.

人脸图像获取模块410可以根据摄像头获取人脸图像,环境声音获取模块420可以根据麦克风等获取环境声音。人脸图像获取模块410和环境声音获取模块420获取的摄像头图像和环境声音需进行存储,以便后续的条件检测。为了节省存储空间,可以设置人脸图像和环境声音的保留存储时间,以便一次操作完成后,无用的数据及时清除,腾出存储空间。The face image acquisition module 410 may acquire the face image through the camera, and the ambient sound acquisition module 420 may acquire the ambient sound through the microphone or the like. The camera images and ambient sounds acquired by the face image acquisition module 410 and the environmental sound acquisition module 420 need to be stored for subsequent condition detection. In order to save storage space, you can set the retention storage time of face images and environmental sounds, so that after an operation is completed, useless data will be cleared in time to free up storage space.

获取人脸图像和环境声音后,人脸图像检测模块430检测人脸图像是否满足第一预设条件,其中第一预设条件可以根据用户需要设置为各种具体形式。例如,如图5所示,所述人脸图像检测模块430可以包括:After acquiring the face image and ambient sound, the face image detection module 430 detects whether the face image satisfies a first preset condition, wherein the first preset condition can be set in various specific forms according to user needs. For example, as shown in Figure 5, the face image detection module 430 may include:

第一判断单元4301,用于判断人脸图像是否为正面人脸图像;The first judging unit 4301 is used to judge whether the face image is a frontal face image;

第二判断单元4302,用于在人脸图像时正面人脸图像时,判断人脸图像中的嘴部是否有动作;The second judging unit 4302 is used to judge whether the mouth in the face image is moving when the face image is a frontal face image;

判定单元4303,用于在嘴部有动作时,判定所述人脸图像满足第一预设条件。The judging unit 4303 is configured to judge that the face image satisfies the first preset condition when the mouth moves.

第一判断单元4301判断人脸图像是否为正面人脸图像的方式有很多种,例如,如图6所示,所述第一判断单元4301可以包括:There are many ways for the first judging unit 4301 to judge whether the face image is a frontal face image, for example, as shown in Figure 6, the first judging unit 4301 may include:

双眼距离获取单元43011,用于获取人脸图像中双眼的距离;The binocular distance acquisition unit 43011 is used to acquire the binocular distance in the face image;

双眼距离判断单元43012,用于判断双眼的距离是否在预设范围内;A binocular distance judging unit 43012, used to judge whether the binocular distance is within a preset range;

人脸图像判定单元43013,用于在双眼的距离在预设范围内时,判定人脸图像是正面人脸图像,否则判定人脸图像不是正面人脸图像。The human face image judging unit 43013 is configured to judge that the human face image is a frontal human face image when the distance between the eyes is within a preset range, otherwise it is judged that the human face image is not a frontal human face image.

第一判断单元4301不限制于确定人脸图像是正面人脸图像,也可以判断人脸图像是不是近似正面人脸图像。同时第一判断单元4301确定人脸图像是不是正面人脸图像也不仅仅限制于如图6所示的方式,还可以根据现有技术中其它方式实现。在判定人脸图像是正面人脸图像后,第二判断单元4302加入嘴部动作特征的判定,能够提高准确性,其中第二判断单元4302根据获取的人脸图像判断嘴部是否有动作可以根据现有技术中已有的方式实现。The first judging unit 4301 is not limited to determining whether the face image is a frontal face image, and may also judge whether the face image is an approximate frontal face image. At the same time, the first judging unit 4301 determines whether the face image is a frontal face image is not limited to the manner shown in FIG. 6 , and can also be implemented in other manners in the prior art. After judging that the face image is a frontal face image, the second judging unit 4302 adds the judgment of mouth movement features, which can improve the accuracy, wherein the second judging unit 4302 judges whether the mouth has movements according to the acquired face image. Existing mode realizes in the prior art.

环境声音检测模块440在人脸图像满足第一预设条件时,检测环境声音是否满足第二预设条件。需要说明的是,本发明并不对环境声音检测模块440和人脸图像检测模块430的执行顺序加以限定。环境声音检测模块440的实现方式有很多种,例如,如图7所示,所述环境声音检测模块440可以包括:The ambient sound detection module 440 detects whether the ambient sound satisfies the second preset condition when the face image satisfies the first preset condition. It should be noted that the present invention does not limit the execution sequence of the ambient sound detection module 440 and the face image detection module 430 . There are many ways to implement the environmental sound detection module 440. For example, as shown in FIG. 7, the environmental sound detection module 440 may include:

音量判断单元4401,用于判断所述环境声音的音量是否在预设范围内,其中音量可以根据现有技术中的音量检测仪等得到;A volume judging unit 4401, configured to judge whether the volume of the ambient sound is within a preset range, wherein the volume can be obtained according to a volume detector in the prior art;

距离判断单元4402,用于判断与所述环境声音的声源的距离是否小于预设阈值,其中距离判断单元4402可以根据声音在空气中的定性的衰减公式确定与声源的距离;A distance judging unit 4402, configured to judge whether the distance to the sound source of the ambient sound is less than a preset threshold, wherein the distance judging unit 4402 can determine the distance to the sound source according to a qualitative attenuation formula of sound in the air;

声音判定单元4403,用于在所述音量在预设范围内且与所述声源的距离小于预设阈值时,判定所述环境声音满足第二预设条件,否则人脸图像获取模块410和环境声音获取模块420重新获取的摄像头图像和环境声音,其中预设范围和预设阈值均可以根据经验自行设定。A sound determination unit 4403, configured to determine that the ambient sound satisfies a second preset condition when the volume is within a preset range and the distance from the sound source is less than a preset threshold, otherwise the face image acquisition module 410 and The camera image and ambient sound re-acquired by the ambient sound acquisition module 420, wherein the preset range and the preset threshold can be set according to experience.

唤醒模块450在环境声音满足第二预设条件时,确定用户需要对智能设备进行语音输入,唤醒语音助手,并同时将环境声音作为语音操作指令输入语音助手,语音助手直接执行相应的操作,免去了特定语音唤醒语音助手的步骤,用户操作简单。When the ambient sound meets the second preset condition, the wake-up module 450 determines that the user needs to perform voice input to the smart device, wakes up the voice assistant, and at the same time inputs the ambient sound as a voice operation command into the voice assistant, and the voice assistant directly executes the corresponding operation without Go to the steps of waking up the voice assistant with a specific voice, and the user operation is simple.

以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above-mentioned embodiments can be combined arbitrarily. To make the description concise, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, should be considered as within the scope of this specification.

以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation modes of the present invention, and the descriptions thereof are relatively specific and detailed, but should not be construed as limiting the patent scope of the invention. It should be pointed out that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention, and these all belong to the protection scope of the present invention. Therefore, the protection scope of the patent for the present invention should be based on the appended claims.

Claims (10)

Translated fromChinese
1.一种唤醒语音助手的方法,其特征在于,包括步骤:1. A method for waking up a voice assistant, comprising steps:获取人脸图像以及对应的环境声音;Obtain the face image and the corresponding ambient sound;检测所述人脸图像是否满足第一预设条件;Detecting whether the face image satisfies a first preset condition;若所述人脸图像满足第一预设条件,检测所述环境声音是否满足第二预设条件;If the face image satisfies the first preset condition, detecting whether the ambient sound satisfies the second preset condition;若所述环境声音满足第二预设条件,则唤醒语音助手,并将所述环境声音作为语音操作指令输入语音助手。If the ambient sound satisfies the second preset condition, wake up the voice assistant, and input the ambient sound as a voice operation instruction into the voice assistant.2.根据权利要求1所述的唤醒语音助手的方法,其特征在于,检测所述人脸图像是否满足第一预设条件的步骤包括:2. The method for waking up a voice assistant according to claim 1, wherein the step of detecting whether the face image satisfies a first preset condition comprises:判断人脸图像是否为正面人脸图像;Determine whether the face image is a frontal face image;若人脸图像为正面人脸图像,判断人脸图像中的嘴部是否有动作,若人脸图像不是正面人脸图像,则返回获取人脸图像以及对应的环境声音的步骤;If the face image is a frontal face image, it is judged whether the mouth in the face image is moving, if the face image is not a frontal face image, then return to the step of obtaining the face image and the corresponding ambient sound;若有动作,则判定所述人脸图像满足第一预设条件,否则返回获取人脸图像以及对应的环境声音的步骤。If there is an action, it is determined that the face image satisfies the first preset condition; otherwise, return to the step of acquiring the face image and the corresponding ambient sound.3.根据权利要求2所述的唤醒语音助手的方法,其特征在于,判断人脸图像是否为正面人脸图像的步骤包括:3. The method for waking up a voice assistant according to claim 2, wherein the step of judging whether the face image is a frontal face image comprises:获取人脸图像中双眼的距离;Obtain the distance between the eyes in the face image;判断双眼的距离是否在预设范围内;Determine whether the distance between the eyes is within the preset range;若是,则判定人脸图像是正面人脸图像,否则判定人脸图像不是正面人脸图像。If so, it is determined that the human face image is a frontal human face image, otherwise it is determined that the human face image is not a frontal human face image.4.根据权利要求1所述的唤醒语音助手的方法,其特征在于,检测所述环境声音是否满足第二预设条件的步骤包括:4. The method for waking up a voice assistant according to claim 1, wherein the step of detecting whether the ambient sound satisfies a second preset condition comprises:判断所述环境声音的音量是否在预设范围内;judging whether the volume of the ambient sound is within a preset range;判断与所述环境声音的声源的距离是否小于预设阈值;judging whether the distance from the sound source of the ambient sound is less than a preset threshold;若所述音量在预设范围内且与所述声源的距离小于预设阈值,则判定所述环境声音满足第二预设条件,否则返回获取人脸图像以及对应的环境声音的步骤。If the volume is within a preset range and the distance from the sound source is less than a preset threshold, then it is determined that the ambient sound meets a second preset condition, otherwise return to the step of acquiring a face image and corresponding ambient sound.5.根据权利要求4所述的唤醒语音助手的方法,其特征在于,与所述声源的距离根据声音在空气中的衰减公式确定。5. The method for waking up the voice assistant according to claim 4, wherein the distance from the sound source is determined according to an attenuation formula of sound in air.6.一种唤醒语音助手的装置,其特征在于,包括:6. A device for waking up a voice assistant, comprising:人脸图像获取模块,用于获取人脸图像;A human face image acquisition module is used to obtain a human face image;环境声音获取模块,用于获取与人脸图像对应的环境声音;The ambient sound acquisition module is used to acquire the ambient sound corresponding to the face image;人脸图像检测模块,用于检测所述人脸图像是否满足第一预设条件;A face image detection module, configured to detect whether the face image satisfies a first preset condition;环境声音检测模块,用于在所述人脸图像满足第一预设条件时,检测所述环境声音是否满足第二预设条件;An ambient sound detection module, configured to detect whether the ambient sound satisfies a second preset condition when the face image satisfies a first preset condition;唤醒模块,用于在所述环境声音满足第二预设条件时,唤醒语音助手,并将所述环境声音作为语音操作指令输入语音助手。A wake-up module, configured to wake up the voice assistant when the ambient sound satisfies a second preset condition, and input the ambient sound as a voice operation instruction into the voice assistant.7.根据权利要求6所述的唤醒语音助手的装置,其特征在于,所述人脸图像检测模块包括:7. The device for waking up a voice assistant according to claim 6, wherein the face image detection module comprises:第一判断单元,用于判断人脸图像是否为正面人脸图像;The first judging unit is used to judge whether the face image is a frontal face image;第二判断单元,用于在人脸图像是正面人脸图像时,判断人脸图像中的嘴部是否有动作;The second judging unit is used for judging whether the mouth in the face image is moving when the face image is a frontal face image;判定单元,用于在嘴部有动作时,判定所述人脸图像满足第一预设条件。The determination unit is configured to determine that the face image satisfies the first preset condition when the mouth moves.8.根据权利要求7所述的唤醒语音助手的装置,其特征在于,所述第一判断单元包括:8. The device for waking up a voice assistant according to claim 7, wherein the first judging unit comprises:双眼距离获取单元,用于获取人脸图像中双眼的距离;The binocular distance acquisition unit is used to acquire the distance between the binoculars in the face image;双眼距离判断单元,用于判断双眼的距离是否在预设范围内;A binocular distance judging unit, configured to judge whether the binocular distance is within a preset range;人脸图像判定单元,用于在双眼的距离在预设范围内时,判定人脸图像是正面人脸图像,否则判定人脸图像不是正面人脸图像。The human face image judging unit is used for judging that the human face image is a frontal human face image when the distance between the eyes is within a preset range, otherwise judging that the human face image is not a frontal human face image.9.根据权利要求6所述的唤醒语音助手的装置,其特征在于,所述环境声音检测模块包括:9. The device for waking up a voice assistant according to claim 6, wherein the ambient sound detection module comprises:音量判断单元,用于判断所述环境声音的音量是否在预设范围内;a volume judging unit, configured to judge whether the volume of the ambient sound is within a preset range;距离判断单元,用于判断与所述环境声音的声源的距离是否小于预设阈值;a distance judging unit, configured to judge whether the distance to the sound source of the ambient sound is less than a preset threshold;声音判定单元,用于在所述音量在预设范围内且与所述声源的距离小于预设阈值时,判定所述环境声音满足第二预设条件。A sound determination unit, configured to determine that the ambient sound satisfies a second preset condition when the volume is within a preset range and the distance from the sound source is less than a preset threshold.10.根据权利要求9所述的唤醒语音助手的装置,其特征在于,所述距离判断单元根据声音在空气中的衰减公式确定与所述声源的距离。10 . The device for waking up a voice assistant according to claim 9 , wherein the distance judging unit determines the distance to the sound source according to an attenuation formula of sound in air. 11 .
CN201510227622.2A2015-05-062015-05-06Method and device for waking up voice assistantPendingCN104820556A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201510227622.2ACN104820556A (en)2015-05-062015-05-06Method and device for waking up voice assistant

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201510227622.2ACN104820556A (en)2015-05-062015-05-06Method and device for waking up voice assistant

Publications (1)

Publication NumberPublication Date
CN104820556Atrue CN104820556A (en)2015-08-05

Family

ID=53730864

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201510227622.2APendingCN104820556A (en)2015-05-062015-05-06Method and device for waking up voice assistant

Country Status (1)

CountryLink
CN (1)CN104820556A (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105204628A (en)*2015-09-012015-12-30涂悦Voice control method based on visual awakening
CN105700363A (en)*2016-01-192016-06-22深圳创维-Rgb电子有限公司Method and system for waking up smart home equipment voice control device
CN105912092A (en)*2016-04-062016-08-31北京地平线机器人技术研发有限公司Voice waking up method and voice recognition device in man-machine interaction
WO2017035768A1 (en)*2015-09-012017-03-09涂悦Voice control method based on visual wake-up
CN106782524A (en)*2016-11-302017-05-31深圳讯飞互动电子有限公司One kind mixing awakening method and system
CN106847285A (en)*2017-03-312017-06-13上海思依暄机器人科技股份有限公司A kind of robot and its audio recognition method
CN107315561A (en)*2017-06-302017-11-03联想(北京)有限公司A kind of data processing method and electronic equipment
CN107517313A (en)*2017-08-222017-12-26珠海市魅族科技有限公司Awakening method and device, terminal and readable storage medium storing program for executing
CN107679506A (en)*2017-10-122018-02-09Tcl通力电子(惠州)有限公司Awakening method, intelligent artifact and the computer-readable recording medium of intelligent artifact
CN107678793A (en)*2017-09-142018-02-09珠海市魅族科技有限公司Voice assistant starts method and device, terminal and computer-readable recording medium
CN108055617A (en)*2017-12-122018-05-18广东小天才科技有限公司Microphone awakening method and device, terminal equipment and storage medium
CN108098767A (en)*2016-11-252018-06-01北京智能管家科技有限公司 Method and device for waking up a robot
CN108154878A (en)*2017-12-122018-06-12北京小米移动软件有限公司Control the method and device of monitoring device
CN108154140A (en)*2018-01-222018-06-12北京百度网讯科技有限公司Voice awakening method, device, equipment and computer-readable medium based on lip reading
CN108363557A (en)*2018-02-022018-08-03刘国华Man-machine interaction method, device, computer equipment and storage medium
CN109671426A (en)*2018-12-062019-04-23珠海格力电器股份有限公司Voice control method and device, storage medium and air conditioner
CN109710131A (en)*2018-12-282019-05-03联想(北京)有限公司A kind of information control method and device
CN109741738A (en)*2018-12-102019-05-10平安科技(深圳)有限公司 Voice control method, device, computer equipment and storage medium
CN109992237A (en)*2018-01-032019-07-09腾讯科技(深圳)有限公司 Intelligent voice device control method, device, computer equipment and storage medium
CN110164444A (en)*2018-02-122019-08-23优视科技有限公司Voice input starting method, apparatus and computer equipment
CN110188179A (en)*2019-05-302019-08-30浙江远传信息技术股份有限公司Speech-oriented identifies exchange method, device, equipment and medium
CN110277094A (en)*2018-03-142019-09-24阿里巴巴集团控股有限公司Awakening method, device and the electronic equipment of equipment
CN110941455A (en)*2019-11-272020-03-31北京声智科技有限公司Active wake-up method and device and electronic equipment
CN111243583A (en)*2019-12-312020-06-05深圳市瑞讯云技术有限公司System awakening method and device
CN111651135A (en)*2020-04-272020-09-11珠海格力电器股份有限公司Sound awakening method and device, storage medium and electrical equipment
WO2020187050A1 (en)*2019-03-152020-09-24海信视像科技股份有限公司Display device
US11158314B2 (en)2018-06-042021-10-26Pegatron CorporationVoice control device and method
CN114187904A (en)*2020-08-252022-03-15广州华凌制冷设备有限公司Similarity threshold acquisition method, voice household appliance and computer readable storage medium
WO2025091960A1 (en)*2023-10-312025-05-08华为技术有限公司Voice assistant interaction method and electronic device

Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP2000347692A (en)*1999-06-072000-12-15Sanyo Electric Co LtdPerson detecting method, person detecting device, and control system using it
EP1215658A2 (en)*2000-12-052002-06-19Hewlett-Packard CompanyVisual activation of voice controlled apparatus
CN102298443A (en)*2011-06-242011-12-28华南理工大学Smart home voice control system combined with video channel and control method thereof
CN102945672A (en)*2012-09-292013-02-27深圳市国华识别科技开发有限公司Voice control system for multimedia equipment, and voice control method
CN103472994A (en)*2013-09-062013-12-25乐得科技有限公司Operation control achieving method, device and system based on voice
US20140222436A1 (en)*2013-02-072014-08-07Apple Inc.Voice trigger for a digital assistant
CN104078041A (en)*2014-06-262014-10-01美的集团股份有限公司Voice recognition method and system
CN104103274A (en)*2013-04-112014-10-15纬创资通股份有限公司Speech processing apparatus and speech processing method
CN104428832A (en)*2012-07-092015-03-18Lg电子株式会社Speech recognition apparatus and method
CN104423992A (en)*2013-09-032015-03-18冠捷投资有限公司Starting method for voice recognition of display

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP2000347692A (en)*1999-06-072000-12-15Sanyo Electric Co LtdPerson detecting method, person detecting device, and control system using it
EP1215658A2 (en)*2000-12-052002-06-19Hewlett-Packard CompanyVisual activation of voice controlled apparatus
US6970824B2 (en)*2000-12-052005-11-29Hewlett-Packard Development Company, L.P.Enabling voice control of voice-controlled apparatus using a head mounted camera system
CN102298443A (en)*2011-06-242011-12-28华南理工大学Smart home voice control system combined with video channel and control method thereof
CN104428832A (en)*2012-07-092015-03-18Lg电子株式会社Speech recognition apparatus and method
CN102945672A (en)*2012-09-292013-02-27深圳市国华识别科技开发有限公司Voice control system for multimedia equipment, and voice control method
US20140222436A1 (en)*2013-02-072014-08-07Apple Inc.Voice trigger for a digital assistant
CN104103274A (en)*2013-04-112014-10-15纬创资通股份有限公司Speech processing apparatus and speech processing method
CN104423992A (en)*2013-09-032015-03-18冠捷投资有限公司Starting method for voice recognition of display
CN103472994A (en)*2013-09-062013-12-25乐得科技有限公司Operation control achieving method, device and system based on voice
CN104078041A (en)*2014-06-262014-10-01美的集团股份有限公司Voice recognition method and system

Cited By (46)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105204628A (en)*2015-09-012015-12-30涂悦Voice control method based on visual awakening
WO2017035768A1 (en)*2015-09-012017-03-09涂悦Voice control method based on visual wake-up
CN105700363A (en)*2016-01-192016-06-22深圳创维-Rgb电子有限公司Method and system for waking up smart home equipment voice control device
CN105700363B (en)*2016-01-192018-10-26深圳创维-Rgb电子有限公司A kind of awakening method and system of smart home device phonetic controller
CN105912092A (en)*2016-04-062016-08-31北京地平线机器人技术研发有限公司Voice waking up method and voice recognition device in man-machine interaction
CN108098767A (en)*2016-11-252018-06-01北京智能管家科技有限公司 Method and device for waking up a robot
CN106782524A (en)*2016-11-302017-05-31深圳讯飞互动电子有限公司One kind mixing awakening method and system
CN106847285A (en)*2017-03-312017-06-13上海思依暄机器人科技股份有限公司A kind of robot and its audio recognition method
CN106847285B (en)*2017-03-312020-05-05上海思依暄机器人科技股份有限公司Robot and voice recognition method thereof
CN107315561A (en)*2017-06-302017-11-03联想(北京)有限公司A kind of data processing method and electronic equipment
CN107517313A (en)*2017-08-222017-12-26珠海市魅族科技有限公司Awakening method and device, terminal and readable storage medium storing program for executing
CN107678793A (en)*2017-09-142018-02-09珠海市魅族科技有限公司Voice assistant starts method and device, terminal and computer-readable recording medium
CN107679506A (en)*2017-10-122018-02-09Tcl通力电子(惠州)有限公司Awakening method, intelligent artifact and the computer-readable recording medium of intelligent artifact
CN108055617B (en)*2017-12-122020-12-15广东小天才科技有限公司 A wake-up method, device, terminal device and storage medium for a microphone
CN108154878A (en)*2017-12-122018-06-12北京小米移动软件有限公司Control the method and device of monitoring device
CN108055617A (en)*2017-12-122018-05-18广东小天才科技有限公司Microphone awakening method and device, terminal equipment and storage medium
CN114860187B (en)*2018-01-032025-03-18腾讯科技(深圳)有限公司 Intelligent voice device control method, device, computer device and storage medium
CN114860187A (en)*2018-01-032022-08-05腾讯科技(深圳)有限公司Intelligent voice equipment control method and device, computer equipment and storage medium
CN109992237B (en)*2018-01-032022-04-22腾讯科技(深圳)有限公司Intelligent voice equipment control method and device, computer equipment and storage medium
CN109992237A (en)*2018-01-032019-07-09腾讯科技(深圳)有限公司 Intelligent voice device control method, device, computer equipment and storage medium
CN108154140A (en)*2018-01-222018-06-12北京百度网讯科技有限公司Voice awakening method, device, equipment and computer-readable medium based on lip reading
US20190228212A1 (en)*2018-01-222019-07-25Beijing Baidu Netcom Science And Technology Co., Ltd.Wakeup method, apparatus and device based on lip reading, and computer readable medium
JP2019128938A (en)*2018-01-222019-08-01ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッドLip reading based voice wakeup method, apparatus, arrangement and computer readable medium
US10810413B2 (en)2018-01-222020-10-20Beijing Baidu Netcom Science And Technology Co., Ltd.Wakeup method, apparatus and device based on lip reading, and computer readable medium
WO2019149160A1 (en)*2018-02-022019-08-08刘国华Human-machine interaction method and device, computer apparatus, and storage medium
CN108363557A (en)*2018-02-022018-08-03刘国华Man-machine interaction method, device, computer equipment and storage medium
US11483657B2 (en)2018-02-022022-10-25Guohua LiuHuman-machine interaction method and device, computer apparatus, and storage medium
JP7066877B2 (en)2018-02-022022-05-13國華 劉 Human-machine interaction methods, devices, computer devices and storage media
JP2021513123A (en)*2018-02-022021-05-20劉 國華LIU, Guohua Human-machine interaction methods, devices, computer devices and storage media
CN108363557B (en)*2018-02-022020-06-12刘国华Human-computer interaction method and device, computer equipment and storage medium
CN110164444A (en)*2018-02-122019-08-23优视科技有限公司Voice input starting method, apparatus and computer equipment
CN110277094A (en)*2018-03-142019-09-24阿里巴巴集团控股有限公司Awakening method, device and the electronic equipment of equipment
US11158314B2 (en)2018-06-042021-10-26Pegatron CorporationVoice control device and method
CN109671426A (en)*2018-12-062019-04-23珠海格力电器股份有限公司Voice control method and device, storage medium and air conditioner
CN109741738A (en)*2018-12-102019-05-10平安科技(深圳)有限公司 Voice control method, device, computer equipment and storage medium
CN109710131A (en)*2018-12-282019-05-03联想(北京)有限公司A kind of information control method and device
WO2020187050A1 (en)*2019-03-152020-09-24海信视像科技股份有限公司Display device
CN110188179A (en)*2019-05-302019-08-30浙江远传信息技术股份有限公司Speech-oriented identifies exchange method, device, equipment and medium
CN110941455B (en)*2019-11-272024-02-20北京声智科技有限公司Active wake-up method and device and electronic equipment
CN110941455A (en)*2019-11-272020-03-31北京声智科技有限公司Active wake-up method and device and electronic equipment
CN111243583B (en)*2019-12-312023-03-10深圳市瑞讯云技术有限公司System awakening method and device
CN111243583A (en)*2019-12-312020-06-05深圳市瑞讯云技术有限公司System awakening method and device
CN111651135A (en)*2020-04-272020-09-11珠海格力电器股份有限公司Sound awakening method and device, storage medium and electrical equipment
CN111651135B (en)*2020-04-272021-05-25珠海格力电器股份有限公司Sound awakening method and device, storage medium and electrical equipment
CN114187904A (en)*2020-08-252022-03-15广州华凌制冷设备有限公司Similarity threshold acquisition method, voice household appliance and computer readable storage medium
WO2025091960A1 (en)*2023-10-312025-05-08华为技术有限公司Voice assistant interaction method and electronic device

Similar Documents

PublicationPublication DateTitle
CN104820556A (en)Method and device for waking up voice assistant
US10643621B2 (en)Speech recognition using electronic device and server
CN108509119B (en) Operation method of electronic device for function execution and electronic device supporting same
KR102293063B1 (en) Customizable wake-up voice commands
KR102414122B1 (en)Electronic device for processing user utterance and method for operation thereof
CN110199350B (en) Method for sensing the end of speech and electronic device implementing the method
KR102405793B1 (en)Method for recognizing voice signal and electronic device supporting the same
CN108121490B (en)Electronic device, method and server for processing multi-mode input
CN108023934B (en) Electronic device and control method thereof
KR102398649B1 (en)Electronic device for processing user utterance and method for operation thereof
KR102815504B1 (en)Method and electronic device for providing contents
US8452597B2 (en)Systems and methods for continual speech recognition and detection in mobile computing devices
US10217477B2 (en)Electronic device and speech recognition method thereof
EP3567584A1 (en)Electronic apparatus and method for operating same
CN104580699B (en)Acoustic control intelligent terminal method and device when a kind of standby
KR20180083587A (en)Electronic device and operating method thereof
CN108829235A (en)Voice data processing method and the electronic equipment for supporting this method
KR102414173B1 (en)Speech recognition using Electronic Device and Server
KR20180109624A (en)Method for operating speech recognition service and electronic device supporting the same
KR20180109625A (en)Method for operating speech recognition service and electronic device supporting the same
US11620995B2 (en)Voice interaction processing method and apparatus
CN106412312A (en)Method and system for automatically awakening camera shooting function of intelligent terminal, and intelligent terminal
CN107526522A (en)Blank screen gesture identification method and device, and mobile terminal, storage medium
CN105718019A (en)Information processing method and electronic device
CN109949815B (en) Electronic Devices

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
EXSBDecision made by sipo to initiate substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20150805

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp