CN109346075A

Movatterモバイル変換

Info

Publication number: CN109346075A
Application number: CN201811199154.2A
Authority: CN
Inventors: 林金锋; 仇存收
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-10-15
Filing date: 2018-10-15
Publication date: 2019-02-15

Abstract

The embodiment of the present invention provide it is a kind of by human body vibration identify user speech with the system of controlling electronic devices, comprising: human body vibration sensor, for incuding the human body vibration of user；Processing circuit is coupled with the human body vibration sensor, for when the output signal for determining the human body vibration sensor includes user voice signal, control pick up facility to start pickup；Communication module is coupled with processing circuit and the pick up facility, for the communication between the processing circuit and the pick up facility.Another embodiment of the present invention provide it is a kind of by human body vibration identify user speech in the method for controlling electronic devices, comprising: detection human body vibration；When determining that the human body vibration includes vibration caused by user speaks, control pick up facility starts pickup.

Description

Translated fromChinese

通过人体振动识别用户语音以控制电子设备的方法和系统Method and system for recognizing user's voice through human body vibration to control electronic equipment

技术领域technical field

本发明涉及通信技术领域，特别是一种通过人体振动识别用户语音的方法和设备。The present invention relates to the field of communication technology, in particular to a method and device for recognizing user voice through human body vibration.

背景技术Background technique

随着人工智能(Artificial Intelligence,AI)技术的进步，语音控制技术在手机、平板电脑等消费电子产品上的应用越来越广泛。现在市面上有许多的语音助手产品，例如苹果公司的Siri，谷歌公司的Google Assistant，微软公司的微软小冰等。这些语音助手产品安装在手机、平板电脑等终端设备中，或者安装在智能音箱、机器人等智能产品中，通过识别用户的语音指令来执行相应的操作，极大的方便了用户的使用。With the advancement of artificial intelligence (AI) technology, voice control technology is more and more widely used in consumer electronic products such as mobile phones and tablet computers. There are many voice assistant products on the market now, such as Apple's Siri, Google's Google Assistant, and Microsoft's Microsoft Xiaoice. These voice assistant products are installed in terminal devices such as mobile phones and tablet computers, or in smart products such as smart speakers and robots, and perform corresponding operations by recognizing the user's voice commands, which greatly facilitates the use of users.

然而，现有的语音助手在使用中存在不能很好的区分语音指令来源的问题。例如，在附近如果有另一个人说出语音指令，有可能导致语音助手的误触发。此外，为了能够监听用户的语音，以做到随时响应，安装了语音助手的设备不得不将麦克风长期开启，造成了功耗的上升。However, the existing voice assistants have the problem of not being able to distinguish the source of the voice command well in use. For example, if there is another person nearby speaking a voice command, it may lead to false triggering of the voice assistant. In addition, in order to be able to monitor the user's voice and respond at any time, devices with voice assistants have to turn on the microphone for a long time, resulting in an increase in power consumption.

发明内容SUMMARY OF THE INVENTION

本发明的实施例提供一种通过人体振动识别用户语音以控制电子设备的方法和系统，以减少语音指令的误触发，并降低功耗。Embodiments of the present invention provide a method and system for recognizing a user's voice through human body vibration to control an electronic device, so as to reduce false triggering of voice commands and reduce power consumption.

根据本发明的第一方面，提供一种通过人体振动识别用户语音以控制电子设备的系统，包括：According to a first aspect of the present invention, there is provided a system for recognizing a user's voice through human body vibration to control an electronic device, comprising:

人体振动传感器,用于感应用户的人体振动；Human body vibration sensor, used to sense the user's human body vibration;

处理电路，与所述人体振动传感器相耦合，用于当确定所述人体振动传感器的输出信号包括用户语音信号时，控制拾音设备开始拾音；a processing circuit, coupled with the human body vibration sensor, for controlling the sound pickup device to start picking up sound when it is determined that the output signal of the human body vibration sensor includes a user voice signal;

通信模块，与处理电路和所述拾音设备相耦合，用于所述处理电路和所述拾音设备之间的通信。A communication module, coupled with the processing circuit and the sound pickup device, is used for communication between the processing circuit and the sound pickup device.

可选的，所述系统还包括放大器，所述放大器的输入端与所述人体振动传感器的输出端相耦合，用于放大所述人体振动传感器的输出信号。Optionally, the system further includes an amplifier, the input end of which is coupled with the output end of the human body vibration sensor, and is used for amplifying the output signal of the human body vibration sensor.

可选的，所述系统包括高通滤波器，所述高通滤波器的输入端与所述人体振动传感器的输出端相耦合，用于滤除所述人体振动传感器的输出信号的低频成分。Optionally, the system includes a high-pass filter, the input end of the high-pass filter is coupled with the output end of the human body vibration sensor, and is used for filtering out low frequency components of the output signal of the human body vibration sensor.

可选的，所述系统包括历史缓存器，所述历史缓存器与所述人体振动传感器相耦合，用于当前时刻之前一定时间内的所述人体振动传感器的输出信号。Optionally, the system includes a history buffer, which is coupled to the human body vibration sensor and is used for the output signal of the human body vibration sensor within a certain time before the current moment.

可选的，所述处理电路包括语音分析器，所述语音分析器用于确定所述人体振动传感器的输出信号是否包括用户语音信号。Optionally, the processing circuit includes a voice analyzer, and the voice analyzer is configured to determine whether the output signal of the human body vibration sensor includes a user voice signal.

可选的，所述语音分析器通过分析所述人体振动传感器的输出信号的包络确定所述人体振动传感器的输出信号是否包括用户语音信号；或者，Optionally, the voice analyzer determines whether the output signal of the human body vibration sensor includes a user voice signal by analyzing the envelope of the output signal of the human body vibration sensor; or,

所述语音分析器通过分析所述人体振动传感器的输出信号的频谱确定所述人体振动传感器的输出信号是否包括用户语音信号。The voice analyzer determines whether the output signal of the human body vibration sensor includes a user voice signal by analyzing the frequency spectrum of the output signal of the human body vibration sensor.

可选的，所述系统包括环境噪声检测器，用于检测所述人体振动传感器的输出信号中的环境噪声。Optionally, the system includes an environmental noise detector for detecting environmental noise in the output signal of the human body vibration sensor.

可选的，所述系统包括虚警过滤器，所述虚警过滤器的输入端与所述人体振动传感器相耦合，输出端与所述语音分析器相耦合，用于过滤所述人体振动传感器的输出信号中来自人体内部的噪声。Optionally, the system includes a false alarm filter, the input end of the false alarm filter is coupled with the human body vibration sensor, and the output end is coupled with the voice analyzer for filtering the human body vibration sensor. The noise from the inside of the human body in the output signal.

可选的，所述处理电路还包括训练模块，所述训练模块与所述语音分析器相耦合，用于训练所述语音分析器使用的语音分析模型。Optionally, the processing circuit further includes a training module, which is coupled to the speech analyzer and is used for training a speech analysis model used by the speech analyzer.

可选的，所述语音分析器包括运行在所述处理电路上的软件模块或者所述处理电路中的硬件模块。Optionally, the speech analyzer includes a software module running on the processing circuit or a hardware module in the processing circuit.

可选的所述训练模块包括行在所述处理电路上的软件模块或者所述处理电路中的硬件模块。Optionally, the training module includes a software module running on the processing circuit or a hardware module in the processing circuit.

可选的，所述系统还包括存储器，所述存储器与所述训练模块和所述语音分析器相耦合，用于存储所述训练模块生成的语音分析模型。Optionally, the system further includes a memory, which is coupled with the training module and the speech analyzer, and is used for storing the speech analysis model generated by the training module.

可选的，所述人体振动传感器包括骨振动传感器。Optionally, the human body vibration sensor includes a bone vibration sensor.

根据本发明的另一方面，提供一种通过人体振动识别用户语音以控制电子设备的方法，包括：According to another aspect of the present invention, there is provided a method for recognizing a user's voice through human body vibration to control an electronic device, comprising:

检测人体振动；Detect human vibration;

确定所述人体振动包括用户说话引起的振动时，控制拾音设备开始拾音。When it is determined that the human body vibration includes the vibration caused by the user's speech, the sound pickup device is controlled to start sound pickup.

可选的，所述拾音设备平时是关闭的，所述方法还包括：Optionally, the sound pickup device is usually turned off, and the method further includes:

确定所述人体振动未包括用户说话引起的振动时，维持所述拾音设备关闭。When it is determined that the human body vibration does not include the vibration caused by the user's speech, the sound pickup device is kept off.

可选的，在所述检测人体振动之后，所述确定所述人体振动包括用户说话引起的振动时，控制拾音设备开始拾音之前，所述方法还包括：Optionally, after the human body vibration is detected, when the human body vibration is determined to include vibration caused by the user speaking, and before the sound pickup device is controlled to start sound pickup, the method further includes:

滤除所述人体振动中的噪声。The noise in the human body vibration is filtered out.

可选的，所述人体振动包括骨振动。Optionally, the human body vibration includes bone vibration.

根据本发明实施例提供的系统和方法，语音识别系统可以只在用户说话时才开启拾音，这样降低了语音识别系统误识别他人语音指令的情形，进一步的，拾音设备可以只在用户说话时开启，降低了系统的功耗。According to the system and method provided by the embodiments of the present invention, the voice recognition system can only start picking up sounds when the user speaks, which reduces the situation that the voice recognition system misrecognizes other people's voice commands. When turned on, the power consumption of the system is reduced.

附图说明Description of drawings

图1A是本发明一个实施例的具体形态图；FIG. 1A is a specific morphological diagram of an embodiment of the present invention;

图1B是本发明一个实施例的结构示意图；1B is a schematic structural diagram of an embodiment of the present invention;

图2A是麦克风拾取的声音信号的视域图和频谱图；FIG. 2A is a view field diagram and a spectrogram of a sound signal picked up by a microphone;

图2B是骨传导传感器拾取的声音信号的视域图和频谱图；FIG. 2B is a view field diagram and a frequency spectrum diagram of a sound signal picked up by a bone conduction sensor;

图3是本发明另一个实施例的结构示意图；3 is a schematic structural diagram of another embodiment of the present invention;

图4是骨传导传感器检测到的信号经过降噪处理前后的对比图；Figure 4 is a comparison diagram of the signal detected by the bone conduction sensor before and after noise reduction processing;

图5是人类语音的一个频谱图；Fig. 5 is a spectrogram of human speech;

图6是人类咀嚼噪声的一个时域图；Fig. 6 is a time domain diagram of human chewing noise;

图7是本发明的一个方法实施例的流程图。Figure 7 is a flow diagram of a method embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的实施例进行详细说明。The embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

如图1B所示，本发明的一个实施例提供一种识别用户语音的系统，该系统可以位于同一个设备内，例如耳机、助听器或者其他专用的设备；也可以分布在不同设备上，例如一部分在耳机上，而另一部分在手机、智能音箱等电子设备上，本发明实施例对此不作限定。该设备包括：As shown in FIG. 1B , an embodiment of the present invention provides a system for recognizing a user's voice, which can be located in the same device, such as earphones, hearing aids or other dedicated devices; it can also be distributed on different devices, such as a part of It is on the earphone, and the other part is on an electronic device such as a mobile phone and a smart speaker, which is not limited in this embodiment of the present invention. The equipment includes:

人体振动传感器110(bone sensor),用于感应用户的人体振动；The human body vibration sensor 110 (bone sensor) is used for sensing the human body vibration of the user;

处理电路120，与所述人体振动传感器110相耦合，用于当确定所述人体振动传感器的输出信号包括用户语音信号时，控制拾音设备开始拾音；The processing circuit 120, coupled with the human body vibration sensor 110, is configured to control the sound pickup device to start sound pickup when it is determined that the output signal of the human body vibration sensor includes a user voice signal;

通信模块130，与处理电路120和拾音设备140相耦合，用于处理电路120和拾音设备140之间的通信；The communication module 130, coupled with the processing circuit 120 and the sound pickup device 140, is used for communication between the processing circuit 120 and the sound pickup device 140;

拾音设备140，用于拾取声音。The sound pickup device 140 is used to pick up sound.

如图1A所示，该识别用户语音的系统可以与电子设备200(如手机、平板电脑、智能音箱、机器人)配合使用，所述电子设备200上安装有语音识别系统，所述语音识别系统可以识别用户的语音，并可以进行进一步的处理，例如作为语音指令控制所述电子设备200执行相应的操作。As shown in FIG. 1A , the system for recognizing a user's voice can be used in conjunction with an electronic device 200 (such as a mobile phone, a tablet computer, a smart speaker, a robot) on which a voice recognition system is installed, and the voice recognition system can The user's voice is recognized, and further processing can be performed, for example, as a voice command to control the electronic device 200 to perform corresponding operations.

人体振动传感器110是用于感应人体振动的传感器。有很多种传感器可以充当人体振动传感器110，例如骨传导传感器。骨传导传感器是一种感应骨头的振动，并将该振动转换为电信号、光信号或者其他信号的装置。作为一种介质，骨头和空气一样可以传播声波，声波在骨头中传播时会引起骨头的振动。骨传导传感器有很多种，本发明实施例可以选用已有的骨传导传感器，例如Sonion公司的13x2型传感器，优选的，可以选择采样带宽2kHz或以上的骨传导传感器，灵敏度可以达到-34dB。骨传导传感器可以安装在靠近人体骨骼的部位，如图1A所示，可以安装在耳机的耳机头上，用户佩戴耳机时，耳机头伸入耳道，传感器就可以检测到从耳道传来的骨振动。当然，传感器也可以安装在其他部位，本发明实施例对此不作限定。在实现本发明的过程中，发明人发现，由于声音在骨头中的传播速度要快于空气中的传播速度，因此骨传导传感器可以更早的感应到用户说话产生的振动，从而可以及时的判断出用户在说话，尽可能的减少麦克风开启的时延。The human body vibration sensor 110 is a sensor for sensing human body vibration. There are many kinds of sensors that can act as the human body vibration sensor 110, such as bone conduction sensors. A bone conduction sensor is a device that senses the vibrations of the bones and converts the vibrations into electrical, optical or other signals. As a medium, bone, like air, can transmit sound waves, and when sound waves travel through the bone, it causes the bone to vibrate. There are many types of bone conduction sensors. In the embodiment of the present invention, an existing bone conduction sensor, such as a 13x2 type sensor from Sonion, can be selected. Preferably, a bone conduction sensor with a sampling bandwidth of 2 kHz or more can be selected, and the sensitivity can reach -34dB. The bone conduction sensor can be installed near the human bones, as shown in Figure 1A, it can be installed on the headphone head of the earphone. When the user wears the earphone, the headphone head is inserted into the ear canal, and the sensor can detect the sound transmitted from the ear canal. Bone vibration. Of course, the sensor may also be installed in other positions, which is not limited in this embodiment of the present invention. In the process of realizing the present invention, the inventor found that since the speed of sound propagation in the bone is faster than that in the air, the bone conduction sensor can sense the vibration generated by the user's speech earlier, so as to judge in time. When the user is speaking, try to reduce the delay of turning on the microphone as much as possible.

同时除了非常强的环境噪声，一般环境噪声很难在骨头中产生强烈的振动，因此与一般的麦克风相比，骨振动传感器检测到的信号是更纯粹的人声信号，如图2A和图2B所示，图2A是一个普通的麦克风记录的声音信号，可以看出其中包含了大量的环境噪声，例如很多高频的成分，而图2B表示一个骨传导传感器记录的声音信号，可以看出与麦克风相比，骨传导传感器记录的信号要更为“纯净”，各种高频的噪声信号明显消失。因此，骨振动传感器更不容易受到环境噪声的干扰。At the same time, in addition to very strong environmental noise, it is difficult for general environmental noise to generate strong vibrations in bones. Therefore, compared with general microphones, the signal detected by the bone vibration sensor is a purer human voice signal, as shown in Figure 2A and Figure 2B Figure 2A is a sound signal recorded by an ordinary microphone, it can be seen that it contains a lot of environmental noise, such as many high-frequency components, while Figure 2B shows the sound signal recorded by a bone conduction sensor, it can be seen that the same Compared with the microphone, the signal recorded by the bone conduction sensor is more "pure", and various high-frequency noise signals obviously disappear. Therefore, the bone vibration sensor is less susceptible to interference from ambient noise.

当然，本发明的实施例中，人体振动传感器也可以是其他传感器，例如附着在人体皮肤上的加速度传感器，可感知皮肤的振动；或者连接在人体上的生物电传感器，例如各种电极，可以感知人体的生物电变化从而检测到人体振动引起的生物电变化。本发明的实施例对此不作限定。Of course, in the embodiment of the present invention, the human body vibration sensor can also be other sensors, such as an acceleration sensor attached to the human skin, which can sense the vibration of the skin; or a bioelectric sensor connected to the human body, such as various electrodes, which can Sensing the bioelectrical changes of the human body to detect the bioelectrical changes caused by the vibration of the human body. The embodiments of the present invention do not limit this.

处理电路120以是任何具有处理功能的电路，可以是中央处理单元(centralprocessing unit,CPU)、数字信号处理器(digital signal processor,DSP)或者专用的处理器。在一个实施例中，处理器为DSP，例如Dialog Semiconductor公司的DA14195型DSP。处理电路120可以集成在一块芯片上，也可以分散在若干块芯片上，还可以是完全分立的电路元件组合而成。处理电路120负责处理骨传导传感器检测到的信号，并控制整个系统。The processing circuit 120 may be any circuit having a processing function, and may be a central processing unit (CPU), a digital signal processor (DSP) or a dedicated processor. In one embodiment, the processor is a DSP, such as a DA14195 DSP from Dialog Semiconductor. The processing circuit 120 may be integrated on one chip, or may be dispersed on several chips, or may be a combination of completely discrete circuit elements. The processing circuit 120 is responsible for processing the signals detected by the bone conduction sensor and controlling the entire system.

通信模块130用于处理电路120与系统内和/或系统外的装置、特别是拾音设备140进行通信。例如，当拾音设备140位于另一设备上时(比如处理器120位于耳机上，而麦克风140位于手机上)，可以通过通信模块130在处理器和麦克风之间进行通信。通信模块130可以是有线通信模块，或者无线通信模块，例如无线保真(WiFi)模块、蓝牙(Bluetooth)模块、近场通信(near field communication,NFC)模块等，本发明实施例对此不作限定。The communication module 130 is used for the processing circuit 120 to communicate with devices within the system and/or outside the system, especially the pickup device 140 . For example, when the pickup device 140 is located on another device (eg, the processor 120 is located on a headset and the microphone 140 is located on a mobile phone), communication between the processor and the microphone can be performed through the communication module 130 . The communication module 130 may be a wired communication module or a wireless communication module, such as a wireless fidelity (WiFi) module, a Bluetooth (Bluetooth) module, a near field communication (near field communication, NFC) module, etc., which is not limited in this embodiment of the present invention .

在一些实施例中，上述通信模块130也可以是简单的导线，用于传输处理电路120与与系统内和/或系统外的装置之间的信号。In some embodiments, the above-mentioned communication module 130 may also be a simple wire for transmitting signals between the processing circuit 120 and devices within and/or outside the system.

拾音设备140是用于拾取声音的设备，例如麦克风、话筒等。The sound pickup device 140 is a device for picking up sound, such as a microphone, a microphone, and the like.

在一些实施例中，拾音设备也可以不是本发明提供的所述系统的一部分。In some embodiments, the pickup device may not be part of the system provided by the present invention.

在本发明的一些实施例中，人体振动传感器将检测到的人体振动信号传给处理电路，处理电路确定所述人体振动信号是用户说话引起的振动时，控制所述拾音设备拾音。通过这一流程，语音识别系统可以只在用户说话时才开启拾音，这样降低了语音识别系统误识别他人语音指令的情形，在一些实施例中，拾音设备只在用户说话时开启，降低了系统的功耗。In some embodiments of the present invention, the human body vibration sensor transmits the detected human body vibration signal to a processing circuit, and when the processing circuit determines that the human body vibration signal is vibration caused by the user speaking, the processing circuit controls the sound pickup device to pick up sound. Through this process, the voice recognition system can only turn on voice pickup when the user speaks, which reduces the situation that the voice recognition system misrecognizes other people's voice commands. In some embodiments, the voice pickup device is only turned on when the user speaks, reducing the power consumption of the system.

以下详细说明本发明实施例提供的系统如何实现对用户语音的判别。The following describes in detail how the system provided by the embodiment of the present invention realizes the discrimination of the user's voice.

如图3所示，在本发明的一些实施例中，人体振动传感器110具体为骨传导传感器110，骨传导传感器110与放大器310相耦合，放大器310与模数转换器320相耦合，模数转换器320与高通滤波器(high-pass filter)330相耦合，高通滤波器330与历史缓存器(history buffer)340相耦合，历史缓存器340的输出端与环境噪声检测器(ambient noisedetector)350相耦合，环境噪声检测器350与语音分析器(envelope detector)390相耦合。在一些实施例中，历史缓存器的输出端和自动增益控制器370相耦合，自动增益控制器370则与虚警过滤器(false alert filter)380相耦合，而虚警过滤器380则耦合到语音分析器390。As shown in FIG. 3, in some embodiments of the present invention, the human body vibration sensor 110 is specifically a bone conduction sensor 110, the bone conduction sensor 110 is coupled with an amplifier 310, the amplifier 310 is coupled with an analog-to-digital converter 320, and the analog-to-digital conversion The high-pass filter 320 is coupled to a high-pass filter 330, the high-pass filter 330 is coupled to a history buffer 340, and the output of the history buffer 340 is coupled to an ambient noise detector 350. Coupled, the ambient noise detector 350 is coupled with a speech analyzer (envelope detector) 390. In some embodiments, the output of the history buffer is coupled to an automatic gain controller 370, which is coupled to a false alert filter 380, which is coupled to Speech Analyzer 390.

下面结合信号流向说明系统对信号的处理。骨传导传感器110产生的信号经过放大器310的放大，传输给模数转换器320，模数转换器320将骨传导传感器110产生的模拟信号转换成数字信号，传输给高通滤波器330。高通滤波器330的功能是滤除直流信号和低频噪声，经过高通滤波器330的过滤，高频信号被提取出来，进入历史缓存器340。历史缓存器340的功能是缓存当前时刻之前若干时间内的信号，这样后续只需要处理这段时间内的信号就可以了。通过这一缓存，实际上起到了给上述信号分帧的作用，也就是说，信号被切分为若干小段，以这些小段为单位进行处理。在一些实施例中，历史缓存器340缓存之前2毫秒内的信号，发明人在实现本发明的过程中发现，这一取值能较好的保证麦克风的及时启动，例如可以在用户开始说话后50毫秒内启动麦克风。The following describes the processing of the signal by the system in conjunction with the signal flow. The signal generated by the bone conduction sensor 110 is amplified by the amplifier 310 and transmitted to the analog-to-digital converter 320 . The function of the high-pass filter 330 is to filter out DC signals and low-frequency noises. After filtering by the high-pass filter 330 , the high-frequency signals are extracted and entered into the history buffer 340 . The function of the history buffer 340 is to buffer the signals within a certain period of time before the current moment, so that only the signals within this period of time need to be processed subsequently. Through this buffering, the above-mentioned signal is actually divided into frames, that is, the signal is divided into several small segments, and the processing is performed in units of these small segments. In some embodiments, the history buffer 340 buffers the signals within the previous 2 milliseconds. The inventor found in the process of implementing the present invention that this value can better ensure the timely activation of the microphone, for example, after the user starts to speak Start the microphone within 50ms.

在一些实施例中，历史缓存器340输出的信号进入环境噪声检测器350。环境噪声检测器350用于检测上述信号中的环境噪声，并滤除该环境噪声。当然，环境噪声检测器350也可以不滤除该环境噪声，而只是把环境噪声信号的信息传输给后续处理。这里，环境噪声是指周边环境里的噪声，例如用户身边他人说话的声音、其他物体发出的声音等。通常来说，骨传导的振动多是人体内部振动源引起的，例如说话、心跳、咀嚼、走路等，外界环境的噪声在音量较低的情况下，不易在人体骨骼中引起可检测的振动，但是当外界环境非常嘈杂时(例如外界噪声达到80dB)，外界环境的噪声也可能在骨骼中引起可检测的振动，从而对语音信号造成干扰。In some embodiments, the signal output from the history buffer 340 enters the ambient noise detector 350 . The environmental noise detector 350 is used to detect the environmental noise in the above-mentioned signal, and filter the environmental noise. Of course, the environmental noise detector 350 may not filter out the environmental noise, but only transmit the information of the environmental noise signal to subsequent processing. Here, the ambient noise refers to the noise in the surrounding environment, such as the voice of others around the user, the voice of other objects, and the like. Generally speaking, the vibration of bone conduction is mostly caused by the internal vibration source of the human body, such as speaking, heartbeat, chewing, walking, etc. The noise of the external environment is not easy to cause detectable vibration in the human bones when the volume is low. However, when the external environment is very noisy (for example, the external noise reaches 80 dB), the noise of the external environment may also cause detectable vibrations in the bones, thereby causing interference to the speech signal.

环境噪声检测器350的降噪处理方法有很多，传统技术中有很多给麦克风降噪的技术，例如双麦克风降噪，在该技术中，两个设置得离开一定距离的麦克风分别拾取声音，其中一个靠近语音的声源，另一个远离语音的声源，这样前者拾取的语音信号较多，而后者拾取的噪声较多，两相对比，就可以滤除噪声。在本发明的一些实施例中，可以在远离骨传导传感器110的位置设置用于拾取噪声的麦克风，或者用于拾取噪声的骨传导传感器，这样就可以运用双麦克风降噪的技术滤除环境噪声。There are many noise reduction processing methods for the environmental noise detector 350. There are many noise reduction technologies for microphones in the traditional technology, such as dual-microphone noise reduction. One is close to the sound source of the speech, and the other is far away from the speech, so that the former picks up more speech signals, while the latter picks up more noise, and the noise can be filtered out by comparing the two. In some embodiments of the present invention, a microphone for picking up noise, or a bone conduction sensor for picking up noise may be provided at a position far from the bone conduction sensor 110, so that the dual-microphone noise reduction technology can be used to filter out environmental noise .

另一种较低成本的实现方式，则是通过分析环境噪声的特性，滤除环境噪声。例如，检测环境噪声的频谱、强度、持续时间、平稳噪声还是非平稳噪声。通常而言，人说话的基频大约在500-1000Hz的范围内，而倍频上限大约在3000Hz，更高的倍频对于语音处理而言是不必要的，可以视为噪声。而在持续时间上，环境噪声通常是持续存在的，即，如果持续开启骨传导传感器，则可以持续不断接收到环境噪声信号，而相比之下，用户说话产生的语音信号则是间断的，因为用户不会总是在说话。可选的，还可以使用环境噪声的模型进行模式识别，从信号中分离出噪声。也可以通过把语音信号之外的所有信号都视为环境噪声的方式来分离出噪声信号。在一些实施例中，所述系统还包括门限选择器355，其与环境噪声检测器350和训练模块360相耦合，用于选择判定噪声的“门限”，例如判定一个信号为噪声的模型参数的临界值。Another low-cost implementation method is to filter out the environmental noise by analyzing the characteristics of the environmental noise. For example, detecting the spectrum, intensity, duration, stationary or non-stationary noise of ambient noise. Generally speaking, the fundamental frequency of human speech is in the range of about 500-1000Hz, and the upper limit of the multiplier is about 3000Hz. Higher multipliers are unnecessary for speech processing and can be regarded as noise. In terms of duration, the ambient noise usually persists, that is, if the bone conduction sensor is continuously turned on, the ambient noise signal can be continuously received, while the speech signal generated by the user's speech is intermittent. Because users won't always be talking. Optionally, a model of environmental noise can also be used for pattern recognition to separate the noise from the signal. It is also possible to separate out the noise signal by treating all signals other than the speech signal as ambient noise. In some embodiments, the system further includes a threshold selector 355, coupled to the ambient noise detector 350 and the training module 360, for selecting a "threshold" for determining noise, such as a model parameter for determining a signal as noise critical value.

在本发明的一个具体实施例中，经过了环境噪声检测器350的降噪处理后，环境噪声的信号大为削弱，如图4所示，该图中第一行为降噪前的信号时域图，第三行为降噪前的信号频谱图，第二行为降噪后的信号时域图，第四行为降噪后的信号频谱图，两个矩形框内的信号为用户说话的时段录下的信号，可以看出，经过降噪处理，两次说话之间的噪声信号被去除，而说话期间的噪声信号也大为削弱，语音信号相对增强。In a specific embodiment of the present invention, after the noise reduction processing by the environmental noise detector 350, the signal of the environmental noise is greatly weakened. As shown in FIG. 4, the first row in the figure is the time domain of the signal before noise reduction. Figure, the third row is the signal spectrum before noise reduction, the second row is the time domain diagram of the signal after noise reduction, the fourth row is the signal spectrum after noise reduction, the signals in the two rectangular boxes are recorded for the period when the user speaks It can be seen that after noise reduction processing, the noise signal between two speeches is removed, and the noise signal during speech is also greatly weakened, and the speech signal is relatively enhanced.

在一些实施例中，语音分析器390用于分析上述信号的包络，从而辨识出该信号是否为用户的语音。如本领域技术人员所知，人类语音的波形包络和噪声是不同的，通常来说，不同人说话的语音波形包络也是不同的。语音分析器390可以根据信号的包络识别该信号是语音还是噪声。In some embodiments, the speech analyzer 390 is used to analyze the envelope of the above-mentioned signal, so as to identify whether the signal is the user's speech. As known to those skilled in the art, the waveform envelope and noise of human speech are different, and generally speaking, the speech waveform envelope of different people's speech is also different. The speech analyzer 390 can identify whether the signal is speech or noise based on the envelope of the signal.

在另一些实施例中，语音分析器390可以分析上述信号的频谱，从而辨识出该信号是否为用户的语音。如本领域技术人员所知，人类语音的频谱和噪声的频谱是不同的，如图5所示，人类语言的元音和浊辅音在频域上会有特定的共振峰，通过识别这些共振峰的模式，就可以判别上述信号属于人类语音还是噪声。In other embodiments, the speech analyzer 390 may analyze the frequency spectrum of the above-mentioned signal, so as to identify whether the signal is the user's speech. As known to those skilled in the art, the spectrum of human speech is different from that of noise. As shown in Figure 5, vowels and voiced consonants of human language have specific formants in the frequency domain. By identifying these formants mode, you can determine whether the above signal belongs to human speech or noise.

识别信号是否属于语音信号或者识别该语音主人的方法有很多，在一些实施例中，可以使用用户训练模型来对信号进行模式匹配，判定信号是否属于用户语音。模式匹配可以使用各种语音识别领域的技术，例如隐马尔科夫模型等，在此不赘述。There are many ways to identify whether a signal belongs to a voice signal or to identify the owner of the voice. In some embodiments, a user training model can be used to perform pattern matching on the signal to determine whether the signal belongs to the user's voice. Pattern matching can use various technologies in the field of speech recognition, such as hidden Markov models, etc., which will not be described here.

如前所述，当语音分析器390确定骨振动传感器传来的信号包括用户语音时，所述处理电路120生成控制所述拾音设备140开始拾音的控制信号。As mentioned above, when the voice analyzer 390 determines that the signal from the bone vibration sensor includes the user's voice, the processing circuit 120 generates a control signal for controlling the sound pickup device 140 to start picking up sound.

在某些实施例中，历史缓存器340输出的信号进入虚警过滤器380，虚警过滤器380用于过滤来自人体内部的噪声，例如由咀嚼、行走等产生的噪声。之后该信号进入语音分析器390。语音分析器390的处理前面已经描述，此处不再赘述。In some embodiments, the signal output from the history buffer 340 enters a false alarm filter 380, which is used to filter noise from inside the human body, such as noise generated by chewing, walking, and the like. The signal then enters speech analyzer 390. The processing of the speech analyzer 390 has been described above and will not be repeated here.

对人体内部噪声信号的过滤可以基于频谱、强度、周期等特点来进行。例如咀嚼的信号特征如图6所示，据此可以建立咀嚼信号的模型，通过模式匹配来识别咀嚼噪声。The filtering of the noise signal inside the human body can be carried out based on the characteristics of spectrum, intensity, period and so on. For example, the signal characteristics of chewing are shown in Fig. 6, according to which the model of the chewing signal can be established, and the chewing noise can be identified by pattern matching.

咀嚼噪声的特点和滤除方法，可以参考“Mastication noise reduction methodfor fully implantable hearing aid using piezo-electric sensor”(Sung Dae Na等人，发表于Technology and Health Care 25(2017)S29–S34)等文献。For the characteristics and filtering methods of chewing noise, please refer to "Mastication noise reduction method for fully implantable hearing aid using piezo-electric sensor" (Sung Dae Na et al., published in Technology and Health Care 25 (2017) S29–S34) and other documents.

如图3所示，在虚警过滤器380之前，可以设置自动增益控制器(automatic gaincontroller，AGC)370，AGC 370的功能是对信号的强度进行调整，把不同强度的信号归一化到一个标准的强度。由于用户说话的音量、骨传导传感器与人体接触的位置等的不同，会导致骨传导传感器输出的信号强度不同，为了方便后续处理，可以对其强度进行归一化处理。As shown in FIG. 3 , before the false alarm filter 380, an automatic gain controller (AGC) 370 can be set. The function of the AGC 370 is to adjust the strength of the signal and normalize the signals of different strengths to one Standard strength. Due to the difference in the volume of the user's speech, the position where the bone conduction sensor contacts the human body, etc., the signal intensity output by the bone conduction sensor will be different. In order to facilitate subsequent processing, the intensity can be normalized.

作为一种更优的实施例，如图3所示，可以在系统中设置训练模块360，该模块与环境噪声检测器350和/或语音分析器390相耦合，用于对环境噪声和/或用户语音的模型进行训练。通过训练，可以提升上述模型的准确性，从而提高对环境噪声和/或用户语音的判断精度。可选的，训练模块还可以与虚警过滤器380相耦合，用于训练人体内部噪声的模型。对模型的训练可以使用机器学习的方法，例如通过神经网络进行训练。As a more preferred embodiment, as shown in FIG. 3, a training module 360 may be set in the system, and the module is coupled with the environmental noise detector 350 and/or the speech analyzer 390, and is used for detecting environmental noise and/or The model of the user's speech is trained. Through training, the accuracy of the above model can be improved, thereby improving the judgment accuracy of environmental noise and/or user speech. Optionally, the training module can also be coupled with the false alarm filter 380 for training a model of the internal noise of the human body. The training of the model can use machine learning methods, such as training through neural networks.

可选的，上述系统还包括存储器365，与所述训练模块360和所述语音分析器390相耦合，用于存储训练生成的模型。Optionally, the above-mentioned system further includes a memory 365, which is coupled with the training module 360 and the speech analyzer 390, and is used for storing the model generated by training.

本领域技术人员可以理解，上述系统的各个部分并不一定是必需的。例如，如果骨传导传感器110的输出信号强度能够满足后续信号处理的需要，则放大器310就不是必需的。如果对环境噪声的滤除要求不高(如前所述，除了非常嘈杂的环境，一般骨传导传感器难以感应到外界环境噪声)，则环境噪声检测器350也不是必需的。Those skilled in the art will appreciate that the various parts of the above-described systems are not necessarily required. For example, if the output signal strength of the bone conduction sensor 110 can meet the needs of subsequent signal processing, the amplifier 310 is not necessary. If the requirement for filtering out environmental noise is not high (as mentioned above, except for a very noisy environment, it is generally difficult for a bone conduction sensor to sense external environmental noise), then the environmental noise detector 350 is not necessary.

上述系统中的放大器、模数转换器、高通滤波器、历史缓存器、环境噪声检测器、门限选择、自动增益控制器、虚警过滤器、训练模块和语音分析器可以是各自分立的器件，也可以是部分或者全部的集成在一块或者若干块芯片中。所述环境噪声检测器、门限选择、自动增益控制器、虚警过滤器、训练模块和语音分析器，分别可以是硬件电路或者运行在处理电路120上的软件模块。在一个具体的实施例中，上述模数转换器、高通滤波器、历史缓存器、环境噪声检测器、门限选择、自动增益控制器、虚警过滤器、训练模块和语音分析器都位于一个DSP内。可选的，上述环境噪声检测器、门限选择、自动增益控制器、虚警过滤器、训练模块和语音分析器均为运行在该DSP上的软件模块。The amplifier, analog-to-digital converter, high-pass filter, history buffer, environmental noise detector, threshold selection, automatic gain controller, false alarm filter, training module and speech analyzer in the above system may be separate devices, It can also be partially or completely integrated in one or several chips. The environmental noise detector, threshold selection, automatic gain controller, false alarm filter, training module and speech analyzer may be hardware circuits or software modules running on the processing circuit 120, respectively. In a specific embodiment, the above-mentioned analog-to-digital converter, high-pass filter, history buffer, ambient noise detector, threshold selection, automatic gain controller, false alarm filter, training module and speech analyzer are all located in one DSP Inside. Optionally, the above-mentioned environmental noise detector, threshold selection, automatic gain controller, false alarm filter, training module and speech analyzer are all software modules running on the DSP.

如图7所示，本发明的另一个实施例提供一种识别用户语音以控制电子设备的方法，包括：As shown in FIG. 7, another embodiment of the present invention provides a method for recognizing a user's voice to control an electronic device, including:

710、检测人体振动；710. Detect human body vibration;

检测人体振动的方法在前面已经述及，此处不再赘述。在某些实施例中，通过骨振动传感器检测骨振动。The method for detecting human body vibration has been described above, and will not be repeated here. In some embodiments, bone vibration is detected by a bone vibration sensor.

750、确定所述人体振动包括用户说话引起的振动时，控制拾音设备开始拾音。750. When it is determined that the human body vibration includes the vibration caused by the user's speech, control the sound pickup device to start sound pickup.

在一些实施例中，所述拾音设备平时是关闭的，所述方法还包括：In some embodiments, the pickup device is usually turned off, and the method further includes:

760、确定所述人体振动未包括用户说话引起的振动时，维持所述拾音设备关闭。760. When it is determined that the human body vibration does not include the vibration caused by the user's speech, keep the sound pickup device turned off.

确定所述人体振动是否包括用户说话引起的振动的具体方法，在前面描述语音分析器390时已有详述，在此不再赘述。The specific method for determining whether the human body vibration includes the vibration caused by the user's speech has been described in detail when describing the speech analyzer 390, and will not be repeated here.

如前所述，在某些实施例中，所述710之后、750之前，所述方法还包括：As mentioned above, in some embodiments, after the 710 and before the 750, the method further includes:

720、滤除所述人体振动中的噪声；720. Filter out the noise in the human body vibration;

滤除噪声的方法在前面已经述及，此处不再赘述。The method for filtering out noise has been described above, and will not be repeated here.

另外，以上实施例中分别说明的各技术、系统、装置、方法以及各实施例中分别说明的技术特征可以进行组合，从而形成不脱离本发明的精神和原则之内的其他的模块，方法，装置，系统及技术，这些根据本发明实施例的记载组合而成的模块，方法，装置，系统及技术均在本发明的保护范围之内。In addition, the technologies, systems, devices, and methods described in the above embodiments and the technical features described in the embodiments can be combined to form other modules and methods that do not depart from the spirit and principles of the present invention, The devices, systems, and technologies, and the modules, methods, devices, systems, and technologies that are combined according to the descriptions of the embodiments of the present invention are all within the protection scope of the present invention.

显然，本领域的技术人员应该明白，上述的本发明的各单元或各步骤可以用通用的计算装置来实现，它们可以集中在单个的计算装置上，或者分布在多个计算装置所组成的网络上，可选地，它们可以用计算装置可执行的程序代码来实现，从而，可以将它们存储在存储装置中由计算装置来执行。或者将它们分别制作成各个电路模块，或者将它们中的多个单元或步骤制作成单个电路模块来实现。这样，本发明不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that the above-mentioned units or steps of the present invention can be implemented by a general-purpose computing device, and they can be centralized on a single computing device or distributed in a network composed of multiple computing devices Above, they can optionally be implemented in program code executable by a computing device, whereby they can be stored in a storage device and executed by the computing device. Either make them into individual circuit modules, or make multiple units or steps in them into a single circuit module. As such, the present invention is not limited to any particular combination of hardware and software.

以上只是本发明的较佳实施例，并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所作的任何修改、等同替换、改进等，均包含在本发明的保护范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.