CN118264946A

Movatterモバイル変換

Info

Publication number: CN118264946A
Application number: CN202211693374.7A
Authority: CN
Inventors: 杜博仁; 张嘉仁; 曾凯盟
Original assignee: Acer Inc
Current assignee: Acer Inc
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2024-06-28

Abstract

Translated fromChinese

本发明提供一种声音信号处理方法及移动装置。在方法中，根据内建麦克风在多个收音方向上所接收的多个第一声音信号，决定那些收音方向中的目标方向及对应于目标方向的目标距离。根据目标方向及目标距离自多个盲信号分离算法中选择目标算法。设定内建麦克风在目标方向上所接收的第一声音信号为目标算法的次要信号且外接麦克风所接收的第二声音信号为目标算法的主要信号，并通过目标算法自主要信号及次要信号中分离出主要声音来源的声音信号。由此，可让麦克风路径仅输出主要声音来源的单一声音信号。

The present invention provides a sound signal processing method and a mobile device. In the method, a target direction in the sound receiving directions and a target distance corresponding to the target direction are determined based on a plurality of first sound signals received by a built-in microphone in a plurality of sound receiving directions. A target algorithm is selected from a plurality of blind signal separation algorithms based on the target direction and the target distance. The first sound signal received by the built-in microphone in the target direction is set as a secondary signal of the target algorithm and the second sound signal received by the external microphone is set as a primary signal of the target algorithm, and the sound signal of the main sound source is separated from the primary signal and the secondary signal by the target algorithm. In this way, the microphone path can output only a single sound signal of the main sound source.

Description

Translated fromChinese

声音信号处理方法及移动装置Sound signal processing method and mobile device

技术领域Technical Field

本发明涉及一种信号处理技术，尤其是，还涉及一种声音信号处理方法及移动装置。The present invention relates to a signal processing technology, and in particular, also relates to a sound signal processing method and a mobile device.

背景技术Background technique

一般笔记本电脑针对会议应用的麦克风的传输路径上有一些降低噪音的机制。例如，单一麦克风的降低稳态噪音技术，或阵列麦克风的波束成形(Beamforming)技术调整束波收音方向(为了避免使用者移动方向，束波的角度不能太窄)。甚至，利用后端人工智能(Artificial Intelligence，AI)降噪技术以保留人声信号。Generally, laptop computers have some noise reduction mechanisms in the transmission path of microphones for conference applications. For example, a single microphone can reduce steady-state noise, or an array microphone can adjust the beam receiving direction (to avoid the user's movement, the beam angle cannot be too narrow). Even back-end artificial intelligence (AI) noise reduction technology can be used to retain the human voice signal.

例如，图1A及图1B是一范例说明基于AI降噪处理的三维麦克风阵列的示意图。请参照图1A及图1B分别是设置两个及三个麦克风mic的笔记本电脑。通过增加麦克风mic可增加波束的指向性，并有助于降低其他人的声音信号。For example, FIG. 1A and FIG. 1B are schematic diagrams of an example of a three-dimensional microphone array based on AI noise reduction processing. Please refer to FIG. 1A and FIG. 1B for laptops with two and three microphones, respectively. By adding microphones, the directivity of the beam can be increased and it helps to reduce the sound signals of other people.

在实际应用中，当使用者附近有其他人在说话时，其他人的声音信号往往不会被滤除，甚至跟随用户的声音信号被麦克风路径传送出去。此外，当使用者移动而不完全在阵列麦克风所对应的方向时，接收到的声音信号也会被影响。In actual applications, when there are other people talking near the user, the voice signals of other people are often not filtered out, and may even be transmitted along the user's voice signal by the microphone path. In addition, when the user moves and is not completely in the direction corresponding to the array microphone, the received voice signal will also be affected.

另一方面，在会议中，多数用户会使用外接麦克风(例如，耳机麦克风)。然而，部分外接麦克风是全向性，导致录到周遭的声音信号，进而影响降噪效果。On the other hand, in meetings, most users will use external microphones (e.g., headset microphones). However, some external microphones are omnidirectional, which causes the recording of surrounding sound signals, thereby affecting the noise reduction effect.

发明内容Summary of the invention

本发明是针对一种声音信号处理方法及移动装置，利用盲信号分离(BlindSignal Separation，BSS)技术提升降噪效果。The present invention is directed to a sound signal processing method and a mobile device, which utilizes blind signal separation (BSS) technology to improve noise reduction effect.

根据本发明的实施例，声音信号处理方法适用于移动装置及外接麦克风，移动装置通讯连接于外接麦克风，且移动装置包括内建麦克风。这声音信号处理方法包括(但不仅限于)下列步骤：根据内建麦克风在多个收音方向上所接收的多个第一声音信号，决定该那些收音方向中的目标方向及对应于目标方向的目标距离。主要声音来源位于目标方向上并与内建麦克风相距目标距离，目标方向的决定是基于那些第一声音信号与外接麦克风所接收的第二声音信号之间的相关性，且目标距离的决定是基于目标方向上的第一声音信号的信号功率。根据目标方向及目标距离自多个盲信号分离(Blind Signal Separation，BSS)算法中选择目标算法。这目标算法的决定是基于目标方向与干扰源声音方向之间的夹角及目标距离的大小，且干扰源声音方向对应于干扰声音来源。设定内建麦克风在目标方向上所接收的第一声音信号为目标算法的次要信号且外接麦克风所接收的第二声音信号为目标算法的主要信号，并通过目标算法自主要信号及次要信号中分离出主要声音来源的声音信号。According to an embodiment of the present invention, a sound signal processing method is applicable to a mobile device and an external microphone, wherein the mobile device is communicatively connected to the external microphone, and the mobile device includes a built-in microphone. The sound signal processing method includes (but is not limited to) the following steps: determining a target direction and a target distance corresponding to the target direction in the sound receiving directions according to a plurality of first sound signals received by the built-in microphone in a plurality of sound receiving directions. The main sound source is located in the target direction and is at a target distance from the built-in microphone, the target direction is determined based on the correlation between the first sound signals and the second sound signal received by the external microphone, and the target distance is determined based on the signal power of the first sound signal in the target direction. A target algorithm is selected from a plurality of blind signal separation (BSS) algorithms according to the target direction and the target distance. The target algorithm is determined based on the angle between the target direction and the direction of the interference source sound and the size of the target distance, and the interference source sound direction corresponds to the interference sound source. The first sound signal received by the built-in microphone in the target direction is set as a secondary signal of the target algorithm and the second sound signal received by the external microphone is set as a primary signal of the target algorithm, and the sound signal of the main sound source is separated from the primary signal and the secondary signal by the target algorithm.

根据本发明的实施例，移动装置包括(但不仅限于)内建麦克风、通讯收发器及处理器。内建麦克风用以收音。通讯收发器通讯连接于外接麦克风，并用以接收来自外接麦克风的信号。处理器耦接内建麦克风及通讯收发器。处理器经配置用以执行：根据内建麦克风在多个收音方向上所接收的多个第一声音信号决定该那些收音方向中的目标方向及对应于目标方向的目标距离，根据目标方向及目标距离自多个盲信号分离算法中选择目标算法，设定内建麦克风在目标方向上所接收的第一声音信号为目标算法的次要信号且外接麦克风所接收的第二声音信号为目标算法的主要信号，并通过目标算法自主要信号及次要信号中分离出主要声音来源的声音信号。主要声音来源位于目标方向上并与内建麦克风相距目标距离，目标方向的决定是基于那些第一声音信号与外接麦克风所接收的第二声音信号之间的相关性，且目标距离的决定是基于目标方向上的第一声音信号的信号功率。这目标算法的决定是基于目标方向与干扰源声音方向之间的夹角及目标距离的大小，且干扰源声音方向对应于干扰声音来源。According to an embodiment of the present invention, a mobile device includes (but is not limited to) a built-in microphone, a communication transceiver, and a processor. The built-in microphone is used for receiving sound. The communication transceiver is communicatively connected to an external microphone and is used to receive a signal from the external microphone. The processor is coupled to the built-in microphone and the communication transceiver. The processor is configured to perform: determining a target direction and a target distance corresponding to the target direction in the plurality of sound receiving directions according to a plurality of first sound signals received by the built-in microphone in the plurality of sound receiving directions, selecting a target algorithm from a plurality of blind signal separation algorithms according to the target direction and the target distance, setting the first sound signal received by the built-in microphone in the target direction as a secondary signal of the target algorithm and the second sound signal received by the external microphone as a primary signal of the target algorithm, and separating a sound signal of a main sound source from the primary signal and the secondary signal by the target algorithm. The main sound source is located in the target direction and is at a target distance from the built-in microphone, the target direction is determined based on the correlation between the first sound signals and the second sound signal received by the external microphone, and the target distance is determined based on the signal power of the first sound signal in the target direction. The determination of the target algorithm is based on the angle between the target direction and the interference source sound direction and the size of the target distance, and the interference source sound direction corresponds to the source of the interference sound.

基于上述，根据本发明实施例的声音信号处理方法及移动装置，可根据主要声音来源的位置利用对应的目标算法自混和信号(例如，第一声音信号及第二声音信号)中分离出主要声音来源的声音信号。由此，当用户使用外接收麦克风时，可让麦克风路径上只传出主要使用者单一的人声信号。Based on the above, the sound signal processing method and mobile device according to the embodiment of the present invention can separate the sound signal of the main sound source from the mixed signal (e.g., the first sound signal and the second sound signal) using the corresponding target algorithm according to the location of the main sound source. Therefore, when the user uses an external receiving microphone, only a single human voice signal of the main user can be transmitted on the microphone path.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

包含附图以便进一步理解本发明，且附图并入本说明书中并构成本说明书的一部分。附图说明本发明的实施例，并与描述一起用于解释本发明的原理。The accompanying drawings are included to provide a further understanding of the present invention and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the present invention and together with the description serve to explain the principles of the present invention.

图1A及图1B是一范例说明基于AI降噪处理的三维麦克风阵列的示意图；FIG. 1A and FIG. 1B are schematic diagrams illustrating an example of a three-dimensional microphone array based on AI noise reduction processing;

图2是根据本发明一实施例的移动装置及外接麦克风的组件方块图；FIG2 is a block diagram of a mobile device and an external microphone according to an embodiment of the present invention;

图3是根据本发明一实施例的声音信号处理方法的流程图；3 is a flow chart of a sound signal processing method according to an embodiment of the present invention;

图4是根据本发明一实施例的定位主要声音来源的示意图；FIG4 is a schematic diagram of locating a main sound source according to an embodiment of the present invention;

图5是根据本发明一实施例说明盲信号分离的示意图；FIG5 is a schematic diagram illustrating blind signal separation according to an embodiment of the present invention;

图6A至图6D是根据本发明一实施例的稀疏(Sparse)成分分析的示意图。6A to 6D are schematic diagrams of sparse component analysis according to an embodiment of the present invention.

附图标号说明Description of Figure Numbers

mic:麦克风；mic:microphone;

10:移动装置；10: mobile device;

11:内建麦克风；11: Built-in microphone;

12:通讯收发器；12: Communication transceiver;

13:存储器；13: memory;

14:处理器；14: Processor;

15:外接麦克风；15: External microphone;

S310～S330:步骤；S310～S330: steps;

S1、S2:使用者；S1, S2: users;

θ1、θ2:收音方向；θ1, θ2: sound receiving direction;

v₁、v₂:第一声音信号；v₁ , v₂ : first sound signal;

R₁:第一相关性；R₁ : first correlation;

R₂:第二相关性；R₂ : second correlation;

X₁:第二声音信号；_X1 : second sound signal;

P_x、P_v:信号功率；P_x , P_v : signal power;

s₁、s₂、y₁、y₂:声音信号；s₁ , s₂ , y₁ , y₂ : sound signals;

A:空间转移函数矩阵；A: spatial transfer function matrix;

W:反向转移函数矩阵；W: inverse transfer function matrix;

x₁、x₂:混和信号；x₁ , x₂ : mixed signals;

E₁、E₂:时频域信号；E₁ , E₂ : time-frequency domain signals;

W1、W2:方向向量；W1, W2: direction vector;

t:时间。t: time.

具体实施方式Detailed ways

现将详细地参考本发明的示范性实施例，示范性实施例的实例说明于附图中。只要有可能，相同组件符号在附图和描述中用来表示相同或相似部分。Reference will now be made in detail to exemplary embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Whenever possible, the same reference numerals are used in the drawings and the description to refer to the same or like parts.

图2是根据本发明一实施例的移动装置10及外接麦克风15的组件方块图。请参照图2，移动装置10包括(但不仅限于)内建麦克风11、通讯收发器12、存储器13及处理器14。移动装置10可以是笔记本电脑、智能手机、平板计算机、台式计算机、智能电视、智能喇叭、智能助理、车载系统或其他电子装置。FIG2 is a block diagram of a mobile device 10 and an external microphone 15 according to an embodiment of the present invention. Referring to FIG2 , the mobile device 10 includes (but is not limited to) a built-in microphone 11, a communication transceiver 12, a memory 13, and a processor 14. The mobile device 10 may be a laptop, a smart phone, a tablet computer, a desktop computer, a smart TV, a smart speaker, a smart assistant, a car system, or other electronic devices.

内建麦克风11可以是动圈式(dynamic)、电容式(Condenser)、或驻极体电容(Electret Condenser)等类型的麦克风，内建麦克风11也可以是其他可接收声波(例如，人声、环境声、机器运作声等)(即，收音或录音)而转换为声音信号的电子组件、模拟至数字转换器、滤波器、及音频处理器的组合。内建麦克风11结合在移动装置10的机体。在一实施例中，两个或更多个内建麦克风11形成麦克风阵列，并据以提供指向性波束。在一实施例中，内建麦克风11用以对发话者收音/录音，以取得语音信号。在一些实施例中，这语音信号可能包括发话者的声音、扬声器(图未示)所发出的声音和/或其他环境音。The built-in microphone 11 can be a dynamic, condenser, or electret condenser microphone. The built-in microphone 11 can also be a combination of other electronic components that can receive sound waves (e.g., human voice, environmental sound, machine operation sound, etc.) (i.e., receive or record sound) and convert them into sound signals, analog-to-digital converters, filters, and audio processors. The built-in microphone 11 is combined with the body of the mobile device 10. In one embodiment, two or more built-in microphones 11 form a microphone array and provide a directional beam accordingly. In one embodiment, the built-in microphone 11 is used to receive/record the speaker to obtain a voice signal. In some embodiments, the voice signal may include the speaker's voice, the sound emitted by a speaker (not shown) and/or other ambient sounds.

通讯收发器12可以支持蓝牙、通用串行总线(Universal Serial Bus，USB)、光纤、S/PDIF、3.5mm或其他音频传输接口。在一实施例中，通讯收发器12用以接收来自外接麦克风15的(声音)信号。The communication transceiver 12 can support Bluetooth, Universal Serial Bus (USB), optical fiber, S/PDIF, 3.5mm or other audio transmission interfaces. In one embodiment, the communication transceiver 12 is used to receive (sound) signals from an external microphone 15 .

存储器13可以是任何型态的固定或可移动随机存取内存(Radom Access Memory，RAM)、只读存储器(Read Only Memory，ROM)、闪存(flash memory)、传统硬盘(Hard DiskDrive，HDD)、固态硬盘(Solid-State Drive，SSD)或类似组件。在一实施例中，存储器13用以存储程序代码、软件模块、组态配置、数据(例如，声音信号、算法参数等)或文件，并待后文详述其实施例。The memory 13 may be any type of fixed or removable random access memory (RAM), read only memory (ROM), flash memory, hard disk drive (HDD), solid-state drive (SSD) or similar components. In one embodiment, the memory 13 is used to store program code, software modules, configurations, data (e.g., sound signals, algorithm parameters, etc.) or files, and its embodiments will be described in detail later.

处理器14耦接内建麦克风11、通讯收发器12及存储器13。处理器14可以是中央处理单元(Central Processing Unit，CPU)、图形处理单元(Graphic Processing unit，GPU)，或是其他可程序化的一般用途或特殊用途的微处理器(Microprocessor)、数字信号处理器(Digital Signal Processor，DSP)、可程序化控制器、现场可程序化逻辑门阵列(Field Programmable Gate Array，FPGA)、特殊应用集成电路(Application-SpecificIntegrated Circuit，ASIC)、神经网络加速器或其他类似组件或上述组件的组合。在一实施例中，处理器14用以执行移动装置10的所有或部分作业，且可加载并执行存储器13所存储的各程序代码、软件模块、文件及数据。在一些实施例中，处理器14的功能可通过软件或芯片实现。The processor 14 is coupled to the built-in microphone 11, the communication transceiver 12 and the memory 13. The processor 14 can be a central processing unit (CPU), a graphics processing unit (GPU), or other programmable general-purpose or special-purpose microprocessor (Microprocessor), a digital signal processor (Digital Signal Processor, DSP), a programmable controller, a field programmable gate array (Field Programmable Gate Array, FPGA), an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a neural network accelerator or other similar components or a combination of the above components. In one embodiment, the processor 14 is used to execute all or part of the operations of the mobile device 10, and can load and execute various program codes, software modules, files and data stored in the memory 13. In some embodiments, the functions of the processor 14 can be implemented by software or chips.

外接麦克风15可以是动圈式、电容式、或驻极体电容等类型的麦克风，外接麦克风15也可以是其他可接收声波(例如，人声、环境声、机器运作声等)(即，收音或录音)而转换为声音信号的电子组件、模拟至数字转换器、滤波器、及音频处理器的组合。外接麦克风15可以是全向性或指向性。在一实施例中，外接麦克风15是耳机麦克风或穿戴式装置的麦克风。在一实施例中，外接麦克风15用以对发话者收音/录音，以取得语音信号。在一些实施例中，这语音信号可能包括发话者的声音、扬声器(图未示)所发出的声音和/或其他环境音。The external microphone 15 can be a dynamic, condenser, or electret condenser microphone. The external microphone 15 can also be a combination of other electronic components that can receive sound waves (e.g., human voice, environmental sound, machine operation sound, etc.) (i.e., receive or record sound) and convert them into sound signals, an analog-to-digital converter, a filter, and an audio processor. The external microphone 15 can be omnidirectional or directional. In one embodiment, the external microphone 15 is an earphone microphone or a microphone of a wearable device. In one embodiment, the external microphone 15 is used to receive/record the speaker to obtain a voice signal. In some embodiments, the voice signal may include the speaker's voice, the sound emitted by a speaker (not shown) and/or other ambient sounds.

下文中，将搭配移动装置10及外接麦克风15中的各组件及模块说明本发明实施例所述的方法。本方法的各个流程可依照实施情形而调整，且并不仅限于此。Hereinafter, the method described in the embodiment of the present invention will be described in conjunction with various components and modules in the mobile device 10 and the external microphone 15. Each process of the method can be adjusted according to the implementation situation, and is not limited thereto.

图3是根据本发明一实施例的声音信号处理方法的流程图。请参照图3，处理器14根据内建麦克风11在多个收音方向上所接收的多个第一声音信号决定该那些收音方向中的目标方向及对应于目标方向的目标距离(步骤S310)。具体而言，主要声音来源位于目标方向上并与内建麦克风11相距目标距离。主要声音来源可以是人、其他动物、机器或扬声器。例如，图4是根据本发明一实施例的定位主要声音来源的示意图。请参照图4，假设主要声音来源为移动装置10的用户S1，且使用者S1配戴/使用外接麦克风15。而另一位使用者S2未配戴/使用外接麦克风15。FIG3 is a flow chart of a sound signal processing method according to an embodiment of the present invention. Referring to FIG3 , the processor 14 determines a target direction and a target distance corresponding to the target direction in the sound receiving directions according to a plurality of first sound signals received by the built-in microphone 11 in a plurality of sound receiving directions (step S310). Specifically, the main sound source is located in the target direction and is at a target distance from the built-in microphone 11. The main sound source may be a person, other animals, a machine, or a speaker. For example, FIG4 is a schematic diagram of locating the main sound source according to an embodiment of the present invention. Referring to FIG4 , it is assumed that the main sound source is user S1 of the mobile device 10, and user S1 wears/uses an external microphone 15. The other user S2 does not wear/use an external microphone 15.

收音方向判定的方法有很多种。在一实施例中，处理器14可通过内建麦克风11形成多个收音方向(或指向角度)的波束。如图4所示的收音方向θ1、θ2的波束。内建麦克风11可根据波束成形(beamforming)技术形成波束。波束成形可通过调整相位阵列的基本单元的参数(例如，相位及振幅)，使得某些角度的信号获得相长干涉，而另一些角度的信号获得相消干涉。因此，不同参数将形成不同波束场型，且其主波束的收音方向可能不同。处理器14可预先定义或基于使用者输入操作生成多个收音方向。例如，-90°至90°之间每间隔10°作为收音方向。There are many methods for determining the sound receiving direction. In one embodiment, the processor 14 can form beams of multiple sound receiving directions (or pointing angles) through the built-in microphone 11. The beams of the sound receiving directions θ1 and θ2 shown in Figure 4. The built-in microphone 11 can form beams according to beamforming technology. Beamforming can adjust the parameters of the basic unit of the phase array (for example, phase and amplitude) so that signals at certain angles obtain constructive interference, while signals at other angles obtain destructive interference. Therefore, different parameters will form different beam fields, and the sound receiving directions of their main beams may be different. The processor 14 can predefine or generate multiple sound receiving directions based on user input operations. For example, every 10° interval between -90° and 90° is used as the sound receiving direction.

在一实施例中，目标方向的决定是基于那些第一声音信号与外接麦克风15所接收的第二声音信号之间的相关性。例如，处理器14分别对各第一声音信号与第二声音信号计算正交交叉相关。若某一个第一声音信号与第二声音信号之间的相关性最大，则处理器14将这第一声音信号对应的收音方向设为目标方向。In one embodiment, the target direction is determined based on the correlation between the first sound signals and the second sound signals received by the external microphone 15. For example, the processor 14 calculates the orthogonal cross correlation of each first sound signal and the second sound signal. If the correlation between a first sound signal and the second sound signal is the largest, the processor 14 sets the sound collection direction corresponding to the first sound signal as the target direction.

以图4为例，处理器14根据初始方向、顺序或随机自多个第一声音信号中挑选一个作为初始的评估信号。例如，收音方向θ1的第一声音信号v₁为评估信号。处理器14可比较那些第一声音信号中的候选信号与第二声音信号X₁之间的第一相关性R₁及那些第一声音信号中的评估信号(以收音方向θ2的第一声音信号v₂为例)与第二声音信号X₁之间的第二相关性R₂。反应于第一相关性R₁大于第二相关性R₂，处理器14可维持候选信号作为目标方向的候选者，并继续比对其他第一声音信号。直到所有第一声音信号都比对结束，则处理器14可将最后的候选信号对应的收音方向作为目标方向。Taking FIG. 4 as an example, the processor 14 selects one from a plurality of first sound signals as an initial evaluation signal according to an initial direction, sequence or randomness. For example, the first sound signal_v1 in the sound receiving direction θ1 is the evaluation signal. The processor 14 may compare the first correlation_R1 between the candidate signal in those first sound signals and the_second sound signal X1 and the second correlation_R2 between the evaluation signal in those first sound signals (taking the first sound signal_v2 in the sound receiving direction θ2 as an example) and the second sound signal_X1 . In response to the first correlation_R1 being greater than the second correlation_R2 , the processor 14 may maintain the candidate signal as a candidate for the target direction and continue to compare other first sound signals. Until all first sound signals are compared, the processor 14 may use the sound receiving direction corresponding to the last candidate signal as the target direction.

另一方面，反应于第一相关性R₁未大于第二相关性R₂，处理器14可将评估信号作为候选信号，以成为目标方向的(新)候选者。由此，可找出具有最大相关性的第一声音信号，并以其收音方向作为目标方向。On the other hand, in response to the first correlation_R1 not being greater than the second correlation_R2 , the processor 14 may use the evaluation signal as a candidate signal to become a (new) candidate of the target direction. Thus, the first sound signal with the maximum correlation may be found and its sound collection direction may be used as the target direction.

须说明的是，若有两个以上的最大相关性，则处理器14可根据差值法决定这些相关性对应的收音方向之间的目标方向。It should be noted that if there are more than two maximum correlations, the processor 14 may determine the target direction between the sound collection directions corresponding to these correlations according to the difference method.

在另一实施例中，可基于到达角(Angle Of Arrival，AOA或Degree Of Arrival，DOA)定位技术估测主要声音来源相对于移动装置10的方向。例如，处理器14可基于主要声音来源的声音信号分别到达两个内建麦克风11的两个声波的时间差及两个内建麦克风11之间的距离决定方向，并据以设定为目标方向。In another embodiment, the direction of the main sound source relative to the mobile device 10 may be estimated based on the angle of arrival (AOA or degree of arrival, DOA) positioning technology. For example, the processor 14 may determine the direction based on the time difference between two sound waves of the sound signal of the main sound source reaching the two built-in microphones 11 and the distance between the two built-in microphones 11, and set it as the target direction accordingly.

另一方面，目标距离的决定是基于目标方向上的第一声音信号的信号功率。若信号功率越强，则目标距离越近；若信号功率越低，则目标距离越远。例如，信号功率与目标距离的平方成反比，但仍可能受环境、接收器灵敏度等因素影响。On the other hand, the determination of the target distance is based on the signal power of the first sound signal in the target direction. If the signal power is stronger, the target distance is closer; if the signal power is lower, the target distance is farther. For example, the signal power is inversely proportional to the square of the target distance, but it may still be affected by factors such as the environment and receiver sensitivity.

以图4为例，假设处理器14已知主要声音来源与外接麦克风15之间的距离，则可将第二声音信号的信号功率P_x作为参考信号。处理器14可根据信号功率P_x及目标方向(以收音方向θ1为例)对应的第一声音信号(以第一声音信号v₁为例)的信号功率P_v之间的比例及信号功率与距离的对应关系(例如，路径损失、信号衰减等)，决定目标距离。Taking FIG. 4 as an example, assuming that the processor 14 knows the distance between the main sound source and the external microphone 15, the signal power_Px of the second sound signal can be used as a reference signal. The processor 14 can determine the target distance based on the ratio between the signal power_Px and the signal power_Pv of the first sound signal (taking the first sound signal_v1 as an example) corresponding to the target direction (taking the sound receiving direction θ1 as an example) and the corresponding relationship between the signal power and the distance (for example, path loss, signal attenuation, etc.).

又例如，信号功率与距离的对应关系已定义在对照表或转换公式并可供处理器14加载，以推估目标距离。For another example, the corresponding relationship between signal power and distance has been defined in a comparison table or a conversion formula and can be loaded by the processor 14 to estimate the target distance.

请参照图3，处理器14根据目标方向及目标距离自多个盲信号分离(Blind SignalSeparation，BSS)算法中选择目标算法(步骤S320)。具体而言，在实际应用情境中，经常会遇到多个声音来源同时出现的情况。“盲”是指接收多个声音来源的声音信号所形成的混合信号，而盲信号分离算法的目标中的一个包括在只有混合信号的情况下，分离出主要声音来源的声音信号。Referring to FIG. 3 , the processor 14 selects a target algorithm from a plurality of blind signal separation (BSS) algorithms according to the target direction and the target distance (step S320). Specifically, in actual application scenarios, multiple sound sources often appear at the same time. "Blind" refers to a mixed signal formed by receiving sound signals from multiple sound sources, and one of the goals of the blind signal separation algorithm includes separating the sound signal of the main sound source when there is only a mixed signal.

盲信号分离算法包括独立成分分析(Independent Component Analysis，ICA)算法及稀疏成分分析(Sparse Component Analysis，SCA)算法。Blind signal separation algorithms include Independent Component Analysis (ICA) algorithm and Sparse Component Analysis (SCA) algorithm.

独立成分分析是假设每个声音来源都是互相独立的，且这些声音来源的声音信号经过混合后不会影响声音信号的本质，因此通过估测得到的反向转移函数矩阵(即，分离矩阵)乘上混合信号，以得到分离后的声音信号。Independent component analysis assumes that each sound source is independent of each other and that the sound signals from these sound sources will not affect the nature of the sound signal after mixing. Therefore, the estimated inverse transfer function matrix (i.e., separation matrix) is multiplied by the mixed signal to obtain the separated sound signal.

例如，图5是根据本发明一实施例说明盲信号分离的示意图。请参照图5，两个声音来源的声音信号s₁、s₂经空间转移函数矩阵A得出混和信号x₁、x₂(假设混和信号x₁为主要信号，且混和信号x₂为次要信号)。假设外接麦克风15所接收的第二声音信号为混和信号x₁，且内建麦克风11所接收的第一声音信号为混和信号x₂。盲信号分离算法通过反向转移函数矩阵W，分离出两个声音来源的声音信号y₁、y₂。例如，声音信号y₁接近于声音信号s₁，且声音信号y₂接近于声音信号s₂。For example, FIG5 is a schematic diagram illustrating blind signal separation according to an embodiment of the present invention. Referring to FIG5, the sound signals_s1 and_s2 from two sound sources are obtained through the spatial transfer function matrix A to obtain mixed signals_x1 and_x2 (assuming that the mixed signal_x1 is the main signal and the mixed signal_x2 is the secondary signal). Assume that the second sound signal received by the external microphone 15 is the mixed signal_x1 , and the first sound signal received by the built-in microphone 11 is the mixed signal_x2 . The blind signal separation algorithm separates the sound signals_y1 and_y2 from the two sound sources through the inverse transfer function matrix W. For example, the sound signal_y1 is close to the sound signal_s1 , and the sound signal_y2 is close to the sound signal_s2 .

稀疏成分分析是假设声音来源的声音信号在某些域(domain)是很稀疏的。稀疏是指声音信号大部分的值都接近0，也就是混合信号中每一个成分点通常只有一个主要的声音来源存在。例如，声谱图(voicegram)(或称时频图(spectrogram))可视为语音频率成分随时间的变化，而不同人的语音信号会有不同的声音特征(例如，基频、倍频、说话速度或断句)，使得不同音源的声谱图交集是很少的(disjoint)。因此，混合信号的声谱图中的每个时频域单元都只来自于其中一个声音来源，即为稀疏(sparse)特性。Sparse component analysis assumes that the sound signal of the sound source is very sparse in some domains. Sparse means that most of the values of the sound signal are close to 0, that is, there is usually only one main sound source for each component point in the mixed signal. For example, a voicegram (or spectrogram) can be regarded as the change of speech frequency components over time, and the speech signals of different people have different sound characteristics (for example, fundamental frequency, frequency octave, speaking speed or sentence segmentation), so that the intersection of the spectrograms of different sound sources is very small (disjoint). Therefore, each time-frequency domain unit in the spectrogram of the mixed signal comes from only one of the sound sources, which is a sparse feature.

目标算法的决定是基于目标方向与干扰源声音方向之间的夹角及目标距离的大小，且干扰源声音方向对应于干扰声音来源。The determination of the target algorithm is based on the angle between the target direction and the interference source sound direction and the size of the target distance, and the interference source sound direction corresponds to the source of the interference sound.

根据语音的高斯分布特性(也就是，第一声音信号及第二声音信号会趋近于高斯分布)，用独立成分分析将语音信号初步分离，计算过程中定义的指定目标函数(即，目标算法)会根据主要声音来源相对于移动装置10的目标方向及目标距离来改变。According to the Gaussian distribution characteristics of speech (that is, the first sound signal and the second sound signal will tend to Gaussian distribution), independent component analysis is used to preliminarily separate the speech signals, and the specified objective function (that is, the target algorithm) defined in the calculation process will change according to the target direction and target distance of the main sound source relative to the mobile device 10.

负熵(Negentropy)是一种非高斯性度量方法。在信息理论中，随机变量的熵与信息有关。而负熵可定义成：Negentropy is a non-Gaussian measurement method. In information theory, the entropy of a random variable is related to information. Negentropy can be defined as:

J(y)＝H(y_gauss)-H(y)…(1)J(y)＝H(y_gauss )-H(y)…(1)

，其中y_gauss为符合高斯分布的随机变量，y为对应于主要信号及次要信号的随机变量，且, where y_gauss is a random variable that conforms to the Gaussian distribution, y is a random variable corresponding to the primary signal and the secondary signal, and

H(y)＝-∫p_y(τ)log{py(τ)}dτ…(2)。H(y)＝-∫py₍ τ)log{py(τ)}dτ…(2).

p_y(τ)为随机变量y的机率密度函数。而函数(1)可近似于：p_y (τ) is the probability density function of the random variable y. Function (1) can be approximated as:

J(y)≈[E{G(y)}-E{G(y_gauss)}]²…(3)，J(y)≈[E{G(y)}-E{G(y_gauss )}]² …(3),

其中E{}为期望函数，而参数G可选自参数G₁、G₂及G₃：Where E{} is the expected function, and the parameter G can be selected from parameters G₁ , G₂ and G₃ :

G₃(y)＝y⁴…(6)。G₃ (y) = y⁴ (6).

其中，a₁为常数。Among them,_a1 is a constant.

在一实施例中，处理器14可比较目标距离及距离临界值(例如，10厘米、15厘米或30厘米)。反应于目标距离未小于距离临界值，处理器14设定目标算法为使用参数G₁的第一独立成分分析算法。即，处理器14选择使用参数G₁的第一独立成分分析算法作为目标算法。由于一般使用情况下使用者通常不会太接近移动装置10，因此通常会采用参数G₁。而反应于目标距离小于距离临界值，处理器14设定目标算法为使用参数G₂的第二独立成分分析算法。即，处理器14选择使用参数G₂的第二独立成分分析算法作为目标算法，以获得较高稳定性。In one embodiment, the processor 14 may compare the target distance with a distance threshold (e.g., 10 cm, 15 cm, or 30 cm). In response to the target distance not being less than the distance threshold, the processor 14 sets the target algorithm to be the first independent component analysis algorithm using the parameter_G1 . That is, the processor 14 selects the first independent component analysis algorithm using the parameter_G1 as the target algorithm. Since the user is usually not too close to the mobile device 10 in general use, the parameter_G1 is usually used. In response to the target distance being less than the distance threshold, the processor 14 sets the target algorithm to be the second independent component analysis algorithm using the parameter_G2 . That is, the processor 14 selects the second independent component analysis algorithm using the parameter_G2 as the target algorithm to obtain higher stability.

在一实施例中，处理器14可判断移动装置10的软硬件资源及对应运算量的负载大小。反应于运算量限制(例如，存储器13的访问速度或带宽或处理器14的处理速度)，处理器14设定目标算法为使用参数G₃的第三独立成分分析算法。即，处理器14选择使用参数G₃的第三独立成分分析算法作为目标算法，以符合小运算量的需求。In one embodiment, the processor 14 can determine the software and hardware resources of the mobile device 10 and the corresponding computing load. In response to the computing load limitation (e.g., the access speed or bandwidth of the memory 13 or the processing speed of the processor 14), the processor 14 sets the target algorithm to be the third independent component analysis algorithm using the parameter G_3. That is, the processor 14 selects the third independent component analysis algorithm using the parameter G₃ as the target algorithm to meet the requirement of small computing load.

此外，根据个人声音特征，可利用稀疏成分分析将语音信号分离的还完整。本发明实施例是主要声音来源相对于移动装置10的目标方向及目标距离来改变目标算法。In addition, according to the individual voice characteristics, the sparse component analysis can be used to separate the speech signal completely. In the embodiment of the present invention, the target algorithm is changed according to the target direction and target distance of the main sound source relative to the mobile device 10.

例如，图6A至图6D是根据本发明一实施例的稀疏(Sparse)成分分析的示意图。请参照图6A为混和信号(例如，第一声音信号或第二声音信号)的声谱图边缘的散布图(时频域信号E₁、E₂分别对应于混合信号x₁、x₂)，且难以区别不同声音来源的声音信号。请参照图6B是将混和信号x₁、x₂投影成稀疏信号，即可区别两个非相关的信号。For example, FIG6A to FIG6D are schematic diagrams of sparse component analysis according to an embodiment of the present invention. Please refer to FIG6A for a scatter diagram of the edge of the spectrogram of a mixed signal (e.g., the first sound signal or the second sound signal) (the time-frequency domain signals E₁ and E₂ correspond to the mixed signals x₁ and x₂ respectively), and it is difficult to distinguish the sound signals from different sound sources. Please refer to FIG6B for projecting the mixed signals x₁ and x₂ into sparse signals, so that two unrelated signals can be distinguished.

为了将混和信号x₁、x₂投影到稀疏领域，处理器14可找出其主要的两个方向(例如，目标方向及干扰源声音方向)。请参照图6C(t为时间)，主成分分析(Principle ComponentAnalysis，PCA)算法是找出使期望值最大化的方向向量W1，并据以估测目标方向及干扰源声音方向。请参照图6D，非线性投影列屏蔽(Nonlinear Projection Column Masking，NPCM)算法是将投影量大于对应阈值的方向向量W2，并据以估测目标方向及干扰源声音方向。In order to project the mixed signals_x1 and_x2 into the sparse domain, the processor 14 can find out the two main directions (e.g., the target direction and the interference source sound direction). Referring to FIG6C (t is time), the principal component analysis (PCA) algorithm finds the direction vector W1 that maximizes the expected value, and estimates the target direction and the interference source sound direction accordingly. Referring to FIG6D, the nonlinear projection column masking (NPCM) algorithm finds the direction vector W2 whose projection amount is greater than the corresponding threshold, and estimates the target direction and the interference source sound direction accordingly.

在一实施例中，若目标方向与干扰源声音方向之间的夹角越大，则非线性投影列屏蔽算法所估测的目标方向及干扰源声音方向可能会偏移实际方向。处理器14可比较目标方向与干扰源声音方向之间的夹角与角度临界值(例如，45度、60度或90度)。反应于目标方向与干扰源声音方向之间的夹角大于角度临界值，处理器14设定目标算法为主成分分析算法。即，处理器14选择使用主成分分析算法作为目标算法。反应于目标方向与干扰源声音方向之间的夹角未大于该角度临界值，处理器14设定目标算法为非线性投影列屏蔽算法。即，处理器14选择使用非线性投影列屏蔽算法作为目标算法。In one embodiment, if the angle between the target direction and the interference source sound direction is larger, the target direction and the interference source sound direction estimated by the nonlinear projection column shielding algorithm may deviate from the actual direction. The processor 14 can compare the angle between the target direction and the interference source sound direction with an angle critical value (e.g., 45 degrees, 60 degrees, or 90 degrees). In response to the angle between the target direction and the interference source sound direction being greater than the angle critical value, the processor 14 sets the target algorithm to the principal component analysis algorithm. That is, the processor 14 chooses to use the principal component analysis algorithm as the target algorithm. In response to the angle between the target direction and the interference source sound direction being not greater than the angle critical value, the processor 14 sets the target algorithm to the nonlinear projection column shielding algorithm. That is, the processor 14 chooses to use the nonlinear projection column shielding algorithm as the target algorithm.

请参照图3，处理器14设定内建麦克风11在目标方向上所接收的第一声音信号为目标算法的次要信号且外接麦克风15所接收的第二声音信号为目标算法的主要信号，并通过目标算法自主要信号及次要信号中分离出主要声音来源的声音信号(步骤S330)。具体而言，由于外接麦克风15通常较接近主要声音来源，因此主要信号可能是主要声音来源的声音信号所占比例/成分较高。相较下，次要信号可能是主要声音来源的声音信号所占比例/成分较低。因此，盲信号分离例如可对主要信号赋予较高优先权，且对次要信号赋予较低优先权。而盲信号分离算法的介绍可参酌步骤S320的说明，于此不再赘述。最终，处理器14可在麦克风的收音路径上只传送主要声音来源的声音信号，进而强化主要声音来源的声音信号。Please refer to Figure 3. The processor 14 sets the first sound signal received by the built-in microphone 11 in the target direction as the secondary signal of the target algorithm and the second sound signal received by the external microphone 15 as the main signal of the target algorithm, and separates the sound signal of the main sound source from the main signal and the secondary signal through the target algorithm (step S330). Specifically, since the external microphone 15 is usually closer to the main sound source, the main signal may be the sound signal of the main sound source with a higher proportion/component. In comparison, the secondary signal may be the sound signal of the main sound source with a lower proportion/component. Therefore, blind signal separation can, for example, give a higher priority to the main signal and a lower priority to the secondary signal. The introduction of the blind signal separation algorithm can refer to the description of step S320 and will not be repeated here. Finally, the processor 14 can only transmit the sound signal of the main sound source on the microphone's sound receiving path, thereby strengthening the sound signal of the main sound source.

综上所述，在本发明实施例的声音信号处理方法及移动装置中，当使用外接麦克风时，将外接麦克风接收的声音信号为主要信号。同时，开启移动装置的内建麦克风，并将内建麦克风的声音信号为次要信号。根据主要声音来源相对于移动装置的方向及距离，并利用合适的盲信号分离技术，让麦克风路径上只传出主要声音来源的单一声音信号，进而强化主要声音来源的声音信号。In summary, in the sound signal processing method and mobile device of the embodiment of the present invention, when an external microphone is used, the sound signal received by the external microphone is used as the main signal. At the same time, the built-in microphone of the mobile device is turned on, and the sound signal of the built-in microphone is used as the secondary signal. According to the direction and distance of the main sound source relative to the mobile device, and using appropriate blind signal separation technology, only a single sound signal of the main sound source is transmitted on the microphone path, thereby strengthening the sound signal of the main sound source.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit it. Although the present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or replace some or all of the technical features therein by equivalents. However, these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present invention.