CN117275446B

Movatterモバイル変換

Info

Publication number: CN117275446B
Application number: CN202311554080.0A
Authority: CN
Inventors: 谢荣; 涂安琦; 李会勇; 赖大坤; 张乐; 史创
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2024-01-23
Anticipated expiration: 2043-11-21
Also published as: CN117275446A

Abstract

Translated fromChinese

本发明的目的在于提供一种基于声音事件检测的交互式有源噪声控制系统及方法，属于有源噪声控制技术领域。本发明控制系统包含新搭建的条件声音时间检测神经网络和子带噪声控制滤波器组；控制方法基于控制系统实现，通过预选一种声音事件类别，然后经过条件声音事件检测神经网络输出预选类别声音事件的频谱掩码；通过频谱掩码实时的调节子带可调噪声控制滤波器输出的控制信号，使次级扬声器输出的控制声波中不包含预选类别的分量，从而不抑制预选类别声音事件的声波。

The purpose of the present invention is to provide an interactive active noise control system and method based on sound event detection, which belongs to the technical field of active noise control. The control system of the present invention includes a newly built conditional sound time detection neural network and a sub-band noise control filter group; the control method is implemented based on the control system by preselecting a sound event category, and then outputting the preselected category sound event through the conditional sound event detection neural network Spectrum mask; the control signal output by the sub-band adjustable noise control filter is adjusted in real time through the spectrum mask, so that the control sound wave output by the secondary speaker does not contain the component of the pre-selected category, thereby not suppressing the sound wave of the pre-selected category sound event .

Description

Translated fromChinese

一种基于声音事件检测的交互式有源噪声控制系统及方法An interactive active noise control system and method based on sound event detection

技术领域Technical field

本发明属于有源噪声控制技术领域，具体涉及一种基于声音事件检测的交互式有源噪声控制系统及方法。The invention belongs to the technical field of active noise control, and specifically relates to an interactive active noise control system and method based on sound event detection.

背景技术Background technique

有源噪声控制技术利用声波干涉相消的原理达到抑制噪声的目的。然而在我们在日常的生活和工作中，噪声依赖于主观判断，即并不是所有的声音都是噪声，例如在公园中行走时，远处路边车辆行驶的声音是噪声，而公园中鸟语声便成为了人们亲近自然悦耳的声音；当行走在路上时，旁边嘈杂的谈话声音是噪声，身后自行车的铃声便是让人们躲避危险的声音；再如当与人交谈时，身后自行车的铃声便成为了噪声。Active noise control technology uses the principle of sound wave interference and destruction to achieve the purpose of suppressing noise. However, in our daily life and work, noise depends on subjective judgment, that is, not all sounds are noise. For example, when walking in the park, the sound of vehicles driving on the roadside in the distance is noise, while the sound of birds singing in the park is noise. The sound becomes a pleasant sound for people to get close to nature; when walking on the road, the noisy conversation next to them is noise, and the ringing of the bicycle behind them is the sound that allows people to avoid danger; another example is when talking to people, the ringing of the bicycle behind them It becomes noise.

相对于传统的耳机，目前有一些耳机增加了主动降噪模式和通透模式，其中主动降噪模式是通过有源噪声控制技术实现的，而通透模式则是通过听感均衡器将外界的声音均衡，目的是让人们感觉没有带耳机一样。但是这两种模式一个是对外界声音进行全部抑制，另一个则是让外界的声音全部到达人耳膜处，均没有选择性，无法实现将想听的声音保留而抑制不想听的声音（噪声）的目的。Compared with traditional headphones, some headphones currently have active noise reduction mode and transparency mode. The active noise reduction mode is achieved through active noise control technology, while the transparency mode uses an auditory equalizer to reduce the external noise. The sound is balanced to make people feel like they are not wearing headphones. However, one of these two modes suppresses all external sounds, and the other allows all external sounds to reach the human eardrum. Both are not selective and cannot achieve the goal of retaining the sounds you want to hear and suppressing the sounds you don’t want to hear (noise). the goal of.

针对于噪声的选择性保留，有一些工作对此做过尝试，例如在“Comb-partitionedfrequency-domain constraint adaptive algorithm for active noise control”一文中，可以只抑制声音的低频段而保留声音的高频段，使有源噪声控制系统保留一些高频警报声音，即使戴着耳机人们也能感受到周围的一些危险信号，但是这种方案能保留的声音频率特征单一。在“Design and Implementation of an Active Noise ControlHeadphone With Directional Hear-Through Capability”一文中，首先利用有源噪声控制技术抑制周围环境的所有声音，然后利用超波束形成技术将正前方的声音通过耳机播放，达到抑制其他方向的声音而保留一个方向声音的目的。这种方案会存在以下两个方面的问题：一方面噪声可能也会出现在同一方向；另一方面由于阵列大小的限制，在耳机上使用超波束形成技术达不到较为理想的效果。此外也可结合有源噪声控制和声音分离技术，但是目前效果较好的声音分离技术延迟较大，延迟较小的声音分离技术不能达到理想的分离效果。最重要的是，现有技术方案不具备声音事件的选择性，无法满足用户个性化的要求。For the selective retention of noise, some work has been attempted. For example, in the article "Comb-partitionedfrequency-domain constraint adaptive algorithm for active noise control", it is possible to suppress only the low-frequency band of the sound and retain the high-frequency band of the sound. The active noise control system retains some high-frequency alarm sounds, so that people can feel some danger signals around them even if they wear headphones, but this solution can retain a single frequency characteristic of the sound. In the article "Design and Implementation of an Active Noise ControlHeadphone With Directional Hear-Through Capability", active noise control technology is first used to suppress all sounds in the surrounding environment, and then ultra-beam forming technology is used to play the sound directly in front through the headphones, achieving The purpose of suppressing sounds from other directions while preserving sounds from one direction. This solution will have the following two problems: on the one hand, noise may also appear in the same direction; on the other hand, due to the limitation of the array size, using ultra-beam forming technology on headphones cannot achieve ideal results. In addition, active noise control and sound separation technology can also be combined. However, the current sound separation technology with better effect has a greater delay, and the sound separation technology with smaller delay cannot achieve the ideal separation effect. The most important thing is that the existing technical solutions do not have the selectivity of sound events and cannot meet the personalized requirements of users.

发明内容Contents of the invention

针对背景技术所存在的现有噪声控制方法不具备对声音具有选择性控制的问题，本发明的目的在于提供一种基于声音事件检测的交互式有源噪声控制系统及方法。本发明控制系统包含新搭建的条件声音时间检测神经网络和子带噪声控制滤波器组；控制方法基于控制系统实现，通过预选一种声音事件类别，然后经过条件声音事件检测神经网络输出预选类别声音事件的频谱掩码；通过频谱掩码实时的调节子带可调噪声控制滤波器输出的控制信号，使次级扬声器输出的控制声波中不包含预选类别的分量，从而不抑制预选类别声音事件的声波，实现对其余声音的控制。In view of the problem in the background art that existing noise control methods do not have the ability to selectively control sound, the purpose of the present invention is to provide an interactive active noise control system and method based on sound event detection. The control system of the present invention includes a newly constructed conditional sound time detection neural network and a sub-band noise control filter group; the control method is implemented based on the control system by preselecting a sound event category, and then outputting the preselected category sound event through the conditional sound event detection neural network Spectrum mask; the control signal output by the sub-band adjustable noise control filter is adjusted in real time through the spectrum mask, so that the control sound wave output by the secondary speaker does not contain the component of the pre-selected category, thereby not suppressing the sound wave of the pre-selected category sound event , to control the rest of the sounds.

为实现上述目的，本发明的技术方案如下：In order to achieve the above objects, the technical solutions of the present invention are as follows:

一种基于声音事件检测的交互式有源噪声控制系统，包括选择端口1、条件声音事件检测网络2、参考麦克风3、子带可控滤波器4和次级扬声器5；An interactive active noise control system based on sound event detection, including a selection port 1, a conditional sound event detection network 2, a reference microphone 3, a subband controllable filter 4 and a secondary speaker 5;

所述选择端口1用于选择声音事件的类别，并将声音事件的类别序号传输至条件声音事件检测网络2；所述参考麦克风3用于实时的将环境中的声波信号传输至条件声音事件检测网络2和子带可控滤波器4；条件声音事件检测网络2用于基于声音事件的类别序号和参考信号实时获取预选类别的频谱掩码，并将频谱掩码传输至子带可控滤波器4；所述子带可控滤波器4用于基于频谱掩码和参考信号输出控制信号，使次级扬声器输出的控制声波中不包含预选类别的分量，从而不抑制预选类别声音事件的声波；次级扬声器5将控制信号转换为控制声波，与干扰声波在人耳处相互抵消，最后在人耳处只剩下了用户预选类别声音事件的声波。The selection port 1 is used to select the category of the sound event and transmit the category number of the sound event to the conditional sound event detection network 2; the reference microphone 3 is used to transmit the sound wave signal in the environment to the conditional sound event detection network in real time. Network 2 and subband controllable filter 4; conditional sound event detection network 2 is used to obtain the spectrum mask of the preselected category in real time based on the category number of the sound event and the reference signal, and transmit the spectrum mask to the subband controllable filter 4 ; The sub-band controllable filter 4 is used to output a control signal based on the spectrum mask and the reference signal, so that the control sound wave output by the secondary speaker does not contain the component of the pre-selected category, thereby not suppressing the sound wave of the pre-selected category sound event; The first-level speaker 5 converts the control signal into a control sound wave, and the interference sound wave cancels each other at the human ear. Finally, only the sound wave of the user's preselected category of sound event is left at the human ear.

进一步地，所述条件声音事件检测网络2包括条件特征生成模块、特征提取模块、局部特征分析模块、特征融合模块、序列特征分析模块和输出模块；Further, the conditional sound event detection network 2 includes a conditional feature generation module, a feature extraction module, a local feature analysis module, a feature fusion module, a sequence feature analysis module and an output module;

所述条件特征生成模块基于预选类别声音类型序号进行初步编码，得到高维条件特征，并将高维条件特征输出至特征融合模块；特征提取模块用于将参考信号进行分帧、加窗、特征变换，得到输入特征，并将输入特征输出至局部特征分析模块；局部特征分析模块将输入特征进行局部的特征分析，得到高维局部特征，并将高维局部特征输出至特征融合模块；特征融合模块将高维条件特征和高维局部特征进行融合，得到高维融合特征，并将高维融合特征输出至序列分析模块；序列分析模块将高维融合特征进行序列化分析，得到序列化融合特征，并将序列化融合特征输出至输出模块；输出模块将序列化融合特征进行维度变换，流式输出预选声音事件类别的活动状态和频谱分布/>，并基于活动状态和频谱分布/>得到频谱掩码/>，m为一段音频的帧序号。The conditional feature generation module performs preliminary coding based on the serial number of the preselected category sound type to obtain high-dimensional conditional features, and outputs the high-dimensional conditional features to the feature fusion module; the feature extraction module is used to frame, window, and feature the reference signal. Transform, obtain input features, and output the input features to the local feature analysis module; the local feature analysis module performs local feature analysis on the input features to obtain high-dimensional local features, and output the high-dimensional local features to the feature fusion module; feature fusion The module fuses high-dimensional conditional features and high-dimensional local features to obtain high-dimensional fusion features, and outputs the high-dimensional fusion features to the sequence analysis module; the sequence analysis module performs serialization analysis on the high-dimensional fusion features to obtain serialized fusion features. , and output the serialized fusion features to the output module; the output module performs dimension transformation on the serialized fusion features, and streams the activity status of the pre-selected sound event category. and spectrum distribution/> , and based on activity status and spectrum distribution/> Get spectrum mask/> ,m is the frame number of a piece of audio.

进一步地，所述条件声音事件检测网络2的损失函数为：Further, the loss function of the conditional sound event detection network 2 is:

其中，m为一段音频的帧序号，M为一段音频分帧的总数，为预选声音事件活动状态的标签，/>为活动状态，/>为预选声音事件频谱分布的标签，/>为频谱分布；和/>为损失权重，且/>。Among them,m is the frame number of a piece of audio,M is the total number of frames of a piece of audio, Label for the preselected sound event active state, /> is active,/> Label for the spectral distribution of preselected sound events, /> is the spectrum distribution; and/> is the loss weight, and/> .

进一步地，所述条件声音事件检测网络2训练所采用的数据集包括可以预选的单音事件声音数据和不可预选的背景噪声声音数据；所述可以预选的单音事件声音数据具体为“笑声”、“鸟鸣声”、“警报声”、“说话声”、“音乐声”等，不可预选的背景噪声声音数据具体为“交通噪声”、“发动机噪声”、“粉色噪声”、“地铁噪声”等。Furthermore, the data set used for training the conditional sound event detection network 2 includes single tone event sound data that can be preselected and background noise sound data that cannot be preselected; the single tone event sound data that can be preselected is specifically "laughter" ", "birdsong", "alarm", "talking", "music", etc. The background noise sound data that cannot be preselected is specifically "traffic noise", "engine noise", "pink noise", "subway" Noise" etc.

进一步地，子带可控滤波器包括子带滤波器组和幅度调节数组/>；Further, the subband controllable filter includes a subband filter bank and amplitude adjustment array/> ;

所述子带滤波器组包括G个子带滤波器/>，所述子带滤波器用于对参考信号进行滤波处理，即将参考信号和各个子滤波器进行线性卷积；所述幅度调节数组/>用于调节各个子滤波器的输出幅度，即滤波输出分别乘以向量/>中的元素，其中为1的元素表示抑制相应的子带，而为0的元素则表示保留响应的子带。The subband filter bank Includes G subband filters/> , the sub-band filter is used to filter the reference signal, that is, linear convolution of the reference signal and each sub-filter; the amplitude adjustment array/> Used to adjust the output amplitude of each sub-filter, that is, the filter output is multiplied by the vector/> Elements in , where elements of 1 represent suppression of the corresponding subband, while elements of 0 represent retention of the response subband.

进一步地，子带滤波器组的获取过程为：Furthermore, the subband filter bank The acquisition process is:

步骤1：运用FxLMS算法对白噪声训练，训练时的迭代公式为：Step 1: Use FxLMS algorithm to train on white noise , the iteration formula during training is:

； ;

其中，为第n次采样时的全频带噪声控制滤波器，/>为步长，/>为误差信号，为滤波参考信号；in, Control filter for the full-band noise at thenth sampling,/> is the step size,/> is the error signal, is the filtered reference signal;

训练好后得到的全频带噪声控制滤波器记为；The full-band noise control filter obtained after training is recorded as ;

步骤2：利用滤波器重构技术将进行分解，Step 2: Use filter reconstruction technology to to decompose,

步骤2.1.对全频带噪声控制滤波器进行离散傅里叶变换，/>；Step 2.1. Control the filter for full-band noise Perform discrete Fourier transform,/> ;

其中，F_L为离散傅里叶变换矩阵，L表示控制滤波器的长度，为全频带噪声控制滤波器频域向量，Among them,F_L is the discrete Fourier transform matrix,L represents the length of the control filter, is the frequency domain vector of the full-band noise control filter,

，其中，/>为噪声控制滤波器的第l个频域值，l=0,1,...,L-1； , where,/> is thelth frequency domain value of the noise control filter,l =0,1,...,L -1;

步骤2.2. 将全频带噪声控制滤波器频域向量划分为G个子带滤波器频域向量，则G个子带滤波器频域系数分配具体过程为：Step 2.2. Convert the full-band noise control filter to the frequency domain vector Divided into G sub-band filter frequency domain vectors, the specific process of allocating G sub-band filter frequency domain coefficients is:

当g < G时：When g < G:

； ;

当g =G时：When g =G:

， ,

其中，，I表示子带噪声控制滤波器的带宽；in, ,I represents the bandwidth of the sub-band noise control filter;

则第g个子带噪声控制滤波器频域的向量形式为：Then the vector form of the g-th subband noise control filter in the frequency domain is:

，其中，/>为第g个子带噪声控制滤波器的第l个频域值，l=0,1,...,L-1； , where,/> is thel-th frequency domain value of theg -th sub-band noise control filter,l =0,1,...,L -1;

步骤2.3.对每个进行离散傅里叶逆变换，得到第g个子带滤波器/>，g=1,2,...,G：Step 2.3. For each Perform inverse discrete Fourier transform to obtain the g-th sub-band filter/> ,g =1,2,...,G :

则子带滤波器组，其中，G为子带的数量，/>为离散傅里叶变换矩阵F_L的逆矩阵。Then the subband filter bank , whereG is the number of sub-bands,/> is the inverse matrix of the discrete Fourier transform matrixF_L.

进一步地，所述幅度调节数组由频谱掩码/>按位取反获得，即：，其中，/>，/>为第g个子带的幅度调节数组。Further, the amplitude adjustment array by spectrum mask/> Obtained by bitwise inversion, that is: , where,/> ,/> It is the amplitude adjustment array of the g-th subband.

一种基于声音事件检测的交互式有源噪声控制方法，包括以下步骤：An interactive active noise control method based on sound event detection, including the following steps:

步骤1.用户预选想要保留的声音事件类别；Step 1. The user preselects the sound event categories that he wants to keep;

步骤2. 实时获取参考信号；Step 2. Get the reference signal in real time ;

步骤3. 基于步骤1预选的声音事件类别序号和步骤2实时获取的参考信号，采用训练好的条件检测神经网络获得预选类别声音事件的活动状态/>和频谱分布/>，其中m表示帧序号，按阈值/>和/>对/>和/>二值化后将两者相乘得到频谱掩码/>；Step 3. Based on the sound event category number preselected in step 1 and the reference signal obtained in real time in step 2 , use the trained condition detection neural network to obtain the activity status of the pre-selected category sound event/> and spectrum distribution/> , where m represents the frame sequence number, according to the threshold/> and/> Right/> and/> After binarization, multiply the two to obtain the spectrum mask/> ;

步骤4.基于步骤3得到的频谱掩码获得幅度调节数组/>，即：/>；Step 4. Based on the spectrum mask obtained in step 3 Get the amplitude adjustment array/> , that is:/> ;

步骤5.基于参考信号、幅度调节数组/>和子带滤波器组/>获得控制信号y(n)，具体为，Step 5. Based on the reference signal , amplitude adjustment array/> and subband filter banks/> Obtain the control signal y(n), specifically,

， ,

其中，n表示采样序号，G为子带的数量，由预选声音事件的频谱分布确定，T为转秩；Among them,n represents the sampling sequence number,G is the number of sub-bands, which is determined by the spectral distribution of the pre-selected sound event, andT is the rotation rank;

步骤6. 控制信号控制次级扬声器发出控制声波，与干扰声波在人耳处相互抵消，最后在人耳处只剩下了用户预设类别声音事件的声波。Step 6. The control signal controls the secondary speaker to emit control sound waves, which cancel each other out with the interference sound waves at the human ear. Finally, only the sound waves of the user-preset category of sound events remain at the human ear.

综上所述，由于采用了上述技术方案，本发明的有益效果是：In summary, due to the adoption of the above technical solutions, the beneficial effects of the present invention are:

1.本发明搭建的条件声音事件检测神经网络是一个流式的检测网络，不依赖于未来时刻的信息，可以使预选类别的声音事件的声音信号实时的传递到人耳处。1. The conditional sound event detection neural network built by the present invention is a streaming detection network that does not rely on information in the future and can transmit sound signals of preselected categories of sound events to the human ear in real time.

2.本发明所采用的子带可调噪声控制滤波器，是由全频带噪声控制滤波器分解而来，不会给系统带来额外的时延或噪声抑制效果的下降。2. The sub-band adjustable noise control filter used in the present invention is decomposed from the full-band noise control filter and will not bring additional delay to the system or reduce the noise suppression effect.

3.本发明控制方法可以使得当用户处在一个嘈杂的声音环境中时，让用户自由的选择要听的声音事件，实现一种个性化的有源噪声控制。3. The control method of the present invention allows the user to freely select the sound events to listen to when the user is in a noisy sound environment, thereby realizing a personalized active noise control.

附图说明Description of the drawings

图1为本发明交互式有源噪声控制系统的结构示意图。Figure 1 is a schematic structural diagram of the interactive active noise control system of the present invention.

图2为本发明条件声音事件检测网络的结构示意图。Figure 2 is a schematic structural diagram of the conditional sound event detection network of the present invention.

图3 为本发明噪声控制方法的流程示意框图。Figure 3 is a schematic flow diagram of the noise control method of the present invention.

图4为本发明实施例1控制前的声波信号时域图。Figure 4 is a time domain diagram of the acoustic wave signal before control in Embodiment 1 of the present invention.

图5为本发明实施例1控制后的声波信号时域图。Figure 5 is a time domain diagram of the acoustic wave signal after control in Embodiment 1 of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面结合实施方式和附图，对本发明作进一步地详细描述。In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with the embodiments and drawings.

一种基于声音事件检测的交互式有源噪声控制系统，其结构示意图如图1所示，包括选择端口1、条件声音事件检测网络2、参考麦克风3、子带可控滤波器4和次级扬声器5；An interactive active noise control system based on sound event detection. Its structural diagram is shown in Figure 1, including a selection port 1, a conditional sound event detection network 2, a reference microphone 3, a sub-band controllable filter 4 and a secondary Speaker 5;

所述选择端口用于选择声音事件的类别，并将声音事件的类别序号传输至条件声音事件检测网络2；所述参考麦克风3用于实时的将环境中的声波信号传输至条件声音事件检测网络2和子带可控滤波器4；条件声音事件检测网络2用于基于声音事件的类别序号和参考信号实时获取预选类别的频谱掩码，并将频谱掩码传输至子带可控滤波器4；所述子带可控滤波器4用于基于频谱掩码和参考信号输出控制信号，使次级扬声器输出的控制声波中不包含预选类别的分量，从而不抑制预选类别声音事件的声波；次级扬声器5将控制信号转换为控制声波，与干扰声波在人耳处相互抵消，最后在人耳处只剩下了用户预选类别声音事件的声波。The selection port is used to select the category of the sound event and transmit the category number of the sound event to the conditional sound event detection network 2; the reference microphone 3 is used to transmit the sound wave signal in the environment to the conditional sound event detection network in real time. 2 and subband controllable filter 4; the conditional sound event detection network 2 is used to obtain the spectrum mask of the preselected category in real time based on the category number of the sound event and the reference signal, and transmit the spectrum mask to the subband controllable filter 4; The sub-band controllable filter 4 is used to output a control signal based on the spectrum mask and the reference signal, so that the control sound wave output by the secondary speaker does not contain the component of the pre-selected category, thereby not suppressing the sound wave of the pre-selected category sound event; the secondary The speaker 5 converts the control signal into a control sound wave, and the interference sound wave cancels each other at the human ear. Finally, only the sound wave of the user's preselected category of sound event remains at the human ear.

其中，条件声音事件检测网络2的结构示意图如图2所示，包括条件特征生成模块、特征提取模块、局部特征分析模块、特征融合模块、序列特征分析模块和输出模块；Among them, the structural diagram of the conditional sound event detection network 2 is shown in Figure 2, including a conditional feature generation module, a feature extraction module, a local feature analysis module, a feature fusion module, a sequence feature analysis module and an output module;

条件特征生成模块基于预选类别声音类型编号进行初步编码，得到高维条件特征，并将高维条件特征输出至特征融合模块；特征提取模块用于将音频序列进行分帧、加窗、特征变换，得到输入特征，并将输入特征输出至局部特征分析模块；局部特征分析模块将输入特征进行局部的特征分析，得到高维局部特征，并将高维局部特征输出至特征融合模块；特征融合模块将高维条件特征和高维局部特征进行融合，得到高维融合特征，并将高维融合特征输出至序列分析模块；序列分析模块将高维融合特征进行序列化分析，得到序列化融合特征，并将序列化融合特征输出至输出模块；输出模块将序列化融合特征进行维度变换，流式输出预选类别声音事件的活动状态和频谱分布/>，并基于活动状态和频谱分布/>得到频谱掩码/>。The conditional feature generation module performs preliminary coding based on the preselected category sound type number to obtain high-dimensional conditional features, and outputs the high-dimensional conditional features to the feature fusion module; the feature extraction module is used to frame, window, and feature transform the audio sequence. Obtain the input features and output the input features to the local feature analysis module; the local feature analysis module performs local feature analysis on the input features to obtain high-dimensional local features, and output the high-dimensional local features to the feature fusion module; the feature fusion module will High-dimensional conditional features and high-dimensional local features are fused to obtain high-dimensional fusion features, and the high-dimensional fusion features are output to the sequence analysis module; the sequence analysis module performs serialization analysis on the high-dimensional fusion features to obtain serialized fusion features, and Output the serialized fusion features to the output module; the output module performs dimension transformation on the serialized fusion features and streams the activity status of the preselected category sound events. and spectrum distribution/> , and based on activity status and spectrum distribution/> Get spectrum mask/> .

本发明系统应用过程中分为两个阶段，一是训练阶段，二是控制阶段。其中，训练阶段包括训练类别条件声音事件检测神经网络和子带可调噪声控制滤波器。The application process of the system of the present invention is divided into two stages, one is the training stage, and the other is the control stage. Among them, the training stage includes training category conditional sound event detection neural network and sub-band adjustable noise control filter.

条件声音事件检测网络的具体训练过程为：The specific training process of the conditional sound event detection network is:

首先准备一个训练条件声音事件检测网络的数据集，数据集可以通过合成的方式获取，在合成数据集中，包含了可以选择的单音事件的声音数据和一些常见的背景噪声的声音数据，其中单音事件的数据的类别预先定义，例如将“笑声”、“鸟鸣声”、“警报声”、“说话声”、“音乐声”等定义为可以预选的声音事件，将“交通噪声”、“发动机噪声”、“粉色噪声”、“地铁噪声”等定义为背景噪声，背景噪声为不可预选的声音事件。First prepare a data set for training the conditional sound event detection network. The data set can be obtained through synthesis. The synthetic data set contains the sound data of selectable single-tone events and the sound data of some common background noises. The categories of sound event data are predefined, for example, "laughter", "birdsong", "alarm sound", "talking sound", "music sound", etc. are defined as sound events that can be preselected, and "traffic noise" , "engine noise", "pink noise", "subway noise", etc. are defined as background noise, and background noise is a sound event that cannot be preselected.

选择端口里包含可人为预选的声音事件类别，在预选之后，将其转换为初步的类别编码，如独热向量编码。The selection port contains sound event categories that can be preselected manually. After preselection, they are converted into preliminary category encodings, such as one-hot vector encoding.

条件特征生成模块则将初步的类别编码转换为高维条件特征向量。特征提取模块将参考麦克风输入的声音波形进行分帧加窗，并且转换为神经网络的输入特征，输入特征可以采用符合人耳听觉特性的Log-梅尔能量谱：首先将声音波形进行短时傅里叶变换；然后用梅尔滤波器组与之相乘，并将其进行Log变换，便得到了Log-梅尔能量谱；然后将得到的Log-梅尔能量谱进行归一化，得到神经网络的输入特征。局部特征分析模块对输入特征进行局部的特征分析，得到高维局部特征。局部特征分析模块可以由卷积神经网络（CNN）构成，具体的可以使用如AlexNet、VGG、GoogLeNet、ResNet中的卷积部分作为局部特征分析模块，此外为了轻量化的应用和更快的推理速度，可以采用一些轻量级的CNN模型，例如MobileNet，SqueezeNet，ShuffleNet等。特征融合模块将高维条件特征向量和高维局部特征进行特征融合，条件特征可以使局部特征只保留预选声音事件的特征。特征融合模块具体的实施方式可以是拼接、相加、相乘、注意力交互等。序列分析模块将高维融合特征进行序列化分析，输出序列化融合特征。为捕捉各个局部特征的序列信息，需要使用序列特征分析模块，该模块需要有记忆性，需要将过去的特征进行综合现在的特征进行分析，同时不依赖未来的特征，其具体的实施方式可以采用单向的长短期记忆递归神经网络（LSTM）、单向的门控循环单元（GRU）、因果时间卷积网络（TCN）等。最后输出模块输出每个时刻的预选声音事件激活状态和预选声音事件频谱分布，其中频谱分布为二值向量。The conditional feature generation module converts the preliminary category encoding into a high-dimensional conditional feature vector. The feature extraction module divides the sound waveform input from the reference microphone into frames and windows, and converts it into the input feature of the neural network. The input feature can use the Log-Mel energy spectrum that conforms to the auditory characteristics of the human ear: first, the sound waveform is subjected to short-time Fu Liye transform; then multiply it with the Mel filter bank and perform Log transformation to obtain the Log-Mel energy spectrum; then normalize the obtained Log-Mel energy spectrum to obtain the neural Input features of the network. The local feature analysis module performs local feature analysis on the input features to obtain high-dimensional local features. The local feature analysis module can be composed of a convolutional neural network (CNN). Specifically, the convolution part in AlexNet, VGG, GoogLeNet, and ResNet can be used as the local feature analysis module. In addition, for lightweight applications and faster reasoning speed , some lightweight CNN models can be used, such as MobileNet, SqueezeNet, ShuffleNet, etc. The feature fusion module fuses high-dimensional conditional feature vectors and high-dimensional local features. The conditional features can make the local features retain only the characteristics of preselected sound events. The specific implementation of the feature fusion module can be splicing, addition, multiplication, attention interaction, etc. The sequence analysis module performs serialization analysis on high-dimensional fusion features and outputs serialized fusion features. In order to capture the sequence information of each local feature, a sequence feature analysis module needs to be used. This module needs to have memory and needs to analyze past features by integrating them with current features. At the same time, it does not rely on future features. Its specific implementation method can be used. One-way long short-term memory recurrent neural network (LSTM), one-way gated recurrent unit (GRU), causal temporal convolution network (TCN), etc. Finally, the output module outputs the activation status of the preselected sound event and the spectrum distribution of the preselected sound event at each moment, where the spectrum distribution is a binary vector.

类别条件声音事件检测网络的损失函数为：The loss function of the category conditional sound event detection network is:

其中，m为一段音频的帧序号，M为一段音频分帧的总数，为预选声音事件活动状态的标签，/>为预选声音事件的活动状态，/>为预选声音事件频谱分布的标签，可以采用理想二进制掩码（IBM）；/>为频谱分布；/>和/>为损失权重，且/>。Among them, m is the frame number of a piece of audio, M is the total number of frames of a piece of audio, Label for the preselected sound event active state, /> The active state of the preselected sound event,/> To preselect labels for the spectral distribution of sound events, ideal binary masks (IBM) can be used;/> is the spectrum distribution;/> and/> is the loss weight, and/> .

训练子带噪声控制滤波器，子带可调噪声控制滤波器包括子带噪声控制滤波器组和幅度调节数组/>；Training sub-band noise control filters, sub-band adjustable noise control filters include sub-band noise control filter banks and amplitude adjustment array/> ;

子带滤波器组的获取过程为：subband filter bank The acquisition process is:

其中，为全频带噪声控制滤波器，/>为步长，/>为误差信号，/>为滤波参考信号；in, Control filters for full-band noise,/> is the step size,/> is the error signal,/> is the filtered reference signal;

步骤2.1.对全频带噪声控制滤波器进行离散傅里叶变换，/>，Step 2.1. Control the filter for full-band noise Perform discrete Fourier transform,/> ,

其中， F_L为离散傅里叶变换矩阵，L表示控制滤波器的长度，为全频带噪声控制滤波器频域向量，Among them, F_L is the discrete Fourier transform matrix, L represents the length of the control filter, is the frequency domain vector of the full-band noise control filter,

； ;

当g < G时：When g < G:

当g =G时：When g =G:

步骤2.3.对每个进行离散傅里叶逆变换，得到第g个子带滤波器/>：Step 2.3. For each Perform inverse discrete Fourier transform to obtain the g-th sub-band filter/> :

则子带滤波器组，其中，G为子带的数量。Then the subband filter bank , where G is the number of sub-bands.

本发明子带噪声控制滤波器包括初级通路和次级通路，其中初级通路表示从声源至人耳处的声学路径，而次级通路为次级扬声器到人耳处的声学路径。对本发明设计的全频带噪声控制滤波器进行训练，FxLMS算法所训练的全频带噪声控制滤波器使白噪声的能量下降了约10dB，意味该全频带噪声控制滤波器着可以有效的对环境噪声进行抑制。除此之外，本发明滤波器分解方法可以有效的将全频带的控制滤波器分解到各个子带，即可以通过控制子带滤波器的输出幅度来达到对预设声音事件的声波保留的目的；子带滤波器的数量越多，对预设声音事件选通的程度越精细。The sub-band noise control filter of the present invention includes a primary path and a secondary path, where the primary path represents the acoustic path from the sound source to the human ear, and the secondary path is the acoustic path from the secondary speaker to the human ear. The full-band noise control filter designed in this invention is trained. The full-band noise control filter trained by the FxLMS algorithm reduces the energy of white noise by about 10dB, which means that the full-band noise control filter can effectively control environmental noise. inhibition. In addition, the filter decomposition method of the present invention can effectively decompose the full-band control filter into each sub-band, that is, the purpose of retaining the sound waves of the preset sound event can be achieved by controlling the output amplitude of the sub-band filter. ;The greater the number of subband filters, the more finely the preset sound events are gated.

将条件声音事件检测网络和子带噪声控制滤波器训练好后，便可直接使用，后续在使用过程中无需再次训练。After the conditional sound event detection network and sub-band noise control filter are trained, they can be used directly, and there is no need to train again during subsequent use.

实施例1Example 1

一种基于声音事件检测的交互式有源噪声控制方法，其流程示意框图如图3所示，包括以下步骤：An interactive active noise control method based on sound event detection, the flow diagram of which is shown in Figure 3, including the following steps:

步骤1. 参考麦克风实时获取参考信号，并将参考信号/>输入至条件检测神经网络和全频带噪声控制滤波器的G个子滤波器中；Step 1. The reference microphone acquires the reference signal in real time , and convert the reference signal/> Input to the G sub-filter of the condition detection neural network and full-band noise control filter;

步骤2. 条件检测神经网络基于预选的声音事件类别和步骤1实时获取的参考信号，获得预选类别声音事件的活动状态/>和频谱分布/>，其中m表示帧序号，按阈值/>和/>对/>和/>二值化后将两者相乘得到频谱掩码/>；Step 2. Condition detection neural network based on pre-selected sound event categories and the reference signal obtained in real time in step 1 , get the active status of a preselected category of sound events/> and spectrum distribution/> , where m represents the frame sequence number, according to the threshold/> and/> Right/> and/> After binarization, multiply the two to obtain the spectrum mask/> ;

步骤3. 基于步骤2得到的频谱掩码按位取反获得幅度调节数组/>，即：；向量/>中为1的元素表示抑制相应的子带，而为0的元素则表示保留响应的子带；Step 3. Based on the spectrum mask obtained in step 2 Bitwise inversion to obtain the amplitude adjustment array/> ,Right now: ;Vector/> An element of 1 indicates that the corresponding subband is suppressed, while an element of 0 indicates that the response subband is retained;

步骤4. 基于参考信号、幅度调节数组/>和子带滤波器组/>获得控制信号y(n)，具体为，Step 4. Based on reference signal , amplitude adjustment array/> and subband filter banks/> Obtain the control signal y(n), specifically,

其中，n表示采样序号，G为子带的数量，由预选声音事件的频谱分布确定，为第g个子带的幅度调节数组；Among them, n represents the sampling sequence number, G is the number of sub-bands, which is determined by the spectral distribution of the pre-selected sound event. is the amplitude adjustment array of the g-th subband;

上述过程为子带滤波器对参考信号进行点到点处理，即将参考信号和各子滤波器进行线性卷积，而不需要声音事件检测的分帧操作；The above process is a sub-band filter Perform point-to-point processing on the reference signal, that is, perform linear convolution on the reference signal and each sub-filter, without the need for frame-breaking operations for sound event detection;

步骤5. 控制信号控制次级扬声器发出控制声波，与干扰声波在人耳处相互抵消，最后在人耳处只剩下了用户预设类别声音事件的声波。Step 5. The control signal controls the secondary speaker to emit control sound waves, which cancel each other out with the interference sound waves at the human ear. Finally, only the sound waves of the user-preset category of sound events remain at the human ear.

图4和图5分别为本发明实施例1控制前和控制后的声波信号时域图。如图4所示，声波信号共持续10秒，其中背景噪声为汽车驾驶舱噪声，在10秒音频中一直持续，3.3秒至3.5秒的声音事件为玻璃破碎声，5.1秒至5.7秒的声音事件为男人的说话声，5.9秒至6.9秒的声音事件为笑声。假设预选的声音事件为男人说话声，当使用本发明有源噪声控制方法对声波信号进行控制，其控制后的声波信号如图5所示。从图5可以看出，当预选的声音事件没有出现时，声波信号被抑制，而预选的声音事件出现时，选定的声音事件的声波信号未被抑制。Figures 4 and 5 are respectively time domain diagrams of acoustic wave signals before and after control in Embodiment 1 of the present invention. As shown in Figure 4, the sound wave signal lasts for a total of 10 seconds, in which the background noise is the car cockpit noise, which continues throughout the 10 seconds of audio, the sound event from 3.3 seconds to 3.5 seconds is the sound of glass breaking, and the sound from 5.1 seconds to 5.7 seconds The event is a man's voice, and the sound event from 5.9 seconds to 6.9 seconds is laughter. Assume that the preselected sound event is a man's voice. When the active noise control method of the present invention is used to control the sound wave signal, the controlled sound wave signal is as shown in Figure 5. It can be seen from Figure 5 that when the preselected sound event does not appear, the sound wave signal is suppressed, and when the preselected sound event occurs, the sound wave signal of the selected sound event is not suppressed.

同时，本发明还对实施例1控制前后的声波信号的能量变化进行了验证，当预选声音时间为男人的说话声时，只有被选定声音事件的信号的能量被保留了下来，而其他时段或频段的信号则被抑制。At the same time, the present invention also verified the energy changes of the sound wave signal before and after the control in Embodiment 1. When the preselected sound time is a man's voice, only the energy of the signal of the selected sound event is retained, while other time periods or frequency band signals are suppressed.

以上所述，仅为本发明的具体实施方式，本说明书中所公开的任一特征，除非特别叙述，均可被其他等效或具有类似目的的替代特征加以替换；所公开的所有特征、或所有方法或过程中的步骤，除了互相排斥的特征和/或步骤以外，均可以任何方式组合。The above are only specific embodiments of the present invention. Any feature disclosed in this specification, unless specifically stated, can be replaced by other equivalent or alternative features with similar purposes; all features disclosed, or All method or process steps, except mutually exclusive features and/or steps, may be combined in any way.