CN100483509C

Movatterモバイル変換

Info

Publication number: CN100483509C
Application number: CN 200610164456
Authority: CN
Inventors: 严勤; 邓浩江; 王珺; 许剑峰; 许丽净; 李伟; 张清; 桑盛虎; 杜正中
Original assignee: Huawei Technologies Co Ltd; Institute of Acoustics CAS
Current assignee: Huawei Technologies Co Ltd; Institute of Acoustics CAS
Priority date: 2006-12-05
Filing date: 2006-12-05
Publication date: 2009-04-29
Anticipated expiration: 2026-12-05
Also published as: EP2096629B1; EP2096629A4; EP2096629A1; WO2008067735A1; CN101197135A

Abstract

Translated fromChinese

本发明公开了一种声音信号分类方法，包括：接收声音信号，根据背景噪声频谱分布参数和所述声音信号的频谱分布参数确定背景噪声的更新速率；根据所述更新速率对噪声参数进行更新，并根据子带能量参数和更新后的噪声参数对所述声音信号进行分类。本发明还公开了一种声音信号分类装置，包括：背景噪声参数更新模块，用于根据背景噪声频谱分布参数和当前声音信号的频谱分布参数确定背景噪声的更新速率，并发送所述确定的更新速率；PSC模块，用于接收来自所述背景噪声参数更新模块的更新速率，对噪声参数进行更新，并根据子带能量参数和更新后的噪声参数对当前声音信号进行分类，并发送分类确定的声音信号类型。

The invention discloses a sound signal classification method, comprising: receiving the sound signal, determining the update rate of the background noise according to the background noise spectrum distribution parameter and the spectrum distribution parameter of the sound signal; updating the noise parameter according to the update rate, And classify the sound signal according to the sub-band energy parameters and the updated noise parameters. The invention also discloses a sound signal classification device, including: a background noise parameter update module, used to determine the update rate of the background noise according to the background noise spectrum distribution parameter and the current sound signal spectrum distribution parameter, and send the determined update Rate; the PSC module is used to receive the update rate from the background noise parameter update module, update the noise parameters, and classify the current sound signal according to the subband energy parameters and the updated noise parameters, and send the classification determined Sound signal type.

Description

Translated fromChinese

声音信号分类方法和装置Sound signal classification method and device

技术领域technical field

本发明涉及语音编码技术领域，特别涉及一种声音信号分类方法和一种声音信号分类装置。The invention relates to the technical field of speech coding, in particular to a sound signal classification method and a sound signal classification device.

背景技术Background technique

在语音通信中只有大约40％的信号是包含语音的，其它时间都是静音或背景噪声，为了节省传输带宽，在语音信号处理领域进行语音编码中，采用语音活动检测(VAD，Voice Activity Detection)技术，使得编码器可以对背景噪声和活动的语音采用不同的速率进行编码，即对背景噪声用较低的速率进行编码，对活动的语音用较高的速率进行编码，从而降低了平均码率，极大的促进了变速率语音编码技术的发展。In voice communication, only about 40% of the signal contains voice, and the rest of the time is silence or background noise. In order to save transmission bandwidth, voice activity detection (VAD, Voice Activity Detection) is used in voice coding in the field of voice signal processing Technology enables the encoder to encode background noise and active speech at different rates, that is, encode background noise at a lower rate and encode active speech at a higher rate, thereby reducing the average bit rate , greatly promoting the development of variable rate speech coding technology.

现有的信号检测器(VAD)均针对语音信号而开发，只将输入的音频信号分为两种：噪声和非噪声。较新的编码器如AMR_WB+和SMV，包含音乐信号的检测，作为VAD判决以外的一个修正和补充。AMR-WB+编码器的重要特征是在VAD检测之后，根据输入音频信号是语音还是音乐，用不同的模式进行编码，以在最大程度上减小码率，保证编码质量。Existing signal detectors (VAD) are all developed for speech signals, and only divide the input audio signal into two types: noise and non-noise. Newer encoders, such as AMR_WB+ and SMV, include music signal detection as a modification and complement to VAD decisions. The important feature of the AMR-WB+ encoder is that after the VAD detection, according to whether the input audio signal is voice or music, it uses different modes to encode, so as to reduce the bit rate to the greatest extent and ensure the encoding quality.

AMR-WB+中的两种不同编码模式包括：基于代数码本激励线性预测语音编码器ACELP(Algebraic Code Excited Linear Prediction)和变换激励编码TCX(Transform coded excitation)模式两种核心编码算法。ACELP属于通过建立语音发声模型，充分利用了语音的特点，对于语音信号的编码效率很高，加之其技术已经相当成熟，故可以通过在通用音频编码器上扩展使用前者使其语音编码质量得到很大提高。类似地，通过在低比特率的语音编码器上扩展使用TCX编码使其宽带音乐的编码质量得到提高。The two different coding modes in AMR-WB+ include two core coding algorithms based on Algebraic Code Excited Linear Prediction Speech Coder ACELP (Algebraic Code Excited Linear Prediction) and TCX (Transform coded excitation) mode. ACELP is based on the establishment of a speech sound model, making full use of the characteristics of speech, and has a high coding efficiency for speech signals. In addition, its technology is quite mature, so the speech coding quality can be greatly improved by extending the use of the former on the general audio encoder. Great improvement. Similarly, the encoding quality of wideband music is improved by extending the use of TCX encoding on low bit rate speech encoders.

AMR-WB+编码算法的ACELP和TCX模式选择算法根据复杂度有两种：开环选择算法和闭环选择算法。闭环选择对应高复杂度，为缺省选项，是一种基于感知加权信噪比的遍历搜索的选择方式，显然，这样的选择方法是很准确的，但它运算复杂度非常高，代码量也较大。The ACELP and TCX mode selection algorithms of the AMR-WB+ coding algorithm have two types according to the complexity: open-loop selection algorithm and closed-loop selection algorithm. Closed-loop selection corresponds to high complexity, which is the default option. It is a selection method based on perceptually weighted signal-to-noise ratio traversal search. Obviously, this selection method is very accurate, but its computational complexity is very high, and the amount of code is also small. larger.

开环选择包括如下步骤：Open loop selection includes the following steps:

首先在步骤101，由VAD模块根据声调标识(Tone_flag)和子带能量参数(Level[n])，确定信号是非有用信号还是有用信号。First, instep 101, the VAD module determines whether the signal is a non-useful signal or a useful signal according to the tone flag (Tone_flag) and the sub-band energy parameter (Level[n]).

然后在步骤102，进行初步模式选择(EC)；Then instep 102, a preliminary mode selection (EC) is performed;

在步骤103，对步骤102初步确定的模式进行修正和细化模式选择(ESC)，以确定选择的编码模式，具体基于开环基音参数和ISF参数进行。In step 103, the mode preliminarily determined instep 102 is modified and refined mode selection (ESC) is performed to determine the selected coding mode, specifically based on open-loop pitch parameters and ISF parameters.

在步骤104、进行TCXS处理，即当连续选择语音信号编码模式的次数小于三次时，进行小规模的闭环遍历搜索，最终确定编码模式，其中语音信号编码模式为ACELP，音乐信号编码模式为TCX。In step 104, TCXS processing is performed, that is, when the number of consecutive voice signal coding modes selected is less than three times, a small-scale closed-loop traversal search is performed to finally determine the coding mode, wherein the voice signal coding mode is ACELP, and the music signal coding mode is TCX.

在上述AMR-WB+的语音信号选择算法具有如下缺点：The voice signal selection algorithm of the above-mentioned AMR-WB+ has the following disadvantages:

1、现有的VAD模块在对信号进行分类时，对噪声和一些种类的音乐信号区分不够理想，降低了声音信号分类的准确性；1. When the existing VAD module classifies signals, it is not ideal for distinguishing noise and some types of music signals, which reduces the accuracy of sound signal classification;

2、计算开环基音参数，对于ACELP编码模式是必要的运算，然而对于TCX编码模式是不必要的。按照AMR-WB+的结构设计，VAD和开环模式选择算法需要用到开环基音参数，因此对所有帧都需要计算开环基音，而这对于其它非ACELP编码模式(例如TCX)来说，属于冗余的复杂度，增加了编码模式选择的计算量，降低了效率。2. Calculating the open-loop pitch parameters is a necessary operation for the ACELP coding mode, but it is not necessary for the TCX coding mode. According to the structural design of AMR-WB+, the VAD and open-loop mode selection algorithms need to use the open-loop pitch parameters, so the open-loop pitch needs to be calculated for all frames, and this is for other non-ACELP coding modes (such as TCX). The complexity of redundancy increases the calculation amount of encoding mode selection and reduces the efficiency.

3、虽然VAD检测算法在语音检测和噪声免疫上的表现是当前各种编码器中较优的，但在某些特殊的音乐信号拖尾部分有可能误将音乐信号判成噪音，这将导致音乐的尾音被截断，听起来不自然。3. Although the performance of the VAD detection algorithm in speech detection and noise immunity is better among the current various encoders, it may mistakenly judge the music signal as noise in some special music signal trailing parts, which will lead to The end of the music is truncated and sounds unnatural.

4、AMR-WB+的模式选择算法不考虑信号所处的信噪比环境，在低信噪比条件下区分语音和音乐的性能进一步恶化。4. The mode selection algorithm of AMR-WB+ does not consider the signal-to-noise ratio environment of the signal, and the performance of distinguishing speech and music under low signal-to-noise ratio conditions deteriorates further.

发明内容Contents of the invention

有鉴于此，本发明提供了一种声音信号分类方法和一种声音信号分类装置，能够提高对声音信号分类检测的准确性。In view of this, the present invention provides a sound signal classification method and a sound signal classification device, which can improve the accuracy of sound signal classification and detection.

本发明提供的一种声音信号分类检测方法包括：A kind of sound signal classification detection method provided by the present invention comprises:

接收声音信号，根据背景噪声频谱分布参数和所述声音信号的频谱分布参数确定背景噪声的更新速率；根据所述更新速率对噪声参数进行更新，并根据子带能量参数和更新后的噪声参数对所述声音信号进行分类，分类得到有用信号和非有用信号。Receive the sound signal, determine the update rate of the background noise according to the background noise spectrum distribution parameter and the spectrum distribution parameter of the sound signal; update the noise parameter according to the update rate, and update the noise parameter according to the subband energy parameter and the updated noise parameter The sound signals are classified to obtain useful signals and non-useful signals.

本发明提供的一种声音信号分类装置包括：背景噪声参数更新模块和信号初始分类PSC模块；A sound signal classification device provided by the present invention includes: a background noise parameter update module and a signal initial classification PSC module;

背景噪声参数更新模块用于根据背景噪声频谱分布参数和当前声音信号的频谱分布参数确定背景噪声的更新速率，并发送所述确定的更新速率；The background noise parameter update module is used to determine the update rate of the background noise according to the background noise spectrum distribution parameter and the spectrum distribution parameter of the current sound signal, and send the determined update rate;

PSC模块用于接收来自所述背景噪声参数更新模块的更新速率，对噪声参数进行更新，并根据子带能量参数和更新后的噪声参数对当前声音信号进行分类，并发送分类确定的声音信号类型。The PSC module is used to receive the update rate from the background noise parameter update module, update the noise parameters, and classify the current sound signal according to the subband energy parameters and the updated noise parameters, and send the sound signal type determined by classification .

从上述方案可以看出，本发明中通过确定背景噪声的更新速率，并根据该更新速率对噪声参数进行更新，再根据子带能量参数和更新后的噪声参数对信号进行初始分类，确定接收的语音信号中的非有用信号和有用信号，降低了将有用信号判决为噪音信号的误判，提高了声音信号分类的准确性。It can be seen from the above scheme that in the present invention, by determining the update rate of the background noise, and updating the noise parameters according to the update rate, and then performing initial classification on the signal according to the subband energy parameters and the updated noise parameters, the received The non-useful signal and the useful signal in the voice signal reduce the misjudgment that the useful signal is judged as a noise signal, and improve the accuracy of the sound signal classification.

附图说明Description of drawings

图1为现有技术中的AMR-WB+编码算法开环选择示意图；Fig. 1 is the open-loop selection schematic diagram of AMR-WB+ coding algorithm in the prior art;

图2为本发明声音信号分类检测方法的总体流程图；Fig. 2 is the general flow chart of sound signal classification detection method of the present invention;

图3为本发明声音信号分类装置的组成示意图；3 is a schematic diagram of the composition of the sound signal classification device of the present invention;

图4为本发明具体实施例所基于的系统组成示意图；Fig. 4 is a schematic diagram of the system composition based on a specific embodiment of the present invention;

图5为本发明具体实施例中一种编码器参数提取模块计算各种参数的流程图；Fig. 5 is a flow chart of calculating various parameters by an encoder parameter extraction module in a specific embodiment of the present invention;

图6为本发明具体实施例中另一种编码器参数提取模块计算各种参数的流程图；FIG. 6 is a flow chart of calculating various parameters by another encoder parameter extraction module in a specific embodiment of the present invention;

图7为本发明具体实施例中PSC模块组成示意图；Fig. 7 is a schematic diagram of the composition of the PSC module in a specific embodiment of the present invention;

图8为本发明具体实施例中信号分类判决模块确定特征参数的示意图；Fig. 8 is a schematic diagram of determining characteristic parameters by the signal classification and judgment module in a specific embodiment of the present invention;

图9为本发明具体实施例中信号分类判决模块进行语音判决的示意图；Fig. 9 is a schematic diagram of voice judgment by the signal classification and judgment module in a specific embodiment of the present invention;

图10为本发明具体实施例中信号分类判决模块进行音乐判决的示意图；Fig. 10 is a schematic diagram of music judgment performed by the signal classification judgment module in a specific embodiment of the present invention;

图11为本发明具体实施例中信号分类判决模块对初始判决结果进行修正的示意图；Fig. 11 is a schematic diagram of the signal classification and judgment module in a specific embodiment of the present invention revising the initial judgment result;

图12为本发明具体实施例中信号分类判决模块对不确定信号进行初步修正分类示意图；Fig. 12 is a schematic diagram of preliminary correction and classification of uncertain signals by the signal classification and judgment module in a specific embodiment of the present invention;

图13为本发明具体实施例中信号分类判决模块对信号进行最终分类修正示意图；Fig. 13 is a schematic diagram of final classification and correction of signals by the signal classification and judgment module in a specific embodiment of the present invention;

图14为本发明具体实施例中信号分类判决模块进行参数更新示意图。Fig. 14 is a schematic diagram of parameter update performed by the signal classification and judgment module in a specific embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面结合附图对本发明作进一步的详细描述。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings.

本发明的主要思想是，根据当前声音信号的频谱分布参数和背景噪声频谱分布参数确定背景噪声的更新速率，并根据该更新速率对噪声参数进行更新，则在确定接收的语音信号中的有用信号和非有用信号时，根据该更新后的噪声参数进行，从而使得在确定有用信号和非有用信号时，噪声参数的准确性更高，提高了声音信号分类的准确性。The main idea of the present invention is to determine the update rate of the background noise according to the spectrum distribution parameters of the current sound signal and the background noise spectrum distribution parameters, and update the noise parameters according to the update rate, then determine the useful signal in the received speech signal When determining the useful signal and the non-useful signal, it is performed according to the updated noise parameter, so that when the useful signal and the non-useful signal are determined, the accuracy of the noise parameter is higher, and the accuracy of the sound signal classification is improved.

如图2所示，本发明首先提供了一种声音信号分类检测方法，该方法包括：As shown in Figure 2, the present invention firstly provides a kind of sound signal classification detection method, and this method comprises:

步骤201、接收声音信号，根据背景噪声频谱分布参数和所述声音信号的频谱分布参数确定背景噪声的更新速率；Step 201, receiving the sound signal, and determining the update rate of the background noise according to the background noise spectrum distribution parameter and the sound signal spectrum distribution parameter;

步骤202、根据所述更新速率对噪声参数进行更新，并根据子带能量参数和更新后的噪声参数对所述声音信号进行分类。Step 202: Update the noise parameters according to the update rate, and classify the sound signal according to the subband energy parameters and the updated noise parameters.

步骤202中，将声音信号分类主要是分为有用信号类型和非有用信号类型。此后，还可以进一步确定有用信号的类型，所述类型包括语音信号和音乐信号，在确定时，根据噪声是否收敛，选择基于开环基音参数、导谱频率参数和子带能量参数确定，或选择基于导谱频率参数和子带能量参数确定。Instep 202, the sound signal is classified mainly into useful signal types and non-useful signal types. Thereafter, the type of useful signal can be further determined, and the type includes voice signal and music signal. When determining, according to whether the noise converges, it is selected based on the open-loop pitch parameter, the guide spectrum frequency parameter and the sub-band energy parameter, or selected based on The frequency parameter of the guide spectrum and the energy parameter of the subband are determined.

此外，为防止将音乐信号拖尾误判为非有用信号，降低声音效果，本发明中还获取确定的有用信号类型，根据该有用信号类型确定信号拖尾长度，并进一步根据该信号拖尾长度确定接收的语音信号中的有用信号和非有用信号。这里，对音乐信号的拖尾可以设置的较大，从而提高音乐信号的声音效果。In addition, in order to prevent the music signal tail from being misjudged as a non-useful signal and reduce the sound effect, the present invention also obtains the determined useful signal type, determines the signal tail length according to the useful signal type, and further determines the length of the signal tail according to the signal tail length. Determining wanted and undesired signals in a received speech signal. Here, the smearing of the music signal can be set larger, so as to improve the sound effect of the music signal.

在将有用信号确定为语音信号或音乐信号时，可以首先将不能够非常准确确定的信号设置为不确定类型，然后再根据其他参数对不确定类型进行修正，最终确定有用信号的类型。When determining the useful signal as a voice signal or a music signal, the signal that cannot be determined very accurately can be set as an uncertain type first, and then the uncertain type is corrected according to other parameters to finally determine the type of the useful signal.

由于非有用信号的编码方式并非均需要计算导谱频率参数，因此为降低分类过程中的计算量，提高分类效率，对确定出的非有用信号，如果其对应的编码方式不需要计算导谱频率参数，则不计算导谱频率参数。Since the coding methods of non-useful signals do not all need to calculate the guide frequency parameters, in order to reduce the calculation amount in the classification process and improve the classification efficiency, for the determined non-useful signals, if the corresponding coding method does not need to calculate the guide spectrum frequency parameter, the frequency parameter of the guide spectrum will not be calculated.

如图3所示，本发明还提供了一种声音信号分类装置，包括背景噪声参数更新模块和信号初始分类(PSC)模块。其中，背景噪声参数更新模块用于根据当前声音信号的频谱分布参数和背景噪声频谱分布参数确定背景噪声的更新速率，并将确定的更新速率传送给所述PSC模块；PSC模块用于根据来自所述背景噪声参数更新模块的更新速率，对噪声参数进行更新，并根据子带能量参数和更新后的噪声参数对信号进行初始分类，将接收的语音信号确定为有用信号类型或非有用信号类型。As shown in FIG. 3 , the present invention also provides a sound signal classification device, which includes a background noise parameter update module and a signal initial classification (PSC) module. Wherein, the background noise parameter update module is used to determine the update rate of the background noise according to the spectrum distribution parameter of the current sound signal and the background noise spectrum distribution parameter, and transmits the determined update rate to the PSC module; The update rate of the background noise parameter update module is used to update the noise parameters, and the signals are initially classified according to the subband energy parameters and the updated noise parameters, and the received speech signal is determined as a useful signal type or a non-useful signal type.

该声音信号分类装置进一步可以包括：信号分类判决模块；则PSC模块还将确定的信号类型传送给信号分类判决模块；信号分类判决模块基于开环基音参数、导谱频率参数和子带能量参数，或者基于导谱频率参数和子带能量参数，确定有用信号的类型，所述类型包括语音信号和音乐信号。The sound signal classification device can further include: a signal classification judgment module; then the PSC module also transmits the determined signal type to the signal classification judgment module; the signal classification judgment module is based on the open-loop pitch parameter, the guide spectrum frequency parameter and the sub-band energy parameter, or Based on the guide spectrum frequency parameter and the subband energy parameter, the type of useful signal is determined, and the type includes speech signal and music signal.

该声音信号分类装置进一步还可以包括：分类参数提取模块；则PSC模块通过分类参数提取模块将确定的信号类型传送给所述信号分类判决模块；分类参数提取模块还用于获取包括导谱频率参数和子带能量参数，或者进一步获取开环基音参数，将获取的参数处理为信号分类特征参数传送给所述分类判决模块；以及根据将获取的参数处理为声音信号的频谱分布参数和背景噪声频谱分布参数，并将这些频谱分布参数传送给所述背景噪声参数更新模块；则分类判决模块根据上述信号分类特征参数和PSC模块确定的信号类型，确定有用信号的类型，所述类型包括语音信号和音乐信号。The sound signal classification device may further include: a classification parameter extraction module; then the PSC module transmits the determined signal type to the signal classification judgment module through the classification parameter extraction module; the classification parameter extraction module is also used to obtain the frequency parameter including the guide spectrum and sub-band energy parameters, or further obtain open-loop pitch parameters, process the obtained parameters as signal classification feature parameters and send them to the classification decision module; and process the obtained parameters as the spectral distribution parameters of the sound signal and the background noise spectral distribution Parameters, and these spectrum distribution parameters are sent to the described background noise parameter update module; then the classification judgment module determines the type of useful signal according to the signal type determined by the above-mentioned signal classification characteristic parameters and the PSC module, and the type includes voice signal and music Signal.

PSC模块进一步还可以用于将确定信号类型过程中计算的声音信号的信噪比传送给所述信号分类判决模块；信号分类判决模块进一步根据所述信噪比将有用信号确定为语音信号或音乐信号。The PSC module can further be used to transmit the signal-to-noise ratio of the sound signal calculated in the process of determining the signal type to the signal classification and judgment module; the signal classification and judgment module further determines the useful signal as a voice signal or music according to the signal-to-noise ratio Signal.

该声音信号分类装置进一步可以包括：编码器模式及速率选择模块；信号分类判决模块将确定的信号类型传送给所述编码器模式及速率选择模块；编码器模式及速率选择模块根据接收的所述信号类型确定声音信号的编码模式及速率。The sound signal classification device may further include: an encoder mode and rate selection module; a signal classification decision module transmits the determined signal type to the encoder mode and rate selection module; the encoder mode and rate selection module according to the received The signal type determines the encoding mode and rate of the audio signal.

该声音信号分类装置进一步可以包括：编码器参数提取模块，用于提取导谱频率参数和子带能量参数，或者进一步提取开环基音参数，并将提取的所述参数传送给所述分类参数提取模块，以及将提取的子带能量参数传送给PSC模块。The sound signal classification device may further include: an encoder parameter extraction module, used to extract the guide spectrum frequency parameter and sub-band energy parameter, or further extract the open-loop pitch parameter, and transmit the extracted parameters to the classification parameter extraction module , and transmit the extracted subband energy parameters to the PSC module.

以下通过一个具体实施例对本发明提供的声音信号分类检测方法和声音信号分类装置进行说明。The sound signal classification and detection method and the sound signal classification device provided by the present invention will be described below through a specific embodiment.

如图4所示，为本发明具体实施例基于的系统组成示意图。其中包括声音信号分类检测器(sound activity detector，SAD)它根据编码器的需要，将输入音频数字信号划分为不同的类，可分为非有用信号、语音和音乐三类，从而为编码器提供编码模式选择和速率选择的依据。As shown in FIG. 4 , it is a schematic diagram of system composition based on a specific embodiment of the present invention. Among them is the sound activity detector (sound activity detector, SAD), which divides the input audio digital signal into different categories according to the needs of the encoder, which can be divided into three categories: non-useful signal, voice and music, so as to provide the encoder with The basis for encoding mode selection and rate selection.

在图4中可以看出，SAD模块内部包括：背景噪声估计控制模块、信号初始分类模块、分类参数提取模块和信号分类判决模块共4个子模块。SAD作为编码器内部使用的信号分类器，为减少资源耗占及计算复杂度，将充分利用编码器自有的参数，所以通过编码器中的编码器参数提取模块计算子带能量参数及编码器参数，并将计算的参数提供给SAD模块。另外，SAD模块最终输出是信号判决类型，包括非有用信号、语音和音乐三类，提供给编码器模式和速率选择模块，供其选择编码器模式和速率。As can be seen in Figure 4, the SAD module includes four sub-modules: the background noise estimation control module, the signal initial classification module, the classification parameter extraction module and the signal classification judgment module. As the signal classifier used inside the encoder, SAD will make full use of the encoder's own parameters in order to reduce resource consumption and computational complexity. Therefore, the sub-band energy parameters and encoder parameters are calculated by the encoder parameter extraction module in the encoder. parameters, and provide the calculated parameters to the SAD module. In addition, the final output of the SAD module is the signal decision type, including three types of non-useful signals, speech and music, which are provided to the encoder mode and rate selection module for it to select the encoder mode and rate.

以下分别对编码器中与SAD相关的模块、SAD中的各个子模块，以及各个模块之间的交互过程进行详细说明。The modules related to the SAD in the encoder, each sub-module in the SAD, and the interaction process between the modules will be described in detail below.

编码器中的编码器参数提取模块计算子带能量参数及编码器参数，并将计算的参数提供给SAD模块。其中，子带能量参数的计算可以采用滤波器组滤波的方法，具体的子带数量根据计算复杂度要求和分类准确性要求确定，在本实施例中下述以分为12个子带进行说明。The encoder parameter extraction module in the encoder calculates sub-band energy parameters and encoder parameters, and provides the calculated parameters to the SAD module. Wherein, the calculation of the sub-band energy parameters can adopt the method of filter bank filtering, and the specific number of sub-bands is determined according to the requirements of computational complexity and classification accuracy. In this embodiment, it is divided into 12 sub-bands for illustration.

本实施例中，编码器参数提取模块计算各种SAD模块需要的参数的过程可以如图5或图6所示，In this embodiment, the process of calculating the parameters required by various SAD modules by the encoder parameter extraction module can be as shown in Figure 5 or Figure 6,

其中，图5所示的流程包括如下步骤：Wherein, the process shown in Figure 5 includes the following steps:

步骤501、编码器参数提取模块首先计算子带能量参数。Instep 501, the encoder parameter extraction module first calculates subband energy parameters.

步骤502、编码器参数提取模块根据来自PSC模块的信号初始判决结果(Vad_flag)决定是否需要进行导谱频率(ISF)运算，如果需要执行步骤503；否则执行步骤504。Step 502, the encoder parameter extraction module decides whether to perform ISF calculation according to the signal initial judgment result (Vad_flag) from the PSC module, and if so, executestep 503; otherwise, executestep 504.

本步骤中决定是否需要进行ISF运算包括：如果当前帧是非有用信号，则根据编码器的机制：如果编码器针对非有用信号的编码需要ISF参数，则进行ISF运算；若不需要，则编码器参数提取模块结束。如果当前帧是有用信号，则进行ISF运算。对于有用信号计算ISF参数，是大多数编码模式都需要的，因此不会给编码器带来冗余的复杂度。ISF参数计算的技术方案可以参考各种编码器的资料，在此不赘述。In this step, deciding whether to perform an ISF operation includes: if the current frame is a non-useful signal, then according to the mechanism of the encoder: if the encoder needs ISF parameters for the encoding of the non-useful signal, perform the ISF operation; if not, the encoder The parameter extraction module ends. If the current frame is a useful signal, perform ISF operation. The calculation of ISF parameters for useful signals is required by most coding modes, so it will not bring redundant complexity to the coder. For the technical solution of ISF parameter calculation, reference can be made to the materials of various encoders, and details will not be described here.

步骤503、编码器参数提取模块计算ISF参数，然后执行步骤504。Step 503, the encoder parameter extraction module calculates ISF parameters, and then executesstep 504.

步骤504、编码器参数提取模块计算开环基音参数。Step 504, the encoder parameter extraction module calculates the open-loop pitch parameters.

通过上述图5的流程计算出的子带能量参数提供给SAD中的PSC模块和分类参数提取模块，其余参数提供给SAD中的分类参数提取模块。The sub-band energy parameters calculated by the flow in FIG. 5 above are provided to the PSC module and classification parameter extraction module in the SAD, and the remaining parameters are provided to the classification parameter extraction module in the SAD.

图6所示的流程中，在图5流程的基础上，增加了根据初始噪声是否收敛来决定是否计算开环基音参数的步骤。其中，步骤601至步骤603与图5中的步骤501至步骤503基本相同，而在步骤604，判断初始化噪声参数，即噪声估计是否收敛，如果是则在步骤605计算开环基音参数；否则不计算开环基音参数。In the process shown in FIG. 6 , on the basis of the process in FIG. 5 , a step of determining whether to calculate the open-loop pitch parameters according to whether the initial noise converges is added. Wherein, steps 601 to 603 are basically the same assteps 501 to 503 in FIG. 5 , and in step 604, it is judged whether the initialization noise parameters, that is, whether the noise estimation converges, and if so, the open-loop pitch parameters are calculated in step 605; otherwise, no Computes the open-loop pitch parameters.

由于开环基音参数对于有的编码模式，如TCX编码模式，属于冗余的计算，为降低计算复杂度，在噪声估计收敛之后，基本可以确定信号对应的编码模式不需要计算开环基音参数，因此就不再计算开环基音参数。Since the open-loop pitch parameters are redundant calculations for some coding modes, such as TCX coding mode, in order to reduce the computational complexity, after the noise estimation converges, it can basically be determined that the coding mode corresponding to the signal does not need to calculate the open-loop pitch parameters. Therefore, the open-loop pitch parameters are no longer calculated.

在噪声估计收敛之前，为确保噪声估计能够收敛及其收敛速度，需要计算开环基音参数，但这属于启动阶段的计算，可以忽略其复杂度。开环基音参数计算的技术方案可以参考基于ACELP的编码，在此不赘述。判断噪声估计是否收敛的依据可以是连续判决为噪声帧的次数超过门限噪声收敛门限(THR1)，本实施例的一个示例中THR1值取20。Before the noise estimation converges, in order to ensure that the noise estimation can converge and its convergence speed, it is necessary to calculate the open-loop pitch parameters, but this belongs to the calculation of the start-up phase, and its complexity can be ignored. For the technical solution of open-loop pitch parameter calculation, reference may be made to ACELP-based coding, which will not be repeated here. The basis for judging whether the noise estimation is converged may be that the number of consecutive frames judged to be noise exceeds the threshold noise convergence threshold (THR1). In an example of this embodiment, the value of THR1 is 20.

上述提取出的子带能量参数为：level[i]。其中，i表示向量的成员索引，本实施例中取1...12，分别对应0-200hz，200-400hz，400-600hz，600-800hz，800-1200hz，1200-1600hz，1600-2000hz，2000-2400hz，2400-3200hz，3200-40000hz，4000-4800hz，4800-6400hz。The sub-band energy parameter extracted above is: level[i]. Among them, i represents the member index of the vector. In this embodiment, 1...12 is used, corresponding to 0-200hz, 200-400hz, 400-600hz, 600-800hz, 800-1200hz, 1200-1600hz, 1600-2000hz, 2000-2400hz, 2400-3200hz, 3200-40000hz, 4000-4800hz, 4800-6400hz.

上述提取出的ISF参数为：Is_n[i]，其中，n表示帧索引，i取1...16表示向量中成员索引。The ISF parameters extracted above are: Is_n [i], where n represents the frame index, and i takes 1...16 to represent the member index in the vector.

上述提取出的开环基音参数包括：The open-loop pitch parameters extracted above include:

开环基因增益(open_loop pitch gain，ol_gain)和开环基因延迟(open_loop pitch lag，ol_lag)，以及音调标志(tone_flag)。其中，如果ol_gain的值大于音调门限(TONE_THR)，则音调标志tone_flag设为1。Open loop pitch gain (open_loop pitch gain, ol_gain) and open loop pitch lag (open_loop pitch lag, ol_lag), and tone flag (tone_flag). Wherein, if the value of ol_gain is greater than the tone threshold (TONE_THR), the tone flag tone_flag is set to 1.

信号初始分类模块(PSC)可以采用各种已有的VAD算法方案来实现，具体包括背景噪声估计子模块、计算信噪比子模块、有用信号估计子模块、判决阈值调整字模块、比较子模块、拖尾保护有用信号子模块。本实施例中，如图7所示，PSC模块的具体实现也可以与现有的VAD算法模块有以下三点不同：The signal initial classification module (PSC) can be realized by various existing VAD algorithm schemes, specifically including the background noise estimation sub-module, the calculation signal-to-noise ratio sub-module, the useful signal estimation sub-module, the decision threshold adjustment word module, and the comparison sub-module , Tailing protection useful signal sub-module. In this embodiment, as shown in Figure 7, the concrete realization of PSC module also can have following three points difference with existing VAD algorithm module:

I、计算信噪比子模块根据该参数和子带能量参数计算信噪比，计算出的信噪比参数(snr)除在PSC模块内部使用外，还将该snr参数传送给信号分类判决模块，以使得信号分类判决模块在低信噪比条件下对语音和音乐的区分也更加准确。1, calculate signal-to-noise ratio sub-module calculates signal-to-noise ratio according to this parameter and sub-band energy parameter, the calculated signal-to-noise ratio parameter (snr) is except used in PSC module interior, also this snr parameter is sent to signal classification decision module, In order to make the signal classification and decision module distinguish between voice and music more accurately under the condition of low signal-to-noise ratio.

II、由于现有的VAD对噪声和某些种类的音乐的区分不够理想，本实施例对VAD进行了以下改进：首先背景噪声参数的计算由背景噪声参数更新模块提供的更新速率acc来控制。由背景噪声估计子模块接收来自背景噪声参数更新模块的更新速率，对噪声参数进行更新，并将根据更新后的噪声参数计算的背景噪声子带能量估计参数传送给计算信噪比子模块。具体对更新速率的计算参见后续对背景噪声参数更新模块的说明，在本实施例的一个示例中，更新速率可以取4个档：acc1，acc2，acc3，acc4。对于不同的更新速率，确定不同的向上更新参数(update_up)和向下更新参数(update_down)，update_up及update_down分别对应背景噪声向上及向下的更新速率。II. Since the existing VAD is not ideal for distinguishing noise from certain types of music, this embodiment makes the following improvements to the VAD: first, the calculation of the background noise parameter is controlled by the update rate acc provided by the background noise parameter update module. The background noise estimation sub-module receives the update rate from the background noise parameter update module, updates the noise parameters, and transmits the background noise sub-band energy estimation parameters calculated according to the updated noise parameters to the signal-to-noise ratio calculation sub-module. For the specific calculation of the update rate, refer to the subsequent description of the background noise parameter update module. In an example of this embodiment, the update rate can take four levels: acc1, acc2, acc3, and acc4. For different update rates, different up update parameters (update_up) and down update parameters (update_down) are determined, and update_up and update_down respectively correspond to the up and down update rates of the background noise.

然后噪声参数更新的方案具体可采用AMR_WB+中的方案：Then the noise parameter update scheme can specifically adopt the scheme in AMR_WB+:

If(bckr_est_m[n]<level_m-1[n])If(bckr_est_m [n]<level_m-1 [n])

update＝update_upupdate=update_up

elseelse

update＝update_downupdate=update_down

则噪声估计更新的公式为：Then the formula for updating the noise estimate is:

bckr_est_m+1[n]＝(1-update)*bckr_est_m[n]+update*level_m-1[n]bckr_est_m+1 [n]=(1-update)*bckr_est_m [n]+update*level_m-1 [n]

则噪声频谱分布参数向量更新的公式为：Then the formula for updating the parameter vector of the noise spectrum distribution is:

${\overset{~ ~}{p p}}_{m m + + 11} [[i i]] = = ((11 - - update update)) * * {\overset{~ ~}{p p}}_{m m} [[i i]] + + update update * * {p p}_{m m} [[i i]]$

其中，in,

m：帧索引m: frame index

n：子带索引n: subband index

i：频谱分布参数向量的元素索引，i＝　1，2，3，4i: element index of spectrum distribution parameter vector, i=1, 2, 3, 4

bckr_est：背景噪声估计子带能量bckr_est: background noise estimation subband energy

：背景噪声频谱分布参数向量估计

: Background noise spectral distribution parameter vector estimation

P：当前信号频谱分布参数向量P: current signal spectrum distribution parameter vector

III、在现有的VAD中，一般都通过拖尾来保护有用信号不被误判为噪声，拖尾的长短应在保护信号和提高传输效率两方面取一个折衷。对于传统的语音编码器，拖尾的长短可以经学习取一个常量。而对于多速率编码器，面向的是包括音乐的音频信号，这类信号经常出现较长的低能量的拖尾，常规VAD较难将这部分拖尾检测出来，因此需要较长的拖尾对其进行保护。在实施例中，将托尾保护有用信号子模块中的拖尾长短设计为根据SAD信号判决结果自适应，如果判决出是音乐信号(SAD_flag＝MUSIC)则设置较长的拖尾参数(hang_len＝HANG_LONG)，如果判决出是语音信号(SAD_flag＝SPEECH)，则设置较短的拖尾参数(hang_len＝HANG_SHORT)，具体设置方式如下：III. In the existing VAD, tailing is generally used to protect useful signals from being misjudged as noise. The length of tailing should be a compromise between protecting the signal and improving transmission efficiency. For traditional speech encoders, the length of the tail can be a constant through learning. For multi-rate encoders, it is aimed at audio signals including music. Such signals often have long and low-energy smears. It is difficult for conventional VAD to detect this part of the smear, so a long smear is required. It protects. In an embodiment, the length of the hangover in the useful signal submodule of the hangover protection is designed to be adaptive according to the SAD signal judgment result, if it is determined that it is a music signal (SAD_flag=MUSIC), a longer hangover parameter (hang_len=MUSIC) is set HANG_LONG), if it is determined that it is a voice signal (SAD_flag=SPEECH), then set a shorter trailing parameter (hang_len=HANG_SHORT), the specific setting method is as follows:

If(SAD_flag＝MUSIC)If(SAD_flag=MUSIC)

hang_len＝HANG_LONGhang_len=HANG_LONG

else if(SAD_flag＝SPEECH)else if(SAD_flag=SPEECH)

hang_len＝HANG_SHORThang_len=HANG_SHORT

elseelse

hang_len＝0hang_len=0

其中：in:

SAD_flag SAD判决标志SAD_flag SAD judgment flag

hang_len 拖尾保护长度hang_len trailing protection length

本实施例的一个示例中，HANG_LONG＝100，HANG_SHORT＝20，单位可以是帧数。In an example of this embodiment, HANG_LONG=100, HANG_SHORT=20, and the unit may be the number of frames.

分类参数提取模块用于根据信号初始分类模块确定的Vad_flag参数和编码器参数提取模块提供的子带能量参数、ISF参数、开环基音参数计算信号分类判决模块和背景噪声参数更新模块需要的参数，以及将子带能量参数、ISF参数、开环基音参数和计算出的参数对应提供给信号分类判决模块和背景噪声参数。分类参数提取模块计算出的参数包括：The classification parameter extraction module is used to calculate the parameters required by the signal classification judgment module and the background noise parameter update module according to the Vad_flag parameter determined by the signal initial classification module and the subband energy parameter, ISF parameter, and open-loop pitch parameter provided by the encoder parameter extraction module, And correspondingly provide the subband energy parameters, ISF parameters, open-loop pitch parameters and the calculated parameters to the signal classification and judgment module and the background noise parameters. The parameters calculated by the classification parameter extraction module include:

1、基音参数(pitch)1. Pitch

比较连续的开环基音延迟的差值，如果开环基音延迟的增量小于设定的阈值，则延迟计数累加；如果连续两帧的延迟计数之和足够大，则设置pitch＝1，否则pitch＝0。开环基音延迟的计算公式可参见AMR-WB+/AMR-WB标准文档。Compare the difference between consecutive open-loop pitch delays, if the increment of the open-loop pitch delay is less than the set threshold, the delay count is accumulated; if the sum of the delay counts of two consecutive frames is large enough, set pitch=1, otherwise pitch =0. For the calculation formula of the open-loop pitch delay, please refer to the AMR-WB+/AMR-WB standard document.

2、长时信号相关值参数(meangain)2. Long-term signal correlation value parameter (meangain)

meangain是相邻三帧音调tone的滑动平均，其中tone＝1000*tone_fig；tone_fig定义与AMR-WB+中的相同。meangain is the moving average of the tone of three adjacent frames, where tone=1000*tone_fig; the definition of tone_fig is the same as that in AMR-WB+.

3、过零率(zcr)3. Zero-crossing rate (zcr)

$zcr zcr = = \frac{11}{T T} {Σ Σ}_{i i - - 11}^{T T - - 11} II II {{x x ((i i)) x x ((i i - - 11)) < < 00}}$

II{A}在当A是truth是1，当是false时为0。II{A} is 1 when A is true and 0 when it is false.

4、子带能量时域波动(t_flux)4. Time-domain fluctuation of sub-band energy (t_flux)

$t t__flux flux = = \frac{{Σ Σ}_{i i = = 11}^{1212} | | {level level}_{m m} ((i i)) - - {level level}_{m m - - 11} ((i i)) | |}{short short__mean mean__level level__energy 能源}$

其中short_mean_level_energy表示短时平均能量Where short_mean_level_energy represents the short-term average energy

5、高低子带能量比(ra)5. High and low sub-band energy ratio (ra)

$ra ra = = \frac{sublevel sublevel__high high__energy 能源}{sublevel sublevel__low low__energy 能源}$

其中，本专利发明的一个实例：Among them, an example of the invention of this patent:

sublevel_high_energy＝level[10]+level[11]；sublevel_high_energy=level[10]+level[11];

sublevel_low_energy＝level[0]+level[1]+level[2]+level[3]+level[4]+level[5]+level[6]+level[7]+level[8]+level[9]；sublevel_low_energy＝level[0]+level[1]+level[2]+level[3]+level[4]+level[5]+level[6]+level[7]+level[8]+level[9 ];

6、子带能量频域波动(f_flux)6. Subband energy frequency domain fluctuation (f_flux)

$f f__flux flux = = \frac{{Σ Σ}_{i i = = 22}^{1212} | | {level level}_{m m} ((i i)) - - {level level}_{m m} ((i i - - 11)) | |}{short short__mean mean__level level__energy 能源}$

7、导谱距离短时平均(isf_meanSD)：为五个相邻帧导谱距离Isf_SD的平均值，其中7. Short-term mean of guide distance (isf_meanSD): It is the average value of guide distance Isf_SD of five adjacent frames, where

$Isf Isf__SD SD = = {Σ Σ}_{i i = = 11}^{1616} | | {Isf Isf}_{m m} ((i i)) - - {Isf Isf}_{m m - - 11} ((i i)) | |$

8、子带能量标准差平均参数(level_meanSD)，表示两个相邻帧子带能量标准差(level_SD)的平均值，level_SD参数的计算方法参考上述Isf_SD的计算方法。8. The subband energy standard deviation average parameter (level_meanSD), indicating the average value of the subband energy standard deviation (level_SD) of two adjacent frames. The calculation method of the level_SD parameter refers to the calculation method of the above-mentioned Isf_SD.

上述8个参数中，提供给背景噪声参数更新模块的参数包括：zcr、ra、f_flux和t_flux。提供给信号分类判决模块的参数包括：pitch、meangain、isf_meanSD和level_meanSD。Among the above 8 parameters, the parameters provided to the background noise parameter update module include: zcr, ra, f_flux and t_flux. The parameters provided to the signal classification and decision module include: pitch, meangain, isf_meanSD and level_meanSD.

信号分类判决模块用于根据来自信号初始分类模块PSC的snr、Vad_flag，以及来自分类参数提取模块的子带能量参数、pitch、meangain、Isf_meanSD、level_meanSD将信号最终区分为：非有用信号(NOISE)、语音信号(SPEECH)和音乐信号(MUSIC)。信号分类判决模块中可以包括：参数更新子模块和判决子模块；所述参数更新子模块用于根据所述信噪比更新信号分类判决过程中的门限，并将更新后的门限提供给所述判决子模块；所述判决子模块用于接收来自PSC模块的声音信号类型，并对其中的有用信号基于开环基音参数、导谱频率参数、子带能量参数和所述更新后的门限，或者基于导谱频率参数和子带能量参数和所述更新后的门限，确定所述有用信号的类型，并发送所确定的有用信号的类型到编码器模式及速率选择模块。The signal classification decision module is used to finally distinguish the signal into: non-useful signal (NOISE), Speech signal (SPEECH) and music signal (MUSIC). The signal classification judgment module may include: a parameter update submodule and a judgment submodule; the parameter update submodule is used to update the threshold in the signal classification judgment process according to the signal-to-noise ratio, and provide the updated threshold to the Judgment sub-module; the judgment sub-module is used to receive the sound signal type from the PSC module, and based on the open-loop pitch parameter, guide spectrum frequency parameter, sub-band energy parameter and the updated threshold for the useful signal therein, or Determine the type of the useful signal based on the frequency parameter of the guide spectrum, the energy parameter of the subband and the updated threshold, and send the determined type of the useful signal to the encoder mode and rate selection module.

将有用信号确定为语音信号或音乐信号包括：首先设置语音标识位的值和音乐标识位的值均为0，然后根据基音参数标识、长时信号相关值、导谱距离短时平均参数和子带能量子标准差平均参数将信号初步确定为语音类型、音乐类型或不确定类型，并根据初步确定出的语音类型或音乐类型对应修改语音标识位或音乐标识位的值；再根据子带能量、长时信号相关值、子带能量子标准差平均参数、speech_flag、music_flag、pitch值为1的连续帧数是否超过预先设置的拖尾帧数门限、连续的音乐帧数、连续的语音帧数，以及上一帧的类型，对初步确定出的所述语音类型、音乐类型或不确定类型进行修正，确定有用信号的类型，所述类型包括语音信号和音乐信号。Determining the useful signal as a speech signal or a music signal includes: first setting the value of the speech flag and the value of the music flag to be 0, and then according to the pitch parameter logo, long-term signal correlation value, guide spectrum distance short-term average parameter and subband The energy quantum standard deviation average parameter preliminarily determines the signal as a voice type, music type or uncertain type, and modifies the value of the voice flag or music flag correspondingly according to the initially determined speech type or music type; then according to the subband energy, Whether the long-term signal correlation value, subband energy quantum standard deviation average parameter, speech_flag, music_flag, and the number of consecutive frames with a pitch value of 1 exceed the preset trailing frame number threshold, the number of continuous music frames, and the number of continuous voice frames, As well as the type of the last frame, modify the preliminarily determined speech type, music type or uncertain type to determine the type of useful signal, and the type includes speech signal and music signal.

以下再对将有用信号确定为语音信号或音乐信号的具体流程进行说明：The specific process of determining the useful signal as a speech signal or a music signal is described below:

为保证信号判决的稳定及避免频繁的判决结果的转换，本实施例提供了参数的标志拖尾机制，包括对pitch_flag、level_meanSD_high_flag、ISF_meanSD_high_flag、ISF_meanSD_low_flag、level_meanSD_low_flag、meangain_flag这些特征参数值的确定根据拖尾机制进行，这些特征参数值的具体确定如图8所示。In order to ensure the stability of signal judgment and avoid frequent conversion of judgment results, this embodiment provides a parameter flag tailing mechanism, including determining the characteristic parameter values of pitch_flag, level_meanSD_high_flag, ISF_meanSD_high_flag, ISF_meanSD_low_flag, level_meanSD_low_flag, and meangain_flag according to the tailing mechanism The specific determination of these characteristic parameter values is shown in Figure 8.

图8中的拖尾期间的长度根据拖尾参数标识值确定，本实施例中提供了两种拖尾设置，即确定拖尾参数标识值的方案：The length of the trailing period in Fig. 8 is determined according to the trailing parameter identification value, and two kinds of trailing settings are provided in the present embodiment, that is, a scheme for determining the trailing parameter identification value:

第一种拖尾设置方案中，当参数值高于或低于一定门限时，对应的参数拖尾计数器值加一；否则对应的参数拖尾计数器值设置为0，并根据参数拖尾计数器的值设定不同的参数拖尾标识。其中，参数拖尾计数器的值越大，参数拖尾标识值的长度越长，具体在根据参数计数器设置参数拖尾标识值时根据实际情况确定，这里不再赘述。In the first kind of tailing setting scheme, when the parameter value is higher or lower than a certain threshold, the value of the corresponding parameter tailing counter is increased by one; otherwise, the value of the corresponding parameter tailing counter is set to 0, and according to the Values set different parameter trailing flags. Wherein, the greater the value of the parameter trailing counter, the longer the length of the parameter trailing identification value, which is determined according to the actual situation when setting the parameter trailing identification value according to the parameter counter, and will not be repeated here.

第二种拖尾设置方案中，根据训练参数对应的决策树的各内部节点的错误率ER来控制拖尾长短，错误率小的参数，拖尾短；错误率大的参数，拖尾长。In the second tailing setting scheme, the length of the tailing is controlled according to the error rate ER of each internal node of the decision tree corresponding to the training parameters. The parameter with a small error rate has a short tail; the parameter with a large error rate has a long tail.

此后，如果当前的信号分类为有用信号，进行语音和音乐的初始分类：Thereafter, if the current signal is classified as a useful signal, the initial classification of speech and music is performed:

首先进行语音初始判决，如图9所示，在步骤901设置语音标识位＝0，然后在步骤902，判断Isf_meanSD是否大于预先设定的第一导谱语音门限(例如为1500)，如果是则设置语音标识位的值为1；否则，At first carry out voice initial judgment, as shown in Figure 9, set voice flag=0 instep 901, then instep 902, judge whether Isf_meanSD is greater than the first guide spectrum voice threshold (such as being 1500) of preset, if so Set the value of the voice identification bit to 1; otherwise,

在步骤903，判断是否pitch值为1，并且开关基音搜索获得的基音延迟值t_top_mean小于基音语音门限(例如为40)，如果是，则设置语音标识位的值为1；否则，Instep 903, it is judged whether the pitch value is 1, and the pitch delay value t_top_mean obtained by switching the pitch search is less than the pitch voice threshold (for example, 40), if so, the value of the voice flag is set to 1; otherwise,

在步骤904，判断pitch值为1的连续帧数是否超过预先设置的拖尾帧数门限(例如为2帧)，如果是，则设置语音标识位的值为1；否则，Instep 904, it is judged whether the continuous frame number whose pitch value is 1 exceeds the preset trailing frame number threshold (for example, 2 frames), if so, the value of the voice flag is set to 1; otherwise,

在步骤905，判断meangain是否大于预先设定的长时相关语音门限(例如为8000)，如果是，则设置语音标识位的值为1；否则，Instep 905, it is judged whether meangain is greater than the preset long-term relevant voice threshold (for example, 8000), if yes, then the value of the voice flag is set to 1; otherwise,

在步骤906，判断level_meanSD_high_flag和ISF_meanSD_high_fiag中是否有一个或两个的值为1，如果是，则设置语音标识位的值为1；否则不更改语音标识位的值。Instep 906, it is judged whether one or both of level_meanSD_high_flag and ISF_meanSD_high_fiag have a value of 1, if yes, the value of the voice flag is set to 1; otherwise, the value of the voice flag is not changed.

然后，进行音乐初始判决，具体如图10所示：Then, the initial music judgment is performed, as shown in Figure 10:

在步骤1001，首先将音乐标识位设置为0，然后在步骤1002，判断信号同时满足标志ISF_meanSD_low_flag＝1和level_meanSD_low_flag＝1，如果是则设置音乐信号标志music_flag；否则，不更改音乐标识位的值。In step 1001, at first music identification bit is set to 0, then in step 1002, judging signal satisfies sign ISF_meanSD_low_flag=1 and level_meanSD_low_flag=1 simultaneously, if then set music signal sign music_flag; Otherwise, do not change the value of music identification bit.

此后，如图11所示，对初始判决结果进行修正：Thereafter, as shown in Figure 11, the initial judgment result is revised:

首先在步骤1101、判断是否子带的即时能量小于子带能量门限(例如为5000)，如果是则执行步骤1102；否则将信号确定为不确定类(UNCERTAIN)；First instep 1101, it is judged whether the instant energy of the subband is less than the subband energy threshold (for example, 5000), if so, then step 1102 is performed; otherwise, the signal is determined as an uncertain class (UNCERTAIN);

在步骤1102，判断是否meangain_fiag＝1，并且音乐持续计数器小于音乐持续计数语音判断门限(例如为3)，如果是则将信号确定为语音信号；否则，Instep 1102, it is judged whether meangain_fiag=1, and the music continuation counter is less than the music continuation counting voice judgment threshold (for example, 3), if so, the signal is determined to be a voice signal; otherwise,

在步骤1103，判断ISF_meanSD的值大于预先设定的第二导谱语音门限(例如为2000)，如果是则将信号确定为语音信号；否则，Instep 1103, it is judged that the value of ISF_meanSD is greater than the second preset voice threshold (for example, 2000), if so, the signal is determined to be a voice signal; otherwise,

在步骤1104，判断是否level_energy小于10000，并且之前判决为噪声的帧数超过了五帧，如果是，则将当前的信号类别置为不确定类，这是为了降低将噪声归为音乐类的误判；否则，Instep 1104, it is judged whether the level_energy is less than 10000, and the number of frames judged to be noise before exceeds five frames, if so, then the current signal category is set as an uncertain category, which is to reduce the error of classifying noise as music sentence; otherwise,

在步骤1105，判断是否音乐标识位和语音标识位的值均为1，如果是，则将当前信号类别确定位不确定类；否则，Instep 1105, it is judged whether the value of the music flag and the voice flag is 1, if yes, then the current signal category is determined as an uncertain class; otherwise,

在步骤1106，判断是否音乐标识位和语音标识位的值均为0，如果是，则将当前信号类别确定位不确定类；否则，Instep 1106, it is judged whether the value of the music flag and the voice flag is 0, if yes, then the current signal category is determined as an uncertain class; otherwise,

在步骤1107，判断是否音乐标识位为0，语音标识位为1，如果是，则将当前信号类型确定为语音类；否则，Instep 1107, it is judged whether the music identification bit is 0, the voice identification bit is 1, if yes, then the current signal type is determined as the voice class; otherwise,

在步骤1108，由于音乐标识位为1，语音标识位为0，将当前信号类型确定为音乐类。Instep 1108, since the music flag is 1 and the voice flag is 0, the current signal type is determined as music.

在上述步骤1104、1105即步骤1106中确定出信号为不确定类后，执行步骤1109：判断是否pitch_flag＝1，并且ISF_meanSD小于导谱音乐门限(例如为900)，并且连续的语音帧数小于3，如果是，则将信号确定为音乐类；否则，将信号仍确定为不确定类；After the above-mentionedsteps 1104, 1105, that is, instep 1106, determine that the signal is an uncertain class, execute step 1109: judge whether pitch_flag=1, and ISF_meanSD is less than the guide music threshold (for example, 900), and the number of continuous speech frames is less than 3 , if yes, determine the signal as music; otherwise, determine the signal as uncertain;

而在上述步骤1103和步骤1107将信号确定为语音类后，执行步骤1110：是否连续的音乐帧数大于3，并且ISF_meanSD小于导谱音乐门限，如果是，则将信号确定为音乐信号；否则，将信号确定为语音信号。After the above-mentionedsteps 1103 and 1107 determine the signal as a voice class, perform step 1110: whether the continuous music frame number is greater than 3, and the ISF_meanSD is less than the guide spectrum music threshold, if so, then the signal is determined as a music signal; otherwise, Identify the signal as a speech signal.

在通过上述流程确定出语音信号和音乐信号后，对于仍然处于不确定类的信号，执行图12所示的流程，进行初步修正分类，包括：首先在步骤1201判断level_energy是否小于子带能量不确定类门限(例如为5000)，如果是，仍将信号类型确定为不确定类；否则，在步骤1202，判断是否音乐的持续帧数大于1并且ISF_meanSD小于导谱音乐门限，如果是，将信号确定为音乐类；否则：After the speech signal and music signal are determined through the above-mentioned process, for the signal that is still in the uncertain category, the process shown in Figure 12 is executed to perform a preliminary correction classification, including: firstly, instep 1201, it is judged whether the level_energy is less than the sub-band energy uncertainty Class threshold (for example being 5000), if yes, signal type is still determined as uncertain class; Otherwise, instep 1202, judge whether the continuous frame number of music is greater than 1 and ISF_meanSD is less than guide spectrum music threshold, if yes, signal is determined for Music; otherwise:

对语音和音乐拖尾标志清零，如果本帧之前为连续的语音类，且连续性较强，那么根据语音的特征参数对语音进行判决，若满足语音条件，那么设置语音拖尾标志speech_hangover_flag＝1，具体包括图12中的步骤1203至步骤1206；如果本帧之前为连续的音乐类，且连续性较强，那么根据音乐的特征参数对音乐进行判决，若满足音乐条件，那么设置音乐拖尾的标志music_hangover_flag＝1，具体包括图12中的步骤1207至步骤1210。Voice and music hangover flags are cleared, if the frame is continuous speech before, and the continuity is strong, then the voice is judged according to the characteristic parameters of the voice, if the voice condition is satisfied, the voice hangover flag is set speech_hangover_flag= 1, specifically includingstep 1203 to step 1206 in Figure 12; if the frame is continuous music before, and the continuity is strong, then judge the music according to the characteristic parameters of the music, if the music condition is met, then set the music drag The tail flag music_hangover_flag=1, specifically includessteps 1207 to 1210 in FIG. 12 .

此后，如图12中的步骤1211至步骤1216所示，如果语音拖尾标志为1，音乐拖尾标志为0，将当前的信号类别置为语音类；如果音乐拖尾标志为1，语音拖尾标志为0，则将当前的信号类别置为音乐类；如果音乐拖尾标志和音乐拖尾标志同时为1或同时为0，将信号类别设为不确定类，这时如果之前音乐的连续性超过了20帧，将信号确定为音乐类，如果之前语音的连续性超过了20帧，将信号确定为语音类。Afterwards, as shown instep 1211 to step 1216 among Fig. 12, if voice trailing sign is 1, music trailing sign is 0, current signal category is set as voice class; If music trailing sign is 1, voice trailing If the tail flag is 0, the current signal category is set as the music category; if the music tail flag and the music tail flag are both 1 or 0 at the same time, the signal category is set as an uncertain class. If the continuity of the previous speech exceeds 20 frames, the signal is determined as the music class, and if the continuity of the previous speech exceeds 20 frames, the signal is determined as the speech class.

在经过上述初步修正后，在图13中对有用信号类型进行最终修正，继续根据当前的语境进行类别的修正，在步骤1301，如果当前的语境为音乐，且持续性很强，超过了3秒，即当前连续的音乐帧数超过了150帧，那么可根据ISF_meanSD的值进行强制修正，确定音乐信号。在步骤1302，如果当前的语境为语音，并且持续性很强，超过了3秒，即当前连续的语音帧数超过了150帧，那么可根据ISF_meanSD的值进行强制修正，确定语音信号类型；此后如果信号类别还为不确定类，那么在步骤1303根据之前的语境对信号类别进行修正，即将当前不确定的信号类别归纳为之前的信号类别。After the above-mentioned preliminary correction, the useful signal type is finally corrected in Fig. 13, and the correction of the category is continued according to the current context. Instep 1301, if the current context is music, and the persistence is strong, exceeding 3 seconds, that is, the current continuous number of music frames exceeds 150 frames, then a forced correction can be made according to the value of ISF_meanSD to determine the music signal. Instep 1302, if the current context is speech, and the persistence is very strong, exceeding 3 seconds, that is, the current continuous speech frame number exceeds 150 frames, then the forced correction can be performed according to the value of ISF_meanSD to determine the speech signal type; Afterwards, if the signal category is still uncertain, then instep 1303, the signal category is corrected according to the previous context, that is, the current uncertain signal category is classified as the previous signal category.

在通过上述流程确定了有用信号的类别后，需要更新三个类别计数器和更新信号类别判决模块中的各门限值。对于三个类别计数器，如果当前分类为音乐signal_sort＝music，则音乐计数器music_countinue_counter增加1，否则清零；其它类别计数器的处理类似，如图14所示，这里不再详述。而门限值根据信号初始分类模块输出的信噪比大小来更新，在实施例中列举的各门限示例是在20db信噪比条件下学习得到的值。After the category of the useful signal is determined through the above process, it is necessary to update the three category counters and update the threshold values in the signal category judging module. For the three category counters, if the current category is music signal_sort=music, the music counter music_countinue_counter is incremented by 1, otherwise it is cleared; the processing of other category counters is similar, as shown in FIG. 14 , and will not be described in detail here. The threshold value is updated according to the signal-to-noise ratio output by the signal initial classification module, and each threshold example listed in the embodiment is a value learned under the condition of a 20db signal-to-noise ratio.

背景噪声参数更新模块利用SAD中分类参数提取模块中计算出的一些频谱分布参数，来控制背景噪声的更新速率。由于在实际应用环境可能出现背景噪声的能量水平突然提高的情况，这时易出现背景噪声估计因信号持续被判为有用信号而一直不能更新的状态，背景噪声参数更新模块的设置即解决了该问题。The background noise parameter update module uses some spectrum distribution parameters calculated in the classification parameter extraction module in the SAD to control the update rate of the background noise. Since the energy level of the background noise may suddenly increase in the actual application environment, it is easy to appear that the background noise estimate cannot be updated because the signal is continuously judged as a useful signal. The setting of the background noise parameter update module solves this problem. question.

该背景噪声参数更新模块根据来自分类参数提取模块中的参数，计算的有关频谱分布参数向量包含以下元素：According to the parameters from the classification parameter extraction module, the background noise parameter update module calculates the relevant spectrum distribution parameter vector to include the following elements:

过零率zcr的短时平均Short-term average of zero-crossing rate zcr

高低子带能量比ra的短时平均Short-time average of energy ratio ra between high and low subbands

子带能量频域波动f_flux的短时平均Short-term average of subband energy frequency domain fluctuation f_flux

子带能量时域波动t_flux的短时平均Short-term averaging of sub-band energy time-domain fluctuation t_flux

其中，zcr_mean短时平均的计算方法如下，其它类似：Among them, the calculation method of zcr_mean short-term average is as follows, and others are similar:

zcr_mean_m＝ALPHA□zcr_mean_m-1+(1-ALPHA)□zcr_mzcr_mean_m = ALPHA zcr_mean_m-1 + (1-ALPHA) zcr_m

其中ALPHA＝0.96，m表示帧索引。Wherein ALPHA=0.96, m represents the frame index.

本实施例利用了背景噪声的频谱特性较为稳定的特点，其中频谱分布参数向量的成员可不限于以上列出的4个。当前背景噪声的更新速率由当前频谱分布参数与背景噪声频谱分布参数估计之间的差异d_cb来控制。该差异可以通过欧式距离、Manhattan距离等算法来实现。本专利的一个发明实例采用Manhattan距离(一种距离计算方式的命名，类似于欧式距离)，即：This embodiment takes advantage of the fact that the spectrum characteristics of the background noise are relatively stable, and the members of the spectrum distribution parameter vector are not limited to the four listed above. The update rate of the current background noise is controlled by the difference d_cb between the current spectral distribution parameter and the background noise spectral distribution parameter estimate. The difference can be realized by algorithms such as Euclidean distance and Manhattan distance. An inventive example of this patent adopts Manhattan distance (the name of a distance calculation method, similar to Euclidean distance), namely:

${d d}_{cb cb} = = {Σ Σ}_{i i = = 11}^{44} | | p p ((i i)) - - \overset{~ ~}{p p} ((i i)) | |$

其中，p是当前信号的频谱分布参数向量，

是背景噪声频谱分布参数向量估计。Among them, p is the spectrum distribution parameter vector of the current signal,

is the parameter vector estimate of the background noise spectral distribution.

在本实施例的一个示例中，当d_cb<TH1时，模块输出更新速率acc1，代表最快更新速率；否则，当d_cb<TH2时，输出更新速率acc2；否则，当d_cb<TH3时，输出更新速率acc3；否则，输出更新速率acc4。这里的TH1、TH2、TH3和TH4为更新门限，具体根据实际环境情况确定。In an example of this embodiment, when d_cb < TH1, the module outputs the update rate acc1, representing the fastest update rate; otherwise, when d_cb < TH2, the output update rate acc2; otherwise, when d_cb < TH3 , output update rate acc3; otherwise, output update rate acc4. Here, TH1, TH2, TH3 and TH4 are update thresholds, which are specifically determined according to actual environmental conditions.

以上是对本发明具体实施例的说明，在具体的实施过程中可对本发明的方法进行适当的改进，以适应具体情况的具体需要。因此可以理解，根据本发明的具体实施方式只是起示范作用，并不用以限制本发明的保护范围。The above is the description of the specific embodiments of the present invention, and the method of the present invention can be appropriately improved during the specific implementation process to meet the specific needs of specific situations. Therefore, it can be understood that the specific implementation manners according to the present invention are only exemplary, and are not intended to limit the protection scope of the present invention.

Claims

1. A method of classifying a sound signal, the method comprising:

A. receiving a sound signal, and determining the update rate of background noise according to the background noise spectrum distribution parameter and the sound signal spectrum distribution parameter;

B. and updating the noise parameters according to the updating rate, classifying the sound signals according to the sub-band energy parameters and the updated noise parameters, and classifying to obtain useful signals and non-useful signals.

2. The method of claim 1, wherein step B is followed by further comprising:

C. and determining the type of the useful signal obtained by the classification based on the open-loop pitch parameter, the guide spectrum frequency parameter and the sub-band energy parameter, wherein the type comprises a voice signal and a music signal.

3. The method of claim 2, wherein step C is preceded by the further step of:

c0, detecting whether the noise estimation is converged, if yes, executing the step C1; otherwise, executing the step C;

and C1, determining the type of the useful signal obtained by the classification based on the guide spectrum frequency parameter and the sub-band energy parameter, wherein the type comprises a speech signal and a music signal.

4. The method according to claim 3, wherein in step C0, it is detected whether the initial noise converges as: judging whether the number of continuous noise frames before the received sound signal exceeds a preset noise convergence threshold, if so, determining noise estimation convergence; otherwise, it is determined that the noise estimate does not converge.

5. The method according to claim 2, wherein said step B further obtains said determined type of useful signal, determines a signal tail length according to the type of useful signal, and further classifies said sound signal according to the signal tail length.

6. The method of claim 2, wherein step C comprises:

initializing a voice identification position and a music identification position, preliminarily determining the type of a useful signal comprising a voice type, a music type or an uncertain type according to a fundamental tone parameter identification, a long-term signal related parameter, a guide spectrum distance short-term average parameter and a sub-band energy standard deviation average parameter, and correspondingly modifying the voice identification position and the music identification position according to the preliminarily determined voice type and music type;

and correcting the preliminarily determined voice type, music type or uncertain type according to whether the number of continuous frames with the sub-band energy, long-term signal related parameters, sub-band energy standard deviation average parameters, voice identification bits, music identification bits and fundamental tone parameter identification value of 1 exceeds a preset tailing frame number threshold, continuous music frame number, continuous voice frame number and the type of the previous frame, and finally determining the type of the useful signal, including the voice signal and the music signal.

7. The method of claim 6, wherein the hangover frame number threshold is adjusted based on a signal-to-noise ratio of the audio signal.

8. The method of claim 1, wherein after step B, further comprising:

D. and determining the corresponding coding mode of the classified non-useful signals, and determining whether the pilot frequency parameters need to be calculated according to the determined coding mode.

9. The method of claim 1, wherein the noise parameters in step B comprise: a noise estimation parameter and a noise spectral distribution parameter.

10. The method according to claim 1 or 9, wherein the step a comprises: calculating a difference parameter between the sound signal spectral distribution parameter and the background noise spectral distribution parameter, and then determining an update rate according to the difference parameter.

11. The method of claim 10, wherein calculating the spectral distribution parameter to which the difference parameter relates comprises: the method comprises a zero crossing rate short-time average parameter, a high-low sub-band energy ratio short-time average parameter, a sub-band energy frequency domain fluctuation short-time average parameter and a sub-band energy time domain fluctuation short-time average parameter.

12. An apparatus for classifying a sound signal, the apparatus comprising: a background noise parameter updating module and a signal initial classification PSC module;

the background noise parameter updating module is used for determining the updating rate of the background noise according to the background noise spectrum distribution parameters and the spectrum distribution parameters of the current sound signal and sending the determined updating rate;

the PSC module is used for receiving the updating rate from the background noise parameter updating module, updating the noise parameters, classifying the current sound signals according to the sub-band energy parameters and the updated noise parameters, and sending the sound signal types determined by classification.

13. The apparatus of claim 12, further comprising: and the signal classification decision module is used for receiving the sound signal type from the PSC module, determining the type of the useful signal based on the open-loop pitch parameter, the guide spectrum frequency parameter and the sub-band energy parameter or the guide spectrum frequency parameter and the sub-band energy parameter, and transmitting the determined type of the useful signal.

14. The apparatus of claim 13, further comprising: the classification parameter extraction module is used for receiving the sound signal type from the PSC module and transmitting the sound signal type to the signal classification judgment module; acquiring a pilot frequency parameter and a sub-band energy parameter, or acquiring an open-loop pitch parameter, a pilot frequency parameter and a sub-band energy parameter, processing the acquired parameters into signal classification characteristic parameters, and transmitting the signal classification characteristic parameters to the signal classification judgment module; processing the acquired parameters into the frequency spectrum distribution parameters of the sound signals and the frequency spectrum distribution parameters of the background noise, and transmitting the frequency spectrum distribution parameters to the background noise parameter updating module;

the signal classification decision module determines the type of the useful signal according to the signal classification characteristic parameters and the type of the sound signal determined by the PSC module, wherein the type of the useful signal comprises a speech signal and a music signal.

15. The apparatus of claim 13 or 14, wherein the signal classification decision module comprises: a parameter updating submodule and a judgment submodule; the parameter updating submodule is used for updating the threshold in the signal classification judgment process according to the signal-to-noise ratio and providing the updated threshold to the judgment submodule;

the decision sub-module is used for receiving the sound signal type from the PSC module, determining the type of the useful signal based on the open-loop pitch parameter, the pilot frequency parameter, the sub-band energy parameter and the updated threshold or based on the pilot frequency parameter, the sub-band energy parameter and the updated threshold, and sending the determined type of the useful signal.

16. The apparatus of claim 13, further comprising: and the coder mode and rate selection module is used for receiving the type of the useful signal from the signal classification judgment module and determining the coding mode and the rate of the sound signal according to the type of the received useful signal.

17. The apparatus of claim 14, further comprising: and the encoder parameter extraction module is used for extracting sub-band energy parameters and transmitting the sub-band energy parameters to the classification parameter extraction module or extracting the sub-band energy parameters and the encoder parameters and transmitting the sub-band energy parameters and the encoder parameters to the classification parameter extraction module, and extracting the sub-band energy parameters and transmitting the sub-band energy parameters to the PSC module, wherein the encoder parameters comprise a guide spectrum frequency parameter and an open loop pitch parameter.