CN107068155A

Movatterモバイル変換

Info

Publication number: CN107068155A
Application number: CN201710051981.6A
Authority: CN
Inventors: 张涛; 唐伟; 丁碧云
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2017-01-23
Filing date: 2017-01-23
Publication date: 2017-08-18

Abstract

Translated fromChinese

一种基于方差和时域峰值的多级音频暂稳态判决方法，包括：对原始音频信号以1024个采样点为单位进行分帧；对每帧信号分别计算均值和方差，并与已设定的方差阈值进行比较，方差小于等于所述方差阈值的信号帧，则设定所述信号帧的判决标志为1，输出稳态帧标志；对方差大于所述方差阈值的信号帧，采用峰值检测算法进行判定；对峰值检测算法的判定结果进行判决，如果判决标志为1，则输出稳态帧标志，如果判决标志为0，则输出暂态帧标志。本发明利用信号的方差和时域峰值来对音频的暂稳态进行判决，得到的暂稳态信号用于进行自适应窗切换，对暂稳态的判决复杂度低且准确率高。在时间复杂度上更简单，并且可以避免检测能量较低的信号，提高了检测的准确率。

A multi-level audio transient steady state judgment method based on variance and time-domain peak, including: dividing the original audio signal into frames with 1024 sampling points as the unit; Variance threshold value is compared, variance is less than or equal to the signal frame of described variance threshold value, then set the decision flag of described signal frame to be 1, output steady-state frame sign; To the signal frame with variance greater than described variance threshold value, adopt peak detection Judgment is made by the algorithm; the judgment result of the peak detection algorithm is judged, if the judgment flag is 1, the steady state frame flag is output, and if the judgment flag is 0, the transient frame flag is output. The invention uses the variance of the signal and the peak value in the time domain to judge the transient state of the audio, and the obtained signal of the transient state is used for adaptive window switching, and the judgment on the transient state has low complexity and high accuracy. It is simpler in terms of time complexity, and can avoid detecting signals with lower energy, improving the accuracy of detection.

Description

Translated fromChinese

一种基于方差和时域峰值的多级音频暂稳态判决方法A Multi-level Audio Transient Judgment Method Based on Variance and Time Domain Peak

技术领域technical field

本发明涉及一种音频暂稳态判决方法。特别是涉及一种基于方差和时域峰值的多级音频暂稳态判决方法。The invention relates to an audio transient steady state judgment method. In particular, it relates to a multi-level audio transient steady state judgment method based on variance and time domain peak value.

背景技术Background technique

现有的大部分音频编码标准都是基于心理声学模型，利用人耳的掩蔽效应，对原始音频信号进行压缩编码，即所谓的音频感知编码。如世界范围内广泛使用的AC-3、AAC、MPEG-2以及拥有我国自主知识产权的AVS和DRA。在目前主流的音频编码标准中，一般采用窗函数的方法对信号进行处理，通过窗函数将信号分成一个一个的数据块，然后对每个数据块单独进行处理，通过量化、熵编码，形成最终的输出比特流。Most of the existing audio coding standards are based on psychoacoustic models, using the masking effect of the human ear to compress and code the original audio signal, which is the so-called audio perceptual coding. Such as AC-3, AAC, MPEG-2 widely used in the world, and AVS and DRA with my country's independent intellectual property rights. In the current mainstream audio coding standards, the method of window function is generally used to process the signal. The signal is divided into data blocks one by one through the window function, and then each data block is processed separately. Through quantization and entropy coding, the final output bitstream.

在基于分块的音频编码技术中，预回声一直是很难解决的问题。预回声产生的根本原因在于音频信号中存在的暂态信息，当它从时域变换到频域后，会存在大量的高频分量，在输出码率一定的情况下，必然会产生量化噪声，量化噪声经反变换到时域后会扩散，由于声音的前向掩蔽的作用时间非常短，将有部分噪声不能被掩蔽掉，往往造成在低能量采样段出现人耳能明显感知的噪声，从而严重影响信号的音质。In block-based audio coding technology, pre-echo has always been a difficult problem to solve. The root cause of the pre-echo is the transient information in the audio signal. When it is transformed from the time domain to the frequency domain, there will be a large number of high-frequency components. When the output bit rate is constant, quantization noise will inevitably occur. Quantization noise will diffuse after being inversely transformed into the time domain. Since the forward masking time of sound is very short, part of the noise cannot be masked, which often results in noise that can be clearly perceived by the human ear in the low-energy sampling section, thus Seriously affect the sound quality of the signal.

随着生活水平逐渐提高，人们对于数字音视频的要求也在不断的提高，而在预回声的产生在解码时表现为在人耳可识别的嚓嚓声，严重影响整个信号的音质，这与人们对声音质量不断提高的需求是完全相反的，因而难以被接受。另一方面，新的音频标准对于音频信号细节的描述也更加明显，这就要求更加精确的算法来区分信号的暂稳态。研究能够准确区分暂态、稳态信号，并且能够准确确定暂态位置与强度的算法，对于整个音频信号的编码过程具有十分重要的意义。With the gradual improvement of living standards, people's requirements for digital audio and video are also constantly improving, and the generation of pre-echo is manifested as a crackling sound that can be recognized by the human ear during decoding, which seriously affects the sound quality of the entire signal. People's demand for continuous improvement of sound quality is completely opposite, so it is difficult to accept. On the other hand, the new audio standard describes the details of the audio signal more clearly, which requires a more accurate algorithm to distinguish the transient state of the signal. It is of great significance for the entire audio signal encoding process to study algorithms that can accurately distinguish between transient and steady-state signals, and can accurately determine the position and intensity of transient states.

解决预回声的方法之一就是自适应窗切换技术，也就是在编码前对音频信号进行判决，对于不同类型的信号采用不同的窗函数。对于自适应窗切换技术，能够准确地检测瞬态信号是前提。所以能够准确区分暂态、稳态信号，并且能够准确确定暂态位置与强度的方法是十分有意义的。One of the methods to solve the pre-echo is the adaptive window switching technology, that is, the audio signal is judged before encoding, and different window functions are used for different types of signals. For the adaptive window switching technology, the ability to accurately detect transient signals is a prerequisite. Therefore, it is very meaningful to be able to accurately distinguish between transient and steady-state signals, and to accurately determine the position and intensity of the transient state.

在MPEG中采用了基于感知熵的瞬态信号检测方法，该方法的原理如下：如果信号是暂态的，变换后的频谱中就会包含有大量的高频分量。高频分量会导致信号的感知熵值增大。当感知熵的值大于一个阈值(MPEG系列中参考值为1800)时，就可以判定当前帧中包含了暂态分量，属于瞬态帧。In MPEG, a transient signal detection method based on perceptual entropy is adopted. The principle of this method is as follows: if the signal is transient, the converted frequency spectrum will contain a large number of high-frequency components. High-frequency components can lead to an increase in the perceptual entropy of the signal. When the value of perceptual entropy is greater than a threshold (the reference value in the MPEG series is 1800), it can be determined that the current frame contains a transient component and belongs to a transient frame.

AVS编码标准采用了基于时域能量和频域不可预测度的暂稳态检测算法对音频信号进行瞬态特性检测。该算法采用了时域能量和频域不可预测性作为判决指标。The AVS coding standard uses a transient state detection algorithm based on time-domain energy and frequency-domain unpredictability to detect transient characteristics of audio signals. The algorithm uses energy in time domain and unpredictability in frequency domain as decision indicators.

目前的暂稳态判决方法都存在着不足之处：基于感知熵的瞬态信号检测结果中会存在较多的冗余判决；其次，它的算法复杂，编码效率低；基于时域能量和频域不可预测度的暂稳态判决方法存在着较高能量的前一帧信号会影响下一帧信号的准确检测的问题，造成误检。The current transient state judgment methods all have shortcomings: there will be more redundant judgments in the transient signal detection results based on perceptual entropy; secondly, its algorithm is complex and the coding efficiency is low; based on time-domain energy and frequency The transient steady-state judgment method in the domain of unpredictability has the problem that the previous frame signal with higher energy will affect the accurate detection of the next frame signal, resulting in false detection.

发明内容Contents of the invention

本发明所要解决的技术问题是，提供一种可以避免检测能量较低的信号，提高检测准确率的基于方差和时域峰值的多级音频暂稳态判决方法。The technical problem to be solved by the present invention is to provide a multi-stage audio transient steady-state judgment method based on variance and time-domain peaks that can avoid detecting signals with low energy and improve detection accuracy.

本发明所采用的技术方案是：一种基于方差和时域峰值的多级音频暂稳态判决方法，包括如下步骤：The technical solution adopted in the present invention is: a multi-stage audio transient steady state judgment method based on variance and time-domain peak, comprising the following steps:

1)对原始音频信号以1024个采样点为单位进行分帧；1) Framing the original audio signal in units of 1024 sampling points;

2)对每帧信号分别计算均值和方差，并与已设定的方差阈值进行比较，方差小于等于所述方差阈值的信号帧，则设定所述信号帧的判决标志为1，输出稳态帧标志，否则进入下一步骤；2) Calculate the mean value and variance of each frame signal respectively, and compare with the set variance threshold, if the variance is less than or equal to the signal frame of the variance threshold, then set the judgment flag of the signal frame to 1, and output the steady state frame mark, otherwise go to the next step;

3)对方差大于所述方差阈值的信号帧，采用峰值检测算法进行判定；3) For signal frames whose variance is greater than the variance threshold, use a peak detection algorithm to determine;

4)对峰值检测算法的判定结果进行判决，如果判决标志为1，则输出稳态帧标志，如果判决标志为0，则输出暂态帧标志。4) Make a judgment on the judgment result of the peak detection algorithm, if the judgment flag is 1, then output the steady state frame flag, if the judgment flag is 0, then output the transient frame flag.

2.根据权利要求1所述的一种基于方差和时域峰值的多级音频暂稳态判决方法，其特征在于，步骤3)包括：2. a kind of multilevel audio frequency transient state judgment method based on variance and time domain peak value according to claim 1, it is characterized in that, step 3) comprises:

(1)对每一帧信号的1024个采样点，按照长度为256个采样点进行第一级块分割，得到4个数据块；(1) For the 1024 sampling points of each frame signal, the first-level block segmentation is performed according to the length of 256 sampling points to obtain 4 data blocks;

(2)分别计算每个数据块的最大峰值，并与已设定的安静阈值进行比较，如果所有数据块的最大峰值都小于等于所述安静阈值，则设定所述数据块对应的信号帧的判决标志为1，否则进入下一步骤；(2) Calculate the maximum peak value of each data block separately, and compare with the set quiet threshold, if the maximum peak value of all data blocks is less than or equal to the quiet threshold, then set the signal frame corresponding to the data block The judgment flag of is 1, otherwise go to the next step;

(3)对同一帧信号按照长度为128个采样点进行第二级块分割，得到8个数据块；(3) Carry out second-level block segmentation to the same frame signal according to the length of 128 sampling points to obtain 8 data blocks;

(4)分别计算8个数据块之间的最大峰值变化率，并与已设定的最大峰值变化率第一阈值进行比较，如果所有数据块的最大峰值变化率都小于等于所述最大峰值变化率第一阈值，则设定所述数据块对应的信号帧的判决标志为1，否则进入下一步骤；(4) Calculate the maximum peak rate of change between the 8 data blocks, and compare it with the set first threshold of the maximum peak rate of change, if the maximum peak rate of change of all data blocks is less than or equal to the maximum peak rate of change rate first threshold, then set the judgment flag of the signal frame corresponding to the data block to be 1, otherwise enter the next step;

(5)对同一帧信号按照长度为64个采样点进行第三级块分割，得到16个数据块；(5) Carry out third-level block segmentation to the same frame signal according to the length of 64 sampling points to obtain 16 data blocks;

(6)分别计算16个数据块之间的最大峰值变化率，并与已设定的最大峰值变化率第二阈值进行比较，如果所有数据块的最大峰值变化率都小于等于所述最大峰值变化率第二阈值的数据块，则设定所述数据块对应的信号帧的判决标志为1，否则设定所述数据块对应的信号帧的判决标志为0。(6) Calculate the maximum peak rate of change between the 16 data blocks, and compare it with the second threshold of the maximum peak rate of change, if the maximum peak rate of change of all data blocks is less than or equal to the maximum peak rate of change If there is a data block with a rate of the second threshold, the decision flag of the signal frame corresponding to the data block is set to 1; otherwise, the decision flag of the signal frame corresponding to the data block is set to 0.

本发明的一种基于方差和时域峰值的多级音频暂稳态判决方法，利用信号的方差和时域峰值来对音频的暂稳态进行判决，得到的暂稳态信号用于进行自适应窗切换，对暂稳态的判决复杂度低且准确率高。在时间复杂度上更简单，并且可以避免检测能量较低的信号，提高了检测的准确率。A multi-level audio transient stability judgment method based on variance and time domain peak value of the present invention uses the signal variance and time domain peak value to judge the audio transient state, and the obtained transient state signal is used for self-adaptation Window switching, low complexity and high accuracy for the judgment of transient steady state. It is simpler in terms of time complexity, and can avoid detecting signals with lower energy, improving the accuracy of detection.

附图说明Description of drawings

图1是本发明基于方差和时域峰值的多级音频暂稳态判决方法流程图；Fig. 1 is the flow chart of the present invention's multi-level audio transient steady-state judgment method based on variance and time-domain peak value;

图2是本发明中峰值检测算法流程图；Fig. 2 is a flow chart of peak detection algorithm in the present invention;

图3是暂稳态判决方法的效果图。Fig. 3 is an effect diagram of the transient steady state judgment method.

具体实施方式detailed description

下面结合实施例和附图对本发明的一种基于方差和时域峰值的多级音频暂稳态判决方法做出详细说明。A multi-level audio transient state judgment method based on variance and time-domain peak values of the present invention will be described in detail below in conjunction with embodiments and drawings.

如图1所示，本发明的一种基于方差和时域峰值的多级音频暂稳态判决方法，包括如下步骤：As shown in Figure 1, a kind of multilevel audio frequency transient state judgment method based on variance and time domain peak value of the present invention, comprises the following steps:

2)对每帧信号分别计算均值和方差，并与已设定的方差阈值进行比较，方差小于等于所述方差阈值的信号帧，则设定所述信号帧的判决标志为1，否则进入下一步骤；2) Calculate the mean value and variance of each frame signal respectively, and compare with the variance threshold value that has been set, if the variance is less than or equal to the signal frame of the variance threshold value, then set the judgment flag of the signal frame to 1, otherwise enter the next step one step;

3)对方差大于所述方差阈值的信号帧，采用峰值检测算法进行判定；包括：3) For signal frames whose variance is greater than the variance threshold, use a peak detection algorithm to determine; including:

(3)对最大峰值大于所述安静阈值的每一个数据块按照长度为128个采样点进行第二级块分割，得到2个数据块；(3) Carry out second-level block segmentation according to the length of 128 sampling points for each data block whose maximum peak value is greater than the quiet threshold, to obtain 2 data blocks;

Claims

1. the temporary stable state decision method of a kind of multistage audio based on variance and time domain peak, it is characterised in that comprise the following steps：

1) framing is carried out in units of 1024 sampled points to original audio signal；

2) calculate every frame signal average and variance respectively, and be compared with the variance threshold values that have set, variance is less than or equal toThe signal frame of the variance threshold values, the then judgement for setting the signal frame is masked as 1, then exports stable state flag of frame, otherwise enterNext step；

3) it is more than the signal frame of the variance threshold values to variance, is judged using peak detection algorithm；

4) result of determination to peak detection algorithm makes decisions, if judgement is masked as 1, exports stable state flag of frame, ifJudgement is masked as 0, then exports transient state flag of frame.

2. the temporary stable state decision method of a kind of multistage audio based on variance and time domain peak according to claim 1, it is specialLevy and be, step 3) include：

(1) to 1024 sampled points of each frame signal, it is that 256 sampled points carry out first order block segmentation according to length, obtains 4Individual data block；

(2) calculate the peak-peak of each data block respectively, and be compared with the quiet threshold value that has set, if all dataThe peak-peak of block is both less than equal to the quiet threshold value, then the judgement for setting the corresponding signal frame of the data block is masked as 1,Otherwise next step is entered；

(3) it is that 128 sampled points carry out second level block segmentation according to length to same frame signal, obtains 8 data blocks；

(4) calculate the peak-peak rate of change between 8 data blocks respectively, and with the threshold of peak-peak rate of change first that has setValue is compared, if the peak-peak rate of change of all data blocks is both less than equal to the threshold of peak-peak rate of change firstValue, then the judgement for setting the corresponding signal frame of the data block is masked as 1, otherwise into next step；

(5) it is that 64 sampled points carry out third level block segmentation according to length to same frame signal, obtains 16 data blocks；

(6) calculate the peak-peak rate of change between 16 data blocks respectively, and with the peak-peak rate of change second that has setThreshold value is compared, if the peak-peak rate of change of all data blocks is both less than equal to the threshold of peak-peak rate of change secondThe data block of value, the then judgement for setting the corresponding signal frame of the data block is masked as 1, otherwise sets the data block correspondingThe judgement of signal frame is masked as 0.