Movatterモバイル変換


[0]ホーム

URL:


CN107068155A - A kind of temporary stable state decision method of multistage audio based on variance and time domain peak - Google Patents

A kind of temporary stable state decision method of multistage audio based on variance and time domain peak
Download PDF

Info

Publication number
CN107068155A
CN107068155ACN201710051981.6ACN201710051981ACN107068155ACN 107068155 ACN107068155 ACN 107068155ACN 201710051981 ACN201710051981 ACN 201710051981ACN 107068155 ACN107068155 ACN 107068155A
Authority
CN
China
Prior art keywords
peak
variance
frame
signal
change
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710051981.6A
Other languages
Chinese (zh)
Inventor
张涛
唐伟
丁碧云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin UniversityfiledCriticalTianjin University
Priority to CN201710051981.6ApriorityCriticalpatent/CN107068155A/en
Publication of CN107068155ApublicationCriticalpatent/CN107068155A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

一种基于方差和时域峰值的多级音频暂稳态判决方法,包括:对原始音频信号以1024个采样点为单位进行分帧;对每帧信号分别计算均值和方差,并与已设定的方差阈值进行比较,方差小于等于所述方差阈值的信号帧,则设定所述信号帧的判决标志为1,输出稳态帧标志;对方差大于所述方差阈值的信号帧,采用峰值检测算法进行判定;对峰值检测算法的判定结果进行判决,如果判决标志为1,则输出稳态帧标志,如果判决标志为0,则输出暂态帧标志。本发明利用信号的方差和时域峰值来对音频的暂稳态进行判决,得到的暂稳态信号用于进行自适应窗切换,对暂稳态的判决复杂度低且准确率高。在时间复杂度上更简单,并且可以避免检测能量较低的信号,提高了检测的准确率。

A multi-level audio transient steady state judgment method based on variance and time-domain peak, including: dividing the original audio signal into frames with 1024 sampling points as the unit; Variance threshold value is compared, variance is less than or equal to the signal frame of described variance threshold value, then set the decision flag of described signal frame to be 1, output steady-state frame sign; To the signal frame with variance greater than described variance threshold value, adopt peak detection Judgment is made by the algorithm; the judgment result of the peak detection algorithm is judged, if the judgment flag is 1, the steady state frame flag is output, and if the judgment flag is 0, the transient frame flag is output. The invention uses the variance of the signal and the peak value in the time domain to judge the transient state of the audio, and the obtained signal of the transient state is used for adaptive window switching, and the judgment on the transient state has low complexity and high accuracy. It is simpler in terms of time complexity, and can avoid detecting signals with lower energy, improving the accuracy of detection.

Description

Translated fromChinese
一种基于方差和时域峰值的多级音频暂稳态判决方法A Multi-level Audio Transient Judgment Method Based on Variance and Time Domain Peak

技术领域technical field

本发明涉及一种音频暂稳态判决方法。特别是涉及一种基于方差和时域峰值的多级音频暂稳态判决方法。The invention relates to an audio transient steady state judgment method. In particular, it relates to a multi-level audio transient steady state judgment method based on variance and time domain peak value.

背景技术Background technique

现有的大部分音频编码标准都是基于心理声学模型,利用人耳的掩蔽效应,对原始音频信号进行压缩编码,即所谓的音频感知编码。如世界范围内广泛使用的AC-3、AAC、MPEG-2以及拥有我国自主知识产权的AVS和DRA。在目前主流的音频编码标准中,一般采用窗函数的方法对信号进行处理,通过窗函数将信号分成一个一个的数据块,然后对每个数据块单独进行处理,通过量化、熵编码,形成最终的输出比特流。Most of the existing audio coding standards are based on psychoacoustic models, using the masking effect of the human ear to compress and code the original audio signal, which is the so-called audio perceptual coding. Such as AC-3, AAC, MPEG-2 widely used in the world, and AVS and DRA with my country's independent intellectual property rights. In the current mainstream audio coding standards, the method of window function is generally used to process the signal. The signal is divided into data blocks one by one through the window function, and then each data block is processed separately. Through quantization and entropy coding, the final output bitstream.

在基于分块的音频编码技术中,预回声一直是很难解决的问题。预回声产生的根本原因在于音频信号中存在的暂态信息,当它从时域变换到频域后,会存在大量的高频分量,在输出码率一定的情况下,必然会产生量化噪声,量化噪声经反变换到时域后会扩散,由于声音的前向掩蔽的作用时间非常短,将有部分噪声不能被掩蔽掉,往往造成在低能量采样段出现人耳能明显感知的噪声,从而严重影响信号的音质。In block-based audio coding technology, pre-echo has always been a difficult problem to solve. The root cause of the pre-echo is the transient information in the audio signal. When it is transformed from the time domain to the frequency domain, there will be a large number of high-frequency components. When the output bit rate is constant, quantization noise will inevitably occur. Quantization noise will diffuse after being inversely transformed into the time domain. Since the forward masking time of sound is very short, part of the noise cannot be masked, which often results in noise that can be clearly perceived by the human ear in the low-energy sampling section, thus Seriously affect the sound quality of the signal.

随着生活水平逐渐提高,人们对于数字音视频的要求也在不断的提高,而在预回声的产生在解码时表现为在人耳可识别的嚓嚓声,严重影响整个信号的音质,这与人们对声音质量不断提高的需求是完全相反的,因而难以被接受。另一方面,新的音频标准对于音频信号细节的描述也更加明显,这就要求更加精确的算法来区分信号的暂稳态。研究能够准确区分暂态、稳态信号,并且能够准确确定暂态位置与强度的算法,对于整个音频信号的编码过程具有十分重要的意义。With the gradual improvement of living standards, people's requirements for digital audio and video are also constantly improving, and the generation of pre-echo is manifested as a crackling sound that can be recognized by the human ear during decoding, which seriously affects the sound quality of the entire signal. People's demand for continuous improvement of sound quality is completely opposite, so it is difficult to accept. On the other hand, the new audio standard describes the details of the audio signal more clearly, which requires a more accurate algorithm to distinguish the transient state of the signal. It is of great significance for the entire audio signal encoding process to study algorithms that can accurately distinguish between transient and steady-state signals, and can accurately determine the position and intensity of transient states.

解决预回声的方法之一就是自适应窗切换技术,也就是在编码前对音频信号进行判决,对于不同类型的信号采用不同的窗函数。对于自适应窗切换技术,能够准确地检测瞬态信号是前提。所以能够准确区分暂态、稳态信号,并且能够准确确定暂态位置与强度的方法是十分有意义的。One of the methods to solve the pre-echo is the adaptive window switching technology, that is, the audio signal is judged before encoding, and different window functions are used for different types of signals. For the adaptive window switching technology, the ability to accurately detect transient signals is a prerequisite. Therefore, it is very meaningful to be able to accurately distinguish between transient and steady-state signals, and to accurately determine the position and intensity of the transient state.

在MPEG中采用了基于感知熵的瞬态信号检测方法,该方法的原理如下:如果信号是暂态的,变换后的频谱中就会包含有大量的高频分量。高频分量会导致信号的感知熵值增大。当感知熵的值大于一个阈值(MPEG系列中参考值为1800)时,就可以判定当前帧中包含了暂态分量,属于瞬态帧。In MPEG, a transient signal detection method based on perceptual entropy is adopted. The principle of this method is as follows: if the signal is transient, the converted frequency spectrum will contain a large number of high-frequency components. High-frequency components can lead to an increase in the perceptual entropy of the signal. When the value of perceptual entropy is greater than a threshold (the reference value in the MPEG series is 1800), it can be determined that the current frame contains a transient component and belongs to a transient frame.

AVS编码标准采用了基于时域能量和频域不可预测度的暂稳态检测算法对音频信号进行瞬态特性检测。该算法采用了时域能量和频域不可预测性作为判决指标。The AVS coding standard uses a transient state detection algorithm based on time-domain energy and frequency-domain unpredictability to detect transient characteristics of audio signals. The algorithm uses energy in time domain and unpredictability in frequency domain as decision indicators.

目前的暂稳态判决方法都存在着不足之处:基于感知熵的瞬态信号检测结果中会存在较多的冗余判决;其次,它的算法复杂,编码效率低;基于时域能量和频域不可预测度的暂稳态判决方法存在着较高能量的前一帧信号会影响下一帧信号的准确检测的问题,造成误检。The current transient state judgment methods all have shortcomings: there will be more redundant judgments in the transient signal detection results based on perceptual entropy; secondly, its algorithm is complex and the coding efficiency is low; based on time-domain energy and frequency The transient steady-state judgment method in the domain of unpredictability has the problem that the previous frame signal with higher energy will affect the accurate detection of the next frame signal, resulting in false detection.

发明内容Contents of the invention

本发明所要解决的技术问题是,提供一种可以避免检测能量较低的信号,提高检测准确率的基于方差和时域峰值的多级音频暂稳态判决方法。The technical problem to be solved by the present invention is to provide a multi-stage audio transient steady-state judgment method based on variance and time-domain peaks that can avoid detecting signals with low energy and improve detection accuracy.

本发明所采用的技术方案是:一种基于方差和时域峰值的多级音频暂稳态判决方法,包括如下步骤:The technical solution adopted in the present invention is: a multi-stage audio transient steady state judgment method based on variance and time-domain peak, comprising the following steps:

1)对原始音频信号以1024个采样点为单位进行分帧;1) Framing the original audio signal in units of 1024 sampling points;

2)对每帧信号分别计算均值和方差,并与已设定的方差阈值进行比较,方差小于等于所述方差阈值的信号帧,则设定所述信号帧的判决标志为1,输出稳态帧标志,否则进入下一步骤;2) Calculate the mean value and variance of each frame signal respectively, and compare with the set variance threshold, if the variance is less than or equal to the signal frame of the variance threshold, then set the judgment flag of the signal frame to 1, and output the steady state frame mark, otherwise go to the next step;

3)对方差大于所述方差阈值的信号帧,采用峰值检测算法进行判定;3) For signal frames whose variance is greater than the variance threshold, use a peak detection algorithm to determine;

4)对峰值检测算法的判定结果进行判决,如果判决标志为1,则输出稳态帧标志,如果判决标志为0,则输出暂态帧标志。4) Make a judgment on the judgment result of the peak detection algorithm, if the judgment flag is 1, then output the steady state frame flag, if the judgment flag is 0, then output the transient frame flag.

2.根据权利要求1所述的一种基于方差和时域峰值的多级音频暂稳态判决方法,其特征在于,步骤3)包括:2. a kind of multilevel audio frequency transient state judgment method based on variance and time domain peak value according to claim 1, it is characterized in that, step 3) comprises:

(1)对每一帧信号的1024个采样点,按照长度为256个采样点进行第一级块分割,得到4个数据块;(1) For the 1024 sampling points of each frame signal, the first-level block segmentation is performed according to the length of 256 sampling points to obtain 4 data blocks;

(2)分别计算每个数据块的最大峰值,并与已设定的安静阈值进行比较,如果所有数据块的最大峰值都小于等于所述安静阈值,则设定所述数据块对应的信号帧的判决标志为1,否则进入下一步骤;(2) Calculate the maximum peak value of each data block separately, and compare with the set quiet threshold, if the maximum peak value of all data blocks is less than or equal to the quiet threshold, then set the signal frame corresponding to the data block The judgment flag of is 1, otherwise go to the next step;

(3)对同一帧信号按照长度为128个采样点进行第二级块分割,得到8个数据块;(3) Carry out second-level block segmentation to the same frame signal according to the length of 128 sampling points to obtain 8 data blocks;

(4)分别计算8个数据块之间的最大峰值变化率,并与已设定的最大峰值变化率第一阈值进行比较,如果所有数据块的最大峰值变化率都小于等于所述最大峰值变化率第一阈值,则设定所述数据块对应的信号帧的判决标志为1,否则进入下一步骤;(4) Calculate the maximum peak rate of change between the 8 data blocks, and compare it with the set first threshold of the maximum peak rate of change, if the maximum peak rate of change of all data blocks is less than or equal to the maximum peak rate of change rate first threshold, then set the judgment flag of the signal frame corresponding to the data block to be 1, otherwise enter the next step;

(5)对同一帧信号按照长度为64个采样点进行第三级块分割,得到16个数据块;(5) Carry out third-level block segmentation to the same frame signal according to the length of 64 sampling points to obtain 16 data blocks;

(6)分别计算16个数据块之间的最大峰值变化率,并与已设定的最大峰值变化率第二阈值进行比较,如果所有数据块的最大峰值变化率都小于等于所述最大峰值变化率第二阈值的数据块,则设定所述数据块对应的信号帧的判决标志为1,否则设定所述数据块对应的信号帧的判决标志为0。(6) Calculate the maximum peak rate of change between the 16 data blocks, and compare it with the second threshold of the maximum peak rate of change, if the maximum peak rate of change of all data blocks is less than or equal to the maximum peak rate of change If there is a data block with a rate of the second threshold, the decision flag of the signal frame corresponding to the data block is set to 1; otherwise, the decision flag of the signal frame corresponding to the data block is set to 0.

本发明的一种基于方差和时域峰值的多级音频暂稳态判决方法,利用信号的方差和时域峰值来对音频的暂稳态进行判决,得到的暂稳态信号用于进行自适应窗切换,对暂稳态的判决复杂度低且准确率高。在时间复杂度上更简单,并且可以避免检测能量较低的信号,提高了检测的准确率。A multi-level audio transient stability judgment method based on variance and time domain peak value of the present invention uses the signal variance and time domain peak value to judge the audio transient state, and the obtained transient state signal is used for self-adaptation Window switching, low complexity and high accuracy for the judgment of transient steady state. It is simpler in terms of time complexity, and can avoid detecting signals with lower energy, improving the accuracy of detection.

附图说明Description of drawings

图1是本发明基于方差和时域峰值的多级音频暂稳态判决方法流程图;Fig. 1 is the flow chart of the present invention's multi-level audio transient steady-state judgment method based on variance and time-domain peak value;

图2是本发明中峰值检测算法流程图;Fig. 2 is a flow chart of peak detection algorithm in the present invention;

图3是暂稳态判决方法的效果图。Fig. 3 is an effect diagram of the transient steady state judgment method.

具体实施方式detailed description

下面结合实施例和附图对本发明的一种基于方差和时域峰值的多级音频暂稳态判决方法做出详细说明。A multi-level audio transient state judgment method based on variance and time-domain peak values of the present invention will be described in detail below in conjunction with embodiments and drawings.

如图1所示,本发明的一种基于方差和时域峰值的多级音频暂稳态判决方法,包括如下步骤:As shown in Figure 1, a kind of multilevel audio frequency transient state judgment method based on variance and time domain peak value of the present invention, comprises the following steps:

1)对原始音频信号以1024个采样点为单位进行分帧;1) Framing the original audio signal in units of 1024 sampling points;

2)对每帧信号分别计算均值和方差,并与已设定的方差阈值进行比较,方差小于等于所述方差阈值的信号帧,则设定所述信号帧的判决标志为1,否则进入下一步骤;2) Calculate the mean value and variance of each frame signal respectively, and compare with the variance threshold value that has been set, if the variance is less than or equal to the signal frame of the variance threshold value, then set the judgment flag of the signal frame to 1, otherwise enter the next step one step;

3)对方差大于所述方差阈值的信号帧,采用峰值检测算法进行判定;包括:3) For signal frames whose variance is greater than the variance threshold, use a peak detection algorithm to determine; including:

(1)对每一帧信号的1024个采样点,按照长度为256个采样点进行第一级块分割,得到4个数据块;(1) For the 1024 sampling points of each frame signal, the first-level block segmentation is performed according to the length of 256 sampling points to obtain 4 data blocks;

(2)分别计算每个数据块的最大峰值,并与已设定的安静阈值进行比较,如果所有数据块的最大峰值都小于等于所述安静阈值,则设定所述数据块对应的信号帧的判决标志为1,否则进入下一步骤;(2) Calculate the maximum peak value of each data block separately, and compare with the set quiet threshold, if the maximum peak value of all data blocks is less than or equal to the quiet threshold, then set the signal frame corresponding to the data block The judgment flag of is 1, otherwise go to the next step;

(3)对最大峰值大于所述安静阈值的每一个数据块按照长度为128个采样点进行第二级块分割,得到2个数据块;(3) Carry out second-level block segmentation according to the length of 128 sampling points for each data block whose maximum peak value is greater than the quiet threshold, to obtain 2 data blocks;

(4)分别计算8个数据块之间的最大峰值变化率,并与已设定的最大峰值变化率第一阈值进行比较,如果所有数据块的最大峰值变化率都小于等于所述最大峰值变化率第一阈值,则设定所述数据块对应的信号帧的判决标志为1,否则进入下一步骤;(4) Calculate the maximum peak rate of change between the 8 data blocks, and compare it with the set first threshold of the maximum peak rate of change, if the maximum peak rate of change of all data blocks is less than or equal to the maximum peak rate of change rate first threshold, then set the judgment flag of the signal frame corresponding to the data block to be 1, otherwise enter the next step;

(5)对同一帧信号按照长度为64个采样点进行第三级块分割,得到16个数据块;(5) Carry out third-level block segmentation to the same frame signal according to the length of 64 sampling points to obtain 16 data blocks;

(6)分别计算16个数据块之间的最大峰值变化率,并与已设定的最大峰值变化率第二阈值进行比较,如果所有数据块的最大峰值变化率都小于等于所述最大峰值变化率第二阈值的数据块,则设定所述数据块对应的信号帧的判决标志为1,否则设定所述数据块对应的信号帧的判决标志为0。(6) Calculate the maximum peak rate of change between the 16 data blocks, and compare it with the second threshold of the maximum peak rate of change, if the maximum peak rate of change of all data blocks is less than or equal to the maximum peak rate of change If there is a data block with a rate of the second threshold, the decision flag of the signal frame corresponding to the data block is set to 1; otherwise, the decision flag of the signal frame corresponding to the data block is set to 0.

4)对峰值检测算法的判定结果进行判决,如果判决标志为1,则输出稳态帧标志,如果判决标志为0,则输出暂态帧标志。4) Make a judgment on the judgment result of the peak detection algorithm, if the judgment flag is 1, then output the steady state frame flag, if the judgment flag is 0, then output the transient frame flag.

Claims (2)

CN201710051981.6A2017-01-232017-01-23A kind of temporary stable state decision method of multistage audio based on variance and time domain peakPendingCN107068155A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201710051981.6ACN107068155A (en)2017-01-232017-01-23A kind of temporary stable state decision method of multistage audio based on variance and time domain peak

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201710051981.6ACN107068155A (en)2017-01-232017-01-23A kind of temporary stable state decision method of multistage audio based on variance and time domain peak

Publications (1)

Publication NumberPublication Date
CN107068155Atrue CN107068155A (en)2017-08-18

Family

ID=59598441

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201710051981.6APendingCN107068155A (en)2017-01-232017-01-23A kind of temporary stable state decision method of multistage audio based on variance and time domain peak

Country Status (1)

CountryLink
CN (1)CN107068155A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20070192102A1 (en)*2006-01-242007-08-16Samsung Electronics Co., Ltd.Method and system for aligning windows to extract peak feature from a voice signal
CN101136202A (en)*2006-08-292008-03-05华为技术有限公司 Audio signal processing system, method and audio signal transceiving device
CN101308605A (en)*2008-05-232008-11-19陈汇鑫Method implementing traffic dynamic management by vehicle information recognition
CN101894557A (en)*2010-06-122010-11-24北京航空航天大学Method for discriminating window type of AAC codes
CN102280103A (en)*2011-08-022011-12-14天津大学Audio signal transient-state segment detection method based on variance

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20070192102A1 (en)*2006-01-242007-08-16Samsung Electronics Co., Ltd.Method and system for aligning windows to extract peak feature from a voice signal
CN101136202A (en)*2006-08-292008-03-05华为技术有限公司 Audio signal processing system, method and audio signal transceiving device
CN101308605A (en)*2008-05-232008-11-19陈汇鑫Method implementing traffic dynamic management by vehicle information recognition
CN101894557A (en)*2010-06-122010-11-24北京航空航天大学Method for discriminating window type of AAC codes
CN102280103A (en)*2011-08-022011-12-14天津大学Audio signal transient-state segment detection method based on variance

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张丽娜: "《硕士学位论文》", 30 May 2015*

Similar Documents

PublicationPublication DateTitle
Chou et al.Robust singing detection in speech/music discriminator design
CN108831499A (en)Utilize the sound enhancement method of voice existing probability
CN102446504B (en) Speech/music recognition method and device
CN104021789A (en)Self-adaption endpoint detection method using short-time time-frequency value
RU2015136792A (en) AUDIO CODER, AUDIO DECODER, METHOD FOR PROCESSING ENCODED AUDIO INFORMATION, METHOD FOR PROVIDING DECODED AUDIO INFORMATION, COMPUTER PROGRAM AND ENCODED REPRESENTATION USING SIGNAL-RESISTANCE
JP6493889B2 (en) Method and apparatus for detecting an audio signal
WO2005034080A3 (en)A method of making a window type decision based on mdct data in audio encoding
CN101763856A (en)Signal classifying method, classifying device and coding system
CN101930746A (en) An Adaptive Noise Reduction Method for MP3 Compressed Domain Audio
CN110265065A (en)A kind of method and speech terminals detection system constructing speech detection model
CN102610232B (en) An Adaptive Audio Perceptual Loudness Adjustment Method
CN105869658B (en) A Speech Endpoint Detection Method Using Nonlinear Features
CN103680509B (en)A kind of voice signal discontinuous transmission and ground unrest generation method
JPH07505732A (en) Method and apparatus for encoding/decoding background sound
CN101256772A (en) Method and device for determining the category of non-noise audio signal
JP2015537254A (en) Encoding method, decoding method, encoding device, and decoding device
CN103295577B (en)Analysis window switching method and device for audio signal coding
CN105741853A (en)Digital speech perception hash method based on formant frequency
CN107068155A (en)A kind of temporary stable state decision method of multistage audio based on variance and time domain peak
CN118298827A (en)Edge intelligent voice recognition method and system device
McClellan et al.Spectral entropy: An alternative indicator for rate allocation?
Jiao et al.MDCT-based perceptual hashing for compressed audio content identification
CN104134443A (en)Symmetrical ternary string represented voice perception Hash sequence constructing and authenticating method
CN101388213B (en) A kind of pre-echo control method
Qiuyu et al.An efficient speech perceptual hashing authentication algorithm based on DWT and symmetric ternary string

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20170818

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp