Movatterモバイル変換


[0]ホーム

URL:


CN106601249B - A real-time decomposition/synthesis method of digital speech based on auditory perception characteristics - Google Patents

A real-time decomposition/synthesis method of digital speech based on auditory perception characteristics
Download PDF

Info

Publication number
CN106601249B
CN106601249BCN201611026399.6ACN201611026399ACN106601249BCN 106601249 BCN106601249 BCN 106601249BCN 201611026399 ACN201611026399 ACN 201611026399ACN 106601249 BCN106601249 BCN 106601249B
Authority
CN
China
Prior art keywords
speech
gamma
filter
channel
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611026399.6A
Other languages
Chinese (zh)
Other versions
CN106601249A (en
Inventor
李冬梅
杨有为
贾瑞
刘润生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua UniversityfiledCriticalTsinghua University
Priority to CN201611026399.6ApriorityCriticalpatent/CN106601249B/en
Publication of CN106601249ApublicationCriticalpatent/CN106601249A/en
Application grantedgrantedCritical
Publication of CN106601249BpublicationCriticalpatent/CN106601249B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于听觉感知特性的数字语音实时分解/合成方法,涉及语音信号处理领域。本方法包括用N级级联的二阶带通滤波器构成一个N阶的伽马通滤波器再构建任意阶的伽马通数字滤波器模型及其参数,语音分解阶段用M路伽马通滤波器采用浮点算法或定点算法将输入语音分解为M路信号;语音合成阶段在伽马通滤波器组中引入延时,以更加符合人耳特性,人耳基底膜延时与频率成反比关,最后进行语音合成操作。本发明参考了人耳的等响度曲线特性,改进了语音分解合成方法,使得最终语音合成效果接近了理想带通滤波器的效果。本发明可应用在手机、人工耳蜗、助听器等语音设备中。The invention discloses a real-time decomposition/synthesis method of digital speech based on auditory perception characteristics, and relates to the field of speech signal processing. The method includes using N-stage cascaded second-order band-pass filters to form an N-order gamma-pass filter, and then constructing an arbitrary-order gamma-pass digital filter model and its parameters. M-channel gamma-pass filters are used in the speech decomposition stage. The filter uses floating-point algorithm or fixed-point algorithm to decompose the input speech into M-channel signals; in the speech synthesis stage, delay is introduced into the gamma pass filter bank to be more in line with the characteristics of the human ear. The delay of the human ear basilar membrane is inversely proportional to the frequency off, and finally perform the speech synthesis operation. The invention improves the speech decomposition and synthesis method by referring to the equal loudness curve characteristic of the human ear, so that the final speech synthesis effect is close to that of an ideal band-pass filter. The present invention can be applied to speech equipment such as mobile phones, cochlear implants, hearing aids and the like.

Description

Translated fromChinese
一种基于听觉感知特性的数字语音实时分解/合成方法A real-time decomposition/synthesis method of digital speech based on auditory perception characteristics

技术领域technical field

本发明属于数字语音信号处理领域,具体涉及一种基于听觉感知特性的数字语音实时分解/合成方法。The invention belongs to the field of digital speech signal processing, and in particular relates to a real-time decomposition/synthesis method of digital speech based on auditory perception characteristics.

背景技术Background technique

在日常生活中,存在各种各样的噪声。语音增强和语音识别等设备的性能在噪声环境下会明显恶化,限制了其应用场景。由于人耳在噪声环境下仍能正常工作,且对声音具有较强的灵敏度和抗干扰能力。因此在语音信号处理系统中迫切需要实现人耳尤其是基底膜的听觉感知特性。人耳基底膜的感知特性有:In daily life, there are various kinds of noise. The performance of devices such as speech enhancement and speech recognition deteriorates significantly in noisy environments, limiting their application scenarios. Because the human ear can still work normally in the noise environment, and has strong sensitivity to sound and anti-interference ability. Therefore, it is urgent to realize the auditory perception characteristics of the human ear, especially the basilar membrane, in the speech signal processing system. The perceptual properties of the basement membrane of the human ear are:

1.频率选择特性:不同的频率在基底膜上都有相应的共振点,频率较高的声音,在靠近基底膜底部位置会引起较大幅度的振动;对于频率较低的声音,响应最强烈的位置在基底膜的顶部。1. Frequency selection characteristics: Different frequencies have corresponding resonance points on the basilar membrane. A sound with a higher frequency will cause a larger vibration near the bottom of the basilar membrane; for a lower frequency sound, the response is the strongest. is located on top of the basement membrane.

2.频率分析特性:它能够将声音中的各种频率分解映射到基底膜的不同位置来感知,得到频率分布图;同时还能够将声音强度转化为对应基底膜位置上的振动幅度。最终,基底膜将声音中具有不同幅度不同频率的声音分离出来,并产生相应的神经信息,相当于对频率和强度等进行了编码,这样大脑就能够对这些信息进行分析归纳,形成不同的听觉感受。2. Frequency analysis characteristics: It can decompose and map various frequencies in the sound to different positions of the basilar membrane to perceive, and obtain a frequency distribution map; at the same time, it can also convert the sound intensity into the vibration amplitude at the corresponding basilar membrane position. Finally, the basilar membrane separates the sounds with different amplitudes and frequencies, and generates corresponding neural information, which is equivalent to encoding the frequency and intensity, so that the brain can analyze and summarize the information to form different hearing. feel.

3.带宽特性:人耳基底膜每个位置的滤波特性各不相同。人耳基底膜顶部对低频比较敏感,且在低频的分辨率较高、带宽小;基底膜底部对高频敏感,且在高频的分辨率高、带宽大。3. Bandwidth characteristics: The filtering characteristics of each position of the basilar membrane of the human ear are different. The top of the basilar membrane of the human ear is sensitive to low frequencies, and has high resolution and small bandwidth at low frequencies; the bottom of the basilar membrane is sensitive to high frequencies, and has high resolution and large bandwidth at high frequencies.

人耳基底膜每个位置的滤波特性都可以用一个听觉滤波器来描述,于是听觉系统处理语音的过程可以用一组听觉滤波器来模拟,听觉滤波器是通过拟合听觉系统的心理声学实验数据而被提出来的一类滤波器。使用上述听觉滤波器组可以将语音分解到不同的子带上面,进而实现对语音的分解和合成。The filtering characteristics of each position of the basilar membrane of the human ear can be described by an auditory filter, so the process of the auditory system processing speech can be simulated by a set of auditory filters. The auditory filter is a psychoacoustic experiment that fits the auditory system. A class of filters proposed for data. Using the above auditory filter bank, the speech can be decomposed into different subbands, so as to realize the decomposition and synthesis of the speech.

为了描述听觉滤波器的带宽,研究中经常使用等价矩形带宽(ERB)这一概念,ERB是指:对于相同的白噪声输入,当矩形滤波器和被测滤波器通过相同能量时,矩形滤波器的带宽即为等价矩形带宽。ERB与听觉滤波器的中心频率fc大致呈线性关系,具体关系可用表达式如式(1-1)所示来描述:In order to describe the bandwidth of the auditory filter, the concept of equivalent rectangular bandwidth (ERB) is often used in research. ERB refers to: for the same white noise input, when the rectangular filter and the filter under test pass the same energy, the rectangular filter The bandwidth of the device is the equivalent rectangular bandwidth. The ERB has a roughly linear relationship with the center frequency fc of the auditory filter, and the specific relationship can be described by the expression shown in Equation (1-1):

ERB(fc)=24.7(1+4.37fc/1000) (1-1)ERB(fc )=24.7(1+4.37fc /1000) (1-1)

M组滤波器的中心频率fc对应着人耳基底膜上的M个位置,它们在基底膜上是均匀分布的。为了更好的描述这种分布,ERB域(ERBs)的概念,首先通过表达式如式(1-2)所示得到ERBs域上的值,再将ERBs值均分,最后在回推出中心频率fc的值。The center frequency fc of the M groups of filters corresponds to M positions on the basilar membrane of the human ear, and they are uniformly distributed on the basilar membrane. In order to better describe this distribution, the concept of ERB field (ERBs) is firstly obtained through the expression as shown in Equation (1-2) to obtain the value on the ERBs field, then the ERBs value is equally divided, and finally the center frequency is derived the value offc .

Figure BDA0001156792220000021
Figure BDA0001156792220000021

一种典型的听觉滤波器组是由M个伽马通滤波器(Gammatone Filterbank)构成,每个伽马通滤波器的时域表达式为:A typical auditory filter bank is composed of M gamma pass filters (Gammatone Filterbank), and the time domain expression of each gamma pass filter is:

Figure BDA0001156792220000022
Figure BDA0001156792220000022

其中,u(t)是阶跃函数;参数A一般是固定值,主要用于归一化处理;N代表滤波器的阶数,控制着Gammatone函数包络的相对形状,一般设置N=4;b代表函数的带宽,控制着函数时域的波动越大,函数波动的范围就越小,b=ERB(fc)。fc代表滤波器的中心频率;

Figure BDA0001156792220000023
代表初始相位,由于
Figure BDA0001156792220000024
对滤波器性能影响较小,并且人耳对相位不敏感,因此
Figure BDA0001156792220000025
一般被设置为0。Among them, u(t) is a step function; the parameter A is generally a fixed value, which is mainly used for normalization processing; N represents the order of the filter, which controls the relative shape of the envelope of the Gammatone function, and is generally set to N=4; b represents the bandwidth of the function, which controls the greater the fluctuation in the time domain of the function, the smaller the range of the function fluctuation, b=ERB(fc ). fc represents the center frequency of the filter;
Figure BDA0001156792220000023
represents the initial phase, since
Figure BDA0001156792220000024
Less impact on filter performance, and the human ear is insensitive to phase, so
Figure BDA0001156792220000025
Usually set to 0.

将表达式(1-3)中进行Laplace变换,得到s域表达式为:Laplace transform is performed in expression (1-3), and the s-domain expression is obtained as:

Figure BDA0001156792220000026
Figure BDA0001156792220000026

Figure BDA0001156792220000027
Figure BDA0001156792220000027

Figure BDA0001156792220000028
Figure BDA0001156792220000028

Figure BDA0001156792220000029
Figure BDA0001156792220000029

Figure BDA00011567922200000210
Figure BDA00011567922200000210

其中,Bc=2πb,wc=2πfcWherein, Bc =2πb, wc =2πfc ;

gn为归一化参数;gn is the normalization parameter;

对表达式(1-4)使用冲激响应不变法,Using the impulse response invariant method for expression (1-4),

可得到数字滤波器的z域表达式:The z-domain expression of the digital filter can be obtained:

Figure BDA00011567922200000211
Figure BDA00011567922200000211

Figure BDA00011567922200000212
Figure BDA00011567922200000212

Figure BDA00011567922200000213
Figure BDA00011567922200000213

由表达式(1-9)可得到其时域迭代方程式(1-13),滤波器结构如图1所示,由四级级联结构组成,其中a1~a4,b1,b2分别为各级滤波器的抽头系数,g1~g4分别为各级的归一化系数,方框中表示在变换域Z域进行延时操作,每一级的输入信号经过各抽头系数的加权、延时、相加等操作后传入下一级。The time domain iterative equation (1-13) can be obtained from the expression (1-9). The filter structure is shown in Figure 1, which consists of a four-stage cascade structure, where a1 ~a4 , b1 , b2 are the tap coefficients of the filters of all levels, g1 to g4 are the normalization coefficients of the various levels, respectively. After weighting, delaying, adding and other operations, it is passed to the next level.

x1(k)=x(k) (1-12)x1 (k)=x(k) (1-12)

yn(k)=xn(k)+anx(k-1)-b1y(k-1)-b2y(k-2) (1-13)yn (k)=xn (k)+an x(k-1)-b1 y(k-1)-b2 y(k-2) (1-13)

xn+1(k)=gnyn(k) (1-14)xn+1 (k)=gn yn (k) (1-14)

y(k)=g4y4(k) (1-15)y(k)=g4 y4 (k) (1-15)

经过上述M个伽马通滤波器,输入语音被分解为M路语音信号,每路输出为ym(k),m代表各个伽马通滤波器的序号,上述(1-12)至(1-15)公式中省略了序号m。After the above-mentioned M gamma-pass filters, the input speech is decomposed into M-way speech signals, each output is ym (k), m represents the serial number of each gamma-pass filter, the above (1-12) to (1 -15) The serial number m is omitted from the formula.

在实际应用中,系统有时还需要将分解后的语音(已经过降噪、识别等处理)恢复原始语音。由于每个通道都有群延时,因此可以获取群延时Dm,然后调整各通道延时,最后合成语音,计算表达式如下:In practical applications, the system sometimes needs to restore the decomposed speech (which has been processed by noise reduction, recognition, etc.) to the original speech. Since each channel has a group delay, the group delay Dm can be obtained, then adjust the delay of each channel, and finally synthesize speech. The calculation expression is as follows:

Figure BDA0001156792220000031
Figure BDA0001156792220000031

该方法的转移函数的幅度响应特性如图2所示,可以看到在低频阶段幅度较高,随着频率的升高幅度缓慢下降,其中合成语音中的各通道权重均相同,通道数目M=64,通道中心频率分布为50Hz~7500Hz。The amplitude response characteristics of the transfer function of this method are shown in Figure 2. It can be seen that the amplitude is high in the low frequency stage, and the amplitude decreases slowly with the increase of the frequency. The weight of each channel in the synthesized speech is the same, and the number of channels M = 64, the channel center frequency distribution is 50Hz~7500Hz.

上述方法的缺点是:The disadvantages of the above method are:

1.该方法限制了伽马通函数的阶数N=4,它仅仅是伽马通滤波器的一个特例,没有给出伽马通滤波器在N为其它值时的实现方法。1. This method limits the order of the gamma-pass function N=4, which is only a special case of the gamma-pass filter, and does not give the implementation method of the gamma-pass filter when N is other values.

2.该方法的一些关键参数是通过仿真获取的,缺乏理论计算依据,主要包括参数b、归一化参数gn和通道群延时Dm,这降低了方法可操作性和可重复性。2. Some key parameters of the method are obtained through simulation and lack theoretical calculation basis, mainly including parameter b, normalized parameter gn and channel group delay Dm , which reduces the operability and repeatability of the method.

3.该方法中的各个伽马通滤波器的幅度是相等,即合成语音时各个通道的权值均设成了1。然而人耳对于不同通道上的语音感知到的响度是不同,如图3中的人耳等响度曲线所示,横坐标为频率,单位是Hz,纵坐标为声压等级,单位为dB,要达到相同的响度,高频需要较高的幅值,低频需要较低的幅值。这样最终就导致合成语音有些频率上的语音被抑制了。3. The amplitude of each gamma pass filter in this method is equal, that is, the weight of each channel is set to 1 when synthesizing speech. However, the loudness perceived by the human ear for speech on different channels is different. As shown in the equal loudness curve of the human ear in Figure 3, the abscissa is the frequency, and the unit is Hz, and the ordinate is the sound pressure level, and the unit is dB. To achieve the same loudness, high frequencies require higher amplitudes and low frequencies require lower amplitudes. This eventually leads to the suppression of speech at some frequencies of the synthesized speech.

发明内容SUMMARY OF THE INVENTION

本发明的目的是针对现有技的不足,提出一种基于听觉感知特性的数字语音实时分解/合成方法。该方法给出了任意阶数的伽马通滤波器的实现方法,同时推导出了伽马通滤波器中的归一化参数gn;并根据人耳基底膜延时特性,给出了各个的通道延时Dm。最后本发明参考了人耳的等响度曲线特性,改进了语音分解合成方法,使得最终语音合成效果接近了理想带通滤波器的效果。The purpose of the present invention is to propose a real-time decomposition/synthesis method of digital speech based on auditory perception characteristics in view of the deficiencies of the prior art. In this method, the realization method of gamma-pass filter of arbitrary order is given, and the normalization parameter gn in the gamma-pass filter is deduced at the same time; The channel delay Dm . Finally, the present invention improves the speech decomposition and synthesis method by referring to the equal loudness curve characteristic of the human ear, so that the final speech synthesis effect is close to that of an ideal bandpass filter.

本发明提出的一种基于听觉感知特性的数字语音实时分解/合成方法,其特征在于,该方法具体步骤如下:A kind of real-time decomposition/synthesis method of digital speech based on auditory perception characteristics proposed by the present invention is characterized in that the specific steps of the method are as follows:

1)构建任意阶的伽马通数字滤波器模型:1) Build a gamma pass digital filter model of any order:

假设滤波器组数目为M,该M组滤波器对应着人耳基底膜上的M个位置,并在人耳基底膜上是均匀分布的,在频域上是对数分布的;具体包括:Assuming that the number of filter groups is M, the M groups of filters correspond to M positions on the basilar membrane of the human ear, and are uniformly distributed on the basilar membrane of the human ear and logarithmically distributed in the frequency domain; specifically include:

1.1)已知输入语音的采样率为fs1.1) The sampling rate of the known input speech is fs ;

设通过滤波器的语音频率范围为[fL,fH],0≤fL<fH≤fs/2;Assume that the range of speech frequencies passing through the filter is [fL , fH ], 0≤fL <fH ≤fs /2;

1.2)根据表达式(1-2):

Figure BDA0001156792220000041
得出中心频率fc在ERBs域上的值分布为[ERBs(fL),ERBs(fH)],将其均分成M-1份得到等间距的M个ERBs值如式(1)所示:1.2) According to expression (1-2):
Figure BDA0001156792220000041
It is obtained that the value distribution of the center frequency fc on the ERBs domain is [ERBs(fL ), ERBs(fH )], and it is divided into M-1 parts to obtain M ERBs values with equal spacing as shown in formula (1) Show:

Figure BDA0001156792220000042
Figure BDA0001156792220000042

其中,m∈[1,M],代表通道号;Among them, m∈[1,M], represents the channel number;

1.3)根据式(1)的计算结果得到M组滤波器的中心频率fc在ERBs域上的值如式(2)所示:1.3) According to the calculation result of formula (1), the value of the center frequency fc of the M groups of filters on the ERBs domain is obtained as shown in formula (2):

Figure BDA0001156792220000043
Figure BDA0001156792220000043

1.4)针对b(fc)与ERB(fc)的关系:基于b=ERB(fc),根据帕塞瓦尔(Paseval)定理,得出N阶伽马通滤波器中滤波器的中心频率fc的带宽函数b(fc)表达式如式(2a)所示:1.4) Regarding the relationship between b(fc ) and ERB(fc ): Based on b=ERB(fc ), according to Paseval’s theorem, the center frequency of the filter in the N-order gamma-pass filter is obtained The expression of the bandwidth function b(fc ) of fc is shown in formula (2a):

Figure BDA0001156792220000044
Figure BDA0001156792220000044

其中,b代表函数的带宽,N为任意正整数;Among them, b represents the bandwidth of the function, and N is any positive integer;

1.5)用N级级联的二阶带通滤波器构成一个N阶的伽马通滤波器;对每个伽马通滤波器的时域表达式(1-3):

Figure BDA0001156792220000045
进行Laplace变换得到s域表达式如式(2b)所示:1.5) Form an N-order gamma-pass filter with N-stage cascaded second-order band-pass filters; the time-domain expression (1-3) for each gamma-pass filter:
Figure BDA0001156792220000045
The Laplace transform is performed to obtain the s-domain expression as shown in formula (2b):

Figure BDA0001156792220000046
Figure BDA0001156792220000046

将式(2b)分解成零极点相乘得到如表达式(2c)所示:Decomposing equation (2b) into zeros and poles and multiplying them together, we get the following equation (2c):

Figure BDA0001156792220000051
Figure BDA0001156792220000051

使用冲激响应不变法得到N阶伽马通数字滤波器的z域表达式(2d):The z-domain expression (2d) of the Nth order gamma-pass digital filter is obtained using the impulse response invariant method:

Figure BDA0001156792220000052
Figure BDA0001156792220000052

其中n=1,2,…,N,sn为表达式分子的零点,an、b1、b2的含义分别为各级滤波器的抽头系数;wheren =1,2,...,N ,sn is the zero point of the numerator of the expression, and the meanings of an, b1 and b2 are the tap coefficients of filters at all levels;

an的表达式如(1-10)所示:

Figure BDA0001156792220000053
b1、b2的表达式如(1-11)所示:
Figure BDA0001156792220000054
The expression of an is shown in (1-10):
Figure BDA0001156792220000053
The expressions of b1 and b2 are shown in (1-11):
Figure BDA0001156792220000054

1.6)计算归一化参数gn:伽马通滤波器各级的二阶滤波器的最大增益如式(2e)所示:1.6) Calculate the normalization parameter gn : the maximum gain of the second-order filter of each stage of the gamma pass filter is shown in formula (2e):

Figure BDA0001156792220000055
Figure BDA0001156792220000055

归一化参数gn如式(2f)所示:The normalization parameter gn is shown in formula (2f):

Figure BDA0001156792220000056
Figure BDA0001156792220000056

1.7)根据步骤1.5)中得到的N级级联的二阶带通滤波器来构成任意阶的伽马通数字滤波器模型,并获取模型的各参数值:用m表示第m组伽马通滤波器组,则由表达式(1-10)、(1-11)、(2e)和(2f)分别得出各个滤波器组的参数

Figure BDA0001156792220000057
的值,其中
Figure BDA0001156792220000058
是各个通道的滤波器抽头系数,
Figure BDA0001156792220000059
为各个通道的归一化系数,如式(3)-式(6)所示:1.7) form the gamma pass digital filter model of any order according to the second-order band-pass filter of the N-level cascade obtained in step 1.5), and obtain each parameter value of the model: use m to represent the mth group of gamma pass filter bank, the parameters of each filter bank are obtained from the expressions (1-10), (1-11), (2e) and (2f) respectively
Figure BDA0001156792220000057
value, where
Figure BDA0001156792220000058
are the filter tap coefficients for each channel,
Figure BDA0001156792220000059
is the normalization coefficient of each channel, as shown in Equation (3)-Equation (6):

Figure BDA00011567922200000510
Figure BDA00011567922200000510

Figure BDA00011567922200000511
Figure BDA00011567922200000511

Figure BDA00011567922200000512
Figure BDA00011567922200000512

Figure BDA00011567922200000513
Figure BDA00011567922200000513

2)语音分解阶段;2) Speech decomposition stage;

利用步骤1)构建的伽马通数字滤波器模型,模仿人耳基底膜对语音进行分解:将输入语音实时地分解到M个子带上,使用M路伽马通滤波器采用浮点算法或定点算法将输入语音分解为M路信号;Use the gamma-pass digital filter model constructed in step 1) to decompose speech by imitating the basilar membrane of the human ear: decompose the input speech into M sub-bands in real time, and use M-way gamma-pass filters to use floating-point arithmetic or fixed-point The algorithm decomposes the input speech into M signals;

3)语音合成阶段;3) Speech synthesis stage;

在伽马通滤波器组中引入延时,以更加符合人耳特性,人耳基底膜延时与频率成反比关系,伽马通滤波器的群延时用表达式(16)来描述:A delay is introduced into the gamma-pass filter bank to be more in line with the characteristics of the human ear. The delay of the basilar membrane of the human ear is inversely proportional to the frequency. The group delay of the gamma-pass filter is described by expression (16):

Figure BDA0001156792220000061
Figure BDA0001156792220000061

式中,m通道群延时tm的单位是秒,第m组滤波器的中心频率fc的单位是Hz;In the formula, the unit of the m-channel group delay tm is seconds, and the unit of the center frequency fc of the mth group of filters is Hz;

具体步骤包括:Specific steps include:

3.1)计算各通道延时:语音的采样率为fs,则采样后的各个通道的延时dm用如表达式(17)、(18)来进行计算:3.1) Calculate the delay of each channel: the sampling rate of speech is fs , then the delay dm of each channel after sampling is calculated by expressions (17) and (18):

dm=D-[fstm] (17)dm =D-[fs tm ] (17)

Figure BDA0001156792220000062
Figure BDA0001156792220000062

其中D为[fstm]中的最大值;where D is the maximum value in [fs tm ];

3.2)对各个通道在总滤波器中所占比重进行加权,则合成语音用表达式(8)来计算;设m通道的权重为wm,该权重合并到gN中,调增后的gN用如下表达式计算:3.2) Weight the proportion of each channel in the total filter, then the synthesized speech is calculated by expression (8); let the weight of m channel be wm , the weight is merged into gN , and the increased gN is calculated with the following expression:

Figure BDA0001156792220000063
Figure BDA0001156792220000063

此时,最终合成语音输出如式(20)所示:At this point, the final synthesized speech output is shown in equation (20):

Figure BDA0001156792220000064
Figure BDA0001156792220000064

其中,当k≤dm时ym(k-dm)=0;语音实时分解、合成任务完成。Wherein, when k≤dm , ym (kdm )=0; the real-time speech decomposition and synthesis tasks are completed.

本发明的特点及有益效果在于:The characteristics and beneficial effects of the present invention are:

1)本发明有系统详细的理论推导过程,给出了各参数的理论计算方法,增强了算法实现的可操作性。1) The present invention has a systematic and detailed theoretical derivation process, provides a theoretical calculation method for each parameter, and enhances the operability of the algorithm implementation.

2)本发明不仅能完成语音分解操作,而且还提供了语音分解的逆变换过程,即支持后续对语音的合成操作。2) The present invention can not only complete the voice decomposition operation, but also provide the inverse transformation process of the voice decomposition, that is, support the subsequent voice synthesis operation.

3)本发明的所有操作均在时域上完成,避免了使用傅里叶变换以及逆变换等操作。3) All operations of the present invention are completed in the time domain, and operations such as Fourier transform and inverse transform are avoided.

4)本发明解决了实时性问题,能实时对语音进行分解、综合操作,扩大了其应用范围。4) The present invention solves the real-time problem, can decompose and synthesize speech in real time, and expand its application range.

5)针对计算复杂度过高、不利于算法硬件实现的问题,本发明提出了一套完整的定点化方案,为算法的硬件实现节约了大量资源。此外还使用了流水线技术,降低了关键路径延时,降低了方法的计算复杂度。5) In view of the problem that the computational complexity is too high and is not conducive to the hardware implementation of the algorithm, the present invention proposes a complete set of fixed-point solutions, which saves a lot of resources for the hardware implementation of the algorithm. In addition, pipeline technology is used, which reduces the critical path delay and reduces the computational complexity of the method.

附图说明Description of drawings

图1为现有方法中语音分解阶段使用的伽马通数字滤波方框图。FIG. 1 is a block diagram of gamma pass digital filtering used in the speech decomposition stage in the existing method.

图2为现有方法中合成语音阶段的总幅度响应曲线。Fig. 2 is the total amplitude response curve of the synthesized speech stage in the existing method.

图3为人耳的等响度曲线。Figure 3 shows the equal loudness curve of the human ear.

图4为本发明使用的定点化滤波算法的方框图Figure 4 is a block diagram of a fixed-point filtering algorithm used in the present invention

图5本发明中合成语音阶段的总幅度响应曲线。Figure 5. Overall amplitude response curve of the synthesized speech stage in the present invention.

具体实施方式Detailed ways

本发明提出的一种基于听觉感知特性的数字语音实时分解/合成方法,下面结合附图及具体实施例进一步说明如下:A real-time decomposition/synthesis method of digital speech based on auditory perception characteristics proposed by the present invention is further described below in conjunction with the accompanying drawings and specific embodiments:

本方法的与已有技术的主要区别是使用一组伽马通滤波器来模拟人耳的基底膜,基底膜上每个位置的滤波特性都可以用一个伽马通滤波器来描述,同时该方法参考了人耳基底膜延时特性和等响度曲线特性,进而实现对语音的分解和合成。The main difference between this method and the prior art is that a set of gamma-pass filters are used to simulate the basilar membrane of the human ear. The filtering characteristics of each position on the basilar membrane can be described by a gamma-pass filter. The method refers to the delay characteristics of the basilar membrane of the human ear and the equal loudness curve characteristics, and then realizes the decomposition and synthesis of speech.

该方法的具体步骤如下:The specific steps of this method are as follows:

1)构建任意阶的伽马通数字滤波器模型(包括每个滤波器的带宽、中心频率即位置参数信息):1) Build a gamma pass digital filter model of any order (including the bandwidth and center frequency of each filter, that is, the position parameter information):

假设滤波器组数目为M,该M组滤波器对应着人耳基底膜上的M个位置,并在人耳基底膜上是均匀分布的,在频域上是对数分布的;具体包括:Assuming that the number of filter groups is M, the M groups of filters correspond to M positions on the basilar membrane of the human ear, and are uniformly distributed on the basilar membrane of the human ear and logarithmically distributed in the frequency domain; specifically include:

1.1)已知输入语音的采样率为fs1.1) The sampling rate of the known input speech is fs ;

设通过滤波器的语音频率范围为[fL,fH],0≤fL≤fH≤fs/2;Let the range of speech frequencies passing through the filter be [fL , fH ], 0≤fL ≤fH ≤fs /2;

1.2)由表达式(1-2):

Figure BDA0001156792220000071
得出中心频率fc在ERBs域上的值分布为[ERBs(fL),ERBs(fH)],将其均分成M-1份得到等间距的M个ERBs值如式(1)所示:1.2) By expression (1-2):
Figure BDA0001156792220000071
It is obtained that the value distribution of the center frequency fc on the ERBs domain is [ERBs(fL ), ERBs(fH )], and it is divided into M-1 parts to obtain M ERBs values with equal spacing as shown in formula (1) Show:

Figure BDA0001156792220000072
Figure BDA0001156792220000072

其中,m∈[1,M],代表通道号;Among them, m∈[1,M], represents the channel number;

1.3)根据式(1)的计算结果得到M组滤波器的中心频率fc在ERBs域上的值如式(2)所示:1.3) According to the calculation result of formula (1), the value of the center frequency fc of the M groups of filters on the ERBs domain is obtained as shown in formula (2):

Figure BDA0001156792220000073
Figure BDA0001156792220000073

1.4)针对b(fc)与ERB(fc)的关系:基于b=ERB(fc),根据帕塞瓦尔(Paseval)定理,得出N阶伽马通滤波器中滤波器的中心频率fc的带宽函数b(fc)表达式如式(2a)所示:1.4) Regarding the relationship between b(fc ) and ERB(fc ): Based on b=ERB(fc ), according to Paseval’s theorem, the center frequency of the filter in the N-order gamma-pass filter is obtained The expression of the bandwidth function b(fc ) of fc is shown in formula (2a):

Figure BDA0001156792220000081
Figure BDA0001156792220000081

其中,b代表函数的带宽,N为任意正整数;Among them, b represents the bandwidth of the function, and N is any positive integer;

1.5)用N级级联的二阶带通滤波器构成一个N阶的伽马通滤波器;对每个伽马通滤波器的时域表达式(1-3):

Figure BDA0001156792220000082
进行Laplace变换得到s域表达式如式(2b)所示:1.5) Form an N-order gamma-pass filter with N-stage cascaded second-order band-pass filters; the time-domain expression (1-3) for each gamma-pass filter:
Figure BDA0001156792220000082
The Laplace transform is performed to obtain the s-domain expression as shown in formula (2b):

Figure BDA0001156792220000083
Figure BDA0001156792220000083

将式(2b)分解成零极点相乘得到如表达式(2c)所示:Decomposing equation (2b) into zeros and poles and multiplying them together, we get the following equation (2c):

Figure BDA0001156792220000084
Figure BDA0001156792220000084

使用冲激响应不变法得到N阶伽马通数字滤波器的z域表达式(2d):The z-domain expression (2d) of the Nth order gamma-pass digital filter is obtained using the impulse response invariant method:

Figure BDA0001156792220000085
Figure BDA0001156792220000085

其中n=1,2,…,N,sn为表达式分子的零点,an、b1、b2的含义分别为各级滤波器的抽头系数;wheren =1,2,...,N ,sn is the zero point of the numerator of the expression, and the meanings of an, b1 and b2 are the tap coefficients of filters at all levels;

an的表达式如(1-10)所示:

Figure BDA0001156792220000086
b1、b2的表达式如(1-11)所示:
Figure BDA0001156792220000087
The expression of an is shown in (1-10):
Figure BDA0001156792220000086
The expressions of b1 and b2 are shown in (1-11):
Figure BDA0001156792220000087

由此将表达式(1-4)和(1-9)推广到了N为任意正整数的情况。以上结果将一个N阶的伽马通滤波器用N级级联的二阶带通滤波器来构成。Thus, expressions (1-4) and (1-9) are generalized to the case where N is any positive integer. The above results construct an N-order gamma-pass filter with N-stage cascaded second-order band-pass filters.

1.6)计算归一化参数gn:(由于伽马通滤波器的幅度响应曲线是近似对称的,伽马通滤波器的幅度最大值在中心频率fc处取得,)因此伽马通滤波器各级的二阶滤波器的最大增益如式(2e)所示:1.6) Calculate the normalization parameter gn : (because the amplitude response curve of the gamma-pass filter is approximately symmetrical, and the amplitude maximum value of the gamma-pass filter is obtained at the center frequencyfc ,) Therefore, the gamma-pass filter The maximum gain of the second-order filter at each stage is shown in formula (2e):

Figure BDA0001156792220000088
Figure BDA0001156792220000088

归一化参数gn如式(2f)所示:The normalization parameter gn is shown in formula (2f):

Figure BDA0001156792220000089
Figure BDA0001156792220000089

1.7)根据步骤1.5)中得到的N级级联的二阶带通滤波器来构成任意阶的伽马通数字滤波器模型,并获取模型的各参数值:用m表示第m组伽马通滤波器组,则由表达式(1-10)、(1-11)、(2e)和(2f)分别得出各个滤波器组的参数

Figure BDA0001156792220000091
的值,其中
Figure BDA0001156792220000092
是各个通道的滤波器抽头系数,
Figure BDA0001156792220000093
为各个通道的归一化系数,如式(3)-式(6)所示:1.7) form the gamma pass digital filter model of any order according to the second-order band-pass filter of the N-level cascade obtained in step 1.5), and obtain each parameter value of the model: use m to represent the mth group of gamma pass filter bank, the parameters of each filter bank are obtained from the expressions (1-10), (1-11), (2e) and (2f) respectively
Figure BDA0001156792220000091
value, where
Figure BDA0001156792220000092
are the filter tap coefficients for each channel,
Figure BDA0001156792220000093
is the normalization coefficient of each channel, as shown in Equation (3)-Equation (6):

Figure BDA0001156792220000094
Figure BDA0001156792220000094

Figure BDA0001156792220000095
Figure BDA0001156792220000095

Figure BDA0001156792220000096
Figure BDA0001156792220000096

Figure BDA0001156792220000097
Figure BDA0001156792220000097

2)语音分解阶段;2) Speech decomposition stage;

利用步骤1)构建的伽马通数字滤波器模型,模仿人耳基底膜对语音进行分解:将输入语音实时地分解到M个子带上,最小处理单位是单个语音采样点,同时该处理过程均是在时域上进行的(不需要将语音变换到频域上),得到M路的语音数据;Use the gamma pass digital filter model constructed in step 1) to decompose speech by imitating the basilar membrane of the human ear: decompose the input speech into M subbands in real time, and the minimum processing unit is a single speech sampling point. It is carried out in the time domain (it is not necessary to convert the speech to the frequency domain), and M channels of speech data are obtained;

首先假设输入语音为x(k),采样率为fs,使用M路伽马通滤波器采用浮点算法或定点算法将输入语音分解为M路信号,每一路的输出信号用ym(k)表示,具体包括:First, assume that the input speech is x(k) and the sampling rate is fs . Use M channels of gamma pass filters to decompose the input speech into M channels of signals using floating-point algorithm or fixed-point algorithm, and the output signal of each channel is represented by ym (k ) means, specifically including:

用于软件仿真时采用浮点算法将输入语音依次通过M路伽马通滤波器得到M组语音输出信号,如式(7)-式(10)所示:When used for software simulation, the floating-point algorithm is used to sequentially pass the input speech through M channels of gamma-pass filters to obtain M groups of speech output signals, as shown in equations (7)-(10):

Figure BDA0001156792220000098
Figure BDA0001156792220000098

Figure BDA0001156792220000099
Figure BDA0001156792220000099

Figure BDA00011567922200000910
Figure BDA00011567922200000910

Figure BDA00011567922200000911
Figure BDA00011567922200000911

其中,m∈[1,M]代表通道号,n∈[1,4]指明表达式描述的是四级级联结构中的具体级数;ym(k)代表每一路的语音输出。

Figure BDA00011567922200000912
代表每个通道的语音输入,
Figure BDA00011567922200000913
是各个通道的滤波器抽头系数,
Figure BDA00011567922200000914
为各个通道的归一化系数;Among them, m∈[1,M] represents the channel number, n∈[1,4] indicates that the expression describes the specific series in the four-level cascade structure; ym (k) represents the speech output of each channel.
Figure BDA00011567922200000912
represents the voice input for each channel,
Figure BDA00011567922200000913
are the filter tap coefficients for each channel,
Figure BDA00011567922200000914
is the normalization coefficient of each channel;

用于硬件实现时采用定点算法将输入语音依次通过M路伽马通滤波器得到M租语音输出信号When used for hardware implementation, the fixed-point algorithm is used to sequentially pass the input speech through M-way gamma-pass filters to obtain M leased speech output signals

(针对计算复杂度过高、不利于算法硬件实现的问题,本发明提出了一套完整的定点化方案,为算法的硬件实现节约了大量资源;该算法同样将输入语音依次通过M路伽马通滤波器得到M租语音输出信号。图4为本发明使用的定点化滤波算法的方框图,流程与图1相似,但其中所有参数均为定点化处理后的结果,改进后,算法的计算时间周期缩短到原来的1/4,将算法的计算能力提升了4倍,从而达到减少运算资源消耗、降低功耗的目的。具体包括以下步骤:(Aiming at the problem that the computational complexity is too high and is not conducive to the hardware implementation of the algorithm, the present invention proposes a complete set of fixed-point solutions, which saves a lot of resources for the hardware implementation of the algorithm; the algorithm also sequentially passes the input speech through M channels of gamma Pass filter obtains M lease speech output signal.Fig. 4 is the block diagram of the fixed point filtering algorithm that the present invention uses, and flow process is similar to Fig. 1, but wherein all parameters are the result after fixed point processing, after improvement, the calculation time of algorithm The cycle is shortened to 1/4 of the original, and the computing power of the algorithm is increased by 4 times, thereby achieving the purpose of reducing computing resource consumption and power consumption. Specifically, the following steps are included:

2.1)对各个滤波器组的各参数进行定点化处理,即使参数扩大E=2p倍,然后分别取整数,如式(11)-(14)所示:2.1) Perform fixed-point processing on the parameters of each filter bank, even if the parameters are expanded by E=2p times, and then take integers respectively, as shown in equations (11)-(14):

Figure BDA0001156792220000101
Figure BDA0001156792220000101

Figure BDA0001156792220000102
Figure BDA0001156792220000102

Figure BDA0001156792220000103
Figure BDA0001156792220000103

Figure BDA0001156792220000104
Figure BDA0001156792220000104

各式中[·]表示最接近·的整数;In each formula, [ ] represents the nearest integer;

2.2)对分别表示第m路伽马通滤波器中第n级的输入语音信号和输入语音信号的中间运算数据

Figure BDA0001156792220000105
进行定点化处理:即根据表达式(2e)得到最大增益Gain值随着中心频率fc的变化关系,由此得出最大增益Gainmax,因此当输入语音为L比特时,中间运算结果的位宽设为Q比特,则Q的值为:2.2) The intermediate operation data representing the n-th stage of the m-th gamma pass filter and the input speech signal respectively
Figure BDA0001156792220000105
Perform fixed-point processing: that is, according to the expression (2e), the relationship between the maximum gain Gain value and the center frequency fc is obtained, and the maximum gain Gainmax is obtained. Therefore, when the input speech is L bits, the bits of the intermediate operation result are If the width is set to Q bits, the value of Q is:

Q=L+[log2(Gainmax)] (15)Q=L+[log2 (Gainmax )] (15)

其中[·]代表取不小于Q的最小整数;由此得到如图4所示的定点化滤波算法,以及每一路语音的输入输出;Wherein [ ] represents the smallest integer not less than Q; thus obtain the fixed-point filtering algorithm as shown in Figure 4, and the input and output of each channel of speech;

3)语音合成阶段;3) Speech synthesis stage;

在步骤2)中语音信号通过N阶Gammatone滤波器,被分解到N个子带上,可对分解后的语音信号进行语音增强、语音识别等处理(例如使用波束形成、计算听觉场景分析等常用语音增强算法);处理后各路信号可通过直接叠加的操作重新合成,进而更好地还原语音。In step 2), the speech signal is decomposed into N subbands through the N-order Gammatone filter, and the decomposed speech signal can be processed by speech enhancement, speech recognition, etc. Enhancement algorithm); after processing, the signals of each channel can be resynthesized by the operation of direct superposition, so as to restore the voice better.

本发明在合成阶段参考了人耳基底膜神经延时特性,给出了伽马通滤波器的通道延时(时域延时)。人耳基底膜神经延时是指人耳基底膜接收语音信号,到将语音信号传递给大脑所需时间对于不同频率的声音是不一样的,因此在伽马通滤波器组中引入一定量的延时,更加符合人耳特性,人耳基底膜延时与频率成反比关系,基于以上分析,伽马通滤波器的群延时(相位变化随着频率变化的快慢程度)用表达式(16)来描述:In the synthesis stage, the present invention provides the channel delay (time domain delay) of the gamma pass filter with reference to the delay characteristics of the human ear basilar membrane nerve. The basilar membrane nerve delay of the human ear means that the time required for the basilar membrane of the human ear to receive the speech signal and transmit the speech signal to the brain is different for sounds of different frequencies, so a certain amount of gamma pass filter bank is introduced. The delay is more in line with the characteristics of the human ear. The delay of the basilar membrane of the human ear is inversely proportional to the frequency. Based on the above analysis, the group delay of the gamma-pass filter (the speed of the phase change with the frequency) is expressed as (16 ) to describe:

Figure BDA0001156792220000106
Figure BDA0001156792220000106

式中,m通道群延时tm的单位是秒,第m组滤波器的中心频率fc的单位是Hz。In the formula, the unit of the m-channel group delay tm is seconds, and the unit of the center frequency fc of the mth group of filters is Hz.

本发明的语音合成过程参考了人耳基底膜的延时特性,在语音合成前对各通道的输出分别引入适当的延时,然后再直接相加,这样可以极大地减弱各通道间的相互干扰,使得语音能够逐点计算各个数字语音的合成与分解,从而到达实时处理的目的。具体步骤包括:The speech synthesis process of the present invention refers to the delay characteristics of the basilar membrane of the human ear. Before speech synthesis, an appropriate delay is introduced to the output of each channel, and then directly added, which can greatly reduce the mutual interference between the channels. , so that the speech can calculate the synthesis and decomposition of each digital speech point by point, so as to achieve the purpose of real-time processing. Specific steps include:

3.1)计算各通道延时:语音的采样率为fs,则采样后的各个通道的延时dm用如表达式(17)、(18)来进行计算:3.1) Calculate the delay of each channel: the sampling rate of speech is fs , then the delay dm of each channel after sampling is calculated by expressions (17) and (18):

dm=D-[fstm] (17)dm =D-[fs tm ] (17)

Figure BDA0001156792220000111
Figure BDA0001156792220000111

其中D为[fstm]中的最大值。where D is the maximum value in [fs tm ].

3.2)(根据图3所示的人耳的等响度曲线,要达到相同的响度,高频需要较高的幅值,低频需要较低的幅值。)对各个通道在总滤波器中所占比重进行加权,则合成语音用表达式(8)来计算;设m通道的权重为wm,在实际操作中,该权重可以合并到gN中,调增后的gN用如下表达式计算:3.2) (According to the equal loudness curve of the human ear shown in Figure 3, to achieve the same loudness, high frequencies need a higher amplitude, and low frequencies need a lower amplitude.) For each channel's share in the total filter If the weight is weighted, the synthesized speech is calculated by expression (8); let the weight of m channel be wm , in practice, the weight can be merged into gN , and the increased gN is calculated by the following expression :

Figure BDA0001156792220000112
Figure BDA0001156792220000112

此时,最终合成语音输出如式(20)所示:At this point, the final synthesized speech output is shown in equation (20):

Figure BDA0001156792220000113
Figure BDA0001156792220000113

其中,当k≤dm时ym(k-dm)=0;语音实时分解、合成任务完成。Wherein, when k≤dm , ym (kdm )=0; the real-time speech decomposition and synthesis tasks are completed.

图5为采用本发明方法改进后语音合成阶段的幅度响应曲线。根据人耳等响度曲线调整通道权重后,合成语音方法的总幅度响应曲线接近理想带通滤波器效果,其中通道数目M=64,通道中心频率分布为50Hz~7500Hz,在7500Hz以内的频率范围内幅度响应较大,超过频率上限之后幅度衰减较快。Fig. 5 is the amplitude response curve of the speech synthesis stage after adopting the method of the present invention to improve. After adjusting the channel weights according to the equal loudness curve of the human ear, the total amplitude response curve of the synthetic speech method is close to the ideal band-pass filter effect, where the number of channels is M=64, and the channel center frequency distribution is 50Hz~7500Hz, within the frequency range of 7500Hz The amplitude response is large, and the amplitude decays faster after exceeding the upper frequency limit.

Claims (3)

Translated fromChinese
1.一种基于听觉感知特性的数字语音实时分解/合成方法,其特征在于,该方法具体步骤如下:1. a kind of digital speech real-time decomposition/synthesis method based on auditory perception characteristic, is characterized in that, the concrete steps of this method are as follows:1)构建任意阶的伽马通数字滤波器模型:1) Build a gamma pass digital filter model of any order:假设滤波器组数目为M,该M组滤波器对应着人耳基底膜上的M个位置,并在人耳基底膜上是均匀分布的,在频域上是对数分布的;具体包括:Assuming that the number of filter groups is M, the M groups of filters correspond to M positions on the basilar membrane of the human ear, and are uniformly distributed on the basilar membrane of the human ear and logarithmically distributed in the frequency domain; specifically include:1.1)已知输入语音的采样率为fs1.1) The sampling rate of the known input speech is fs ;设通过滤波器的语音频率范围为[fL,fH],0≤fL<fH≤fs/2;Assume that the range of speech frequencies passing through the filter is [fL , fH ], 0≤fL <fH ≤fs /2;1.2)根据表达式(1-2):
Figure FDA0001156792210000011
得出中心频率fc在ERBs域上的值分布为[ERBs(fL),ERBs(fH)],将其均分成M-1份得到等间距的M个ERBs值如式(1)所示:1.2) According to expression (1-2):
Figure FDA0001156792210000011
It is obtained that the value distribution of the center frequency fc on the ERBs domain is [ERBs(fL ), ERBs(fH )], and it is divided into M-1 parts to obtain M ERBs values with equal spacing as shown in formula (1) Show:
Figure FDA0001156792210000012
Figure FDA0001156792210000012
其中,m∈[1,M],代表通道号;Among them, m∈[1,M], represents the channel number;1.3)根据式(1)的计算结果得到M组滤波器的中心频率fc在ERBs域上的值如式(2)所示:1.3) According to the calculation result of formula (1), the value of the center frequency fc of the M groups of filters on the ERBs domain is obtained as shown in formula (2):
Figure FDA0001156792210000013
Figure FDA0001156792210000013
1.4)针对b(fc)与ERB(fc)的关系:基于b=ERB(fc),根据帕塞瓦尔(Paseval)定理,得出N阶伽马通滤波器中滤波器的中心频率fc的带宽函数b(fc)表达式如式(2a)所示:1.4) For the relationship between b(fc ) and ERB(fc ): Based on b=ERB(fc ), according to Paseval’s theorem, the center frequency of the filter in the N-order gamma-pass filter is obtained The expression of the bandwidth function b(fc ) of fc is shown in formula (2a):
Figure FDA0001156792210000014
Figure FDA0001156792210000014
其中,b代表函数的带宽,N为任意正整数;Among them, b represents the bandwidth of the function, and N is any positive integer;1.5)用N级级联的二阶带通滤波器构成一个N阶的伽马通滤波器;对每个伽马通滤波器的时域表达式(1-3):
Figure FDA0001156792210000015
进行Laplace变换得到s域表达式如式(2b)所示:
1.5) Form an N-order gamma-pass filter with N-stage cascaded second-order band-pass filters; the time-domain expression (1-3) for each gamma-pass filter:
Figure FDA0001156792210000015
The Laplace transform is performed to obtain the s-domain expression as shown in formula (2b):
Figure FDA0001156792210000016
Figure FDA0001156792210000016
将式(2b)分解成零极点相乘得到如表达式(2c)所示:Decomposing equation (2b) into zeros and poles and multiplying them together, we get the following equation (2c):
Figure FDA0001156792210000017
Figure FDA0001156792210000017
使用冲激响应不变法得到N阶伽马通数字滤波器的z域表达式(2d):The z-domain expression (2d) of the Nth order gamma-pass digital filter is obtained using the impulse response invariant method:
Figure FDA0001156792210000021
Figure FDA0001156792210000021
其中n=1,2,…,N,sn为表达式分子的零点,an、b1、b2的含义分别为各级滤波器的抽头系数;wheren =1,2,...,N ,sn is the zero point of the numerator of the expression, and the meanings of an, b1 and b2 are the tap coefficients of filters at all levels;an的表达式如(1-10)所示:
Figure FDA0001156792210000022
b1、b2的表达式如(1-11)所示:
Figure FDA0001156792210000023
The expression of an is shown in (1-10):
Figure FDA0001156792210000022
The expressions of b1 and b2 are shown in (1-11):
Figure FDA0001156792210000023
1.6)计算归一化参数gn:伽马通滤波器各级的二阶滤波器的最大增益如式(2e)所示:1.6) Calculate the normalization parameter gn : the maximum gain of the second-order filter of each stage of the gamma pass filter is shown in formula (2e):
Figure FDA0001156792210000024
Figure FDA0001156792210000024
归一化参数gn如式(2f)所示:The normalization parameter gn is shown in formula (2f):
Figure FDA0001156792210000025
Figure FDA0001156792210000025
1.7)根据步骤1.5)中得到的N级级联的二阶带通滤波器来构成任意阶的伽马通数字滤波器模型,并获取模型的各参数值:用m表示第m组伽马通滤波器组,则由表达式(1-10)、(1-11)、(2e)和(2f)分别得出各个滤波器组的参数
Figure FDA0001156792210000026
的值,其中
Figure FDA0001156792210000027
是各个通道的滤波器抽头系数,
Figure FDA0001156792210000028
为各个通道的归一化系数,如式(3)-式(6)所示:
1.7) form the gamma pass digital filter model of any order according to the second-order band-pass filter of the N-level cascade obtained in step 1.5), and obtain each parameter value of the model: use m to represent the mth group of gamma pass filter bank, the parameters of each filter bank are obtained from the expressions (1-10), (1-11), (2e) and (2f) respectively
Figure FDA0001156792210000026
value, where
Figure FDA0001156792210000027
are the filter tap coefficients for each channel,
Figure FDA0001156792210000028
is the normalization coefficient of each channel, as shown in Equation (3)-Equation (6):
Figure FDA0001156792210000029
Figure FDA0001156792210000029
Figure FDA00011567922100000210
Figure FDA00011567922100000210
Figure FDA00011567922100000211
Figure FDA00011567922100000211
Figure FDA00011567922100000212
Figure FDA00011567922100000212
2)语音分解阶段;2) Speech decomposition stage;利用步骤1)构建的伽马通数字滤波器模型,模仿人耳基底膜对语音进行分解:将输入语音实时地分解到M个子带上,使用M路伽马通滤波器采用浮点算法或定点算法将输入语音分解为M路信号;Use the gamma-pass digital filter model constructed in step 1) to decompose speech by imitating the basilar membrane of the human ear: decompose the input speech into M sub-bands in real time, and use M-way gamma-pass filters to use floating-point arithmetic or fixed-point The algorithm decomposes the input speech into M signals;3)语音合成阶段;3) Speech synthesis stage;在伽马通滤波器组中引入延时,以更加符合人耳特性,人耳基底膜延时与频率成反比关系,伽马通滤波器的群延时用表达式(16)来描述:A delay is introduced into the gamma-pass filter bank to be more in line with the characteristics of the human ear. The delay of the basilar membrane of the human ear is inversely proportional to the frequency. The group delay of the gamma-pass filter is described by expression (16):
Figure FDA0001156792210000031
Figure FDA0001156792210000031
式中,m通道群延时tm的单位是秒,第m组滤波器的中心频率fc的单位是Hz;In the formula, the unit of the m-channel group delay tm is seconds, and the unit of the center frequency fc of the m-th filter group is Hz;具体步骤包括:Specific steps include:3.1)计算各通道延时:语音的采样率为fs,则采样后的各个通道的延时dm用如表达式(17)、(18)来进行计算:3.1) Calculate the delay of each channel: the sampling rate of speech is fs , then the delay dm of each channel after sampling is calculated by expressions (17) and (18):dm=D-[fstm] (17)dm =D-[fs tm ] (17)
Figure FDA0001156792210000032
Figure FDA0001156792210000032
其中D为[fstm]中的最大值;where D is the maximum value in [fs tm ];3.2)对各个通道在总滤波器中所占比重进行加权,则合成语音用表达式(8)来计算;设m通道的权重为wm,该权重合并到gN中,调增后的gN用如下表达式计算:3.2) Weight the proportion of each channel in the total filter, then the synthesized speech is calculated by expression (8); let the weight of m channel be wm , this weight is merged into gN , and the increased gN is calculated with the following expression:
Figure FDA0001156792210000033
Figure FDA0001156792210000033
此时,最终合成语音输出如式(20)所示:At this point, the final synthesized speech output is shown in equation (20):
Figure FDA0001156792210000034
Figure FDA0001156792210000034
其中,当k≤dm时ym(k-dm)=0;语音实时分解、合成任务完成。Among them, when k≤dm , ym (kdm )=0; the real-time speech decomposition and synthesis tasks are completed.2.如权利要求1所述数字语音实时分解/合成方法,其特征在于,所述步骤2)用于软件仿真时采用浮点算法,具体包括:2. the real-time decomposition/synthesis method of digital speech as claimed in claim 1, is characterized in that, described step 2) adopts floating-point arithmetic when being used for software simulation, specifically comprises:将输入语音依次通过M路伽马通滤波器得到M组语音输出信号,如式(7)-式(10)所示:The input speech is sequentially passed through M-way gamma-pass filters to obtain M groups of speech output signals, as shown in equations (7)-(10):
Figure FDA0001156792210000035
Figure FDA0001156792210000035
Figure FDA0001156792210000036
Figure FDA0001156792210000036
Figure FDA0001156792210000037
Figure FDA0001156792210000037
Figure FDA0001156792210000038
Figure FDA0001156792210000038
其中,m∈[1,M]代表通道号,n∈[1,4]指明表达式描述的是四级级联结构中的具体级数;ym(k)代表每一路的语音输出;
Figure FDA0001156792210000039
代表每个通道的语音输入,
Figure FDA00011567922100000310
是各个通道的滤波器抽头系数,
Figure FDA00011567922100000311
为各个通道的归一化系数。
Among them, m∈[1,M] represents the channel number, n∈[1,4] indicates that the expression describes the specific series in the four-level cascade structure; ym (k) represents the speech output of each channel;
Figure FDA0001156792210000039
represents the voice input for each channel,
Figure FDA00011567922100000310
are the filter tap coefficients for each channel,
Figure FDA00011567922100000311
is the normalization coefficient for each channel.
3.如权利要求1所述数字语音实时分解/合成方法,其特征在于,所述步骤2)用于硬件实现时采用定点算法将输入语音依次通过M路伽马通滤波器得到M租语音输出信号,具体包括以下步骤:3. the real-time decomposition/synthesis method of digital speech as claimed in claim 1, is characterized in that, when described step 2) is used for hardware realization, adopt fixed-point algorithm to obtain M rented speech output successively by M road gamma pass filter by input speech signal, which includes the following steps:2.1)对各个滤波器组的各参数进行定点化处理,即使参数扩大E=2p倍,然后分别取整数,如式(11)-(14)所示:2.1) Perform fixed-point processing on the parameters of each filter bank, even if the parameters are expanded by E=2p times, and then take integers respectively, as shown in equations (11)-(14):
Figure FDA0001156792210000041
Figure FDA0001156792210000041
Figure FDA0001156792210000042
Figure FDA0001156792210000042
Figure FDA0001156792210000043
Figure FDA0001156792210000043
Figure FDA0001156792210000044
Figure FDA0001156792210000044
各式中[·]表示最接近·的整数;In each formula, [ ] represents the nearest integer;2.2)对分别表示第m路伽马通滤波器中第n级的输入语音信号和输入语音信号的中间运算数据
Figure FDA0001156792210000045
进行定点化处理:即根据表达式(2e)得到最大增益Gain值随着中心频率fc的变化关系,由此得出最大增益Gainmax,因此当输入语音为L比特时,中间运算结果的位宽设为Q比特,则Q的值为:
2.2) The intermediate operation data representing the n-th input speech signal and the input speech signal in the m-th gamma pass filter respectively
Figure FDA0001156792210000045
Perform fixed-point processing: that is, according to the expression (2e), the relationship between the maximum gain Gain value and the center frequency fc is obtained, and the maximum gain Gainmax is obtained. Therefore, when the input speech is L bits, the bits of the intermediate operation result are If the width is set to Q bits, the value of Q is:
Q=L+[log2(Gainmax)] (15)Q=L+[log2 (Gainmax )] (15)其中[·]代表取不小于Q的最小整数;由此得到每一路语音的输入输出。Where [·] represents the smallest integer not less than Q; thus, the input and output of each channel of speech are obtained.
CN201611026399.6A2016-11-182016-11-18 A real-time decomposition/synthesis method of digital speech based on auditory perception characteristicsActiveCN106601249B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201611026399.6ACN106601249B (en)2016-11-182016-11-18 A real-time decomposition/synthesis method of digital speech based on auditory perception characteristics

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201611026399.6ACN106601249B (en)2016-11-182016-11-18 A real-time decomposition/synthesis method of digital speech based on auditory perception characteristics

Publications (2)

Publication NumberPublication Date
CN106601249A CN106601249A (en)2017-04-26
CN106601249Btrue CN106601249B (en)2020-06-05

Family

ID=58592464

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201611026399.6AActiveCN106601249B (en)2016-11-182016-11-18 A real-time decomposition/synthesis method of digital speech based on auditory perception characteristics

Country Status (1)

CountryLink
CN (1)CN106601249B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110459235A (en)*2019-08-152019-11-15深圳乐信软件技术有限公司 A reverberation elimination method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103325381A (en)*2013-05-292013-09-25吉林大学Speech separation method based on fuzzy membership function
CN103440871A (en)*2013-08-212013-12-11大连理工大学Method for suppressing transient noise in voice
CN103714810A (en)*2013-12-092014-04-09西北核技术研究所Vehicle model feature extraction method based on Grammatone filter bank

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8311812B2 (en)*2009-12-012012-11-13Eliza CorporationFast and accurate extraction of formants for speech recognition using a plurality of complex filters in parallel
CN102456351A (en)*2010-10-142012-05-16清华大学Voice enhancement system
CN102438189B (en)*2011-08-302014-07-09东南大学Dual-channel acoustic signal-based sound source localization method
US20130297299A1 (en)*2012-05-072013-11-07Board Of Trustees Of Michigan State UniversitySparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech and Speaker Recognition
CN102881289B (en)*2012-09-112014-04-02重庆大学Hearing perception characteristic-based objective voice quality evaluation method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN103325381A (en)*2013-05-292013-09-25吉林大学Speech separation method based on fuzzy membership function
CN103440871A (en)*2013-08-212013-12-11大连理工大学Method for suppressing transient noise in voice
CN103714810A (en)*2013-12-092014-04-09西北核技术研究所Vehicle model feature extraction method based on Grammatone filter bank

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"A Realtime Analysis/Synthesis Gammatone Filterbank";Youwei Yang等;《2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC)》;20150919;第Ⅱ-Ⅴ节内容*
"实时的Gammatone听感知滤波器组的FPGA实现";贾瑞 等;《微电子学与计算机》;20150930;第32卷(第1期);摘要,第3节 系统验证及结构分析*

Also Published As

Publication numberPublication date
CN106601249A (en)2017-04-26

Similar Documents

PublicationPublication DateTitle
CN115410583B (en)Perceptual based loss function for audio encoding and decoding based on machine learning
CN102017402B (en)System for adjusting perceived loudness of audio signals
LyonA computational model of filtering, detection, and compression in the cochlea
CN102157156B (en)Single-channel voice enhancement method and system
CN111986660B (en) A single-channel speech enhancement method, system and storage medium based on neural network sub-band modeling
CN103714810B (en)Vehicle feature extracting method based on Gammatone bank of filters
CN105931649A (en)Ultra-low time delay audio processing method and system based on spectrum analysis
CN102447993A (en)Sound scene manipulation
CN105788607A (en)Speech enhancement method applied to dual-microphone array
CN105679330B (en)Based on the digital deaf-aid noise-reduction method for improving subband signal-to-noise ratio (SNR) estimation
US7787640B2 (en)System and method for spectral enhancement employing compression and expansion
CN103456312A (en)Single channel voice blind separation method based on computational auditory scene analysis
Thakur et al.FPGA implementation of the CAR model of the cochlea
CN107274887A (en)Speaker&#39;s Further Feature Extraction method based on fusion feature MGFCC
CN116386654B (en)Wind noise suppression method, device, equipment and computer readable storage medium
Barros et al.Estimation of speech embedded in a reverberant and noisy environment by independent component analysis and wavelets
CN106034274A (en)3D sound device based on sound field wave synthesis and synthetic method
CN106601249B (en) A real-time decomposition/synthesis method of digital speech based on auditory perception characteristics
CN110010150A (en) A method for extracting speech feature parameters of auditory perception based on multi-resolution
CN112397090B (en) A real-time sound classification method and system based on FPGA
CN112863517B (en) Speech Recognition Method Based on Convergence Rate of Perceptual Spectrum
Drgas et al.Dynamic processing neural network architecture for hearing loss compensation
Agcaer et al.Optimization of amplitude modulation features for low-resource acoustic scene classification
Yang et al.A realtime analysis/synthesis Gammatone filterbank
Paatero et al.Modeling and equalization of audio systems using Kautz filters

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp