

技术领域technical field
本发明涉及一种利用分数阶傅立叶变换进行单声道混叠语音分离的方法,属于音频信号处理技术领域。The invention relates to a method for separating monophonic aliasing speech by using fractional Fourier transform, and belongs to the technical field of audio signal processing.
背景技术Background technique
在语音和听觉信号处理领域中,有一个重要的问题是如何从混叠语音信号中分离出人们感兴趣的语音。混叠语音分离在语音通信、声学目标检测、声音信号增强等方面都有重要的理论意义和使用价值,但由于构成混叠语音的各个源语音信号在时域和频域上完全重叠,常用的语音增强方法难以将人们所感兴趣的语音(称为目标语音)从干扰语音中分离出来。In the field of speech and auditory signal processing, an important problem is how to separate the speech of interest from the aliased speech signal. Aliasing speech separation has important theoretical significance and practical value in speech communication, acoustic target detection, sound signal enhancement, etc. It is difficult for speech enhancement methods to separate the speech that people are interested in (called the target speech) from the interference speech.
分数阶傅立叶变换(Fractional Fourier Transform,FrFT)对于分析某些非平稳信号具有十分优良的特性,成为一种近年来引起信号处理界广泛关注的工具。作为非平稳信号的语音,FrFT或者类似的变换在语音信号处理中的应用目前主要集中在以下几个方面:语音分析,可以给出比传统的傅立叶变换方法更高的时频分辨率;基音估计,可以给出比传统方法更精确的基音估计;语音增强;语音识别;以及说话人识别等。Fractional Fourier Transform (FrFT) has excellent characteristics for analyzing some non-stationary signals, and has become a tool that has attracted widespread attention in the signal processing field in recent years. Speech as a non-stationary signal, the application of FrFT or similar transforms in speech signal processing is currently mainly focused on the following aspects: speech analysis, which can give higher time-frequency resolution than traditional Fourier transform methods; pitch estimation , can give more accurate pitch estimation than traditional methods; speech enhancement; speech recognition; and speaker recognition, etc.
在混叠语音分离方面的研究,主要分为听觉场景分析(Auditory Scene Analysis,ASA)和盲源分离(Blind Source Separation,BSS)两类。听觉场景分析的研究有两种方法:一种是从人的听觉生理及心理特性出发,研究人在声音识别过程中的规律,即听觉场景分析;另一种是利用对人听觉感知的研究成果建立模型,对模型进行数学分析并用计算机来实现它,这是计算听觉场景分析(Computational Auditory Scene Analysis,CASA)所要研究的内容。盲源分离是指在源信号、传输通道特性未知的情况下,仅由观测信号和源信号的一些先验知识(如概率密度)来估计出源信号各个分量的过程。盲源分离的独立分量分析方法首先是由P.Comon提出,它是基于神经网络和统计学的基础发展起来的一种技术,是一个十分活跃的前沿领域。Research on aliasing speech separation is mainly divided into two categories: Auditory Scene Analysis (ASA) and Blind Source Separation (BSS). There are two methods for the study of auditory scene analysis: one is to study the rules in the process of sound recognition from the perspective of human auditory physiological and psychological characteristics, that is, auditory scene analysis; the other is to use the research results of human auditory perception Building a model, mathematically analyzing the model and implementing it with a computer are what Computational Auditory Scene Analysis (CASA) is about. Blind source separation refers to the process of estimating the components of the source signal only by some prior knowledge (such as probability density) of the observed signal and the source signal when the characteristics of the source signal and the transmission channel are unknown. The independent component analysis method of blind source separation was first proposed by P.Comon. It is a technology developed on the basis of neural network and statistics, and it is a very active frontier field.
现有的混叠语音分离方法主要存在以下不足:The existing aliasing speech separation methods mainly have the following deficiencies:
(1)听觉场景分析和计算听觉场景分析的研究还处于起步阶段。特别是在计算听觉场景分析研究中,所建立的模型只能用于验证听觉场景分析研究中的一些不够明了的理论,即人脑处理听觉信号的机制。(1) The research on auditory scene analysis and computational auditory scene analysis is still in its infancy. Especially in computational auditory scene analysis research, the established models can only be used to verify some unclear theories in auditory scene analysis research, that is, the mechanism of human brain processing auditory signals.
针对盲源分离方法的研究非常活跃,但对这个问题还没有得到很好的解决,其涉及到多通道卷积混叠系统和盲反卷积系统的稳定性及相位不确定性问题,尤其是当源的数目未知时盲反卷积问题以及带噪声的情况。Research on blind source separation methods is very active, but this problem has not been well resolved, which involves the stability and phase uncertainty of multi-channel convolution aliasing systems and blind deconvolution systems, especially The blind deconvolution problem when the number of sources is unknown and the noisy case.
(2)混叠语音的基频分离提取是听觉场景分析中实现混叠语音分离的关键,但现有的混叠语音基频分离提取方法只考虑浊音与浊音的混叠,不考虑清音与浊音的混叠。这是因为在语音信号的清音帧中,激励信号是无周期性的,因此估计清音帧的基频并没有实际意义。不仅如此,清音帧估计出来的基频通常随机性强,不具有连续性,而从混叠语音中分离提取出的基频是以基频的连续性来判断其归属,所以,清音帧估计出的基频会影响基音归属判断,进而影响基频的平滑处理效果。(2) The fundamental frequency separation and extraction of aliased speech is the key to realize the separation of aliased speech in auditory scene analysis, but the existing method of fundamental frequency separation and extraction of aliased speech only considers the aliasing of voiced and voiced sounds, and does not consider unvoiced and voiced sounds aliasing. This is because in the unvoiced frame of the speech signal, the excitation signal is aperiodic, so estimating the fundamental frequency of the unvoiced frame has no practical significance. Not only that, the base frequency estimated by the unvoiced frame is usually highly random and not continuous, while the base frequency extracted from the aliased speech is judged by the continuity of the base frequency. Therefore, the estimated base frequency of the unvoiced frame is The fundamental frequency will affect the pitch attribution judgment, and then affect the smoothing effect of the fundamental frequency.
发明内容Contents of the invention
本发明的目的是为克服现有技术的缺陷,解决如何从单声道混叠语音信号中分离出目标语音的问题,提出一种新的基于分数阶傅立叶变换的单声道混叠语音分离方法。The purpose of the present invention is to overcome the defective of prior art, solve the problem how to separate target speech from monophonic aliasing speech signal, propose a kind of new monophonic aliasing speech separation method based on fractional order Fourier transform .
本发明所采用的技术方案如下:The technical scheme adopted in the present invention is as follows:
一种基于分数阶傅立叶变换的单声道混叠语音分离方法,包括以下步骤:A monophonic aliasing speech separation method based on fractional Fourier transform, comprising the following steps:
步骤一、对混叠语音信号进行预处理,去除其静音段信号,找出浊音帧。Step 1: Perform preprocessing on the aliased speech signal, remove the silent segment signal, and find out the voiced sound frame.
首先,对混叠语音信号进行端点检测,去除其静音段信号,把剩余的混叠段信号作为处理对象。First, endpoint detection is performed on the aliased speech signal, the silent segment signal is removed, and the remaining aliased segment signal is taken as the processing object.
然后,对剩余混叠段信号进行分帧处理,并进行清浊音判断,标出浊音帧。Then, the remaining aliasing segment signals are divided into frames, unvoiced and voiced are judged, and voiced frames are marked.
步骤二、基于分数阶傅立叶变换,对经步骤一处理后的浊音帧信号进行基音检测,分离出混叠语音的基音轨迹,也就是每个源信号的基频,过程如下:Step 2. Based on the fractional Fourier transform, perform pitch detection on the voiced sound frame signal processed in step 1, and separate the pitch track of the aliased speech, that is, the fundamental frequency of each source signal. The process is as follows:
首先,根据每帧信号的连续性计算出FrFT的阶数。然后,对浊音帧信号重新进行FrFT变换,求得谐波积谱,再用动态规划方法提取出其中一个人的基频,即一个源信号的基频。First, the order of FrFT is calculated according to the continuity of each frame signal. Then, perform FrFT transformation on the voiced sound frame signal again to obtain the harmonic product spectrum, and then use the dynamic programming method to extract the fundamental frequency of one of them, that is, the fundamental frequency of a source signal.
当搜出一个人的基频之后,在谐波积谱中减去此人的基频和谐波所对应的谱成分,然后再使用一次动态规划,即可得到另一个人的基频,,即另一个源信号的基频;When a person's fundamental frequency is found, subtract the spectral component corresponding to the fundamental frequency and harmonics of the person from the harmonic product spectrum, and then use dynamic programming again to obtain the fundamental frequency of another person, That is, the fundamental frequency of the other source signal;
重复上述过程,即可得到每个源信号的基频。By repeating the above process, the fundamental frequency of each source signal can be obtained.
步骤三、由于语音信号能够用一组正弦信号的叠加表示,因此,根据经步骤二得到的各条基频,结合语音信号的正弦模型来合成语音,从而得到分离后的各个语音信号。Step 3. Since the voice signal can be expressed by superposition of a group of sinusoidal signals, the voice is synthesized according to each fundamental frequency obtained in step 2 and combined with the sinusoidal model of the voice signal, so as to obtain the separated voice signals.
本发明的积极效果和优点在于:Positive effect and advantage of the present invention are:
1.使用本发明方法,可有效的分离并提取出多个混叠语音的基频,从而实现混叠语音的有效分离。1. Using the method of the present invention, the fundamental frequencies of multiple aliased voices can be effectively separated and extracted, thereby realizing effective separation of aliased voices.
2.采用基于FrFT代替传统的FFT(短时傅立叶变换)来提取基音频率,减少了谐波频谱的延展。2. Using FrFT instead of the traditional FFT (short-time Fourier transform) to extract the pitch frequency, reducing the extension of the harmonic spectrum.
3.由于每帧信号都有其固有的调制频率,使用FrFT可以选择合适的阶数使其符合信号固有的调频率,从而得到更为准确的原始信号的基频。3. Since each frame signal has its inherent modulation frequency, FrFT can be used to select an appropriate order to match the inherent modulation frequency of the signal, thereby obtaining a more accurate fundamental frequency of the original signal.
本发明尤其适用于分离含有两个人语音的单声道混叠语音。The invention is particularly suitable for separating monophonic aliased speech containing two human voices.
附图说明Description of drawings
图1为本发明方法的实现流程框图。Fig. 1 is a block diagram of the implementation flow of the method of the present invention.
图2为本发明方法中的基于分数阶傅立叶变换的混叠语音基音检测流程图。Fig. 2 is a flow chart of aliasing speech pitch detection based on fractional Fourier transform in the method of the present invention.
具体实施方式Detailed ways
下面结合附图对本发明的优选实施方式作进一步说明。Preferred embodiments of the present invention will be further described below in conjunction with the accompanying drawings.
一种基于分数阶傅立叶变换的单声道混叠语音分离方法,其实现流程如图1所示,包括以下步骤:A monophonic aliasing speech separation method based on fractional Fourier transform, its implementation process is as shown in Figure 1, including the following steps:
步骤一、对混叠语音信号进行预处理,去除其静音段信号,找出浊音帧。Step 1: Perform preprocessing on the aliased speech signal, remove the silent segment signal, and find out the voiced sound frame.
首先,对混叠语音信号进行端点检测,去除其静音段信号,把剩余的混叠段信号作为处理对象。端点检测可采用短时能量和过零率相结合的方法。First, endpoint detection is performed on the aliased speech signal, the silent segment signal is removed, and the remaining aliased segment signal is taken as the processing object. Endpoint detection can use the combination of short-term energy and zero-crossing rate.
然后,对剩余混叠段信号进行分帧处理,分帧时的帧长为20ms,帧移为10ms。此时,进行清浊音判断,并标出浊音帧。混叠语音信号的清浊音判断与单个语音的判断稍有不同,两个混叠语音的清浊情况有3种:双浊音、一清一浊、双清音。混叠语音的清浊音判断分为两步:先判断两个混叠信号是否为双清音,若是,判断结束,若不是,再判断两混叠信号是一清一浊还是双浊音。对于一清一浊,只对浊音帧进行后续处理,不处理清音帧。对于双清音信号,同样不对其进行处理。Then, the remaining aliasing section signals are divided into frames, the frame length is 20ms, and the frame shift is 10ms. At this time, unvoiced and voiced sound is judged, and the voiced sound frame is marked. The unvoiced and voiced judgment of aliased speech signals is slightly different from the judgment of a single voice. There are three types of unvoiced voices for two aliased voices: double voiced, one voiced and one voiced, and double unvoiced. The unvoiced and voiced judgment of aliased speech is divided into two steps: firstly judge whether the two aliased signals are double voiced, if so, the judgment ends, if not, then judge whether the two aliased signals are one voiced and one voiced or double voiced. For one unvoiced and one voiced, only the voiced frames are processed, and the unvoiced frames are not processed. As for the double voiceless signal, it is also not processed.
步骤二、采用基于分数阶傅立叶变换方式,对经步骤一处理后的浊音帧进行基音检测,分离出混叠语音的基音轨迹,也就是分离出每个源信号的基频。其实现流程如图2所示。Step 2: Perform pitch detection on the voiced sound frame processed in step 1 by adopting a method based on fractional Fourier transform, and separate the pitch track of the aliased speech, that is, separate the fundamental frequency of each source signal. Its implementation process is shown in Figure 2.
首先,根据每帧信号的连续性,计算出FrFT的阶数。考虑到目的是求解语音信号的基频,而且是用帧问连续的特性来搜索基频,FrFT的阶数αi与前后两帧的基频密切相关,因此用下式表示:First, according to the continuity of each frame signal, the order of FrFT is calculated. Considering that the purpose is to find the fundamental frequency of the speech signal, and the fundamental frequency is searched by the continuous characteristics between frames, the order αi of FrFT is closely related to the fundamental frequency of the two frames before and after, so it is expressed by the following formula:
其中,pi-1,pi,pi+1分别为前一帧、当前帧和下一帧的估计基频,pi-1,pi,pi+1可通过短时傅立叶变换获得。Among them, pi-1 , pi , pi+1 are the estimated fundamental frequencies of the previous frame, the current frame and the next frame respectively, and pi-1 , pi , pi+1 can be obtained by short-time Fourier transform .
然后,对经步骤一处理后得到的浊音帧信号重新进行FrFT变换,求得谐波积谱,再用动态规划方法提取出其中一条基音轨迹,也就是其中一个人的基频。具体过程如下:Then, perform FrFT transformation again on the voiced sound frame signal processed in step 1 to obtain the harmonic product spectrum, and then use the dynamic programming method to extract one of the pitch tracks, that is, the fundamental frequency of one of the people. The specific process is as follows:
(1)对浊音帧信号x(n),采用下式进行N点(例如1024点)的分数阶傅立叶变换,得到其幅度谱X(α,k):(1) For the voiced sound frame signal x(n), use the following formula to perform fractional Fourier transform of N points (for example, 1024 points) to obtain its amplitude spectrum X(α, k):
X(α,k)=FrFTN{x(n)} 1.2X(α,k)=FrFTN {x(n)} 1.2
再将幅度谱X(α,k)变换到对数域,得到对数幅度谱SLog(α,k):Then transform the magnitude spectrum X(α, k) into the logarithmic domain to obtain the logarithmic magnitude spectrum SLog(α, k):
SLog(α,k)=log10(|X(α,k)|2) 1.3SLog(α, k) = log10 (|X(α, k)|2 ) 1.3
将一帧信号内的所有谐波对数谱SLog(α,k)进行求和,得到谐波积谱ρ(α,f):Sum all the harmonic logarithmic spectra SLog(α, k) in a frame signal to obtain the harmonic product spectrum ρ(α, f):
式1.4中,H为抽样带宽内的谐波个数,h为谐波索引的值,f为每帧的基频,α为每帧的阶数。In formula 1.4, H is the number of harmonics within the sampling bandwidth, h is the value of the harmonic index, f is the fundamental frequency of each frame, and α is the order of each frame.
(2)考虑到两个语音的混叠,从谐波积谱ρ(α,f)中提取出可能含有基频成分的M个候选峰值。考虑到计算量的问题,M的取值要大于等于3。当M大于等于3时,得到的结果基本没有变化。(2) Considering the aliasing of the two speeches, extract M candidate peaks that may contain fundamental frequency components from the harmonic product spectrum ρ(α,f). Considering the amount of calculation, the value of M should be greater than or equal to 3. When M is greater than or equal to 3, the obtained results basically do not change.
动态规划方法中需要设定一个指标函数,对每条路径均计算其指标函数的值,最大值所对应的路径即为所要求的其中一条基音轨迹。为了防止在基音周期的估计过程中出现半频错误或倍频错误,将指标函数c(α,f)设定为:In the dynamic programming method, an index function needs to be set, and the value of the index function is calculated for each path, and the path corresponding to the maximum value is one of the required pitch trajectories. In order to prevent half-frequency errors or octave errors in the estimation process of the pitch period, the index function c(α, f) is set as:
c(α,f)=k(f)*(P(α,f)-P(α,f/2)) 1.5c(α,f)=k(f)*(P(α,f)-P(α,f/2)) 1.5
式1.5中,f为每帧信号的估计基频,k(f)为伴随f递减的函数。设定加权值k(f)能够避免倍频错误,引入P(α,f/2)能够避免半频错误。因此,将(αi,fi)记为μi,路径的评分函数Si(μi)设定为:In Equation 1.5, f is the estimated fundamental frequency of each frame signal, and k(f) is a function that decreases with f. Setting the weighted value k(f) can avoid multiplication errors, and introducing P(α, f/2) can avoid half-frequency errors. Therefore, record (αi , fi ) as μi , and the scoring function Si (μi ) of the path is set as:
式1.6、1.7中,i表示帧号,是在选择合适的阶数以及得到第i-1帧基频时的参数。由于正常人说话的基频范围为50Hz 400Hz,因此在此范围内搜索基频,在每帧信号的两个峰值点里均能够找到选择使评分函数Si(μi)最大的f值,即认为是这一帧信号中其中一个人的基频。同理,当搜索所有的信号之后,可以连成一条基音轨迹,从而得到其中一个人的基音轨迹(即此人的基频)。In formulas 1.6 and 1.7, i represents the frame number, It is a parameter when selecting an appropriate order and obtaining the fundamental frequency of the i-1th frame. Since the fundamental frequency range of normal people's speech is 50Hz to 400Hz, the fundamental frequency is searched within this range, and the f value that maximizes the scoring function Si (μi ) can be found in the two peak points of each frame signal, namely It is considered to be the fundamental frequency of one of the persons in this frame signal. Similarly, after all the signals are searched, a pitch track can be connected to obtain the pitch track of one of them (that is, the fundamental frequency of the person).
当搜出一个人的基频之后,在谐波积谱ρ(α,p)中减去此人的基频和谐波所对应的谱成分,然后再使用一次动态规划方法,即可得到另一个人的基音轨迹(即此人的基频),从而分离出混叠语音的基音轨迹。After searching out a person's fundamental frequency, subtract the spectral components corresponding to the fundamental frequency and harmonics of the person from the harmonic product spectrum ρ(α, p), and then use the dynamic programming method again to obtain another A person's pitch locus (that is, the person's fundamental frequency), thereby separating the pitch locus of the aliased speech.
求取谐波所对应的谱成分的方法如下:The method to obtain the spectral components corresponding to the harmonics is as follows:
在谐波积谱中减去谐波所对应的谱成分时,首先要知道谐波个数Hi,由此即能获知究竟需要减去几个谱成分。根据式1.8,可得到第i帧信号的谐波个数Hi,When subtracting the spectral components corresponding to the harmonics from the harmonic product spectrum, the number of harmonics Hi must first be known, so that it can be known how many spectral components need to be subtracted. According to formula 1.8, the number of harmonics Hi of the i-th frame signal can be obtained,
式1.8中,fi为第i帧的基频,fs为采样率。则谐波频率f′和基频f的关系如下:In formula 1.8, fi is the fundamental frequency of frame i, and fs is the sampling rate. Then the relationship between the harmonic frequency f' and the fundamental frequency f is as follows:
f′=h*f,h=2,3,4,...,H 1.9f'=h*f, h=2, 3, 4,..., H 1.9
式1.9中,H为谐波个数。得到了谐波频率f′,即获知了谐波所对应的谱成分。In formula 1.9, H is the number of harmonics. The harmonic frequency f' is obtained, that is, the spectral component corresponding to the harmonic is obtained.
步骤三、由于语音信号能够用一组正弦信号的叠加表示,因此,根据经步骤二得到的各条基频fi,结合语音信号的正弦模型来合成语音,从而得到分离后的各个语音信号。Step 3. Since the speech signal can be represented by a superposition of a group of sinusoidal signals, the speech is synthesized according to each fundamental frequency fi obtained in step 2 and the sinusoidal model of the speech signal, so as to obtain separated speech signals.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2009102359018ACN102054480B (en) | 2009-10-29 | 2009-10-29 | A Monophonic Aliasing Speech Separation Method Based on Fractional Fourier Transform |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2009102359018ACN102054480B (en) | 2009-10-29 | 2009-10-29 | A Monophonic Aliasing Speech Separation Method Based on Fractional Fourier Transform |
| Publication Number | Publication Date |
|---|---|
| CN102054480Atrue CN102054480A (en) | 2011-05-11 |
| CN102054480B CN102054480B (en) | 2012-05-30 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2009102359018AExpired - Fee RelatedCN102054480B (en) | 2009-10-29 | 2009-10-29 | A Monophonic Aliasing Speech Separation Method Based on Fractional Fourier Transform |
| Country | Link |
|---|---|
| CN (1) | CN102054480B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103117061A (en)* | 2013-02-05 | 2013-05-22 | 广东欧珀移动通信有限公司 | Method and device for identifying animals based on voice |
| CN103854644A (en)* | 2012-12-05 | 2014-06-11 | 中国传媒大学 | Automatic duplicating method and device for single track polyphonic music signals |
| CN104078051A (en)* | 2013-03-29 | 2014-10-01 | 中兴通讯股份有限公司 | Voice extracting method and system and voice audio playing method and device |
| CN105551501A (en)* | 2016-01-22 | 2016-05-04 | 大连民族大学 | Harmonic signal fundamental frequency estimation algorithm and device |
| CN105590633A (en)* | 2015-11-16 | 2016-05-18 | 福建省百利亨信息科技有限公司 | Method and device for generation of labeled melody for song scoring |
| CN106571150A (en)* | 2015-10-12 | 2017-04-19 | 阿里巴巴集团控股有限公司 | Method and system for positioning human acoustic zone of music |
| CN106611604A (en)* | 2015-10-23 | 2017-05-03 | 中国科学院声学研究所 | An automatic voice summation tone detection method based on a deep neural network |
| CN106716528A (en)* | 2014-07-28 | 2017-05-24 | 弗劳恩霍夫应用研究促进协会 | Method for estimating noise in audio signal, noise estimator, audio encoder, audio decoder, and system for transmitting audio signal |
| CN106847267A (en)* | 2015-12-04 | 2017-06-13 | 中国科学院声学研究所 | A kind of folded sound detection method in continuous speech stream |
| CN107657962A (en)* | 2017-08-14 | 2018-02-02 | 广东工业大学 | The gentle sound identification of larynx sound and separation method and the system of a kind of voice signal |
| CN109065025A (en)* | 2018-07-30 | 2018-12-21 | 珠海格力电器股份有限公司 | Computer storage medium and audio processing method and device |
| CN109346109A (en)* | 2018-12-05 | 2019-02-15 | 百度在线网络技术(北京)有限公司 | Fundamental frequency extracting method and device |
| CN111125423A (en)* | 2019-11-29 | 2020-05-08 | 维沃移动通信有限公司 | Denoising method and mobile terminal |
| CN111613243A (en)* | 2020-04-26 | 2020-09-01 | 云知声智能科技股份有限公司 | Voice detection method and device |
| CN113362840A (en)* | 2021-06-02 | 2021-09-07 | 浙江大学 | General voice information recovery device and method based on undersampled data of built-in sensor |
| CN114067784A (en)* | 2021-11-24 | 2022-02-18 | 云知声智能科技股份有限公司 | Training method and device of fundamental frequency extraction model and fundamental frequency extraction method and device |
| WO2023092368A1 (en)* | 2021-11-25 | 2023-06-01 | 广州酷狗计算机科技有限公司 | Audio separation method and apparatus, and device, storage medium and program product |
| CN117289022A (en)* | 2023-09-25 | 2023-12-26 | 国网江苏省电力有限公司南通供电分公司 | Power grid harmonic detection method and system based on Fourier algorithm |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103854644B (en)* | 2012-12-05 | 2016-09-28 | 中国传媒大学 | The automatic dubbing method of monophonic multitone music signal and device |
| CN103854644A (en)* | 2012-12-05 | 2014-06-11 | 中国传媒大学 | Automatic duplicating method and device for single track polyphonic music signals |
| CN103117061B (en)* | 2013-02-05 | 2016-01-20 | 广东欧珀移动通信有限公司 | A kind of voice-based animals recognition method and device |
| CN103117061A (en)* | 2013-02-05 | 2013-05-22 | 广东欧珀移动通信有限公司 | Method and device for identifying animals based on voice |
| CN104078051A (en)* | 2013-03-29 | 2014-10-01 | 中兴通讯股份有限公司 | Voice extracting method and system and voice audio playing method and device |
| WO2014153922A1 (en)* | 2013-03-29 | 2014-10-02 | 中兴通讯股份有限公司 | Human voice extracting method and system, and audio playing method and device for human voice |
| CN106716528B (en)* | 2014-07-28 | 2020-11-17 | 弗劳恩霍夫应用研究促进协会 | Method and device for estimating noise in audio signal, and device and system for transmitting audio signal |
| CN106716528A (en)* | 2014-07-28 | 2017-05-24 | 弗劳恩霍夫应用研究促进协会 | Method for estimating noise in audio signal, noise estimator, audio encoder, audio decoder, and system for transmitting audio signal |
| US11335355B2 (en) | 2014-07-28 | 2022-05-17 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Estimating noise of an audio signal in the log2-domain |
| US10762912B2 (en) | 2014-07-28 | 2020-09-01 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Estimating noise in an audio signal in the LOG2-domain |
| CN106571150A (en)* | 2015-10-12 | 2017-04-19 | 阿里巴巴集团控股有限公司 | Method and system for positioning human acoustic zone of music |
| CN106611604A (en)* | 2015-10-23 | 2017-05-03 | 中国科学院声学研究所 | An automatic voice summation tone detection method based on a deep neural network |
| CN106611604B (en)* | 2015-10-23 | 2020-04-14 | 中国科学院声学研究所 | An automatic speech overlap detection method based on deep neural network |
| CN105590633A (en)* | 2015-11-16 | 2016-05-18 | 福建省百利亨信息科技有限公司 | Method and device for generation of labeled melody for song scoring |
| CN106847267A (en)* | 2015-12-04 | 2017-06-13 | 中国科学院声学研究所 | A kind of folded sound detection method in continuous speech stream |
| CN106847267B (en)* | 2015-12-04 | 2020-04-14 | 中国科学院声学研究所 | A method for detecting overlapping sounds in continuous speech streams |
| CN109524023A (en)* | 2016-01-22 | 2019-03-26 | 大连民族大学 | A kind of method of pair of fundamental frequency estimation experimental verification |
| CN105551501B (en)* | 2016-01-22 | 2019-03-15 | 大连民族大学 | Fundamental Frequency Estimation Algorithm and Device of Harmonic Signal |
| CN105551501A (en)* | 2016-01-22 | 2016-05-04 | 大连民族大学 | Harmonic signal fundamental frequency estimation algorithm and device |
| CN107657962A (en)* | 2017-08-14 | 2018-02-02 | 广东工业大学 | The gentle sound identification of larynx sound and separation method and the system of a kind of voice signal |
| CN107657962B (en)* | 2017-08-14 | 2020-06-12 | 广东工业大学 | A method and system for identifying and separating throat sounds and air sounds of speech signals |
| CN109065025A (en)* | 2018-07-30 | 2018-12-21 | 珠海格力电器股份有限公司 | Computer storage medium and audio processing method and device |
| CN109346109A (en)* | 2018-12-05 | 2019-02-15 | 百度在线网络技术(北京)有限公司 | Fundamental frequency extracting method and device |
| CN111125423A (en)* | 2019-11-29 | 2020-05-08 | 维沃移动通信有限公司 | Denoising method and mobile terminal |
| CN111613243A (en)* | 2020-04-26 | 2020-09-01 | 云知声智能科技股份有限公司 | Voice detection method and device |
| CN113362840A (en)* | 2021-06-02 | 2021-09-07 | 浙江大学 | General voice information recovery device and method based on undersampled data of built-in sensor |
| CN113362840B (en)* | 2021-06-02 | 2022-03-29 | 浙江大学 | General voice information recovery device and method based on undersampled data of built-in sensor |
| CN114067784A (en)* | 2021-11-24 | 2022-02-18 | 云知声智能科技股份有限公司 | Training method and device of fundamental frequency extraction model and fundamental frequency extraction method and device |
| CN114067784B (en)* | 2021-11-24 | 2024-11-15 | 云知声智能科技股份有限公司 | Training method and device of fundamental frequency extraction model, fundamental frequency extraction method and device |
| WO2023092368A1 (en)* | 2021-11-25 | 2023-06-01 | 广州酷狗计算机科技有限公司 | Audio separation method and apparatus, and device, storage medium and program product |
| CN117289022A (en)* | 2023-09-25 | 2023-12-26 | 国网江苏省电力有限公司南通供电分公司 | Power grid harmonic detection method and system based on Fourier algorithm |
| CN117289022B (en)* | 2023-09-25 | 2024-06-11 | 国网江苏省电力有限公司南通供电分公司 | A method and system for detecting harmonics in power grid based on Fourier algorithm |
| Publication number | Publication date |
|---|---|
| CN102054480B (en) | 2012-05-30 |
| Publication | Publication Date | Title |
|---|---|---|
| CN102054480B (en) | A Monophonic Aliasing Speech Separation Method Based on Fractional Fourier Transform | |
| CN103854662B (en) | Adaptive voice detection method based on multiple domain Combined estimator | |
| Zão et al. | Time-frequency feature and AMS-GMM mask for acoustic emotion classification | |
| CN103236260B (en) | Speech recognition system | |
| Sukhostat et al. | A comparative analysis of pitch detection methods under the influence of different noise conditions | |
| Nakatani et al. | Robust and accurate fundamental frequency estimation based on dominant harmonic components | |
| Shahnaz et al. | Pitch estimation based on a harmonic sinusoidal autocorrelation model and a time-domain matching scheme | |
| CN104616663A (en) | A Music Separation Method Combining HPSS with MFCC-Multiple Repetition Model | |
| CN101452698B (en) | An Automatic Voice Harmonic-to-Noise Ratio Analysis Method | |
| KR101840015B1 (en) | Music Accompaniment Extraction Method for Stereophonic Songs | |
| CN103258543A (en) | A Method for Extending the Bandwidth of Artificial Voice | |
| CN102592589B (en) | Speech scoring method and device implemented through dynamically normalizing digital characteristics | |
| Sebastian et al. | Group delay based music source separation using deep recurrent neural networks | |
| CN108172210B (en) | Singing harmony generation method based on singing voice rhythm | |
| CN104064196A (en) | Method for improving speech recognition accuracy on basis of voice leading end noise elimination | |
| Sebastian et al. | An analysis of the high resolution property of group delay function with applications to audio signal processing | |
| CN103971697B (en) | Sound enhancement method based on non-local mean filtering | |
| JP5325130B2 (en) | LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program | |
| Xu et al. | The extraction and simulation of Mel frequency cepstrum speech parameters | |
| Katsir et al. | Evaluation of a speech bandwidth extension algorithm based on vocal tract shape estimation | |
| CN102231279B (en) | Objective evaluation system and method of voice frequency quality based on hearing attention | |
| Kawahara et al. | Higher order waveform symmetry measure and its application to periodicity detectors for speech and singing with fine temporal resolution | |
| CN107871498A (en) | A Hybrid Feature Combination Algorithm Based on Fisher's Criterion to Improve Speech Recognition Rate | |
| Shome et al. | Non-negative frequency-weighted energy-based speech quality estimation for different modes and quality of speech | |
| Ali et al. | Disordered speech quality estimation using the matching pursuit algorithm |
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| C17 | Cessation of patent right | ||
| CF01 | Termination of patent right due to non-payment of annual fee | Granted publication date:20120530 Termination date:20121029 |