Movatterモバイル変換


[0]ホーム

URL:


CN111599372B - Stable on-line multi-channel voice dereverberation method and system - Google Patents

Stable on-line multi-channel voice dereverberation method and system
Download PDF

Info

Publication number
CN111599372B
CN111599372BCN202010256507.9ACN202010256507ACN111599372BCN 111599372 BCN111599372 BCN 111599372BCN 202010256507 ACN202010256507 ACN 202010256507ACN 111599372 BCN111599372 BCN 111599372B
Authority
CN
China
Prior art keywords
signal
frequency domain
voice
covariance matrix
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010256507.9A
Other languages
Chinese (zh)
Other versions
CN111599372A (en
Inventor
李妍文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co LtdfiledCriticalUnisound Intelligent Technology Co Ltd
Priority to CN202010256507.9ApriorityCriticalpatent/CN111599372B/en
Publication of CN111599372ApublicationCriticalpatent/CN111599372A/en
Application grantedgrantedCritical
Publication of CN111599372BpublicationCriticalpatent/CN111599372B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention provides a stable on-line multi-channel voice dereverberation method and a system, wherein the method comprises the following steps: performing first preprocessing on an input voice signal and converting the input voice signal from a time domain to a frequency domain; calculating a covariance matrix of an input voice signal; calculating a regularization vector corresponding to each frame of signal; each frequency band is independent, and a filter coefficient corresponding to the frequency domain signal is estimated by adopting a recursive least square method; calculating an auxiliary covariance matrix of covariance among channels, and correcting the auxiliary covariance matrix based on the regularization vector; updating the filter coefficient based on the covariance matrix and the corrected auxiliary covariance matrix to obtain a new filter coefficient; and the filtering module filters the frequency domain signal according to the new filter coefficient to obtain the frequency domain signal after the reverberation is removed, converts the signal from the frequency domain to the time domain and transmits the signal to the voice recognition system. The feature value range of the covariance matrix can be controlled by regularizing the covariance matrix, the matrix is prevented from entering a sick state, and the stability of the algorithm is enhanced.

Description

Stable on-line multi-channel voice dereverberation method and system
Technical Field
The invention relates to the technical field of voice processing, in particular to a stable on-line multi-channel voice dereverberation method and system.
Background
In the prior art, the received signal of the indoor microphone array is influenced by reverberation, so that the voice recognition performance is reduced. At present, the recursive least square filtering method is usually adopted to realize the online dereverberation of the voice, so that the recognition accuracy is improved to a great extent, however, the method has poor stability and is easy to disperse, and in an actual situation, due to the instantaneous variability and diversity of the voice, the processed voice result may be wrong, so that the voice recognition result is influenced.
Disclosure of Invention
The invention provides a stable on-line multi-channel voice dereverberation method and a system, which are used for solving the technical problem.
A stable online multi-channel speech dereverberation method, comprising:
step 1: performing first preprocessing on an input voice signal, and converting the voice signal subjected to the first preprocessing into a frequency domain from a time domain to obtain a frequency domain signal; meanwhile, calculating a covariance matrix of the input voice signal;
the first preprocessing comprises framing;
step 2: calculating a regularization vector corresponding to each frame signal in the frequency domain signal; and step 3: estimating a filter coefficient corresponding to the frequency domain signal by adopting a recursive least square method based on a mode that each frequency band is independent;
and 4, step 4: calculating an auxiliary covariance matrix of covariance among channels, and correcting the auxiliary covariance matrix based on the regularization vector calculated in the step 2;
and 5: updating the filter coefficient based on the covariance matrix and the corrected auxiliary covariance matrix to obtain a new filter coefficient, and outputting the new filter coefficient to a filtering module;
step 6: and the filtering module carries out filtering processing on the frequency domain signal according to the new filter coefficient to obtain a frequency domain signal after reverberation is removed, converts the signal after reverberation is removed from a frequency domain to a time domain and transmits the signal to a voice recognition system.
Preferably, step 1 further comprises: acquiring a voice signal by adopting a microphone array, and converting the voice signal into a digital signal;
the step 1 converts the voice signal after the first pretreatment from a time domain to a frequency domain through short-time Fourier transform;
the step 2 is to calculate a regularization vector corresponding to each frame of signal according to the number of the microphones and the length of the filter;
and 6, converting the dereverberated signal from a frequency domain to a time domain through short-time inverse Fourier transform.
Preferably, the microphone array is a linear array or a circular array or a spherical array.
Preferably, in the framing processing in step 1, the frame length is 512 sample points, and the frame length is shifted to half of the frame length.
Preferably, the step 4 calculates an auxiliary covariance matrix of the covariance between the channels by using an auxiliary orthogonal transformation.
Preferably, the first pretreatment comprises sequentially performing: pre-emphasis processing, framing processing, windowing processing, and end point detection, wherein the end point detection is used for determining an effective signal of the digital signal, and extracting the effective signal part to serve as a signal output after the first pre-processing.
Preferably, after the microphone array is used for acquiring the voice signal, the second preprocessing is performed first, and then the voice signal is converted into a digital signal, where the second preprocessing includes: denoising;
the denoising processing comprises the following steps:
calculating the similarity of adjacent voice signals in the voice signals, and judging whether noise exists according to the similarity;
when noise exists, acquiring characteristic parameters of the noise contained in the voice signal;
denoising the voice signal according to the characteristic parameters;
and storing the denoised voice signal.
Preferably, the second preprocessing further includes a speech enhancement process, and the speech enhancement process includes:
determining the position and direction of a voice source according to the position of the microphone and the strength of the voice signal;
enhancing speech in the direction of the speech source while attenuating speech in the direction of the non-speech source.
A system for use in a dereverberation method as claimed in any preceding claim, the system comprising:
the first preprocessing module is used for performing the first preprocessing;
a first transformation module, configured to transform the first preprocessed voice signal from a time domain to a frequency domain;
a first calculation module for performing the calculation of a covariance matrix of the input speech signal;
a second calculation module, configured to perform the step 2;
a recursion module for performing said step 3;
a third calculation module for performing the step 4;
a filter coefficient update module for performing the step 5;
a filtering module, configured to perform filtering processing on the frequency domain signal in step 6;
a second transform module for converting the dereverberated signal from the frequency domain to the time domain.
Preferably, the system comprises:
the microphone array is used for acquiring a voice signal;
the input end of the second preprocessing module is connected with the output end of the microphone array;
and the input end of the audio coding and decoding chip is connected with the output end of the second preprocessing module, and the output end of the audio coding and decoding chip is connected with the first preprocessing module.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flow chart of a stable on-line multi-channel speech dereverberation method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a stable on-line multi-channel speech dereverberation system according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
In addition, the descriptions related to the first, the second, etc. in the present invention are only used for description purposes, do not particularly refer to an order or sequence, and do not limit the present invention, but only distinguish components or operations described in the same technical terms, and are not understood to indicate or imply relative importance or implicitly indicate the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions and technical features between various embodiments can be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not be within the protection scope of the present invention.
An embodiment of the present invention provides a stable online multi-channel speech dereverberation method, as shown in fig. 1, including:
step 1: performing first preprocessing on an input voice signal, and converting the voice signal subjected to the first preprocessing into a frequency domain from a time domain to obtain a frequency domain signal; meanwhile, calculating a covariance matrix of the input voice signal;
the first preprocessing comprises framing;
step 2: calculating a regularization vector corresponding to each frame signal in the frequency domain signal; and step 3: estimating a filter coefficient corresponding to the frequency domain signal by adopting a recursive least square method based on a mode that each frequency band is independent;
and 4, step 4: calculating an auxiliary covariance matrix of covariance among channels, and correcting the auxiliary covariance matrix based on the regularization vector calculated in the step 2; wherein, the regularization control factor is introduced to change the regularization size of the matrix.
And 5: updating the filter coefficient based on the covariance matrix and the corrected auxiliary covariance matrix to obtain a new filter coefficient, and outputting the new filter coefficient to a filtering module;
step 6: and the filtering module carries out filtering processing on the frequency domain signal according to the new filter coefficient to obtain a frequency domain signal after reverberation is removed, converts the signal after reverberation is removed from a frequency domain to a time domain and transmits the signal to a voice recognition system.
Preferably, the step 4 calculates an auxiliary covariance matrix of the covariance between the channels by using an auxiliary orthogonal transformation.
The working principle of the technical scheme is as follows: at present, the on-line dereverberation of voice is usually realized by adopting a recursive least square filtering method, the solution of a covariance matrix is a key step of the recursive least square filtering process, the technical scheme adopts a regularization vector corresponding to each frame of signal to correct an auxiliary covariance matrix of covariance among channels, the corrected auxiliary covariance matrix and the signal are adopted to calculate the covariance matrix, and a filter coefficient is updated.
The beneficial effects of the above technical scheme are: according to the technical scheme, the characteristic value range of the matrix can be controlled by regularizing the covariance matrix, the matrix is prevented from entering a ill state, the stability of the algorithm is enhanced, the dispersion is not easy to occur, meanwhile, the dereverberation performance of the algorithm is not influenced, correct processed voice is obtained, and the accuracy of voice recognition is guaranteed.
In one embodiment, step 1 is preceded by: acquiring a voice signal by adopting a microphone array, and converting the voice signal into a digital signal;
the step 1 converts the voice signal after the first pretreatment from a time domain to a frequency domain through short-time Fourier transform;
the step 2 is to calculate a regularization vector corresponding to each frame of signal according to the number of the microphones and the length of the filter;
said step 6 transforms the dereverberated signal from the frequency domain to the time domain by a short time inverse fourier transform.
The microphone array is a linear array or a circular array or a spherical array, and preferably, the microphone array element spacing is 3.5 cm.
The beneficial effects of the above technical scheme are: the microphone array is convenient for collecting voice signals in different spatial directions; short-time fourier transform is more able to observe information about the instantaneous frequency of the signal than fourier transform.
In the framing process of step 1, the frame length is 512 sampling points, and the frame is shifted to half of the frame length.
The beneficial effects of the above technical scheme are: selecting the appropriate frame length and frame shift facilitates accurate signal processing.
In one embodiment, the first pre-processing comprises sequentially: pre-emphasis processing, framing processing, windowing processing, and end point detection, wherein the end point detection is used for determining an effective signal of the digital signal, and extracting the effective signal part to serve as a signal output after the first pre-processing.
The voice signal end point detection technology accurately determines a starting point and an end point of voice from a segment of signal containing voice, and distinguishes a voice signal (i.e. the effective signal) from a non-voice signal (including a silence segment and a noise segment).
The effective end point detection technology not only can reduce the data acquisition amount in the voice recognition system and save the processing time, but also can eliminate the interference of an unvoiced segment or a noise segment and improve the performance of the voice recognition system.
The beneficial effects of the above technical scheme are: the pre-emphasis process can be pre-emphasized by a first-order high-pass digital filter; because the voice signal has short-time stationarity, the voice signal can be divided into a plurality of short sections to be collected by windowing, so that the signal processing is more convenient; determining a valid signal of the digital signal through end point detection, and extracting a valid signal part to serve as a signal output after first preprocessing; the technical scheme ensures the reliability of signal processing and is convenient for the subsequent steps.
In an embodiment, the obtaining of the voice signal by using the microphone array is followed by performing a second preprocessing, and then converting the voice signal into a digital signal, where the second preprocessing includes: denoising;
the denoising processing comprises the following steps:
calculating the similarity of adjacent voice signals in the voice signals, and judging whether noise exists according to the similarity;
when noise exists, acquiring characteristic parameters of the noise contained in the voice signal;
denoising the voice signal according to the characteristic parameters;
and storing the denoised voice signal.
The working principle of the technical scheme is as follows: the denoising processing firstly calculates the similarity of adjacent voice signals in the voice signals, and judges whether noise exists according to the similarity; when noise exists, acquiring characteristic parameters of the noise contained in the voice signal; denoising the voice signal according to the characteristic parameters; finally, storing the voice signal after denoising;
the beneficial effects of the above technical scheme are: the technical scheme can ensure the noise processing effect and is more convenient to ensure the accuracy of the signal processing of the invention.
In one embodiment, the second pre-processing further comprises speech enhancement processing comprising:
determining the position and direction of a voice source according to the position of the microphone and the strength of the voice signal;
enhancing speech in the direction of the speech source while attenuating speech in the direction of the non-speech source.
The working principle effect of the technical scheme is as follows: the technical scheme determines the position and the direction of the voice source according to the position of the microphone and the strength of the voice signal; and enhancing the voice in the voice source direction and weakening the voice in the non-voice source direction at the same time according to the determined position and direction of the voice source.
The beneficial effects of the above technical scheme are: the voice in the voice source direction can be enhanced, and the voice signal processing effect can be ensured more conveniently.
A system for use in any of the above methods, as shown in fig. 2, comprising:
the first preprocessing module is used for performing the first preprocessing;
a first transformation module, configured to transform the first preprocessed voice signal from a time domain to a frequency domain;
a first calculation module for performing said calculating a covariance matrix of the input speech signal;
a second calculation module, configured to perform the step 2;
a recursion module for performing said step 3;
a third calculation module for performing the step 4;
a filter coefficient update module for performing the step 5;
a filtering module, configured to perform the filtering processing on the frequency domain signal in step 6
Said filtering the received signal;
a second transform module for converting the dereverberated signal from the frequency domain to the time domain.
The working principle of the technical scheme is as follows: the first preprocessing is carried out through a first preprocessing module, and the voice signal after the first preprocessing is transmitted to a first conversion module; converting the voice signal from a time domain to a frequency domain through a first transformation module and transmitting the frequency domain signal to a first calculation module, a second calculation module and a recursion module; the first calculation module is used for calculating the covariance matrix of the input voice signal and transmitting the covariance matrix to the filter coefficient updating module; step 2 is executed by the second calculation module and is transmitted to a third calculation module, and step 4 is executed by the third calculation module and is transmitted to a filter coefficient updating module; step 3 is executed by the recursion module and transmitted to the filter updating module; the filter updating module executes the step 5 to obtain a new filter coefficient and transmits the new filter coefficient to the filtering module; and the filtering module carries out filtering according to the updated filter coefficient to obtain a frequency domain signal after the reverberation is removed, converts the signal after the reverberation is removed from the frequency domain to a time domain and sends the signal to the voice recognition system.
The beneficial effects of the above technical scheme are: according to the technical scheme, the characteristic value range of the matrix can be controlled by regularizing the covariance matrix, the matrix is prevented from entering a ill state, the stability of the algorithm is enhanced, the dispersion is not easy to occur, meanwhile, the dereverberation performance of the algorithm is not influenced, correct processed voice is obtained, and the accuracy of voice recognition is ensured.
In one embodiment, as shown in FIG. 2, the system comprises:
the microphone array is used for acquiring a voice signal;
the input end of the second preprocessing module is connected with the output end of the microphone array;
and the input end of the audio coding and decoding chip is connected with the output end of the second preprocessing module, and the output end of the audio coding and decoding chip is connected with the first preprocessing module.
The working principle of the technical scheme is as follows: and (analog) voice signals are acquired through the microphone array and are transmitted to the second preprocessing module for second preprocessing, the second preprocessing module transmits the (analog) voice signals subjected to the second preprocessing to the audio decoding chip, and the (analog) voice signals are converted into digital signals to be transmitted to the first preprocessing module.
The beneficial effects of the above technical scheme are: the microphone array and the audio decoding chip are used for acquiring voice signals and converting the voice signals into digital signals, so that subsequent processing is facilitated, and second preprocessing is performed on the signals through the second preprocessing module, so that the reliability of signal transmission is guaranteed.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

CN202010256507.9A2020-04-022020-04-02Stable on-line multi-channel voice dereverberation method and systemActiveCN111599372B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010256507.9ACN111599372B (en)2020-04-022020-04-02Stable on-line multi-channel voice dereverberation method and system

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010256507.9ACN111599372B (en)2020-04-022020-04-02Stable on-line multi-channel voice dereverberation method and system

Publications (2)

Publication NumberPublication Date
CN111599372A CN111599372A (en)2020-08-28
CN111599372Btrue CN111599372B (en)2023-03-21

Family

ID=72185460

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010256507.9AActiveCN111599372B (en)2020-04-022020-04-02Stable on-line multi-channel voice dereverberation method and system

Country Status (1)

CountryLink
CN (1)CN111599372B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112700787B (en)*2021-03-242021-06-25深圳市中科蓝讯科技股份有限公司Noise reduction method, nonvolatile readable storage medium and electronic device
CN113299301A (en)*2021-04-212021-08-24北京搜狗科技发展有限公司Voice processing method and device for voice processing
CN113823314B (en)2021-08-122022-10-28北京荣耀终端有限公司Voice processing method and electronic equipment
CN115762541B (en)*2022-11-022025-08-08紫光展锐(重庆)科技有限公司 Audio data processing method and related device

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108172231A (en)*2017-12-072018-06-15中国科学院声学研究所 A method and system for removing reverberation based on Kalman filter
CN110915233A (en)*2017-04-202020-03-24弗劳恩霍夫应用研究促进协会Apparatus and method for multi-channel interference cancellation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US10403299B2 (en)*2017-06-022019-09-03Apple Inc.Multi-channel speech signal enhancement for robust voice trigger detection and automatic speech recognition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110915233A (en)*2017-04-202020-03-24弗劳恩霍夫应用研究促进协会Apparatus and method for multi-channel interference cancellation
CN108172231A (en)*2017-12-072018-06-15中国科学院声学研究所 A method and system for removing reverberation based on Kalman filter

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何冲 ; 王冬霞 ; 王旭东 ; 蒋茂松 ; .一种基于正交非负矩阵分解的多通道线性预测语音去混响方法.2018,(05),全文.*
王旭东 ; 王冬霞 ; 周城旭 ; .基于改进BFDNN的远距离语音识别方法.2018,(15),全文.*

Also Published As

Publication numberPublication date
CN111599372A (en)2020-08-28

Similar Documents

PublicationPublication DateTitle
CN111599372B (en)Stable on-line multi-channel voice dereverberation method and system
US11056130B2 (en)Speech enhancement method and apparatus, device and storage medium
CN110322891B (en)Voice signal processing method and device, terminal and storage medium
EP2416315A1 (en)Noise suppression device
CN109643554A (en)Adaptive voice Enhancement Method and electronic equipment
CN102097095A (en)Speech endpoint detecting method and device
CN112017682B (en)Single-channel voice simultaneous noise reduction and reverberation removal system
JP2006079079A (en) Distributed speech recognition system and method
CN108597505A (en)Audio recognition method, device and terminal device
CN108847253B (en)Vehicle model identification method, device, computer equipment and storage medium
CN112002307B (en)Voice recognition method and device
CN111312275B (en) An Online Sound Source Separation Enhancement System Based on Subband Decomposition
JP4050350B2 (en) Speech recognition method and system
CN108053842B (en)Short wave voice endpoint detection method based on image recognition
CN117711419B (en)Intelligent data cleaning method for data center
CN112599148A (en)Voice recognition method and device
CN120148484B (en)Speech recognition method and device based on microcomputer
US6678656B2 (en)Noise reduced speech recognition parameters
CN103745729A (en)Audio de-noising method and audio de-noising system
CN111968651A (en)WT (WT) -based voiceprint recognition method and system
CN119418712A (en) A noise reduction method for real-time speech at the edge
TWI396186B (en)Speech enhancement technique based on blind source separation for far-field noisy speech recognition
CN118016079A (en)Intelligent voice transcription method and system
CN113611314A (en) A method and system for speaker recognition
JPS628800B2 (en)

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp