Background technology
Along with popularizing of audio/video conference system and hands-free phone, the demand of the virtual telepresenc of the particularly development of HD Audio communication system, and conference system, far-end speech signal comes out to make all participants to hear through loudspeaker plays.Local microphone picks up the sound that comes out from loudspeaker plays when picking up local spokesman's voice too, after amplification, gives far-end.Therefore, the far-end participant can hear that the sound of oneself sends back from the other side.This phenomenon is called as echo.Echo can be upset the acoustics in meeting-place, influences spokesman's thinking, when serious even cause feedback squealing and damage effect of meeting.Therefore, echo cancellation technology occupies an important position in conference telephone and audio/video conference system.
It is main with the gain inhibition mainly that traditional echo is eliminated.Its basic principle is, according to the power of far-end and local both sides signal energy, a side strong with signal energy is voice signal, promotes its gain it is passed through, and reduce the opposing party's gain and suppress its signal, thereby reach the effect of eliminating echo.This technology control bi-directional gain, certain only make in a flash a certain folk prescription to signal through reaching the purpose that echo is eliminated, therefore be called as single worker's Echo Canceller.Its advantage is a cheap and simple, and weak point is: because unidirectional conducting, but bilateral when talking simultaneously, a side sound is cut off (Cut Off) can't pass to the other side, influences communication effect; Special in the higher occasion of ambient noise, its mechanism can't be distinguished voice and noise signal, can't reach desirable effect.Modern conference system suppresses the requirement that the formula technology can't satisfy high-end devices and user to require increasingly high, the traditional gain inhibition and the echo of echo cancellation technology.
Along with the improvement with Digital Signal Processing of becoming stronger day by day of digital signal processor function, the echo cancellation technology of sef-adapting filter pattern obtains promoting with using gradually.But the sef-adapting filter pattern needs massive values computation, can only realize that usually the echo of narrow-band frequency response is eliminated, and eliminates like the echo of telephonic communication and mobile phone; Like the patent No. of Nokia is 97195672.3 patent, and being primarily aimed at phone and mobile phone is object.And for the broadband frequency response (Wide Band:50Hz-7000Hz) of communication new trend, ultra broadband frequency response (Super Wide Band:50Hz-14000Hz), and the demand of full frequency band frequency response (Whole Band:20Hz-20000Hz) is difficult to maybe can't satisfy; Have only the frequency response of 200Hz-7000Hz like the XAP400 of ClearOne, the VORTEX of Polycom has only the frequency of 200Hz-10000Hz corresponding.Further, because background noise influence in the echo cancellation process, the accuracy of the difficulty of bilateral while talk detection and the adjustment of sef-adapting filter self adaptation is not enough, and the effect that echo is eliminated is affected, and does not reach the enough echo eliminations or the effect of full duplex.
Summary of the invention
The technical problem that the present invention will solve provides the method that a kind of adaptive full duplex full frequency band echo is eliminated, and can eliminate echo by full duplex full frequency band.
For solving the problems of the technologies described above the method that adaptive full duplex full frequency band echo of the present invention is eliminated, may further comprise the steps:
The local speaker output signal of reference signal is carried out filtering and obtains sub-band reference signal through Methods of Subband Filter Banks;
Microphone input signal is carried out filtering and obtains the subband microphone input signal through Methods of Subband Filter Banks;
Sub-band reference signal is obtained estimated echo via the subband acoustic echo filter filtering;
Subband microphone input signal and estimated echo are carried out subband acoustic echo cancellation through the subband acoustic echo cancellation device, obtain residue signal;
Sub-band reference signal, subband microphone input signal, estimated echo and residue signal are carried out weighted energy and spectrum analysis, the mode of decision Nonlinear Processing;
According to the subband signal analysis result is carried out Nonlinear Processing with further reduction echo to residue signal, the Nonlinear Processing concrete grammar is: when having only local speech, nonlinear processor passes through by residual echo; When having only the far-end speech, nonlinear processor is blocked residual echo; When talking simultaneously, nonlinear processor suppresses residual echo when bilateral;
With the signal of subband, with and the signal analysis result through bilateral while talk detector bilateral talk active state is carried out analyzing and testing; To determine how acoustic filter correction taken back in antithetical phrase; When having only echo; Use correction correction adaptive filter coefficient, adopt PNLMS coefficient correction formula to revise; When talking simultaneously, freeze the correction of adaptive filter coefficient when bilateral;
Acoustic filter correction taken back in antithetical phrase as a result according to bilateral while talk detector and signal analysis;
The signal that will disappear from the echo of all subbands is reduced to the full frequency band audio signal after echo is eliminated through subband signal composite filter group;
Also comprise in the said method and keep reference signal and the synchronous step of echo signal.
After said Nonlinear Processing, can add noise reduction, automatic gain control, comfort noise technology.
The method that adopts above-mentioned adaptive full duplex full frequency band echo to eliminate is divided into subband with signal filtering, has reduced being correlated with between the subband signal; Thereby improved the convergence problem of sef-adapting filter, reduced the data processing amount of system, improved the efficient of echo cancellation process; Realize the full duplex of session; Improve the communication audio frequency response, improve communication quality, realized that the echo of full frequency band response is eliminated.
Embodiment
Below in conjunction with accompanying drawing the present invention is done further explanation at length.Echo problem is a complicated problems.The echo sound wave is except directly passing to the microphone from loud speaker; Also can be from the sound wave that loud speaker sends through desktop; Ground, ceiling, walls etc. reflect and the arrival microphone; And also having repeatedly reflected sound wave to exist, the echo that microphone receives is the summation of all direct sound waves and reflective sound wave.This has caused the complexity of echo, and holds the time-delay of long period, and this duration is called as hangover time (Tail Length) in echo cancellation technology.Because the acoustic reflection rate of various materials is different, the echo duration is different because of meeting room, and the hangover time of 128ms (millisecond) can satisfy the demand of most of meeting room.In addition, the transmission of sound wave and reflection are different because of frequency, and therefore, echo also is the function of a frequency.Further, the echo that microphone receives is the summation of all direct sound waves and reflective sound wave, and each sound wave mutual superposition or counteracting make echo problem become very complicated.
The method of the adaptive full duplex full frequency band Echo Canceller that the present invention relates to and concrete realization technology.The present invention is the basis, integrated digital signal analysis, input with Digital Signal Processing (DSP-Digital Signal Processing); Digital filtering, adaptive-filtering, Nonlinear Processing; Bilateral while talk detection, and the subband technology waits and realizes that full duplex full frequency band echo eliminates.Noise reduction process (Noise Reduction), automatic gain control (AGC-Automatic Gain Control) and comfort noise (Comfort Noise) etc. can embed and cooperate the present invention with further improvement system acoustics.But fall hot-tempered processing, automatic gain control and comfort noise be not within the present invention.
As shown in Figure 1; The signal of Echo Canceller is by the input signal S_fin that receives from far-end; Be sent to the output signal S_fout that far-end goes, deliver to the output signal S_speaker of local loud speaker, and form from the microphone input signal S_mic that microphone receives.Remote end input signal is fed to local loud speaker when broadcasting; The reference signal that this signal of Echo Canceller sampling is eliminated as echo; This signal is the foundation of echo estimation; Therefore be called as reference signal), and handle the back from the microphone input signal S_mic that microphone receives through Echo Canceller and be sent to far-end as exporting signal S_fout.
As shown in Figure 2; This Echo Canceller is mainly by the sub-band filter of Methods of Subband Filter Banks (Sub-Band Filter Bank); The echo estimation (Sub-Band Echo Estimation) of subband acoustic echo filter; Subband acoustic echo cancellation (Echo Subtraction); Bilateral while talk detection (Double Talk Detection), system signal weighted energy are analyzed and control (Weighted Energy Analysis and Control), the Nonlinear Processing of subband residue signal (NLP-Nonlinear Porcessing); The correction of subband acoustic echo filter (Coefficient Update), the signal of subband signal composite filter group (Synthesize Filter Bank) is synthetic etc., and several funtion parts are formed.The main processing step of this Echo Canceller is:
The local speaker output signal of reference signal is carried out filtering and obtains sub-band reference signal through Methods of Subband Filter Banks.
Microphone input signal is carried out filtering and obtains the subband microphone input signal through Methods of Subband Filter Banks.
Sub-band reference signal obtains estimated echo via the subband acoustic echo filter filtering.
Subband microphone input signal and estimated echo are carried out subband acoustic echo cancellation through the subband acoustic echo cancellation device, and this signal is called as residue signal.
To sub-band reference signal, the subband microphone input signal, estimated echo, and residue signal etc. carries out weighted energy and spectrum analysis, the mode of decision Nonlinear Processing.
According to the subband signal analysis result residue signal is carried out Nonlinear Processing with further reduction echo.
The signal of each above-mentioned subband, with and the signal analysis result through bilateral while talk detector bilateral talk active state is carried out analyzing and testing, to determine how acoustic filter correction taken back in antithetical phrase.
Acoustic filter correction taken back in antithetical phrase as a result according to bilateral while talk detector and signal analysis.
The signal that will disappear from the echo of all subbands is reduced to the full frequency band audio signal after echo is eliminated through subband signal composite filter group, and this signal will be used as output signal S_fout and be sent to far-end.
Because the cause of echo hangover time, reference signal keeps the data of up-to-date 128ms.According to the difference of application, can suitably adjust echo hangover time (Tail Length).In addition, in some occasion, reference signal is just delivered to loud speaker through certain time-delay, to this situation, can add delay memory to keep the synchronous of reference signal and echo signal.
Likewise, microphone input signal obtains the subband composition of microphone input signal via same Methods of Subband Filter Banks filtering.Analysis and processing after subband signal will be used for.
The group number of the sample rate of digital signal and Methods of Subband Filter Banks can be decided according to the demand and the specification of actual treatment signal.For example, in the full range end gauage lattice of 20Hz-20000Hz,, can adopt the sample rate of 48KHz according to Nyquist-Shannon Sampling Theorem.According to the difference and the requirement of actual application of sample rate, the group number of Methods of Subband Filter Banks can be selected between thousand from several to several.
Sub-filter is limited exciter response filter (FIR-Finite Impulse Response filter).Hamming (Hamming) filter and Kai Sa (Kaiser) filter have darker stopband attenuation and steeper transition band decay, the better selection when being designing filter.
Fig. 3 has explained the working condition of the present invention's sef-adapting filter on each subband.Sef-adapting filter mainly is made up of following three parts on each subband: 1. the reference signal echo signal that obtains estimating through the limited exciter response filter of echo.2. the echo signal that from the subband composition of microphone input signal, deducts estimation is offset echo and is obtained residue signal.3. with reference signal and residue signal the parameter of echo filter is revised renewal.
Microphone input signal S_mic obtains the subband composition M_1 of microphone input signal S_mic through the filtering of sub-filter ... M_N; Through the sub-band adaptive filter; Obtain M ' _ 1 ... M ' _ n-signal; Obtain M through nonlinear processing then " _ 1 ... M " _ N, finally synthetic full frequency band audio signal microphone input signal the S ' _ mic that is reduced to after echo is eliminated of process subband synthesis filter.Equally, the output signal S_speaker of local loud speaker obtains the subband composition S_1 of the output signal S_speaker of loud speaker through the filtering of sub-filter ... S_N is sent to the sub-band adaptive filter.Digital signal processor is also controlled transmit control signal control_2, control_1 of sub-filter, Nonlinear Processing.
Under ideal conditions, the estimated echo signal should be consistent with the echo signal of actual microphone input.Like this, after echo cancelltion was handled, residual echo should be zero.And actual conditions are really not so, because influence of environmental noise, even the estimated echo signal is consistent with the echo signal of actual microphone input; Residual echo after echo cancelltion is handled is also non-vanishing; At least ambient noise still exists, and the echo filter that obtains with this residual echo correction will depart from actual echo exciter response, and local in addition speech is to the influence of echo filter parameter correction; Residual echo always exists, and therefore needs Nonlinear Processing to come further to reduce echo.
The present invention controls nonlinear processor according to signal analysis testing result accurately, and nonlinear processor adopts through (Pass), and the mode that inhibition (Suppression) and blocking-up (Center Cut) combine is handled residual echo.Its concrete grammar is: when having only local speech, nonlinear processor passes through by residual echo; When having only the far-end speech, nonlinear processor is blocked residual echo; When talking simultaneously, nonlinear processor suppresses residual echo when bilateral.Since this nonlinear processor on each subband respectively to signal processing; When therefore can suppress echo effectively; Can let local speech pass through again, solve residual echo and sound and cut off crag-fast problem, improve the effect of Echo Canceller significantly.
Subband signal after above nonlinear processor is handled is admitted to subband synthesis filter, and subband signal is synthesized and is reduced to the full frequency band voice signal.Signal after this reduction is sent to far-end.The characteristic of subband synthesis filter and characteristic and the exciter response decision of exciter response by subband (separation) filter.Signal should be reduced to original signal through behind these two groups of filters in principle.But in practical application, since the restriction of perfect reduction Design of Filter, the influence of echo cancellation process and Nonlinear Processing; Between signal that reduction obtains and the primary signal error is often arranged; Therefore, when the designing filter group, need be controlled at its influence to a certain degree to signal below.For example, its error should be less than-30 decibels.
The system signal energy spectrometer is to reference signal; Microphone input signal and residual echo signal carry out spectrum analysis and weighted energy analysis to measure; To background noise; Voice average energy and peak energy are estimated, according to the relation of signal energy and each estimation energy bilateral voice status are assessed, and this result is used to control nonlinear processor and bilateral while talk detector.For example, the voice average energy can be simplified amount of calculation with the unlimited exciter response filter of one-level (IIR-Infinite Impulse Response filter).
Bilateral while talk detector is used to detect the correcting process of the state of bilateral voice with the control echo sef-adapting filter.According to Echo Canceller model and adaptive filter algorithm (LMS), locally revise echo when noiseless and eliminate sef-adapting filter and can make it to restrain and improve the effect that echo is eliminated in far-end speech; On the contrary, when make a speech in this locality, revising echo eliminates sef-adapting filter and will make it to disperse and worsen the effect that echo is eliminated.
Therefore, when make a speech in this locality, need freeze echo is eliminated the correction of sef-adapting filter.Similarly, when far-end does not have voice signal, also should freeze echo is eliminated the correction of sef-adapting filter.Except the data of using system SIGNAL ENERGY ANALYSIS, native system carries out computational analysis to the cross-correlation between reference signal and the microphone signal, thus bilateral voice status is made sef-adapting filter is eliminated in judgement with control echo correction.Summarize and opinion, when having only echo, the cross-correlation coefficient between reference signal and the microphone signal is bigger; Otherwise its cross-correlation coefficient is less.
The adaptive filter coefficient correction is one of most important parts in the sef-adapting filter.Difference according to the coefficient correction formula has LMS, methods such as NLMS and PNLMS.The fluctuation of LMS is bigger; NLMS is stable but convergence rate is slow than PNLMS; PNLMS is stable but amount of calculation is bigger.The present invention adopts the PNLMS method of simplification, but does not get rid of above method.The result of bilateral while talk detection is used for the control coefrficient correction.For example, when having only echo, use bigger coefficient correction to remove to revise adaptive filter coefficient; And when talking simultaneously, freeze the correction of adaptive filter coefficient when bilateral.
In addition, to the requirement of the system specification, noise reduction, automatic gain cybernetics, comfort noise etc. can be added in after the Nonlinear Processing, and integration is in system of the present invention simply.But this function is not included among the present invention.
The foregoing description does not limit the present invention in any way, and all employings are equal to the technical scheme that mode obtained of replacement or equivalent transformation, all drop in protection scope of the present invention.