EP0852052B1

Movatterモバイル変換

Info

Publication number: EP0852052B1
Application number: EP96931552A
Authority: EP
Inventors: Torbjörn W. SÖLVE
Original assignee: Ericsson Inc
Current assignee: Ericsson Inc
Priority date: 1995-09-14
Filing date: 1996-09-13
Publication date: 2001-06-13
Anticipated expiration: 2016-09-13
Also published as: AU724111B2; KR100423029B1; AU7078496A; KR19990044659A; EE9800068A; WO1997010586A1; PL325532A1; DE69613380D1; MX9801857A; RU2163032C2; CN1121684C; EE03456B1; TR199800475T1; NO981074D0; PL185513B1; CN1201547A; EP0852052A1; JPH11514453A; CA2231107A1; NO981074L

Description

The present invention relates to noise reductionsystems, and in particular, to an adaptive speechintelligibility enhancement system for use in portabledigital radio telephones.
The cellular telephone industry has made phenomenalstrides in commercial operations in the United States as well as the rest of the world. Demand for cellularservices in major metropolitan areas is outstrippingcurrent system capacity. Assuming this trendcontinues, cellular telecommunications will reach eventhe smallest rural markets. Consequently, cellularcapacity must be increased while maintaining highquality service at a reasonable cost. One importantstep towards increasing capacity is the conversion ofcellular systems from analog to digital transmission.This conversion is also important because the firstgeneration of personal communication networks (PCNs),employing low cost, pocket-size, cordless telephonesthat can be easily carried and used to make or receivecalls in the home, office, street, car, etc., willlikely be provided by cellular carriers using the nextgeneration digital cellular infrastructure.
Digital communication systems take advantage ofpowerful digital signal processing techniques. Digitalsignal processing refers generally to mathematical andother manipulation of digitized signals. For example,after converting (digitizing) an analog signal intodigital form, that digital signal may be filtered,amplified, and attenuated using simple mathematicalroutines in a digital signal processor (DSP).Typically, DSPs are manufactured as high speedintegrated circuits so that data processing operationscan be performed essentially in real time. DSPs mayalso be used to reduce the bit transmission rate ofdigitized speech which translates into reduced spectraloccupancy of the transmitted radio signals and increased system capacity. For example, if speechsignals are digitized using 14-bit linear Pulse CodeModulation (PCM) and sampled at an 8 KHz rate, a serialbit rate of 112 Kbits/sec is produced. Moreover, bytaking mathematical advantage of redundancies and otherpredicable characteristics of human speech, voicecoding techniques can be used to compress the serialbit rate from 112 Kbits/sec to 7.95 Kbits/sec toachieve a 14:1 reduction in bit transmission rate.Reduced transmission rates translate into moreavailable bandwidth.
One popular speech compression technique adopted inthe United States by the TIA for use as the digitalstandard for the second generation of cellulartelephone systems (i.e., IS-54) is vector sourcebookexcited linear predictive coding (VSELP).Unfortunately, when audio signals including speech,mixed with high levels of ambient noise (particularly"colored noise"), are coded/compressed using VSELP,undesirable audio signal characteristics may be part ofthe result. For example, if a digital mobile telephoneis used in a noisy environment (e.g. inside a movingautomobile), both ambient noise and desired speech arecompressed using the VSELP encoding algorithm andtransmitted to a base station where the compressedsignal is decoded and reconstituted into audiblespeech. When the background noise is reconstitutedinto an analog format, undesirable, audible distortionof the noise, and occasionally in the speech, is introduced. This distortion is very annoying to theaverage listener.
The distortion is caused in large part by theenvironment in which the mobile telephones are used.Mobile telephones are typically used in a vehicle'sinterior where there is often ambient noise produced bythe vehicle's engine and surrounding vehicular traffic.This ambient noise in the vehicle's interior istypically concentrated in the low audible frequencyrange and the magnitude of the noise can vary due tosuch factors as the speed and acceleration of thevehicle and the extent of the surrounding vehiculartraffic. This type of low frequency noise also has thetendency of significantly decreasing theintelligibility of the speech coming from the speakingperson in the car environment. The decrease in speechintelligibility caused by low frequency noise can beparticularly significant in communication systemsdeploying a VSELP vocoder, but can also occur incommunication systems that do not include a VSELPvocoder.
The influence of the ambient noise on the mobiletelephone can also be affected by the manner in whichthe mobile telephone is used. In particular, themobile telephone may be used in a hands-free mode wherethe telephone user talks on the telephone while themobile telephone is in a cradle. This frees thetelephone user's hands to drive but also increases thedistance that the telephone user's audible words musttravel before reaching the microphone input of the mobile telephone. This increased distance between theuser and the mobile telephone, along with the varyingambient noise, can result in noise being a significantportion of the total power spectral energy of the audiosignal inputted into the mobile telephone.
Prior art disclosure contained inEP 0 645 756,EP0 558 312,EP 0 665 530, DE 4 012 349, U.S. Patent Nos.4,811,404, 4,461,025, and 5,251,263 all disclose mannersby which to filter unwanted signal components.
In theory, various signal processing algorithms couldbe implemented using digital signal processors to filterthe VSELP encoded background noise. These solutions,however, often require significant digital signalprocessing overhead, measured in terms of millions ofinstructions executed per second (MIPS), which consumesvaluable processing time, memory space, and powerconsumption. Each of these signal processing resources,however, is limited in portable radiotelephones. Hence,simply increasing the processing burden of the DSP is notan optimal solution for minimizing VSELP encoded and othertypes of background noise.
The present invention provides a method and an apparatus for selectively altering a frame of a digital signal according toclaims 1 and 9.
The present invention provides an adaptive noisereduction system that reduces the undesirablecontributions of encoded background noise while bothminimizing any negative impact on the quality of theencoded speech and minimizing any increased drain ondigital signal processor resources. The method and systemof the present invention increases the intelligibility ofthe speech in a digitized audio signal by passing framesof the digitized audio signal through a filter circuit.The filter circuit functions as an adjustable, high-pass filter which filters aportion of the digitized signal in a low audiblefrequency range and passes the portion of the digitizedsignal falling in higher frequency ranges. Because thenoise in a vehicle tends to be concentrated in a lowaudible frequency range and only a relatively smallportion of the intelligibility content of speech fallswithin this low frequency range, the filter circuitfilters a large segment of the noise in the digitizedaudio signal while only filtering less importantsegments of the speech. This results in a relativelylarger portion of the noise energy being removedcompared to the portion of the speech energy removed.By adaptively adjusting and selecting the frequencyresponse curve of the filter circuit, the amount ofspeech filtered is limited and has a minimal affect onthe intelligibility of the speech outputted by theradio.
A filter control circuit is used to adjust thefilter circuit to exhibit different frequency responsecurves as a function of a noise estimate and/or aspectral profile result corresponding to the noise inthe audio signal. The noise estimate and/or thespectral profile result are adjusted on a frame-by-framebasis for the digital signal and as a function ofspeech detection. If speech is not detected, the noiseestimate and/or spectral profile result is updated forthe current frame. If speech is detected, the noiseestimate and/or spectral profile result is leftunadjusted.
In a first embodiment, the filter circuitcalculates noise estimates for the frames of thedigitized audio signals. The noise estimatescorrespond to the amount of background noise in theframes of the digitized audio signals. As the relativeamount of background noise to speech in a low frequencyrange of speech increases, the noise estimatesincrease. The filter control circuit uses the noiseestimates to adjust the filter circuit to filter largerportions of the low frequency range of speech as therelative amount of background noise to speech in a lowfrequency range of speech increases. When nobackground noise is present, no portion of the speechsignal is filtered. Larger portions of noise andspeech information are extracted when there is a higherlevel of background noise. Because noise tends to beconcentrated in a low frequency range and only arelatively small portion of the intelligibility contentof speech falls within this low frequency range, theoverall intelligibility of the audio signal can beincreased by increasing the portion of low frequencyenergy being filtered as the noise estimates increase.
In a second embodiment, a modified filter controlcircuit is used to adjust the filter circuit to exhibitdifferent frequency response curves as a function of anoise profile of the noise estimate over a selectedfrequency range in the audio signal. The filtercontrol circuit includes a spectral analyzer fordetermining a noise profile estimate as a function ofthe detection speech. A noise profile estimate is determined for a current frame and compared to areference noise profile. Based on this comparison, thefilter circuit is adaptively adjusted to extractvarying amounts of low frequency energy from thecurrent frame.
The adaptive noise reduction system according tothe present invention may be advantageously applied totelecommunication systems in which portable/mobileradio transceivers communicate over RF channels witheach other or with fixed telephone line subscribers.Each transceiver includes an antenna, a receiver forconverting radio signals received over an RF channelvia the antenna into analog audio signals, and atransmitter. The transmitter includes a coder-decoder(codec) for digitizing analog audio signals to betransmitted into frames of digitized speechinformation, the speech information including bothspeech and background noise. A digital signalprocessor processes a current frame based on anestimate of the background noise and the detection ofspeech in the current frame to minimize backgroundnoise. A modulator modulates an RF carrier with theprocessed frame of digitized speech information forsubsequent transmission via the antenna.
These and other features and advantages of thepresent invention will be readily apparent to one ofordinary skill in the art from the following written description, read in conjunction with the drawings, inwhich:
FIGURE 1 is a general functional block diagram ofthe present invention;
FIGURE 2 illustrates the frame and slot structureof the U.S. digital standard IS-54 for cellular radiocommunications;
FIGURE 3 is a block diagram of a first preferredembodiment of the present invention implemented using adigital signal processor;
FIGURE 4 is a functional block diagram of anexemplary embodiment of the present invention in one ofplural portable radio transceivers in atelecommunication system;
FIGURES 5A and 5B is a flow chart which illustratesfunctions/operations performed by the digital signalprocessor in implementing the first preferredembodiment of the present invention;
FIGURE 6A is a graph illustrating a first exampleof an attenuation vs. frequency characteristic of afilter circuit according to the first preferredembodiment of the present invention;
FIGURE 6B is a graph illustrating a second exampleof an attenuation vs. frequency characteristic of afilter circuit according to the first preferredembodiment of the present invention;
FIGURE 7 is an example look-up table accessible bythe filter control circuit of the first preferredembodiment of the present invention;
FIGURES 8A and 8B are graphs illustrating theamplitude vs. frequency characteristics of exampleinput audio signals;
FIGURES 9A and 9B are graphs illustrating theamplitude vs. frequency characteristics of the inputaudio signals of Figures 8A and 8B, respectively, afterhaving been filtered by the filter circuit of thepresent invention;
FIGURE 10 is a block diagram of a second preferredembodiment of the present invention implemented using adigital signal processor;
FIGURE 11 is a flow chart, corresponding to theflow chart of Figure 5B, which illustratesfunctions/operations performed by the digital signalprocessor in implementing the second preferredembodiment of the present invention; and
FIGURE 12 is an example look-up table accessible bythe filter control circuit of the second preferredembodiment of the present invention.
In the following description, for purposes ofexplanation and not limitation, specific details areset forth, such as particular circuits, circuitcomponents, techniques, flow charts, etc. in order toprovide a thorough understanding of the invention.However, it will be apparent to one skilled in the artthat the present invention may be practiced in otherembodiments that depart from these specific details.In other instances, detailed descriptions of well known methods, devices, and circuits are omitted so as not toobscure the description of the present invention withunnecessary details.
Figure 1 is a general block diagram of the adaptivenoise reduction system 100 according to the presentinvention. Adaptivenoise reduction system 100includes afilter control circuit 105 connected to afilter circuit 115.Filter control circuit 105generates a filter control signal for a current frameof a digitized audio signal. The filter control signalis outputted to thefilter circuit 115, and thefiltercircuit 115 adjusts in response to the filter controlsignal to exhibit a high-pass frequency response curveselected based on the filter control signal. Theadjustedfilter circuit 115 filters the current frameof the digitized audio signal. The filtering signal isprocessed by avoice coder 120 to produce a codedsignal representing the digitized audio signal.
In an exemplary embodiment of the invention appliedto portable/mobile radio telephone transceivers in acellular telecommunications system, Figure 2illustrates the time division multiple access (TDMA)frame structure employed by the IS-54 standard fordigital cellular telecommunications. A "frame" is atwenty millisecond time period which includes onetransmit block TX, one receive block RX, and a signalstrength measurement block used for mobile-assistedhand-off (MAHO). The two consecutive frames shown inFigure 2 are transmitted in a forty millisecond timeperiod. Digitized speech and background noise information is processed and filtered on a frame-by-framebasis as further described below.
Preferably, the functions of thefilter controlcircuit 105,filter circuit 115, andvoice coder 120shown in Figure 1 are implemented with a high speeddigital signal processor. One suitable digital signalprocessor is the TMS320C53 DSP available from TexasInstruments. The TMS320C53 DSP includes on a singleintegrated chip a sixteen-bit microprocessor, on-chipRAM for storing data such as speech frames to beprocessed, ROM for storing various data processingalgorithms including the VSELP speech compressionalgorithm, and other algorithms to be described belowfor implementing the functions performed by thefiltercontrol circuit 105 and thefilter circuit 115.
A first embodiment of the present invention isshown in Figure 3. In the first embodiment, thefiltercircuit 115 is adjusted as a function of backgroundnoise estimates determined by the filter controlcircuit. Frames of pulse code modulated (PCM) audioinformation are sequentially stored in the DSP's on-chipRAM. The audio information could be digitizedusing other digitization techniques. Each PCM frame isretrieved from a DSP on-chip RAM and processed byframeenergy estimator 210, and stored temporarily intemporary frame store 220. The energy of the currentframe determined byframe energy estimator 210 isprovided tonoise estimator 230 andspeech detector 240function blocks.Speech detector 240 indicates thatspeech is present in the current frame when the frame energy estimate exceeds the sum of the previous noiseestimate and a speech threshold. If thespeechdetector 240 determines that no speech is present, thedigital signal processor 200 calculates an updatednoise estimate as a function of the previous noiseestimate and the current frame energy (block 230).
The updated noise estimate is outputted to afilterselector 235.Filter selector 235 generates a filtercontrol signal based on the noise estimate. In thepreferred embodiment, thefilter selector 235 accessesa look-up table in generating the filter controlsignal. The look-up table includes a series of filtercontrol values that are each matched with a noiseestimate or range of noise estimates. A filter controlvalue from a look-up table is selected based on theupdated noise estimate and this filter control value isrepresented by a filter control signal outputted to afilter bank 265 for thefilter circuit 115. Tostabilize the process and avoid accessive switchingbetween different filters a hangover time of N framesis set upon the selection of a new filter. A newfilter can only be selected every N frames, where N isan integer greater than one and preferably greater than10.
Thefilter circuit 115 is adjusted in response tothe filter control signal to exhibit a high-passfrequency response curve that corresponds with theinputted filter control signal and noise estimate.Various different types of filter circuits well knownin prior art can be utilized to exhibit selected frequency response curves in response to the filtercontrol signal. These prior art filters include IIRfilters such as Butterworth, Chebyshev (Tschebyscheff) orelliptic filters. IIR filters are preferable to FIRfilters, which also can be used, due to lower processingrequirements.
The filtered signal is processed by avoice coder 120which is used to compress the bit rate of the filteredsignal. In the preferred embodiments, thevoice coder 120uses vector sourcebook excited linear predictive coding(VSELP) to code the audio signal. Other voice codingtechniques and algorithms such as code excited linearpredictive (CELP) codings, residual pulse excited linearpredictive (RPE-LTP) coding, improved multiband excited(IMBE) coding can be used. By filtering the frames ofaudio signals in accordance with the present inventionbefore voice coding, background noise is minimized whichsubstantially reduces any undesired noise effects in thespeech when it is reconstituted. It also prevents thespeech from being "drowned" in low frequent noise.
Thedigital signal processor 200 described inconjunction with Figure 3 can be used, for example, in thetransceiver of a digital portable/mobile radiotelephoneused in a radio telecommunications system. Figure 4illustrates one such digital radio transceiver which maybe used in a cellular telecommunications network.
Audio signals including speech and background noiseare input in amicrophone 400 to a coder-decoder (codec)402 which preferably is an application specific integratedcircuit (ASIC). The band limited audio signals detectedatmicrophone 400 are sampled by thecodec 402 at a rateof 8,000 samples per second and blocked into frames.Accordingly, each twenty millisecond frame includes 160speech samples. These samples are quantized and convertedinto a coded digital format such as 14-bit linear PCM.Once 160 samples of digitized speech for a current frameare stored in a transmitDSP 200 in on-chip RAM 202, thetransmitDSP 200 performs channel encoding functions, theframe energy estimation, noise estimation, speechdetection, FFT, filter functions and digital speechcoding/compression in accordance with the VSELP algorithm,as described above in conjunction with Figure 3.
Asupervisory microprocessor 432 controls the overalloperation of all of the components in the transceivershown in Figure 4. The filtered PCM data stream generatedby transmitDSP 200 is provided for quadrature modulationand transmission. To this end, anASIC gate array 404generates in-phase (I) and quadrature (Q) channels ofinformation based upon the filtered PCM data stream fromDSP 200. The I and Q bit streams are processed by matched, low pass filters 406and 408 and passed onto IQ mixers inbalanced modulator410. Areference oscillator 412 and amultiplier 414provide a transmit intermediate frequency (IF). The Isignal is mixed with in-phase IF, and the Q signal ismixed with quadrature IF (i.e., the in-phase IF delayedby 90 degrees by phase shifter 416). The mixed I and Qsignals are summed, converted "up" to an RF channelfrequency selected bychannel synthesizer 430, andtransmitted via duplexer 420 andantenna 422 over theselected radio frequency channel.
On the receive side, signals received viaantenna422 and duplexer 420 are down converted from theselected receive channel frequency in amixer 424 to afirst IF frequency using a local oscillator signalsynthesized bychannel synthesizer 430 based on theoutput ofreference oscillator 428. The output of thefirst IFmixer 424 is filtered and down converted infrequency to a second IF frequency based on anotheroutput fromchannel synthesizer 430 anddemodulator426. A receivegate array 434 then converts the secondIF signal into a series of phase samples and a seriesof frequency samples. The receiveDSP 436 performsdemodulation, filtering, gain/attenuation, channeldecoding, and speech expansion on the received signals.The processed speech data are then sent tocodec 402and converted to baseband audio signals for drivingloudspeaker 438.
The operations performed by thedigital signalprocessor 200 for implementing the functions offilter control circuit 105,filter circuit 115, andvoicecoder 120 will now be described in conjunction with theflow chart illustrated in Figures 5A and 5B.Frameenergy estimator 210 determines the energy in eachframe of audio signals.Frame energy estimator 210determines the energy of the current frame bycalculating the sum of the squared values of each PCMsample in the frame (step 505). Since there are 160samples per twenty millisecond frame for an 8000samples per second sampling rate, 160 squared PCMsamples are summed. Expressed mathematically, theframe energy estimate is determined according toequation 1 below:
The frame energy value calculated for the current frameis stored in the on-chip RAM 202 of DSP 200 (step 510).
The functions ofspeech detector 240 includefetching a noise estimate previously determined bynoise estimator 230 from the on-chip RAM of DSP 200(step 515). Of course, when the transceiver isinitially powered up, no noise estimate will exist.Decision block 520 anticipates this situation andassigns a noise estimate instep 525. Preferably, anarbitrarily high value, e.g. 20 dB above normal speechlevels, is assigned as the noise estimate in order toforce an update of the noise estimate value as will bedescribed below. The frame energy determined byframe energy estimator 210 is retrieved from the on-chip RAM202 of DSP 200 (block 530). A decision is made inblock 535 as to whether the frame energy estimateexceeds the sum of the retrieved noise estimate plus apredetermined speech threshold value, as shown inequation 2 below:frame energy estimate > (noise estimate + speech threshold)
The speech threshold value may be a fixed valuedetermined empirically to be larger than short termenergy variations of typical background noise and may,for example, be set to 9 dB. In addition, the speechthreshold value may be adaptively modified to reflectchanging speech conditions such as when the speakerenters a noisier or quieter environment. If the frameenergy estimate exceeds the sum inequation 2, a flagis set inblock 570 that speech exists. Ifspeechdetector 240 detects that speech exists, thennoiseestimator 230 is bypassed and the noise estimatecalculated for the previous frame in the digitizedaudio is retrieved and used as the current noiseestimate. Conversely, if the frame energy estimate isless than the sum inequation 2, the speech flag isreset inblock 540.
Other systems for detecting speech in a currentframe can also be used. For example, the EuropeanTelecommunications Standards Institute (ETSI) hasdeveloped a standard for voice activity detection (VAD)in the Global System for Mobile communications (GSM)system and is described in the ETSI Reference: RE/SMG-020632P which is incorporated by reference. Thisstandard could be used for speech detection in thepresent invention and is incorporated by reference.
If speech does not exist, the noise estimationupdate routine ofnoise estimator 230 is executed. Inessence, the noise estimate is a running average of theframe energy during periods of no speech. As describedabove, if the initial start-up noise estimate is chosensufficiently high, speech is not detected, and thespeech flag will be reset thereby forcing an update ofthe noise estimate.
In the noise estimation routine followed bynoiseestimator 230, a difference/error delta (Δ) isdetermined inblock 545 between the frame noise energygenerated byframe energy estimator 210 and a noiseestimate previously calculated bynoise estimator 230in accordance with the following equation:Δ = current frame energy - previous noise estimateA determination is made indecision block 550 whether Δexceeds zero. If Δ is negative, as occurs for highvalues of the noise estimate, then the noise estimateis recalculated inblock 560 in accordance with thefollowing equation:noise estimate = previous noise estimate + Δ/2Since Δ is negative, this results in a downwardcorrection of the noise estimate. The relatively largestep size of Δ/2 is chosen to rapidly correct for decreasing noise levels. However, if the frame energyexceeds the noise estimate, providing a Δ greater thanzero, the noise is updated inblock 555 in accordancewith the following equation:noise estimate = previous noise estimate + Δ/256Since Δ is positive, the noise estimate must beincreased. However, a smaller step size of Δ/256 (ascompared to Δ/2) is chosen to gradually increase thenoise estimate and provide substantial immunity totransient noise.
The noise estimate calculated for the current frameis outputted to thefilter selector 235. In the firstpreferred embodiment,filter selector 235 accesses alook-up table and uses the current noise estimate toselect a filter control value (Step 572). The filtercircuit 115 (in Step 574) is then adjusted as a functionof the selected filter control value to exhibit afrequency response curve intended to increase the amountof noise filtered as the noise estimate and backgroundnoise increases. The PCM samples stored in DSP RAM arethen passed through the adjustedfilter circuit 265 tofilter the PCM samples in order to remove noise (Step576). The filtered PCM samples are then processed byvoice coder 120 (step 578), and the coded samples arethen outputted to RF transmit circuits (Step 580).
Figures 6A and 6B show examples of how thefiltercircuit 115 adjusts to exhibit different frequencyresponse curves F1-F4 for different filter control signals inputted to thefilter circuit 115. As shown inFigure 6A, thefilter circuit 115 can be selected toexhibit a series of different frequency response curveswith the frequency response curves F1-F4 having cut-offfrequencies F1c-F4c, respectively. The cut-offfrequencies offilter circuit 115 may range in thepreferred embodiment from 300 Hz to 800 Hz. As thenoise estimates increase, thefilter circuit 115 isdesigned to exhibit frequency response curves havinghigher cut-off frequencies. The higher cut-offfrequencies result in a larger portion of frame energyfalling within the lower frequency range of speech beingextracted by thefilter circuit 115.
Likewise, as shown in Figure 6B, thefilter circuit115 can be selected to exhibit a series of differentfrequency response curves F1-F4 with each frequencyresponse curve having a different slope and the samecut-off frequencies. The cut-off frequency forfrequency response curves F1-F4 is in the above-mentionedrange. As the noise estimate increases, thefilter circuit 115 is adjusted to exhibit frequencyresponse curves having steeper slopes. The steeperslopes result in a larger portion of frame energyfalling within the lower frequency range of speech beingextracted by thefilter circuit 115.
Thefilter circuit 115 filters the current frames asa function of the noise estimate calculated for thecurrent frame. The current frame is filtered so thatthe noise is reduced and a major portion of the speechis passed. The major portion of speech which is passed unfiltered provides for recognizable speech output withonly a minimal reduction in the quality of the speechsignal. A combination of different cutoff frequenciesand different slopes could be used for adaptivelyextracting selected portions of frame energy fallingwithin a low frequency range of speech.
Figure 7 depicts an example look-up table accessedbyfilter selector 235 in order to select one of thefilter response curves F1-F4 forfilter circuit 115.The look-up table includes a series of potential noiseestimates N1-Nn and filter control values F1-Fn thatcorrespond with potential response curves that areexhibitable by thefilter circuit 115. Noise estimatesN1-Nn can each represent a range of noise estimates andare each matched with a particular filter control valueF1-F4. Thefilter control circuit 105 generates afilter control signal by calculating a noise estimateand retrieving from the look-up table the filter controlvalue associated therewith.
Figures 8A & B and 9A & B show how the audio signalfor two frames are each adaptively filtered to providean improved audio signal outputted to the RFtransmitter. Figures 8A and 8B show a first frame and asecond frame of an audio signal containing speechcomponents s1 and s2 and noise components n1 and n2,respectively. As shown, the noise energy n1 and n2 inboth frames is concentrated in a low audible frequencyrange, while the speech energy s1 and s2 is concentratedin a higher audible frequency range. Figure 9A showsthe noise signal n1 and speech signal s1 for the first frame after filtering. Figure 9B shows the noise signaln2 and speech signal s2 for the second frame afterfiltering.
The adaptive audionoise reduction system 100, asdiscussed, is designed to account for the difference innoise level between the first frame and the second frameby adjusting thefilter control circuit 105 based on acalculated noise estimate for the current frame. Forexample, a noise estimate N1 and a spectral profile S1is calculated byfilter control circuit 105 and a filtercontrol value of F1 is selected for the first frame. Inthe preferred embodiment, thefilter circuit 115 isadjusted based on filter control value F1 and exhibits afrequency response curve F1 having a cut-off frequencyFlc, as shown in Figure 6A. The first frame is passedthrough this adjustedfilter circuit 115. Thefiltercircuit 115 is selected so that a large portion of thenoise n1 and only a small portion of speech s1 fallsbelow the cut-off frequency F1c of the frequencyresponse curve F1. This results in noise n1 beingeffectively filtered and only a relatively insignificantportion of speech s1 being filtered. The filtered audiosignal of the first frame is shown in Figure 9A.
In the second frame shown in Figure 8b, a higherbackground noise is present, and assuming speech is notdetected, a higher noise estimate n2 is calculated byfilter control circuit 105. A higher correspondingfilter control value F2 is determined for the secondframe based on the higher noise estimate. In the firstpreferred embodiment, thefilter circuit 115 is adjusted in response to the higher filter control value F2 toexhibit a frequency response curve having a higher cut-offfrequency F2c, as shown in Figure 6A. Thesubsequent frame of audio signal is passed through theadjustedfilter circuit 115. Because the cut-offfrequency F2c of the frequency response curve F2 ishigher for the subsequent frame, a larger portion ofboth the noise n2 and speech s2 is filtered. Theportion of speech s2 filtered is still relativelyinsignificant to the intelligibility informationcontained by the frame so that there is only minimalaffect on the speech. The disadvantage of filtering alarger portion of the speech s2 is offset by theadvantage of the increased removal of noise n2 from thesecond frame. The filtered spectral portion of thespeech does not significantly contribute to theintelligibility of the speech. The filtered audiosignal of the second frame is shown in Figure 9B.
A second preferred embodiment of adaptivenoisereduction system 100 is shown in Figures 10-12. In thesecond preferred embodiment, thefilter control circuit105 adjusts thefilter circuit 115 as a function ofnoise profile estimates. A noise profile estimate iscalculated for each frame and is compared to a referencenoise profile. Based on this comparison, thefiltercircuit 115 is adaptively adjusted to extract varyingamounts of low frequency energy from the current frame.
Referring to Figure 10, aDSP 200 configuredaccording to the second preferred embodiment is shown.As shown, thefilter control circuit 105 includes aspectral analyzer 270, in addition toframe energyestimator 210,noise estimator 230,speech detector 240,andfilter selector 235 which are described with respectto the first preferred embodiment. Thefilter controlcircuit 105 determines noise estimates and detectsspeech for the received frames as described for thefirst embodiment and shown in flow charts 5A and 5B.Upon speech detection for a current frame, thespectralanalyzer 270 updates the noise profile estimate and usesthe noise profile estimate in adjusting thefiltercircuit 115.
Referring to Figure 11, the steps of updating thenoise profile estimate and adjusting thefilter circuit115 is shown. Figure 11 shows the steps performed byspectral analyzer 270 incorporated into the overallprocess previously described in the flow charts ofFigures 5A and 5B for the first preferred embodiment.
When speech is not detected for the current frame,thespectral analyzer 270 first determines a noiseprofile for the current frame (step 600). The noiseprofile determined for the current frame includesenergy calculations for different frequencies (i.e.,frequency bins) within a selected low frequency rangeof speech for the current frame. In the preferredembodiment, the selected frequency range isapproximately 300 to 800 hertz. The noise profile ofthe current frame can be determined by processing thecurrent frame using a Fast Fourier Transform (FFT)having N frequency bins. Processing digital signalsusing an FFT is well-known in the prior art and is advantageous in that very little processing power isrequired where the FFT is limited to a relatively smallnumber of frequency bins such as 32. An FFT having Nfrequency bins produces energy calculations at Ndifferent frequencies. The energy calculations for thefrequency bins falling within the selected frequencyrange form the noise profile for the current frame.
To determine the noise profile estimate for thecurrent frame (step 604), the noise profile for thecurrent frame is averaged with a noise profile estimatedetermined for the previous frame of the audio signal.Where no previous noise profile estimate is available,such as after initialization, a stored, initial noiseprofile estimate can be used. The noise profileestimate includes noise energy estimates e_i (where i =1,2,...n) located at successively lower frequencies(i.e., e₁ is the noise energy estimate for the highestfrequency and e_n is the noise energy estimate for thelowest frequency in the selected frequency range). Inthe preferred embodiment, each noise energy estimate e_icorresponds to an average of the energy calculations ata particular frequency in the selected frequency rangeover a plurality of successive frames in which nospeech was detected. By using a plurality of frames indetermining the noise profile estimate, thefiltercircuit 115 is adjusted on a more gradual basis. Inalternate embodiments, the noise profile estimate canbe equated to the noise profile of the current frame.
The energy estimates e_i of the noise profileestimate are then compared with a reference noise profile (step 604). The reference noise profileincludes reference energy thresholds e_ri (where i =1,2,...n) at frequencies corresponding to thefrequencies for noise energy estimates e_i of the noiseprofile estimate. The reference energy thresholds e_rican be determined empirically. The noise energyestimates e_i are successively compared to correspondingreference energy thresholds e_ri from the highestfrequency energy estimate e₁ to the lowest frequencyenergy estimate e_n.
More specifically, noise energy estimate e₁ isfirst compared to reference noise threshold e_r1. If e₁is greater than reference noise threshold e_r1, then acomparison value c₁ is selected and inputted intofilter selector 235. If noise energy estimate e₁ isless than reference noise threshold e_r1, then noiseenergy estimate e₂ (which is a noise energy estimatetaken at a lower frequency than e₁) is compared toreference noise threshold e_r2. If noise energyestimate e₂ is greater than reference noise thresholde_r2, then a comparison value c₂ is selected and inputtedto filterselector 235. This comparison process iscontinued until a comparison value c_i (where i =1,2,...n) is selected.
Thefilter circuit 235 uses the determinedcomparison value c_i to determine a filter control value.The filter control value is selected from a look-uptable such as that shown in Figure 12. The look-uptable includes a series of comparison values c_i and corresponding filter control values F_i. Thefiltercircuit 115 is adjusted as a function of the selectedfilter control value. Thefilter circuit 115 isadjusted to exhibit a frequency response curve forextracting low frequency energy from the current frame.Thefilter circuit 115 is adjusted to extract increasingamounts of low frequency energy as noise energyestimates at successively higher frequencies surpasstheir corresponding reference energy thresholds. Figure6A and 6B show example frequency response curves forselected filter control values.
Use of noise profile estimates helps improve theability to adaptively adjust the filter circuit toextract low frequency energy in a manner to improve theoverall quality of speech. Since the car environment isnot the only environment where a mobiletelecommunications device is used, and therefore thenoise profile in certain situations could be tiltedmore towards higher frequencies, thespectral analyzer270 can be selectively disabled when noise energy inthe low frequencies is small. Also, when a significantportion of the noise frequency spectrum resides inlower frequencies a steeper filtering slope could beapplied even though some processing power may besacrificed. This extra processing requirement is stillfairly small.
As is evident from the description above, theadaptive noise filter system of the present invention isimplemented simply and without significant increase inDSP calculations. More complex methods of reducing noise, such as "spectral subtraction," require severalcalculation-relates MIPS and a large amount of memory fordata and program code storage. By comparison, the presentinvention may be implemented using only a fraction of theMIPS and memory required for the "spectral subtraction"algorithm which also introduces more speech distortion.Reduced memory reduces the size of the DSP integratedcircuits; decreased MIPS decreases power consumption.Both of these attributes are desirable for battery-poweredportable/mobile radiotelephones.
While the invention has been particularly shown anddescribed with reference to the preferred embodimentsthereof, it is not limited to those embodiments. Forexample, although a DSP is disclosed as performing thefunctions of theframe energy estimator 210,noiseestimator 230,speech detector 240,filter selector 235andfilter circuit 265, these functions could beimplemented using other digital and/or analog components.In addition, anadaptive filtering system 100 could beimplemented where thefilter circuit 115 is adjusted asa function of both noise estimates and noise profileestimates.

Claims

A method for selectively altering a frame of adigital signal formed of a plurality of successive frames,the digital signal representative of an audio signalreceived at a transmitter, the audio signal formedalternately of a speech component, a noise component, andthe speech component together with the noise component,said method comprising the steps of:
estimating an energy level (505) of a frame ofthe digital signal;
determining (535), responsive to the estimatemade during said step of estimating, whether the frame ofthe digital signal includes a speech component;
updating a noise estimate as a function of apreceding noise estimate and the energy level estimatedduring said step of estimating when said step ofdetermining determines that the frame does not contain a speech component;
accessing (572) an entry in a look-up tablehaving filter characteristics indexed against levels ofnoise estimates, the entry accessed associated with thenoise estimate updated during said step of updating;
selecting (574) filter characteristics for afilter such that the filter exhibits a frequency responsecurve having variable gain over different frequencyranges, the filter characteristic selected responsive tothe stored filter characteristics of the entry accessedduring said step of accessing; and
filtering (576) the frame of the digital datawith the filter which exhibits the filter characteristics,thereby to alter the frame of the digital data responsiveto the filter characteristics.
The method of claim 1 further characterized bythe additional intermediary step of determining (600) anoise profile estimate of the frame of the digital signalif the frame of the digital data is determined not toinclude the speech component.
The method of claim 2 wherein the noise profileestimate determined during said step of determining (600)the noise profile estimate is used during said step ofupdating to update the noise estimate.
The method of claim 1 wherein the look-up tableaccessed during said step of accessing is characterizedby a plurality of entries (C1-CN, F4-FN), each entry ofthe plurality including a separate filter characteristic.
The method of claim 4 wherein the separatefilter characteristics of the plurality of entries of thelook-up table comprise separate high pass filtercharacteristics, each high pass characteristic defined bya separate cut-off frequency (F1_c, F2_c, F3_c, F4_c).
The method of claim 4 wherein the separatefilter characteristics of the plurality of entries of thelook-up table comprise separate high pass filtercharacteristics, each high pass filter characteristicdefined by a separate frequency response curve slope (F1,F2, F3, F4).
The method of claim 1 characterized by thefurther step of incrementing a counter value to count eachframe for which an energy level is estimated during saidstep of estimating.
The method of claim 7 wherein said step ofselecting the filter-circuit filter characteristics isperformed when the counter value is incremented each Nthtime, N forming an integer value greater than one.
An apparatus (100; 200) for selectively alteringa frame of a digital signal formed of a plurality ofsuccessive frames, the digital signal representative ofan audio signal received at a transmitter, the audio signal formed alternately of a speech component, a noisecomponent, and the speech component together with thenoise component, said apparatus comprising:
an energy level estimator (210) coupled toreceive indications of a frame of the digital signal, saidenergy level estimator for estimating an energy level ofthe frame of the digital signal;
a speech detector (240) coupled to said energylevel estimator, said speech detector for determiningwhether the frame of the digital signal includes a speechcomponent;
a noise estimator (230) operable when saidspeech detector determines that a frame does not containa speech component, said noise estimator for updating anoise estimate as a function of a preceding noiseestimate, and the energy level estimated by saidestimator;
a look-up table (FIG. 12) containing a pluralityof entries, each entry indexed against levels of noiseestimates, an entry of said look-up table accessedresponsive to a noise estimate formed by said noiseestimator; and
a filter (265) coupled to receive the frame ofthe digital data, said filter exhibiting selectable filtercharacteristics enabling the filter to exhibit a frequencyresponse curve having variable gain over differentfrequency ranges, selection of the filter characteristicsof the filter determined responsive to the entry of thelook-up table accessed responsive to the noise estimateupdated by said noise estimator.
The apparatus (100; 200) of claim 9 furthercharacterized by a noise profile estimator (270) fordetermining a noise profile estimate of the frame of thedigital data if the frame of the digital data isdetermined by said speech component determiner not toinclude the speech component.