CN108141694B

Movatterモバイル変換

Info

Publication number: CN108141694B
Application number: CN201680058340.7A
Authority: CN
Inventors: 山缪尔·王·帕尔玛·爱贝耐泽尔
Original assignee: Cirrus Logic International Semiconductor Ltd
Current assignee: Cirrus Logic International Semiconductor Ltd
Priority date: 2015-08-07
Filing date: 2016-08-05
Publication date: 2021-03-16
Anticipated expiration: 2036-08-05
Also published as: EP3332558B1; CN108141694A; EP3332558A2

Abstract

According to an embodiment of the present disclosure, a method for processing audio information in an audio device may include reproducing audio information by generating an audio output signal for delivery to at least one transducer of the audio device, receiving at least one input signal representative of ambient sound external to the audio device, detecting near-field sound in the ambient sound from the at least one input signal, and modifying a characteristic of the audio information reproduced to the at least one transducer in response to detecting the near-field sound.

Description

Event detection for playback management in audio devices

Cross reference to related applications

The present disclosure claims priority to U.S. non-provisional patent application serial No. 15/229,429 filed onday 5/8/2016, U.S. non-provisional patent application serial No. 15/229,429 claims priority to U.S. provisional patent application serial No. 62/202,303 filed on day 7/8/2015, U.S. provisional patent application serial No. 62/237,868 filed onday 6/10/2015, and U.S. provisional patent application serial No. 62/351,499 filed onday 17/6/2016, each of which is incorporated herein by reference.

Technical Field

The field of representative embodiments of the present disclosure relates to methods, apparatuses, or implementations relating to or relating to playback management in audio devices. Applications include the detection of certain ambient events, but are not limited to applications related to near-field sound detection, proximity sound detection, and tonal alarm detection using spatial processing based on signals received from multiple microphones.

Background

Personal audio devices have become commonplace and are used in a wide variety of ambient environments. The headphones used in these audio devices have been advanced so that occlusion due to passive or active methods prevents the user from tracking the ambient sound field outside the audio device. Although increased isolation and uninterrupted listening is preferred in most cases, sometimes for security or enhanced user experience it is inevitable that the user hears some specific surrounding event and takes appropriate action in response to the event. For example, if a user is listening to music through his headphones and is interrupted by someone attempting to start talking to him or her, it may be difficult to keep talking unless the user pauses the playback signal or reduces the volume of the playback signal. For example, U.S. patent No. 7,903,825 proposes an audio device in which the playback signal is modified according to the ambient sound field. As another example, U.S. patent No. 8,804,974 teaches ambient event detection in a personal audio device, which can then be used to implement event-based modifications to the played back content. The above references also teach the use of microphones to detect a variety of acoustic events. As another example, U.S. application serial No. 14/324,286, filed 7/2014, teaches the use of a voice detector as an event detector to adjust the playback signal during a conversation. As another example, U.S. patent No. 8,565,446 teaches the use of direction of arrival (DOA) estimates and interference-to-desired (near-field) speech signal ratio estimates from a set of multiple microphones to detect desired speech in the presence of non-stationary background noise to control speech enhancement algorithms in Noise Reduction Echo Cancellation (NREC) systems. Likewise, U.S. application serial No. 13/199,593 teaches that the maximum value of the normalized cross-correlation statistic obtained by cross-correlation analysis of multiple microphones may be an effective discriminator for detecting near-field speech. A music detector based on spectral flatness measure for NREC systems is proposed in us patent No. 8,126,706 to distinguish the presence of background noise from background music. U.S. patent No. 7,903,825, U.S. patent No. 8,804,974, U.S. application serial No. 14/324,286, U.S. patent No. 8,565,446, U.S. application serial No. 13/199,593, and U.S. patent No. 8,126,706 are incorporated herein by reference.

Disclosure of Invention

In accordance with the teachings of the present disclosure, one or more disadvantages and problems associated with existing approaches to event detection for playback management in personal audio devices may be reduced or eliminated.

In accordance with these and other embodiments of the present disclosure, an integrated circuit for implementing at least a portion of an audio device may comprise: an audio output configured to reproduce audio information by generating an audio output signal for delivery to at least one transducer of an audio device; a microphone input configured to receive an input signal representing ambient sound external to the audio device; a processor configured to detect near-field sound in the ambient sound from the input signal, and to modify a characteristic of the audio information in response to detecting the near-field sound.

In accordance with these and other embodiments of the present disclosure, a method for processing audio information in an audio device may include reproducing audio information by generating an audio output signal for delivery to at least one transducer of the audio device, receiving at least one input signal representative of ambient sound external to the audio device, detecting an audio event from the at least one input signal, and modifying a characteristic of the audio information reproduced to the at least one transducer in response to detecting the audio event for at least a predetermined time.

In accordance with these and other embodiments of the present disclosure, an integrated circuit for implementing at least a portion of an audio device may comprise: an audio output configured to reproduce audio information by generating an audio output signal for delivery to at least one transducer of an audio device; a microphone input configured to receive an input signal representing ambient sound external to the audio device; a processor configured to detect an audio event from the input signal, and in response to detecting the audio event for at least a predetermined time, modify a characteristic of audio information reproduced to the at least one transducer.

The technical advantages of the present disclosure will be readily apparent to one skilled in the art from the figures, descriptions, and claims included herein. The objects and advantages of the embodiments will be realized and attained by at least the elements, features, and combinations particularly pointed out in the claims.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the scope of the claims as set forth in the disclosure.

Drawings

A more complete understanding of embodiments of the present invention and certain advantages thereof may be acquired by referring to the following description in consideration with the accompanying drawings, in which like reference numbers indicate like features, and wherein:

fig. 1 illustrates an example of a use case scenario in which such a detector may be used in conjunction with a playback management system to enhance a user experience, in accordance with an embodiment of the present disclosure;

FIG. 2 illustrates an exemplary playback management system that modifies a playback signal based on a decision of an event detector according to an embodiment of the present disclosure;

FIG. 3 illustrates an exemplary event detector according to an embodiment of the present disclosure;

FIG. 4 shows a functional block of a system for obtaining near-field spatial statistics that may be used to detect audio events, according to an embodiment of the present disclosure;

FIG. 5 illustrates exemplary fusion logic for detecting near-field sounds according to an embodiment of the present disclosure;

FIG. 6 illustrates exemplary fusion logic for detecting near sounds in accordance with embodiments of the present disclosure;

FIG. 7 illustrates an embodiment of a close-in voice detector according to an embodiment of the present disclosure;

FIG. 8 illustrates exemplary fusion logic for detecting tone alarm events according to embodiments of the present disclosure;

FIG. 9 illustrates an exemplary timing diagram showing delay and hysteresis logic that may be applied to a transient audio event detection signal to generate a validated audio event signal in accordance with an embodiment of the present disclosure;

FIG. 10 illustrates different audio event detectors with delay and hysteresis logic according to embodiments of the disclosure.

Detailed Description

According to embodiments of the present disclosure, systems and methods are presented that may use at least three different audio event detectors that may be used in an automatic playback management framework. Such audio event detectors of audio devices may include: a near-field detector that can detect when near-field sounds of the audio device are detected, such as when a user of the audio device (e.g., a user wearing or otherwise using the audio device) is speaking; a proximity detector that can detect when a proximity sound of the audio device is detected, such as when another person near a user of the audio device speaks; a tone alarm detector that detects an acoustic alarm that may have occurred in the vicinity of the audio device. Fig. 1 illustrates an example of a use case scenario in which such a detector may be used in conjunction with a playback management system to enhance a user experience, in accordance with an embodiment of the present disclosure.

Fig. 2 illustrates an exemplary playback management system that modifies the playback signal based on the decision of the event detector 2 according to an embodiment of the present disclosure. The signal processing functions in theprocessor 50 may include anacoustic echo canceller 1, whichacoustic echo canceller 1 may cancel acoustic echoes received at amicrophone 52 due to echo coupling between an output audio transducer 51 (e.g., a speaker) and themicrophone 52. The echo reduced signal may be passed to an event detector 2, which event detector 2 may detect one or more different ambient events, including but not limited to near field events detected by a near field detector 3 (e.g., including but not limited to speech from a user of the audio device), near field events detected by a near field detector 4 (e.g., including but not limited to speech or other ambient sounds in addition to near field sounds), and/or tonal alarm events detected by analarm detector 5. If an audio event is detected, the event-basedplayback control 6 may modify the characteristics of the audio information (shown as "playback content" in FIG. 2) reproduced to theoutput audio transducer 51. The audio information may include any information that may be reproduced at theoutput audio transducer 51, including, but not limited to, downlink speech associated with a telephone conversation received via a communication network (e.g., a cellular network) and/or internal audio from an internal audio source (e.g., a music file, a video file, etc.).

Fig. 3 illustrates an example event detector in accordance with an embodiment of the present disclosure. As shown in fig. 3, an exemplary event detector may include avoice activity detector 10, amusic detector 9, a direction of arrival estimator 7, a near-fieldspatial information extractor 8, a background noise soundpressure level estimator 11, and a decisionfusion logic device 12. The decisionfusion logic device 12 uses information from thevoice activity detector 10, themusic detector 9, the direction of arrival estimator 7, the near fieldspatial information extractor 8 and the background noise soundpressure level estimator 11 to detect audio events including, but not limited to, near field sounds, close range sounds other than near field sounds and tone alarms.

Thenear field detector 3 may detect near field sounds including voices. When such near-field sounds are detected, it may be desirable to modify the audio information reproduced to theoutput audio transducer 51, since the detection of near-field sounds may indicate that the user is participating in a conversation. Such near-field detection may need to be able to detect near-field sounds in noisy sound conditions and accommodate false detection of near-field sounds in very diverse background noise conditions (e.g., background noise in restaurants, noise while driving a car, etc.). As explained in more detail below, near field detection may require spatial sound processing usingmultiple microphones 51. In some embodiments, such near field sound detection may be implemented in the same or similar manner as described in U.S. patent No. 8,565,446 and/or U.S. application serial No. 13/199,593.

The proximity detector 4 may detect ambient sounds other than near-field sounds (e.g., speech from a person near the user, background music, etc.). As explained in more detail below, because it may be difficult to distinguish near sounds from non-stationary background noise and background music, the near detector may utilize the music detector and noise sound pressure level estimation to disable near detection by the near detector 4 to avoid poor user experience due to false detection of near sounds. In some embodiments, such close proximity sound detection may be accomplished in the same or similar manner as described in U.S. patent No. 8,126,706, U.S. patent No. 8,565,446, and/or U.S. application serial No. 13/199,593.

Thetone alarm detector 5 may detect a tone alarm (e.g., siren) near the audio device. To provide the maximum user experience, it may be desirable for thetonal alarm detector 5 to ignore certain alarms (e.g., weak or low volume alarms). As described in more detail below, tone alarm detection may require spatial sound processing usingmultiple microphones 51. In some embodiments, such tone alarm detection may be accomplished in the same or similar manner as described in U.S. patent No. 8,126,706 and/or U.S. application serial No. 13/199,593.

FIG. 4 shows functional blocks of a system for obtaining near-field spatial statistics that may be used to detect audio events, according to an embodiment of the present disclosure. Soundpressure level analysis 41 may be performed onmicrophone 52 by estimating an inter-microphone sound pressure level difference between the near and far microphones (imd) (e.g., as described in U.S. application serial No. 13/199,593).Cross-correlation analysis 13 may be performed on the signals received bymicrophone 52 to obtain direction of arrival information DOA of ambient sound impinging on microphone 52 (e.g., as described in U.S. patent No. 8,565,446). In thecross-correlation analysis 13, a maximum normalized correlation value norm maxcorr (e.g., as described in U.S. application serial No. 13/199,593) may also be obtained. Thevoice activity detector 10 may detect the presence of speech and generate a signal speechDet indicative of the presence or absence of speech in the ambient sound (e.g., as described in the probabilistic based speech presence/absence method of U.S. patent No. 7,492,889). Thebeamformer 15 may generate near-field signal estimates and interference signal estimates based on the signals from themicrophones 52, which may be used by thenoise analysis 14 to determine the noise sound pressure level noiseLevel and the interference-to-near-field signal ratio idr in the ambient sound. Us patent No. 8,565,446 describes an example method of estimating the interference-to-near-field signal ratio idr using a pair ofbeamformers 15. The voice activity detector 36 may use the interference estimate to detect any voice signals that do not originate from the desired signal direction (prox spechdet). As long as the direction of arrival estimate DOA of the ambient sound is outside the acceptance angle of the near-field sound, thenoise analysis 14 may be performed by updating the interfering signal energy based on the direction of arrival estimate DOA. The direction of arrival of near-field sound may be known a priori for a given microphone array configuration in the industrial design of a personal audio device.

The presence of near-field sound may then be detected using a variety of statistics generated by the system of fig. 4. Fig. 5 illustrates exemplary fusion logic for detecting near-field sounds according to embodiments of the present disclosure. As shown in fig. 5, near-field speech may be detected when all of the following criteria are met:

the direction of arrival estimate DOA of the ambient sound is within the acceptance angle of the near-field sound (block 16);

the maximum normalized cross-correlation statistic norm maxcorr is greater than the threshold norm maxcorrthres1 (block 17);

the interference-to-near-field desired signal ratio idr is less than the threshold idrThres1 (block 18);

voice activity is detected, as represented by the signal speeddet (block 19);

the inter-microphone pressure level difference statistic imd is greater than the threshold imdTh (block 42).

In some embodiments, the thresholds idrThres and imdTh may be dynamically adjusted based on the background noise sound pressure level estimate.

The close-in detection by the close-in detector 4 may differ from the near-field sound detection by the near-field detector 3, because the signal characteristics of close-in speech may be very similar to surrounding signals such as music and noise. Therefore, the proximity detector 4 must avoid false detection of near speech to achieve an acceptable user experience. Thus, as long as there is music in the background, themusic detector 9 can be used to disable close-range detection. Likewise, the close-range detector 4 may be disabled as long as the background noise sound pressure level is above a certain threshold. The background noise threshold may be determined a priori such that false detections below the threshold sound pressure level are very unlikely. Fig. 6 illustrates exemplary fusion logic for detecting near sounds (e.g., speech) in accordance with embodiments of the present disclosure. Furthermore, there may be many sources of ambient noise that produce transient acoustic stimuli. These noise types may be erroneously detected as voice signals by the voice detector. To reduce the likelihood of false detections, Spectral Flatness Measure (SFM) statistics from themusic detector 9 may be used to distinguish speech from transient noise. For example, the SFM may be tracked over a period of time and the difference between the maximum SFM value and the minimum SFM value over the same period of time may be calculated, the difference being defined as sfmSwing. The value of sfmSwing may typically be small for transient noise signals because the spectral content of these signals is broad-band and they tend to level out over short time intervals (300ms-500 ms). The value of sfmSwing may be higher for a voice signal because the spectral content of the voice signal may change faster than the transient signal. As shown in fig. 6, a near sound (e.g., speech) may be detected when all of the following criteria are met:

no music detected in the background (block 20);

the direction of arrival estimate DOA is within the acceptance angle of the near sound (block 21);

the maximum normalized cross-correlation statistic norm maxcorr is greater than the threshold norm maxcorrthres2 (block 22);

the background noise sound pressure level noiseLevel is below the threshold noiseLevel th (block 23);

detection of near speech activity, as represented by the signal proxSpeechDet (block 19);

SFM change statistic sfmSwing greater than threshold sfmSwing th (block 37);

the interference-to-near-field desired signal ratio idr is greater than a threshold idrThres2 (block 40);

the inter-microphone pressure level difference statistic imd is close to 0dB (block 43).

In some embodiments, themusic detector 9 used to detect the presence of background music may be implemented using a music detector as taught in U.S. patent No. 8,126,706. Another embodiment of a near speech detector according to an embodiment of the present disclosure is shown in fig. 7. According to the present embodiment, a close-up voice can be detected if the following conditions are satisfied.

The interference-to-near-field desired signal ratio idr is greater than a threshold idrThres2 (block 39);

detecting near voice activity (block 27);

the maximum normalized cross-correlation statistic norm maxcorr is greater than the threshold norm maxcorrthres3 (block 28);

the direction of arrival estimate DOA is within the acceptance angle of the near sound (block 29);

no music detected in the background (block 30);

the presence of low or medium sound pressure level background noise or the absence of background noise (block 31). This condition is verified by comparing the estimated background noise sound pressure level with a threshold noiseLevelThLo. If a low noise sound pressure level is detected, the following two conditions are also tested to confirm the presence of near speech:

SFM change statistic sfmSwing greater than threshold sfmSwing th (block 38);

the inter-microphone pressure level difference statistic imd is close to 0dB (block 44).

If the above-described background noise pressure level condition is not met atblock 31, then the following condition may indicate a near voice to improve the detection rate of near voice without increasing the occurrence of false alarms (e.g., due to background noise conditions):

there is a stationary background noise (block 32). Stationary background noise may be detected by calculating the peak-to-root mean square ratio of the SFM generated by the music detector (block 9) over a period of time. In particular, if the above ratio is high, non-stationary noise may be present because the spectral flatness measure of non-stationary noise tends to vary faster than stationary noise;

there is a high noise sound pressure level (block 32). A high noise condition may be detected if the estimated background noise is greater than the threshold noiseLevelLo and less than the threshold noiseLevelHi. If the stationary noise and direction of arrival conditions are not met atblock 32, then the presence of the following set of two conditions may indicate the presence of near speech:

there are close talking close talkers (block 33). A close-talking close-talker may be detected when the maximum normalized cross-correlation statistic normmaxcorrr is greater than a threshold normMaxCorrThres4 (the threshold normMaxCorrThres4 may be greater than normMaxCorrThres3 to indicate the presence of an close-talking talker);

the presence or absence of low or medium or high sound pressure level background noise (block 34). This condition may be detected if the estimated background noise sound pressure level is less than the threshold noiseLevelThHi.

If the above direction-of-arrival condition is not met atblock 29, then the presence of the following condition may indicate near speech:

music is not present (block 35);

Thetonal alarm detector 5 may be configured to detect tonal alarm signals, where the acoustic bandwidth of such alarm signals is also narrow (e.g., siren, beep). In some embodiments, the pitch of the ambient sound may be detected by dividing the time domain signal into a plurality of sub-bands by time-frequency transformation, and a spectral flatness measure, shown in fig. 6 as the signal sfm [ ] generated by themusic detector 9, may be calculated in each sub-band. The spectral flatness measure sfm can be estimated for all sub-bands, and a tone alarm can be detected if the spectrum is flat in most but not all sub-bands. Furthermore, in a playback management system, it may not be necessary to detect far-field alarm signals. Thus, the near fieldspatial statistics 8 of FIG. 3 may be used to distinguish far field alarm signals from near field signals. Fig. 8 illustrates exemplary fusion logic for detecting tone alarm events (e.g., siren, beep), in accordance with embodiments of the present disclosure. As shown in FIG. 8, a tone alarm event may be detected when all of the following criteria are met:

the direction of arrival estimate DOA is within the acceptance angle of the alarm signal (block 24);

the maximum normalized cross-correlation statistic norm maxcorr is greater than the threshold norm maxcorrthres5 (block 25);

the spectral flatness measure sfm [ ] indicates that the noise spectrum is flat in most but not all sub-bands (block 26).

In fact, the transient audio event detections of thenear field detector 3, the proximity detector 4 and thetone alert detector 5 as shown in fig. 5, 6,7 and 8 may represent false audio events. Therefore, it may be desirable to verify the transient audio event detection signal before passing it to theplayback control 6. FIG. 9 illustrates an exemplary timing diagram showing delay and hysteresis logic that may be applied to a transient audio event detection signal to generate a validated audio event signal, according to an embodiment of the disclosure. As shown in fig. 9, in response to the instantaneous detection of an audio event (e.g., near-field sound, tonal alarm event) lasting at least a predetermined time, the delay logic may generate a validated audio event signal, while the hysteresis logic may continue to assert the validated audio event signal until the instantaneous detection of the audio event has ceased for a second predetermined time.

The following pseudo-code may demonstrate the application of delay and hysteresis logic to reduce false detection of audio events, according to embodiments of the present disclosure.

/*If the instant.detect is true,increment the hold off counter and reset the hang over counter*/

If(instDet＝＝TRUE)

{

holdOffCntr＝holdOffCntr+1；

hangOverCntr＝0；

}

/*If the instant.detect is false,increment the hang over counter and reset the hold off counter*/

else

{

hangOverCntr＝hangOverCntr+1；

/*Valid detect will transition to true state if the instant.detect is continuously true for certain time and the previous valid detect is false*/if(holdOffCntr>holdOffThres&&validDet＝＝FALSE)

/*Valid NF detect will transition to false state if the instant.NF detect is continuously false for certain time and the previous valid NF detect is true*/

If(hangOverCntr>hangOverThres&&validDet＝＝TRUE)

The verified event may be further verified before generating the playback mode switching control. For example, the following pseudo-code may demonstrate the application of delay and hysteresis logic for gracefully switching between a talk mode (e.g., where audio information reproduced to theoutput audio transducer 51 may be modified in response to an audio event) and a normal playback mode (e.g., where audio information reproduced to theoutput audio transducer 51 is unmodified).

/***********************************

*Conversational Mode Enter Logic*

***********************************/

/*Increment the time to enter conversational mode counter if the event detect is true and the mode is not in the conversational mode.If the counter exceeds the threshold,switch to conversational mode and reset the counters.Note that the event detect need not be true contiguously.*/if(convModeEn＝＝FALSE&&validDet＝＝TRUE)

{

timeToEnterConvModeCntr＝timeToEnterConvModeCntr+1；

if(timeToEnterConvModeCntr>timeToEnterConvModeThres)

{

convModeEn＝TRUE；

timeToEnterConvModeCntr＝0；

timeToExitConvModeCntr＝0；

}

/***********************************

*Conversational Mode Exit Logic*

***********************************/

/*Increment the time to exit conversational mode counter if the event detect is false and the mode is in the conversational mode.If the counter exceeds the threshold,switch to normal mode and reset the counters.

Note that the event detect must be false contiguously.*/

if(convModeEn＝＝TRUE&&validDet＝＝FALSE)

{

timeToExitConvModeCntr++；

if(timeToExitConvModeCntr>timeToExitConvModeThres)

{

convModeEn＝FALSE；

timeToEnterConvModeCntr＝0；

timeToExitConvModeCntr＝0；

}

else

{

timeToExitConvModeCntr＝0；

}

FIG. 10 illustrates different audio event detectors with delay and hysteresis logic according to embodiments of the disclosure. The delay period and/or the hysteresis period of the respective detectors may be set differently. In addition, in some embodiments, playback management may be controlled differently based on the type of event detected. In these and other embodiments, as shown in fig. 9, the playback gain (and thus the audio information reproduced at the output audio transducer 51) may be attenuated whenever one or more of the audio events are detected. In these and other embodiments, to provide smooth gain transitions, the playback gain may be smoothed using a first order exponential averaging filter represented by the following pseudocode:

if(convModeEn＝＝TRUE)

{

playBackGain＝(1-alpha)*convModeGain+alpha*playBackGain

}

else

{

playBackGain＝(1-beta)*normalModeGain+beta*playBackGain

}

the smoothing parameters a and β may be set to different values to adjust the gain slope.

It should be understood that various operations described herein, particularly in conjunction with the figures, may be implemented by other circuits or other hardware components, particularly by those of ordinary skill in the art having the benefit of this disclosure. The order in which the various operations of a given method are performed can be varied, and various elements of the systems illustrated herein can be added, reordered, combined, omitted, modified, etc. The disclosure is intended to embrace all such modifications and changes, and therefore the above description should be taken as illustrative and not restrictive.

Likewise, although the present disclosure makes reference to specific embodiments, certain modifications and changes may be made to these embodiments without departing from the scope of the present disclosure. Furthermore, any benefits, advantages, or solutions to problems that are described herein with regard to specific embodiments are not intended to be construed as a critical, required, or essential feature or element.

Further embodiments will likewise be apparent to those of ordinary skill in the art, given the benefit of this disclosure, and such embodiments should be considered to be encompassed herein.

Claims

1. A method for processing audio information in an audio device, the method comprising:

receiving a first signal representing audio information;

generating, based on the first signal, an audio output signal for delivery to at least one transducer of the audio device;

causing the at least one transducer to generate sound from the audio output signal;

receiving at least one input signal representing ambient sound external to the audio device;

determining a near-field spatial statistic of the ambient sound;

detecting near-field sound and near-range sound in surrounding sound according to the at least one input signal;

modifying a characteristic of the audio output signal in response to detecting near-field sound;

causing the at least one transducer to generate a modified sound from the modified audio output signal;

close-in detection by the close-in detector is disabled using the music detector and the noise sound pressure level estimation.

2. The method of claim 1, further comprising determining a direction of ambient sound from the at least one input signal and modifying a characteristic of the audio output signal in response to the direction of ambient sound indicating that the ambient sound is sound from a user of the audio device.

3. The method of claim 1, further comprising determining a direction of ambient sound from the at least one input signal and modifying a characteristic of the audio output signal in response to the direction of ambient sound indicating that the ambient sound is speech from a user of the audio device.

4. The method of claim 1, wherein modifying a characteristic of the audio output signal comprises attenuating the audio information.

5. The method of claim 1, further comprising modifying a characteristic of the audio output signal in response to detecting near-field sound for at least a predetermined time.

6. The method of claim 5, further comprising:

detecting an absence of near-field sound in ambient sound from the at least one input signal;

in response to the absence of near-field sound for at least a second predetermined time, ceasing to modify the characteristic of the audio output signal.

7. The method of claim 1, further comprising:

detecting ambient sound other than near-field sound among the ambient sound based on the at least one input signal in addition to the near-field sound;

in response to detecting ambient sound, a characteristic of the audio output signal is modified.

8. The method of claim 7, further comprising determining a direction of ambient sound from the at least one input signal and modifying a characteristic of the audio output signal in response to the direction of ambient sound indicating that the ambient sound is a sound other than near-field sound.

9. The method of claim 7, further comprising:

detecting whether ambient sound includes background noise based on the at least one input signal;

in response to detecting background noise in the ambient sound, modifying a characteristic of the audio output signal.

10. The method of claim 7, further comprising:

detecting whether ambient sound includes a tonal alarm based on the at least one input signal;

modifying a characteristic of the audio output signal in response to detecting a tonal alarm in the ambient sound.

11. The method of claim 10, wherein detecting a tonal alarm in ambient sound comprises:

detecting a direction of an ambient sound from the at least one input signal;

detecting a spectral flatness measure of the ambient sound from the at least one input signal; and

a tonal alarm is detected based on the direction of ambient sound, the presence or absence of background noise, and near-field spatial statistics.

12. The method of claim 11, wherein:

the at least one input signal includes a first microphone signal representing ambient sound at a first microphone and a second microphone signal representing ambient sound at a second microphone;

the near-field spatial statistics include a correlation between the first microphone signal and the second microphone signal.

13. The method of claim 11, wherein detecting the direction of the ambient sound comprises determining whether the direction of the ambient sound is within an acceptance angle of the near-field sound.

14. The method of claim 11, wherein detecting near-field spatial statistics comprises detecting whether a normalized cross-correlation statistic is greater than a threshold.

15. The method of claim 11, wherein detecting a spectral flatness measure of the ambient sound comprises detecting whether a noise spectrum is flat in most, but not all, of the sub-bands of the ambient sound.

16. The method of claim 1, wherein detecting near-field sounds in ambient sounds comprises:

detecting a direction of an ambient sound from the at least one input signal;

detecting a presence of speech in ambient sound from the at least one input signal; and

near-field sound is detected based on the direction, the presence or absence of speech, and near-field spatial statistics.

17. The method of claim 16, wherein:

18. The method of claim 16, wherein:

the near-field spatial statistics include a disturbance-to-signal ratio associated with near-field sound.

19. The method of claim 16, wherein:

the near-field spatial statistics include an inter-microphone sound pressure level difference between the first microphone signal and the second microphone signal.

20. The method of claim 16, wherein detecting the direction of the ambient sound comprises determining whether the direction of the ambient sound is within an acceptance angle of the near-field sound.

21. The method of claim 16, wherein detecting near-field spatial statistics comprises: detecting whether the normalized cross-correlation statistic is greater than a first threshold;

detecting whether the interference-to-near-field desired signal ratio is less than a second threshold;

it is detected whether the inter-microphone sound pressure level difference is larger than a third threshold.

22. The method of claim 21, wherein the second threshold is adjusted based on an estimate of background noise in ambient sound.

23. The method of claim 21, wherein the third threshold is adjusted based on an estimate of background noise in ambient sound.

24. The method of claim 1, further comprising:

detecting a direction of an ambient sound from the at least one input signal;

detecting the presence of background noise in ambient sound from the at least one input signal;

detecting the presence of near speech in the ambient sound from the at least one input signal;

detecting a volume of ambient sound based on the at least one input signal;

detecting the presence of an audio event comprising a near-sound event based on the direction, the presence or absence of background noise, the presence or absence of speech, the volume, and near-field spatial statistics; and

modifying a characteristic of the audio output signal in response to the detection of the presence of the audio event.

25. The method of claim 24, further comprising:

detecting a change in a spectral component of the ambient sound;

the presence of audio events, including near-sound events, is detected based on direction, presence or absence of background noise, presence or absence of speech, volume, near-field spatial statistics, and spectral content of ambient sound.

26. The method of claim 25, wherein:

27. The method of claim 25, wherein:

28. The method of claim 25, wherein:

29. The method of claim 25, wherein detecting the presence of near speech in ambient sound comprises detecting stationary background noise.

30. The method of claim 25, wherein detecting the presence of near speech in ambient sound comprises detecting speech from a close-talking near talker.

31. The method of claim 25, wherein detecting the presence of near speech in the ambient sound comprises detecting a spectral flatness measure of the ambient sound from the at least one input signal, wherein detecting the spectral flatness measure of the ambient sound comprises detecting a spectral component change of the ambient sound.

32. An integrated circuit for implementing at least a portion of an audio device, the integrated circuit comprising:

an input configured to receive a first signal representing audio information;

an audio output configured to generate an audio output signal for delivery to at least one transducer of the audio device based on the first signal, the audio output operative to cause the at least one transducer to generate sound from the audio output signal;

a microphone input configured to receive at least one input signal representative of ambient sound external to the audio device; and

a processor configured to:

determining a near-field spatial statistic of the ambient sound;

modifying a characteristic of the audio output information in response to detecting near-field sound;

33. The integrated circuit of claim 32, the processor further configured to:

determining a direction of the ambient sound based on the at least one input signal;

modifying a characteristic of the audio output signal in response to the direction of the ambient sound indicating that the ambient sound is sound from a user of the audio device.

34. The integrated circuit of claim 32, the processor further configured to:

modifying a characteristic of the audio output signal in response to the direction of the ambient sound indicating that the ambient sound is speech from a user of the audio device.

35. The integrated circuit of claim 32, wherein modifying a characteristic of the audio output signal comprises attenuating the audio information.

36. The integrated circuit of claim 32, the processor further configured to modify a characteristic of the audio output signal in response to detecting near-field sound for at least a predetermined time.

37. The integrated circuit of claim 36, the processor further configured to:

38. The integrated circuit of claim 36, the processor further configured to:

39. The integrated circuit of claim 38, the processor further configured to:

modifying a characteristic of the audio output signal in response to the direction of the ambient sound indicating that the ambient sound is a sound other than near-field sound.

40. The integrated circuit of claim 38, the processor further configured to:

41. The integrated circuit of claim 38, the processor further configured to:

42. The integrated circuit of claim 41, wherein detecting a tonal alarm in ambient sound comprises:

detecting a direction of an ambient sound from the at least one input signal;

detecting a spectral flatness measure of the ambient sound from the at least one input signal;

43. The integrated circuit of claim 41, wherein:

44. The integrated circuit of claim 42, wherein detecting the direction of the ambient sound comprises determining whether the direction of the ambient sound is within an acceptance angle of the near-field sound.

45. The integrated circuit of claim 42, wherein detecting near-field spatial statistics comprises detecting whether a normalized cross-correlation statistic is greater than a threshold.

46. The integrated circuit of claim 42, wherein detecting a spectral flatness measure of the ambient sound comprises detecting whether a noise spectrum is flat in most, but not all, of the sub-bands of the ambient sound.

47. The integrated circuit of claim 32, wherein detecting near-field sounds in ambient sounds comprises:

detecting a direction of an ambient sound from the at least one input signal;

detecting a presence of speech in ambient sound from the at least one input signal;

48. The integrated circuit of claim 47, wherein:

49. The integrated circuit of claim 47, wherein:

50. The integrated circuit of claim 47, wherein:

51. The integrated circuit of claim 47, wherein detecting the direction of the ambient sound comprises determining whether the direction of the ambient sound is within an acceptance angle of the near-field sound.

52. The integrated circuit of claim 47, wherein detecting near-field spatial statistics comprises:

detecting whether the normalized cross-correlation statistic is greater than a first threshold;

53. The integrated circuit of claim 52, wherein the second threshold is adjusted based on an estimate of background noise in ambient sound.

54. The integrated circuit of claim 52, wherein the third threshold is adjusted based on an estimate of background noise in ambient sound.

55. The integrated circuit of claim 32, the processor further configured to:

detecting a direction of an ambient sound from the at least one input signal;

detecting a volume of ambient sound based on the at least one input signal;

56. The integrated circuit of claim 32, the processor further configured to:

detecting a change in a spectral component of the ambient sound;

57. The integrated circuit of claim 56, wherein:

58. The integrated circuit of claim 56, wherein:

59. The integrated circuit of claim 56, wherein:

60. The integrated circuit of claim 56, wherein detecting the presence of near speech in ambient sound comprises detecting stationary background noise.

61. The integrated circuit of claim 56, wherein detecting the presence of near speech in ambient sound comprises detecting speech from a close-talking near talker.

62. The integrated circuit of claim 56, wherein detecting the presence of near speech in the ambient sound comprises detecting a spectral flatness measure of the ambient sound from the at least one input signal, wherein detecting a spectral flatness measure of the ambient sound comprises detecting a change in a spectral composition of the ambient sound.

63. A method for processing audio information in an audio device, the method comprising:

receiving a first signal representing audio information;

determining a near-field spatial statistic of the ambient sound;

detecting an audio event comprising a close-in sound from the at least one input signal;

modifying a characteristic of the audio output signal in response to detecting an audio event for at least a predetermined time;

64. The method of claim 63, further comprising ceasing to modify the characteristic of the audio information in response to the absence of an audio event for at least a second predetermined time.

65. The method of claim 63, wherein the audio event comprises at least one of a near-field event, a near event, and an alarm event.

66. An integrated circuit for implementing at least a portion of an audio device, the integrated circuit comprising:

an input configured to receive a first signal representing audio information;

a processor configured to:

determining a near-field spatial statistic of the ambient sound;

detecting an audio event comprising a close-up sound from the input signal;

67. The integrated circuit of claim 66, the processor further configured to stop modifying the characteristic of the audio output signal in response to the absence of an audio event for at least a second predetermined time.

68. The integrated circuit of claim 66, wherein the audio event comprises at least one of a near-field event, a near event, and an alarm event.