US20060100870A1

Movatterモバイル変換

Info

Publication number: US20060100870A1
Application number: US11/249,073
Authority: US
Inventors: Toshiya Shikano; Tatsuya Kyomitsu
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2004-10-25
Filing date: 2005-10-11
Publication date: 2006-05-11
Also published as: US7684983B2; JP4097219B2; JP2006119520A

Abstract

When a voice input is detected as being applied to a directional microphone, sounds output from selected loudspeakers from among a plurality of loudspeakers, which otherwise would obstruct the speech recognition process, are attenuated, and sound signals supplied to the selected loudspeakers are combined with sound signals supplied to the other loudspeakers, and the combined sound signals are supplied to the other loudspeakers.

Description

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a speech recognition apparatus for recognizing the voice of a speaker entered through a voice input means, such as a microphone or the like, and more particularly to a speech recognition apparatus suitable for use in controlling a vehicle-mounted electronic unit based on speech recognition technology. In the speech recognition system, the speech recognition rate of speech, which exists together with background car audio sounds, is not lowered, by employing a microphone with high directivity. The present invention also concerns a vehicle incorporating such a speech recognition apparatus.

2. Description of the Related Art

Modern vehicles are equipped with many electronic units, which are constantly growing to provide higher functionality. Under such circumstances, speech recognition apparatuses have been proposed for generating remote control commands based on speech recognition, in order to easily control various electronic devices such as a navigation system, an audio system, or an air-conditioning system, which are incorporated in the vehicles.

As has been pointed out, when the voice of a speaker is input to a microphone of a speech recognition apparatus in a vehicle that is also equipped with an audio system, sounds from loudspeakers of the audio system are also input to the microphone, thereby lowering the speech recognition rate of the speech recognition apparatus (see, Japanese Laid-Open Patent Publication No. 2000-132200).

According to the audio/video system disclosed in Japanese Laid-Open Patent Publication No. 2000-132200, the above problem is solved by providing two loudspeakers mounted in a front region of the vehicle and two loudspeakers mounted in a rear region of the vehicle, and further wherein the speech recognition apparatus incorporated in the audio/video system includes a microphone embedded in the steering wheel of the vehicle. When the speech recognition apparatus operates to recognize the voice of the speaker, music sounds output from the two loudspeakers in the front region of the vehicle are attenuated within a speech frequency band, because they produce an adverse effect on the microphone in the steering wheel, and during speech recognition music sounds are radiated primarily from the two loudspeakers located in the rear region of the vehicle.

However, the disclosed speech recognition apparatus has the following disadvantage. When the audio/video system operates such that main vocal sounds are radiated only from the two loudspeakers, (e.g., left and right loudspeakers) in the front region of the vehicle, and main melody sounds are radiated only from the two loudspeakers in the rear region, the main vocal sounds become muted in the speech frequency band by the speech recognition apparatus, thereby developing an unnatural music sound environment in the passenger compartment of the vehicle. Conversely, when main melody sounds are radiated only from the two loudspeakers in the front region of the vehicle, and main vocal sounds are radiated only from the two loudspeakers in the rear region, the main melody sounds become muted in the speech frequency band by the speech recognition apparatus, thus also developing an unnatural music sound environment in the passenger compartment of the vehicle.

If only two loudspeakers, e.g., left and right, for stereophonic sound reproduction are installed in the passenger compartment of the vehicle, and the microphone embedded in the steering wheel has a directivity pattern covering the driver's seat, then only sounds radiated from the loudspeaker near the driver's seat may be muted by the speech recognition apparatus. However, this type of sound muting also develops an unnatural music sound environment in the passenger compartment.

SUMMARY OF THE INVENTION

It is therefore an object of the present invention to provide a speech recognition apparatus, which is capable of highly accurate speech recognition in a natural and pleasant sound environment, even when the sounds output from at least one of a plurality of loudspeakers used in combination with the speech recognition apparatus are attenuated.

Another object of the present invention is to provide a speech recognition apparatus, which is capable of highly accurate speech recognition in a natural and pleasant sound environment, even if the sounds output from one of left and right loudspeakers are attenuated when the loudspeakers are radiating different types of sounds, e.g., one of the loudspeakers emits melody sounds and the other vocal sounds.

Still another object of the present invention is to provide a speech recognition apparatus, which is capable of highly accurate speech recognition in a natural and pleasant sound environment, even when the sounds are output from loudspeakers used in a 5.1-channel surround system.

Yet another object of the present invention is to provide a vehicle incorporating therein a speech recognition apparatus according to the present invention.

According to the present invention, there is provided a speech recognition apparatus comprising a directional voice input unit for inputting the voice from a speaker, a voice input state detector for detecting a voice input state in which the voice from the speaker is input to the directional voice input unit, a speech recognizer for recognizing the voice input from the directional voice input unit and outputting a command corresponding to the recognized voice, a sound output unit for outputting sound signals in a plurality of channels to corresponding loudspeakers, a sound output attenuator for attenuating sounds output from selected loudspeakers which would otherwise obstruct the speech recognition process performed by the speech recognizer when the voice input state detector detects the voice input state, and a combined sound generator for combining the sound signals output to the selected loudspeakers whose output sounds are attenuated, with the sound signals output to the other loudspeakers whose output sounds are not attenuated, thereby producing a combined sound signal to generate a combined sound.

With the above arrangement, when the voice input state detector detects the voice input state in which the voice from the speaker is input to the directional voice input unit, the sound output attenuator attenuates sounds output from selected loudspeakers which otherwise would obstruct the speech recognition process performed by the speech recognizer, and the combined sound generator combines the sound signals output to the selected loudspeakers whose output sounds are attenuated, with the sound signals output to the other loudspeakers whose output sounds are not attenuated, thereby producing a combined sound signal to generate a combined sound. Since the speech recognition rate of the speech recognition process is increased simply by suppressing the sounds output from a minimum number of speakers during the speech recognition process, and the sound signals output to the selected loudspeakers whose output sounds are attenuated are combined with the sound signals output to the other loudspeakers whose output sounds are not attenuated, producing a combined sound signal to generate a combined sound, the speech recognition process can be performed highly accurately in a natural and pleasant sound environment.

For example, one of two loudspeakers for stereophonic sound reproduction is used to output melody sounds and the other to output vocal sounds. Even if the sound (melody or vocal) output from one of the two loudspeakers is attenuated, the other loudspeaker outputs the combined sounds, i.e., the melody and vocal sounds, together. Consequently, the speech recognition process can be performed highly accurately in a natural and pleasant sound environment.

The directional voice input unit may be either a directional microphone or a microphone array that is capable of changing its directivity or providing directivity in a plurality of directions. The voice input state detector may be a speech switch, for allowing the speaker to input his or her voice to the voice input unit while the speech switch is operated, or during a certain period of time, e.g., several seconds, after the speech switch has been operated. The speech switch is located within a range that can be reached by the speaker. For example, the speech switch may be mounted on the steering wheel of a vehicle. The voice input state detector may be located near the driver's seat or near a passenger seat, such as a front passenger seat, in the passenger compartment of the vehicle.

If the voice input state detector comprises speech switches disposed near the driver's seat and the front passenger seat, respectively, then directional microphones may be disposed near both the driver's seat and the front passenger seat, respectively, or a microphone array may have its directivity oriented toward one of the speech switches which is operated, i.e., the speech switch near the driver's seat or near the front passenger seat. Sounds from loudspeakers that are not included in the directivity pattern may be attenuated, and the sound signals supplied to the loudspeakers whose output sounds are attenuated may be combined with the sound signals supplied to the loudspeakers whose output sounds are not attenuated.

According to the present invention, there is also provided a speech recognition apparatus comprising a microphone array for inputting the voice from a speaker and outputting a delay sum output signal from a speaker on the driver's seat in a vehicle and a delay sum output signal from a speaker on a passenger seat in the vehicle, a voice input state detector for detecting a voice input state in which the voice from the speaker on the driver's seat is input to the microphone array when the delay sum output signal from the speaker on the driver's seat reaches a predetermined level, and detecting a voice input state in which the voice from the speaker on the passenger seat is input to the microphone array when the delay sum output signal from the speaker on the passenger seat reaches a predetermined level, a speech recognizer for recognizing the voice input from the microphone array and outputting a command corresponding to the recognized voice, a sound output unit for outputting sound signals in a plurality of channels to corresponding loudspeakers, a sound output attenuator for attenuating sounds output from selected loudspeakers which otherwise would obstruct a speech recognition process performed by the speech recognizer when the voice input state detector detects the voice input state, and a combined sound generator for combining the sound signals output to the selected loudspeakers whose output sounds are attenuated, with the sound signals output to the other loudspeakers whose output sounds are not attenuated, thereby producing a combined sound signal to generate a combined sound.

With the above arrangement, when the voice input state detecting unit detects the voice input state in which the voice from the speaker on the driver's seat or the speaker on the passenger seat is input to the microphone array, the sound output attenuator attenuates sounds output from selected loudspeakers which otherwise would obstruct a speech recognition process performed by the speech recognizer, and the combined sound generator combines the sound signals output to the selected loudspeakers whose output sounds are attenuated, with the sound signals output to the other loudspeakers whose output sounds are not attenuated, thereby producing a combined sound signal to generate a combined sound. Since the speech recognition rate of the speech recognition process is increased simply by suppressing the sounds output from a minimum number of speakers during the speech recognition process, and the sound signals output to the selected loudspeakers whose output sounds are attenuated are combined with the sound signals output to the other loudspeakers whose output sounds are not attenuated, producing a combined sound signal to generate a combined sound, the speech recognition process can be performed highly accurately in a natural and pleasant sound environment.

The present invention is particularly suitable when applied to a surround system having at least front, rear, left, and right independent channels. For example, a 5.1-channel surround system has a single central loudspeaker, two main loudspeakers, two rear loudspeakers, and a single superwoofer making up the 0.1 channel, each outputting different sounds. If the sound output from one of these loudspeakers is attenuated, e.g., if the main melody sound is attenuated, then an unnatural music sound environment is developed. According to the present invention, even if the sound output from one of the loudspeakers is attenuated, the sound signal supplied to the loudspeaker whose output sound is attenuated is combined with a sound signal supplied to another loudspeaker. Consequently, the speech recognition process can be performed highly accurately in a natural and pleasant sound environment. The surround system having at least front, rear, left, and right independent channels may be a 5.1-channel surround system, a 6.1-channel surround system, or a 7.1-channel surround system.

The sound output attenuator may attenuate only sounds in a frequency range which is used during the speech recognition process performed by the speech recognizer. For example, only sound signals in a middle frequency range, which would tend to obstruct the speech recognition process, may be combined with the sound signal for another loudspeaker, and sounds in lower and higher frequency ranges, outside of the frequency range of the voice input unit, may be output unattenuated from the loudspeaker whose output sound in the middle frequency range is attenuated. Consequently, the speech recognition process can be performed highly accurately in a more natural and pleasant sound environment.

A vehicle according to the present invention incorporates therein either one of the above speech recognition apparatuses.

According to the present invention, when the speech recognition process is performed to recognize the voice of the speaker, while attenuating sounds output from certain ones of the plurality of loudspeakers, sound signals supplied to those loudspeakers are combined with sound signals supplied to other loudspeakers, and combined sound signals are supplied to the other loudspeakers. Therefore, the speech recognition process can be performed highly accurately in a natural and pleasant sound environment.

The above and other objects, features, and advantages of the present invention will become more apparent from the following description when taken in conjunction with the accompanying drawings in which preferred embodiments of the present invention are shown by way of illustrative example.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention;

FIG. 2 is a schematic plan view of the passenger compartment of a vehicle that incorporates therein the speech recognition apparatus shown inFIG. 1;

FIG. 3 is a block diagram of a sound output device together with a basic arrangement of a sound processor in the speech recognition apparatus;

FIG. 4 is a schematic plan view of the passenger compartment of a vehicle in which a directional microphone and a voice input state detecting means are disposed in front of each of a driver's seat and a front passenger seat in the passenger compartment;

FIG. 5 is a schematic plan view of the passenger compartment of a vehicle, showing the directivity pattern of a directional microphone comprising a directional microphone array;

FIG. 6A is a diagram showing the relationship between a voice from the front passenger seat and its delay;

FIG. 6B is a diagram showing the relationship between a voice from the driver's seat and its delay;

FIG. 6C is a diagram showing how to switch between an output signal representative of the voice from the front passenger seat (a delay sum output signal from a speaker on the front passenger seat) and an output signal representative of the voice from the driver's seat (a delay sum output signal from a speaker on the driver's seat);

FIG. 7 is a diagram showing a wiring arrangement for automatically supplying an output signal representative of the voice from the front passenger seat (a delay sum output signal from a speaker on the front passenger seat) and an output signal representative of the voice from the driver's seat (a delay sum output signal from a speaker on the driver's seat);

FIG. 8 is a main flowchart of an operation sequence of the speech recognition apparatus;

FIG. 9 is a flowchart of a muting process in the main flowchart;

FIG. 10 is a block diagram showing connections for the sound output device in a normal sound output process;

FIG. 11 is a block diagram of the sound output device, while illustrating the muting process;

FIG. 12 is a diagram showing frequency characteristics of another muting process;

FIG. 13 is a block diagram of a sound output device combined with a sound output attenuating means incorporating filters having the frequency characteristics shown inFIG. 12; and

FIG. 14 is a schematic plan view of the passenger compartment of a vehicle, showing a loudspeaker layout for a 6.1-channel surround system incorporated therein.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Like or corresponding parts are denoted by like or corresponding reference characters throughout the views.

FIG. 1 shows in block form aspeech recognition apparatus10 according to an embodiment of the present invention.FIG. 2 shows in schematic plan view the passenger compartment of avehicle12 that incorporates therein thespeech recognition apparatus10 shown inFIG. 1.

As shown inFIG. 2, thevehicle12 includes a driver'sseat22, afront passenger seat24, and rear passenger seats26,28, which are disposed in the passenger compartment. Thevehicle12 incorporates a 5.1-channel surroundsystem having loudspeakers41 through45 disposed in respective positions, described below, in the passenger compartment. Thefront passenger seat24 and the rear passenger seats26,28 serve as seats for vehicle passengers rather than the vehicle driver.

As shown inFIG. 1, thespeech recognition apparatus10 basically comprises adirectional microphone50, serving as a directional voice input means for inputting the voice from a speaker, a voice input state detecting means56 for detecting a voice input state in which the voice from the speaker is input to thedirectional microphone50, asound output device14 providing a 5.1-channel surround system, a controlleddevice20, and an ECU (Electronic Control Unit)18 for performing speech recognition and sound control. TheECU18 includes asound processor16 for controlling thesound output device14, and a speech recognition means52 for controlling the controlleddevice20 based on recognized speech.

Thesound processor16 has an upper sound limit setting means66, a combined sound generating means64, and a sound output attenuating means62. Thesound processor16 controls processing operations of such means66,64,62 based on information from aloudspeaker identifying means54.

FIG. 3 shows in block form thesound output device14 together with a basic arrangement for thesound processor16. Thesound output device14 basically includes asound source72 such as a tuner, a player, a hard disk, etc., apreamplifier74 for preamplifying 5.1-channel sound signals output from thesound source72, a volume adjustment button76 (seeFIG. 1), and a sound output means38 comprising a plurality ofpower amplifiers31 through35, each of which amplifies a sound signal in one channel, and a plurality ofloudspeakers41 through45 connected respectively to thepower amplifiers31 through35.

As shown inFIG. 2, theloudspeaker41 is a front right loudspeaker (Fr) disposed in a position rightward and forward of the front driver'sseat22, and outputs sounds depending on a front right sound signal Sfr (seeFIG. 3) supplied from thesound source72 through thepreamplifier74. Theloudspeaker42 is a central loudspeaker (C) disposed in the vicinity of the center of the instrument panel, and outputs sounds depending on a central sound signal Sc supplied from thesound source72 through thepreamplifier74. Theloudspeaker43 is a front left loudspeaker (Fl) disposed in a position leftward and forward of thefront passenger seat24, and outputs sounds depending on a front left sound signal Sfl supplied from thesound source72 through thepreamplifier74. Theloudspeaker44 is a rear right loudspeaker (Rr) disposed in a position rightward and rearward of the rightrear passenger seat28, and outputs sounds depending on a rear right sound signal Srr supplied from thesound source72 through thepreamplifier74. Theloudspeaker45 is a rear left loudspeaker (Rl) disposed in a position leftward and rearward of the leftrear passenger seat26, and outputs sounds depending on a rear left sound signal Srl supplied from thesound source72 through thepreamplifier74. The sounds are radiated from theloudspeakers41 through45 into the passenger compartment.

The 5.1-channel surround system also includes a superwoofer, which is a loudspeaker for radiating bass sounds at very low frequencies, in addition to theloudspeakers41 through45. The superwoofer is located adjacent to thecentral loudspeaker42. The superwoofer covers a low frequency range up to 120 Hz, which is outside of the frequency band (voice frequency band from 150 to 6000 Hz) of thedirectional microphone50. Since the frequency range covered by the superwoofer falls outside of the frequency range of thespeech recognition apparatus10, the superwoofer is not controlled for sound attenuation, and hence is not plotted in the drawings. In operation, thepreamplifier74 outputs a superwoofer sound signal to a power amplifier, which supplies an amplified sound signal to the superwoofer in order to radiate bass sounds.

Thesound processor16 controls processing operations of the upper sound limit setting means66, the combined sound generating means64, and the sound output attenuating means62, based on information from theloudspeaker identifying means54 and information from the volume adjustment button76, in order to set upper limits for, combine, and attenuate the sound signals Sfr, Sc, Sfl, Srr, Srl output from thepreamplifier74.

As shown inFIG. 2, thedirectional microphone50 is mounted on the instrument panel and has a directivity range surrounded by a pattern (referred to as “directivity pattern51”). In the present embodiment, the speaker whose speech is to be recognized by thespeech recognition apparatus10 is the driver seated on the driver'sseat22. Thedirectivity pattern51 of thedirectional microphone50 includes therein thecentral loudspeaker42, the frontright loudspeaker41, and the rearright loudspeaker44, which output sounds that may possibly tend to lower the speech recognition ratio of thespeech recognition apparatus10. Stated otherwise, thedirectional microphone50 for use in recognizing the voice uttered by the speaker is positioned such that it also picks up sounds radiated by thecentral loudspeaker42, the frontright loudspeaker41, and the rearright loudspeaker44.

TheECU18 includes an input interface, a CPU, a memory, and an output interface, as are well known in the art. TheECU18 executes a program stored in the memory, in order to perform various processes including a speech recognition process, a sound control process, etc., for controlling the controlleddevice20. The controlleddevice20 may be a navigation system, an air-conditioning system, or an audio system. TheECU18 is disposed in the vicinity of the instrument panel.

According to the present embodiment, theECU18 functions as the speech recognition means52 for recognizing speech entered from thedirectional microphone50, and for outputting commands corresponding to the recognized speech to the controlleddevice20, theloudspeaker identifying means54, and thesound processor16.

As well known in the art, the speech recognition means52 encodes voice signals entered from thedirectional microphone50, analyzes the frequencies of the voice signals in order to recognize a speech pattern, compares the speech pattern with a speech dictionary in order to specify the contents of the voice signal, and outputs a command corresponding to the specified contents to the controlleddevice20.

As illustrated inFIG. 2, when a voice input state detecting means56, such as a pushbutton speech switch or the like, is mounted on the steering wheel only, then the speaker is limited to the driver on the driver'sseat22. Therefore, while the voice input state detecting means56 is operated, e.g., while the pushbutton is pressed, or during a certain period of time after the voice input state detecting means56 has been operated, theloudspeaker identifying means54 identifies thecentral loudspeaker42, the frontright loudspeaker41, and the rearright loudspeaker44, which are included in thedirectivity pattern51 of thedirectional microphone50, as loudspeakers which would obstruct the speech recognition process and whose sounds are to be attenuated from among theloudspeakers41 through45. The loudspeaker identifying means54 then supplies loudspeaker identifying information, which represents the identified loudspeakers, to thesound processor16.

Loudspeakers whose sounds are to be attenuated may be identified in various ways depending on the number of voice input state detecting means56 used, and/or the number and specifications ofdirectional microphones50 used.

For example,FIG. 4 schematically shows the passenger compartment of a vehicle in which adirectional microphone50aand a voice input state detecting means56aare disposed in front of the driver'sseat22, and anotherdirectional microphone50band a voice input state detecting means56bare disposed in front of thefront passenger seat24. According to the layout shown inFIG. 4, when the voice input state detecting means56ain front of the driver'sseat22 is operated, the

loudspeakers

41,42,44 included in thedirectivity pattern51R of thedirectional microphone50aare identified as loudspeakers whose sounds are to be attenuated. At this time, thedirectional microphone50bis turned off, and only a voice signal entered from thedirectional microphone50ais supplied to the speech recognition means52.

Conversely, when the voice input state detecting means56bin front of thefront passenger seat24 is operated, the

loudspeakers

42,43,45 included in thedirectivity pattern51L of thedirectional microphone50bare identified as loudspeakers whose sounds are to be attenuated. At this time, thedirectional microphone50ais turned off, and only a voice signal entered from thedirectional microphone50bis supplied to the speech recognition means52.

InFIG. 5, thedirectional microphone50 that is mounted on the instrument panel comprises adirectional microphone array50A, and the voice input state detecting means56a,56bare disposed respectively in front of the driver'sseat22 and thefront passenger seat24. As shown inFIG. 6A, thedirectional microphone array50A comprises an array of four microphones50A1 through50A4. Thefront passenger seat24 faces the microphones50A1 through50A4 obliquely at an angle θ1 to their direct forward directions, as shown inFIG. 6A. Therefore, the voice of the passenger seated on thefront passenger seat24 arrives at the microphones50A2 through50A4 with respective delay times D1, +2D1, +3D1 with respect to the voice arriving at the microphone50A1. The driver'sseat22 faces the microphones50A1 through50A4 obliquely at an angle θ2 to their direct forward directions, opposite to the angle θ1, as shown inFIG. 6B. Therefore, the voice of the driver seated on the driver'sseat22 arrives at the microphones50A3 through50A1 with respective delay times D2, +2D2, +3D2 with respect to the voice arriving at the microphone50A4.

As shown inFIG. 6C,

delay units

91,92,93 having respective delay times D1, +2D1, +3D1 are connected respectively to the microphones50A2 through50A4, and delay

units

94,95,96 having respective delay times D2, +2D2, +3D2 are connected respectively to the microphones50A3 through50A1. The microphone50A1 and the

delay units

91,92,93 have respective output terminals connected to a fixedcontact90bof aswitch90, and the microphone50A4 and the

delay units

94,95,96 have respective output terminals connected to a fixedcontact90cof theswitch90. A singledirectional microphone array50A can thus supply an output signal representative of a voice from the front passenger seat24 (a delay sum output signal from the speaker on the front passenger seat24) to thecontact90b, and can also supply an output signal representative of a voice from the driver's seat22 (a delay sum output signal from the speaker on the driver's seat22) to thecontact90c.

In the arrangement shown inFIGS. 5 and 6, when the voice input state detecting means56ain front of the driver'sseat22 is operated, thecommon terminal90aof theswitch90 is connected to the fixedcontact90c,in order to supply the output signal representative of the voice from the driver'sseat22 to the speech recognition means52. At this time, the

loudspeakers

41,42,44 included in thedirectivity pattern51R of thedirectional microphone array50A are identified by the speaker identifying means54 as loudspeakers whose sounds are to be attenuated.

When the voice input state detecting means56bin front of thefront passenger seat24 is operated, thecommon terminal90aof theswitch90 is connected to the fixedcontact90b,in order to supply the output signal representative of the voice from thefront passenger seat24 to the speech recognition means52. At this time, the

loudspeakers

42,43,44 included in thedirectivity pattern51L of thedirectional microphone array50A are identified by the speaker identifying means54 as loudspeakers whose sounds are to be attenuated.

The voice input state detecting means56a,56b, such as pushbutton speech switches or the like, may be omitted from the arrangement shown inFIG. 5. When the voice input state detecting means56a,56bare omitted, then, as shown inFIG. 7, alevel detector97 for detecting the level of an output signal representative of a voice from the front passenger seat24 (a delay sum output signal from the speaker on the front passenger seat24) is connected to the microphone50A1 and the

delay units

91,92,93, and alevel detector98 for detecting the level of an output signal representative of a voice from the driver's seat22 (a delay sum output signal from the speaker on the driver's seat22) is connected to the microphone50A4 and the

delay units

94,95,96. The

level detectors

97,98 function as a voice input state detecting means56B. When the level of an output signal representative of a voice from the driver'sseat22, as detected by thelevel detector98, reaches a predetermined level, the voice input state detecting means56B detects the voice input from the driver'sseat22, and outputs detected information to theloudspeaker identifying means54. When the level of an output signal representative of a voice from thefront passenger seat24, as detected by thelevel detector97, reaches a predetermined level, the voice input state detecting means56B detects the voice input from thefront passenger seat24, and outputs detected information to theloudspeaker identifying means54.

As described above, the voice input state detecting means56 and thedirectional microphone50 may be of any of various arrangements.

For an easier understanding of the present invention, in the following description, it shall be assumed that thedirectivity pattern51 of thedirectional microphone50 is fixed in the vicinity of the driver'sseat22, as shown inFIG. 2.

The above-mentionedsound processor16 may comprise the sound output attenuating means62, the combined sound generating means64 and the upper sound limit setting means66. The sound output attenuating means62 operates to attenuate sounds output from the

loudspeakers

41,42,44 which otherwise would obstruct the speech recognition process carried out by the speech recognition means52. The combined sound generating means64 operates to combine the sounds output from the

loudspeakers

41,42,44 and which are attenuated by the sound output attenuating means62 with sounds output from the

loudspeakers

43,45, thereby generating a combined sound. The upper sound limit setting means66 operates to set an upper volume limit of 70 dB, if the volume setting made through the volume adjustment button76 is a sound pressure of 70 dB or higher.

Thespeech recognition apparatus10 according to the present embodiment, and thevehicle12 incorporating therein thespeech recognition apparatus10, are basically constructed and operate as described above. Details of an operation sequence of thespeech recognition apparatus10 shall be described below with reference to the flowcharts shown inFIGS. 8 and 9. Unless otherwise noted, theECU18 operates as a main control entity for executing the operation sequence. However, theECU18 may or may not be specifically referred to in the description of the operation sequence given below.

In step S1 shown inFIG. 8, theECU18 determines whether there is a voice input to thedirectional microphone50 for speech recognition or not. Specifically, a voice input state for applying a voice input to thedirectional microphone50 is detected while the voice input state detecting means56 is operated, or during a certain period of time, e.g., several seconds, after the voice input state detecting means56 has been operated. Thesound processor16 determines whether there is actually a voice input based on the loudspeaker identifying information from theloudspeaker identifying means54.

If a voice input state is not determined, then thesound processor16 performs a normal sound output process in step S5.

FIG. 10 shows connections of thesound output device14 during the normal sound output process. Since there is no voice input for speech recognition during the normal sound output process, the sound signals Sfr, Sc, Sfl, Srr, Srl output from thepreamplifier74 are supplied directly to therespective power amplifiers31 through35, which energize therespective loudspeakers41 through45 in order to output 5.1-channel surround sounds. Specifically, the sound signals Sfr, Sc, Sfl, Srr, Srl pass, unprocessed, through the upper sound limit setting means66, the combined sound generating means64, and the sound output attenuating means62, to therespective power amplifiers31 through35.

If the voice input state detecting means56 is operated to detect a voice input state for applying a voice input to thedirectional microphone50, and a voice input is actually detected by thesound processor16 in step S1, then control goes to step S2 in order to perform the muting process shown inFIG. 9.

The muting process is performed by thesound output device14, as shown inFIG. 11.

When the muting process is started, theECU18 determines whether the volume setting made by the volume adjustment button76 is a predetermined value of 70 dB or higher in step S2a(FIG. 9). If it is judged that the volume setting is equal to or higher than 70 dB in step S2a, then volume limiters80 (volume limiting means), for setting a maximum value of 70 dB, are inserted into respective channels in the upper sound limit setting means66 in step S2bas shown inFIG. 11. Consequently, the levels of the sounds output from theloudspeakers41 through45 are limited to 70 dB at maximum.

If it is judged that the volume setting is lower than 70 dB in step S2a, then thevolume limiters80 are not inserted into the respective channels in the upper sound limit setting means66.

In step S2c,a loudspeaker identifying process is performed. In the loudspeaker identifying process, from among theloudspeakers41 through45, thecentral loudspeaker42, the frontright loudspeaker41, and the rearright loudspeaker44 included in thedirectivity pattern51 shown inFIG. 2 are identified as loudspeakers which otherwise would obstruct the speech recognition process performed by the speech recognition means52. In order to attenuate the sounds output from the

loudspeakers

41,42,44, in step S2d,

attenuators

78 for attenuating the sound levels by −15 dB are inserted into the respective channels in the sound output attenuating means62 that correspond to the

loudspeakers

41,42,44, as shown inFIG. 11.

Then, in step S2e,the combined sound generating means64 is wired to provide

adders

82,84, as shown inFIG. 11. Specifically, the sound signals Sfr, Sc for the frontright loudspeaker41 and thecentral loudspeaker42, which are attenuated by the sound output attenuating means62, are combined with (i.e., added to) the sound signal Sfl for the frontleft loudspeaker43, which is not attenuated by the sound output attenuating means62, by theadder82. The combined signal, which represents the sum (Sfr+Sc+Sfl) of the sound signals Sfr, Sc, Sfl for the frontright loudspeaker41, thecentral loudspeaker42, and the frontleft loudspeaker43, is amplified by thepower amplifier33, which energizes the frontleft loudspeaker43 to emit the combined sound.

At the same time, the sound signals Sc, Srr for thecentral loudspeaker42 and the rearright loudspeaker44, which are attenuated by the sound output attenuating means62, are combined with (i.e., added to) the sound signal Srl for the rearleft loudspeaker45, which is not attenuated by the sound output attenuating means62, by theadder84. The combined signal, which represents the sum (Sc+Srr+Srl) of the sound signals Sc, Srr, Srl for thecentral loudspeaker42, the rearright loudspeaker44, and the rearleft loudspeaker45, is amplified by thepower amplifier35, which energizes the rearleft loudspeaker45 to emit the combined sound.

Therefore, the sound signal Sc for thecentral loudspeaker42 is applied to the frontleft loudspeaker43 and to the rearleft loudspeaker45, which output the sound based on the sound signal Sc. Accordingly, while thespeech recognition apparatus10 carries out the speech recognition process, a natural and pleasant sound environment is developed inside the passenger compartment.

Control then returns to the main flowchart shown inFIG. 8. TheECU18 determines whether the voice input state is finished or not in step S3. The voice input state is determined as being finished when the voice input state detecting means56 is no longer operated, or when a certain period of time has elapsed after the voice input state detecting means56 has been operated. If the voice input state is not determined as being finished, then the muting process continues, and the speech recognition process is performed by the speech recognition means52.

In the speech recognition process performed by the speech recognition means52, thespeech recognition apparatus10 outputs a command to the controlleddevice20, thereby controlling the controlleddevice20. Accordingly, the controlleddevice20 is controlled highly accurately according to the speech recognition process, wherein the process is performed in a natural and pleasant sound environment developed in the passenger compartment.

If it is determined that the voice input state is finished in step S3, then the muting process is completed in step S4. Specifically, in step S5, thesound output device14 is connected in the normal sound output process, as shown inFIG. 10, for outputting surround sounds in the 5.1 channels from all of theloudspeakers41 through45. Thereafter, theECU18 repeats the process from step S1.

According to the present embodiment, as described above, when a voice input to thedirectional microphone50 is detected by the voice input state detecting means56, the sound output attenuating means62 attenuates sounds output from the

loudspeakers

41,42,44, which otherwise would obstruct the speech recognition process carried out by the speech recognition means52 (i.e., sounds output from the

loudspeakers

41,42,44, which would otherwise be added to the voice applied from the speaker to thedirectional microphone50, are attenuated). Further, the sound signals for the

loudspeakers

41,42,44 are combined with sound signals for the

loudspeakers

43,45, which do not radiate sounds into the directivity pattern of thedirectional microphone50, so that combined sounds are radiated from the

loudspeakers

43,45.

The speech recognition rate of the speech recognition process can thus be increased simply by attenuating sounds output from a minimum number of loudspeakers, which otherwise might radiate sounds that would be added to the speaker's voice recognized by the speech recognition process. Consequently, the voice of the speaker can be recognized highly accurately in a natural and pleasant sound environment, even when different sounds are radiated from the left and right loudspeakers for producing stereophonic sound effects.

Rather than attenuating the sounds output from the

loudspeakers

41,42,44 by −15 dB as shown inFIG. 11, the sounds output from the

loudspeakers

41,42,44 may be attenuated by a level which is progressively smaller at higher frequencies, e.g., by a level of −40 dB at frequencies in the vicinity of 150 Hz at the lower end of the voice frequency band, and by a level of −10 dB at frequencies in the vicinity of 6000 Hz at the higher end of the voice frequency band, as indicated by the high-passcharacteristic curve99ashown inFIG. 12.

Thesound processor16 shown inFIG. 13 includes a sound output attenuating means62acomprising high-pass filters99, each having a high-passcharacteristic curve99aas shown inFIG. 12, inserted into the respective channels that correspond to the

loudspeakers

41,42,44. Since the high-pass filters99 pass a higher level of sound signals at higher frequencies, thesound processor16 shown inFIG. 13 can provide a more natural sound environment, while still allowing thespeech recognition apparatus10 to recognize speech highly accurately.

In the above embodiments, a 5.1-channel surround system has been illustrated as a surround system having independent front, rear, left, and right channels. However, the principles of the present invention are also applicable to a 4-channel surround system, or a 6.1-channel surround system which comprises, as shown inFIG. 14, a 5.1-channel surround system including asuperwoofer49 and a rearcentral loudspeaker46 mounted on a tonneau cover in thevehicle12, or a 7.1-channel surround system which comprises a 5.1-channel surround system and two left and right loudspeakers mounted on the tonneau cover instead of the rearcentral loudspeaker46.

Although certain preferred embodiments of the present invention have been shown and described in detail, it should be understood that various changes and modifications may be made therein without departing from the scope of the appended claims.

Claims

1. A speech recognition apparatus comprising:

a directional voice input unit for inputting voice from a speaker;

a voice input state detector for detecting a voice input state in which the voice from the speaker is input to said directional voice input unit;

a speech recognizer for recognizing the voice input from said directional voice input unit and outputting a command corresponding to a recognized voice;

a sound output unit for outputting sound signals in a plurality of channels to corresponding loudspeakers;

a sound output attenuator for attenuating sounds output from selected loudspeakers which otherwise would obstruct a speech recognition process performed by said speech recognizer when said voice input state detector detects the voice input state; and

a combined sound generator for combining the sound signals output to said selected loudspeakers whose output sounds are attenuated, with the sound signals output to other loudspeakers whose output sounds are not attenuated, thereby producing a combined sound signal to generate a combined sound.

2. A speech recognition apparatus according toclaim 1, wherein said sound output unit comprises a surround system having at least front, rear, left, and right independent channels.

3. A speech recognition apparatus according toclaim 1, wherein said sound output attenuator attenuates only sounds in a frequency range that is used in said speech recognition process performed by said speech recognizer.

4. A speech recognition apparatus according toclaim 3, wherein said sound output attenuator comprises a high-pass filter for attenuating a higher level of sound signals at frequencies in the vicinity of a lower end of the voice frequency band, and a lower level of sound signals at frequencies in the vicinity of a higher end of the voice frequency band.

5. A vehicle incorporating therein the speech recognition apparatus according toclaim 1.

6. A speech recognition apparatus comprising:

a microphone array for inputting voice from a speaker and outputting a delay sum output signal from a speaker on the driver's seat in a vehicle and a delay sum output signal from a speaker on a passenger seat in the vehicle;

a voice input state detector for detecting a voice input state in which the voice from the speaker on the driver's seat is input to said microphone array when the delay sum output signal from the speaker on the driver's seat reaches a predetermined level, and for detecting a voice input state in which the voice from the speaker on the passenger seat is input to said microphone array when the delay sum output signal from the speaker on the passenger seat reaches a predetermined level;

a speech recognizer for recognizing the voice input from said microphone array and outputting a command corresponding to a recognized voice;

7. A speech recognition apparatus according toclaim 6, wherein said sound output unit comprises a surround system having at least front, rear, left, and right independent channels.

8. A speech recognition apparatus according toclaim 6, wherein said sound output attenuator attenuates only sounds in a frequency range that is used in said speech recognition process performed by said speech recognizer.

9. A speech recognition apparatus according toclaim 8, wherein said sound output attenuator comprises a high-pass filter for attenuating a higher level of sound signals at frequencies in the vicinity of a lower end of the voice frequency band, and a lower level of sound signals at frequencies in the vicinity of a higher end of the voice frequency band.

10. A vehicle incorporating therein the speech recognition apparatus according toclaim 6.