JP2009104020A

Movatterモバイル変換

Info

Publication number: JP2009104020A
Application number: JP2007277232A
Authority: JP
Inventors: Akira Baba; 朗馬場; Kiyotaka Takehara; 清隆竹原; Kenji Okuno; 健治奥野; Kenji Nakakita; 賢二中北; Shinpei Hibiya; 新平日比谷
Original assignee: Panasonic Electric Works Co Ltd
Current assignee: Panasonic Electric Works Co Ltd
Priority date: 2007-10-25
Filing date: 2007-10-25
Publication date: 2009-05-14

Abstract

【課題】処理負荷やコストの大幅な増大を招くことなく、制御機器をより適切に制御することが可能な音声認識装置を提供する。
【解決手段】音声認識装置１０は、ユーザからの発話音声を入力し、入力した発話音声が予め登録された登録語彙に該当すると認識した場合に、その登録語彙に基づいて制御機器２０を制御するもので、登録語彙を記憶した登録語彙記憶部１２ａと、登録語彙がどの話者によって発話されたかを識別するための複数の音声モデルを記憶した音声モデル記憶部１２ｂと、ユーザによる登録語彙の発話音声が入力された場合に複数の音声モデルから登録語彙がどの話者によって発話されたかを識別する音声認識部１３と、登録語彙が特定の話者によって発話された場合と他の話者によって発話された場合とで登録語彙に基づく制御機器２０への制御内容を異ならせる制御機器制御部１４と、を備えている。
【選択図】図１A speech recognition apparatus capable of more appropriately controlling a control device without causing a significant increase in processing load and cost.
SOLUTION: A speech recognition device 10 inputs a speech voice from a user, and controls a control device 20 based on the registered vocabulary when it recognizes that the input utterance voice corresponds to a registered vocabulary registered in advance. A registered vocabulary storage unit 12a that stores registered vocabulary, a speech model storage unit 12b that stores a plurality of speech models for identifying which speaker uttered the registered vocabulary, and utterances of registered vocabulary by a user A speech recognition unit 13 that identifies which speaker has spoken the registered vocabulary from a plurality of speech models when speech is input, and a case where the registered vocabulary is spoken by a specific speaker and a speech by another speaker And a control device control unit 14 that changes the control contents to the control device 20 based on the registered vocabulary.
[Selection] Figure 1

Description

Translated fromJapanese

本発明は、音声認識装置に関する。 The present invention relates to a speech recognition apparatus.

近年、ユーザからの発話音声を入力し、入力した発話音声が予め登録された登録語彙に該当すると認識した場合に、認識した登録語彙に応じて制御機器を制御する音声認識装置が知られている。また、大人による登録語彙の発話であるか、子供による登録語彙の発話であるかなどを識別するための音声モデルを複数記憶した音声認識装置についても知られている（特許文献１参照）。
特開平９−２３０８９０号公報2. Description of the Related Art In recent years, there has been known a speech recognition device that inputs a speech voice from a user and controls a control device according to the recognized registered vocabulary when the input utterance voice is recognized as corresponding to a registered vocabulary registered in advance. . There is also known a speech recognition apparatus that stores a plurality of speech models for identifying whether an utterance of a registered vocabulary by an adult or an utterance of a registered vocabulary by a child (see Patent Document 1).
Japanese Patent Laid-Open No. 9-230890

しかし、従来の音声認識装置では、制御機器を適切に制御しているとはいえない。例えば、制御機器が浴室装置であって湯温の設定などを子供にさせないようにチャイルドロック機能がある場合において、子供によってチャイルドロックを解除する登録語彙が発話された場合、チャイルドロックを解除してしまう。このような場合、音声認識装置は、適切な制御を行っているとはいえない。なお、この問題はチャイルドロックが設けられている装置の場合に限らず、チャイルドロックが設けられていない場合においても同様に生じるものである。 However, it cannot be said that the conventional speech recognition apparatus appropriately controls the control device. For example, if the control device is a bathroom device and there is a child lock function that prevents the child from setting the hot water temperature, etc., if the registered vocabulary for releasing the child lock is spoken by the child, the child lock is released. End up. In such a case, it cannot be said that the speech recognition apparatus performs appropriate control. Note that this problem occurs not only in the case of a device provided with a child lock but also in the case where a child lock is not provided.

また、上記のような事情から、子供等の話者を識別する音声認識装置が望まれるが、話者の識別を行うためには、処理負荷やコストの大幅な増大を招くこともあり、望ましいとはいえない。 In addition, a voice recognition device for identifying a speaker such as a child is desired due to the above circumstances. However, in order to identify a speaker, a processing load and cost may be significantly increased, which is desirable. That's not true.

本発明は、上記問題点を解決するために成されたものであり、その目的とするところは、処理負荷やコストの大幅な増大を招くことなく、制御機器をより適切に制御することが可能な音声認識装置を提供することである。 The present invention has been made to solve the above-mentioned problems, and the object of the present invention is to control the control device more appropriately without causing a significant increase in processing load and cost. Is to provide a simple speech recognition device.

本発明に係る音声認識装置は、ユーザからの発話音声を入力し、入力した発話音声が予め登録された登録語彙に該当すると認識した場合に、認識された登録語彙に基づいて制御機器を制御する制御信号を出力する音声認識装置であって、少なくとも１つの登録語彙を記憶した登録語彙記憶手段と、登録語彙記憶手段により記憶される登録語彙が、どの話者によって発話されたかを識別するための複数の音声モデルを記憶した音声モデル記憶手段と、ユーザによる登録語彙の発話音声が入力された場合、音声モデル記憶手段によって記憶された複数の音声モデルから、当該登録語彙がどの話者によって発話されたかを識別する話者識別手段と、話者識別手段により登録語彙が特定の話者によって発話されたと識別された場合と、話者識別手段により登録語彙が特定の話者を除く他の話者によって発話されたと識別された場合とで、登録語彙に基づく制御機器への制御内容を異ならせる制御手段と、を備えている。 The speech recognition apparatus according to the present invention inputs a speech voice from a user, and controls the control device based on the recognized registered vocabulary when it recognizes that the input utterance speech corresponds to a registered vocabulary registered in advance. A speech recognition apparatus that outputs a control signal, for identifying a registered vocabulary storage unit storing at least one registered vocabulary and a speaker uttered by a registered vocabulary stored by the registered vocabulary storage unit When a speech model storage unit storing a plurality of speech models and an utterance speech of a registered vocabulary by a user are input, which speaker utters the registered vocabulary from the plurality of speech models stored by the speech model storage unit A speaker identifying means for identifying whether or not the registered vocabulary is uttered by a specific speaker by the speaker identifying means; Ri in the case where registered vocabulary is identified to have been spoken by other speakers other than the specific speaker, and a, and a control means for varying the control contents of the control devices based on registration vocabulary.

この音声認識装置によれば、複数の音声モデルに基づいて話者を識別し、登録語彙が特定の話者によって発話された場合と、他の話者による発話であると識別された場合とで制御機器への制御内容を異ならせる。このため、話者に応じて制御信号を異ならせることとなり、話者に応じて制御機器を制御することができ、制御機器を話者に応じて適切に制御することができる。さらに、複数の音声モデルにより話者を識別する手法は、従来の音声認識装置に対して処理量の大幅な増加やハードウェアの追加をする必要性がない。従って、処理負荷やコストの大幅な増大を招くことなく、制御機器をより適切に制御することができる。 According to this speech recognition apparatus, a speaker is identified based on a plurality of speech models, and when the registered vocabulary is spoken by a specific speaker and when it is identified as a speech by another speaker. Different control contents for control devices. For this reason, a control signal will be varied according to a speaker, a control apparatus can be controlled according to a speaker, and a control apparatus can be controlled appropriately according to a speaker. Furthermore, the method for identifying a speaker using a plurality of speech models does not require a significant increase in processing amount or addition of hardware compared to a conventional speech recognition apparatus. Therefore, the control device can be more appropriately controlled without causing a significant increase in processing load and cost.

また、本発明に係る音声認識装置において、音声モデル記憶手段は、登録語彙が子供によって発話されたことを認識する子供音声モデルを記憶し、制御手段は、話者識別手段により登録語彙が子供によって発話されたと識別された場合、話者識別手段により登録語彙が子供を除く他の話者によって発話されたと識別された場合とで、登録語彙に基づく制御機器への制御内容を異ならせることが好ましい。 In the speech recognition apparatus according to the present invention, the speech model storage means stores a child speech model that recognizes that the registered vocabulary is spoken by the child, and the control means uses the speaker identification means to register the registered vocabulary by the child. When the utterance is identified, it is preferable that the control content to the control device based on the registered vocabulary is different from the case where the registered vocabulary is identified by the speaker other than the child by the speaker identification means. .

この音声認識装置によれば、子供音声モデルを記憶し、登録語彙が子供によって発話されたと識別した場合と、他の話者によって発話されたと識別した場合とで、制御機器への制御内容を異ならせる。このため、子供用の制御を行うことが可能となり、制御機器がテレビであってテレビの電源をオンする登録語彙が認識された場合に子供向けアニメ等の番組をつけることなどができる。従って、例えば、制御機器の操作が不慣れな子供に対して利便性を向上させることができる。 According to this speech recognition apparatus, the child speech model is stored, and when the registered vocabulary is identified as being uttered by a child, the control content to the control device is different between when the vocabulary is identified as uttered by another speaker. Make it. For this reason, it becomes possible to perform control for children, and when the control device is a television and a registered vocabulary for turning on the television is recognized, a program such as animation for children can be attached. Therefore, for example, it is possible to improve convenience for a child who is unfamiliar with the operation of the control device.

また、本発明に係る音声認識装置において、制御手段は、話者識別手段により登録語彙が子供によって発話されたと識別された場合、制御信号の出力をせず、話者識別手段により登録語彙が子供を除く他の話者によって発話されたと識別された場合、制御信号を出力することが好ましい。 In the speech recognition apparatus according to the present invention, the control means does not output a control signal when the registered vocabulary is uttered by the child by the speaker identifying means, and the registered vocabulary is not reproduced by the speaker identifying means. It is preferable to output a control signal when it is identified that the voice is spoken by another speaker except for.

この音声認識装置によれば、登録語彙が子供によって発話されたと識別された場合に制御信号の出力をせず、登録語彙が子供を除く他の話者によって発話されたと識別された場合に制御信号を出力する。このため、制御機器が浴室装置である場合に湯温の設定を禁止したり、制御機器がパーソナルコンピュータ等のインターネット接続が可能な機器である場合に成人向けコンテンツへの接続を禁止したりなど、不用意に制御機器を動かしてしまいがちな子供に対して制御機器をより適切に制御することができる。 According to this speech recognition apparatus, when the registered vocabulary is identified as uttered by a child, the control signal is not output, and when the registered vocabulary is identified as uttered by another speaker other than the child, the control signal is output. Is output. For this reason, for example, when the control device is a bathroom device, the setting of the hot water temperature is prohibited, or when the control device is a device such as a personal computer that can be connected to the Internet, connection to adult content is prohibited. The control device can be controlled more appropriately for a child who tends to move the control device carelessly.

また、本発明に係る音声認識装置において、登録語彙記憶手段は、子供による制御機器の制御を許可するロック解除語彙を登録語彙として記憶し、制御手段は、話者識別手段によりロック解除語彙が子供によって発話されたと識別された場合、子供による制御機器の制御を許可せず、話者識別手段により登録語彙が子供を除く他の話者によって発話されたと識別された場合、子供による制御機器の制御を許可することが好ましい。 In the speech recognition apparatus according to the present invention, the registered vocabulary storage means stores an unlocked vocabulary allowing the child to control the control device as a registered vocabulary, and the control means uses the speaker identifying means to make the unlocked vocabulary a child Does not allow control of the control device by the child, and controls the control device by the child if the registered vocabulary is identified by another speaker other than the child by the speaker identification means. It is preferable to allow

この音声認識装置によれば、ロック解除語彙を登録語彙として記憶し、ロック解除語彙が子供によって発話されたと識別した場合、子供による制御機器の制御を許可しない。このため、子供によってチャイルドロックが解除されてしまい、子供にとって不適切な制御内容が制御可能となってしまう事態を防止することができる。 According to this speech recognition apparatus, the unlocked vocabulary is stored as a registered vocabulary, and when the unlocked vocabulary is identified as being uttered by a child, the control of the control device by the child is not permitted. For this reason, it is possible to prevent a situation in which the child lock is released by the child and control contents inappropriate for the child can be controlled.

また、本発明に係る音声認識装置において、音声モデル記憶手段は、登録語彙が高齢者によって発話されたことを認識する高齢者音声モデルを記憶し、制御手段は、話者識別手段により登録語彙が高齢者によって発話されたと識別された場合、制御機器に対して予め登録された高齢者向けの制御をし、話者識別手段により登録語彙が高齢者を除く他の話者によって発話されたと識別された場合、制御機器に対して高齢者向けの制御を行わないことが好ましい。 In the speech recognition apparatus according to the present invention, the speech model storage means stores an elderly speech model that recognizes that the registered vocabulary is spoken by the elderly, and the control means stores the registered vocabulary by the speaker identification means. When it is identified that the utterance was made by an elderly person, the control device is controlled for the elderly person registered in advance, and the registered vocabulary is identified by the speaker identifying means as being uttered by another speaker other than the elderly person. In this case, it is preferable not to perform control for the elderly on the control device.

この音声認識装置によれば、高齢者音声モデルを記憶し、登録語彙が高齢者によって発話された識別した場合、制御機器に対して、予め登録された高齢者向けの制御をする。このため、制御機器がパーソナルコンピュータ等の文字を表示するものであって、高齢者からの発話により電源がオンされた場合に、文字を大きく表示することや、制御機器がマッサージ機である場合に、強くマッサージし過ぎないようにすることができる。従って、高齢者向けの制御が可能となって、制御機器をより適切に制御することができる。 According to this speech recognition apparatus, when the elderly speech model is stored and the registered vocabulary is identified as spoken by the elderly, the control device is controlled for the elderly registered in advance. For this reason, when the control device displays characters such as a personal computer and the power is turned on by an utterance from an elderly person, when the character is displayed large, or when the control device is a massage machine You can avoid over-massage. Therefore, control for elderly people is possible, and the control device can be controlled more appropriately.

また、本発明に係る音声認識装置において、音声モデル記憶手段は、特定の登録語彙に対してのみ、複数の音声モデルを記憶していることが好ましい。 In the speech recognition apparatus according to the present invention, the speech model storage means preferably stores a plurality of speech models only for a specific registered vocabulary.

この音声認識装置によれば、また、特定の登録語彙に対してのみ、複数の音声モデルを記憶しているため、話者を識別する必要がない語彙について、話者の識別処理を省略して処理負荷の軽減を図ることができる。 According to this speech recognition apparatus, since a plurality of speech models are stored only for a specific registered vocabulary, speaker identification processing is omitted for a vocabulary that does not require identification of the speaker. The processing load can be reduced.

本発明によれば、処理負荷やコストの大幅な増大を招くことなく、制御機器をより適切に制御することができる。 According to the present invention, it is possible to more appropriately control the control device without causing a significant increase in processing load and cost.

以下、図面を参照して、本発明の実施の形態を説明する。図１は、本発明の実施形態に係る音声認識装置を含む音声認識システムを示す構成図である。音声認識システム１は、ユーザからの発話音声によって制御機器２０を制御するものであって、音声認識装置１０と、制御機器２０とを備えている。 Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a configuration diagram showing a voice recognition system including a voice recognition device according to an embodiment of the present invention. Thevoice recognition system 1 controls thecontrol device 20 based on speech from a user, and includes avoice recognition device 10 and acontrol device 20.

音声認識装置１０は、ユーザから音声及びスイッチ操作による入力を受け付け、受け付けた入力内容に応じて制御機器２０を制御するための制御信号を出力するものである。この音声認識装置１０は、音声により制御機器２０を制御できる音声入力モードと、スイッチ操作により制御機器２０を制御できるボタン操作入力モードとが選択可能となっている。音声入力モードにおいて、音声認識装置１０は、ユーザからの発話音声を入力して、入力した発話音声が予め登録された登録語彙に該当すると認識した場合に、認識した登録語彙に応じて制御機器２０を制御する制御信号を出力する。また、ボタン操作入力モードにおいて、音声認識装置１０は、ユーザからのスイッチ操作を入力し、スイッチ操作に該当する内容で制御機器２０を制御する制御信号を出力する。 Thevoice recognition device 10 receives voice and input by a switch operation from a user, and outputs a control signal for controlling thecontrol device 20 according to the received input content. Thevoice recognition device 10 can select a voice input mode in which thecontrol device 20 can be controlled by voice and a button operation input mode in which thecontrol device 20 can be controlled by switch operation. In the voice input mode, when thespeech recognition apparatus 10 inputs speech speech from the user and recognizes that the input speech speech corresponds to a registered vocabulary registered in advance, thecontrol device 20 according to the recognized registered vocabulary. A control signal for controlling is output. In the button operation input mode, thevoice recognition device 10 inputs a switch operation from the user, and outputs a control signal for controlling thecontrol device 20 with contents corresponding to the switch operation.

制御機器２０は、音声認識装置１０からの制御信号の内容に応じて動作する外部機器である。具体的に制御機器２０は、浴室装置２１、換気扇２２及びテレビ２３の３機器からなっており、音声認識装置１０からの制御信号に応じて運転したり、運転を停止したりなどする。一例を挙げると、制御機器２０の１つであるテレビ２３は、音声認識装置１０からの制御信号によって、電源がオンされたり、チャンネルが変えられたりする。 Thecontrol device 20 is an external device that operates according to the content of the control signal from thevoice recognition device 10. Specifically, thecontrol device 20 includes three devices: abathroom device 21, aventilation fan 22, and atelevision 23. Thecontrol device 20 operates according to a control signal from thespeech recognition device 10, stops operation, or the like. For example, thetelevision 23 that is one of thecontrol devices 20 is turned on or changed in channel by a control signal from thevoice recognition device 10.

図２は、図１に示した音声認識装置１０の設置例を示す外観図である。図２に示すように、音声認識装置１０は、例えば浴室に設けられる。浴室には、浴室装置２１（図２において図示せず）、換気扇２２、及びテレビ２３が設けられている。さらに、浴室の浴槽３０付近には、音声認識装置１０の構成要素である後述のコントローラ１１が設置されている。 FIG. 2 is an external view showing an installation example of thespeech recognition apparatus 10 shown in FIG. As shown in FIG. 2, thespeech recognition apparatus 10 is provided in a bathroom, for example. In the bathroom, a bathroom device 21 (not shown in FIG. 2), aventilation fan 22, and atelevision 23 are provided. Further, a controller 11 (described later) that is a component of thespeech recognition device 10 is installed in the vicinity of thebathtub 30 in the bathroom.

なお、図１及び図２では、浴室装置２１、換気扇２２及びテレビ２３を制御機器２０の一例として挙げたが、これに限らず、制御機器２０は、床暖房機器やマッサージ機やパーソナルコンピュータや音響機器など他の機器であってもよい。また、音声認識装置１０は浴室に設けられていなくともよく、寝室、リビング、会社のデスク付近及び会議室など、他の箇所に設けられていてもよい。 In FIG. 1 and FIG. 2, thebathroom device 21, theventilation fan 22, and thetelevision 23 are given as examples of thecontrol device 20, but thecontrol device 20 is not limited thereto, and thecontrol device 20 may be a floor heating device, a massage machine, a personal computer, an acoustic device, or the like. Other devices such as a device may be used. Thevoice recognition device 10 may not be provided in the bathroom, and may be provided in other places such as a bedroom, a living room, a company desk, and a conference room.

再度、図１を参照する。図１に示すように、音声認識装置１０は、コントローラ１１と、記憶部１２と、音声認識部（話者識別手段）１３と、制御機器制御部（制御手段）１４と、装置制御部１５とを備えている。コントローラ１１は、ユーザから音声及びスイッチ操作による入力を受け付けるものである。 Reference is again made to FIG. As shown in FIG. 1, thespeech recognition apparatus 10 includes acontroller 11, astorage unit 12, a speech recognition unit (speaker identification unit) 13, a control device control unit (control unit) 14, and adevice control unit 15. It has. Thecontroller 11 receives input from the user by voice and switch operation.

図３は、図１に示したコントローラ１１の詳細を示す正面図である。図３に示すように、コントローラ１１は、音声入力部１１ａと、操作ボタン１１ｂと、表示部１１ｃと、ＬＥＤランプ１１ｄとを備えている。なお、表示部１１ｃ及びＬＥＤランプ１１ｄは他の要素１２〜１５，２０との接続関係がないため、図１における表示部１１ｃ及びＬＥＤランプ１１ｄの図示は省略した。 FIG. 3 is a front view showing details of thecontroller 11 shown in FIG. As shown in FIG. 3, thecontroller 11 includes avoice input unit 11a, anoperation button 11b, adisplay unit 11c, and anLED lamp 11d. Since thedisplay unit 11c and theLED lamp 11d are not connected to theother elements 12 to 15 and 20, thedisplay unit 11c and theLED lamp 11d in FIG.

図３に示す音声入力部１１ａは、マイク等によって構成され、ユーザからの音声入力を受け付けるものである。操作ボタン１１ｂは、ユーザによるスイッチ操作を受け付けるものである。表示部１１ｃは、ＬＣＤなどによって構成され、各種制御機器２０の動作状況等（例えばふろの温度や現在時刻など）を表示するものである。ＬＥＤランプ１１ｄは、現在音声入力モードであるか、ボタン操作入力モードであるかをユーザに提示するものである。このＬＥＤランプ１１ｄは、３つのＬＥＤによって構成され、例えば１つが点灯しているときには音声入力モードであり、他の１つが点灯しているときにはボタン操作入力モードであり、残り１つが点灯しているときには双方のモードの併用状態であることを示す構成となっている。 Thevoice input unit 11a shown in FIG. 3 is configured by a microphone or the like, and receives voice input from the user. Theoperation button 11b receives a switch operation by the user. Thedisplay unit 11c is configured by an LCD or the like, and displays operation statuses and the like (for example, bath temperature and current time) of thevarious control devices 20. TheLED lamp 11d indicates to the user whether the current voice input mode or the button operation input mode is selected. ThisLED lamp 11d is composed of three LEDs. For example, when one is lit, it is in the voice input mode, when the other is lit, it is in the button operation input mode, and the remaining one is lit. In some cases, the configuration indicates that both modes are used together.

具体的に各種操作ボタン１１ｂを説明する。各種操作ボタン１１ｂは、優先ボタン１１ｂ１、追いだきボタン１１ｂ２、ふろ自動ボタン１１ｂ３、通話ボタン１１ｂ４、コントローラオンオフボタン１１ｂ５、メニューボタン１１ｂ６、確定ボタン１１ｂ７、戻るボタン１１ｂ８、及び十字キー１１ｂ９からなっている。 Thevarious operation buttons 11b will be specifically described. Thevarious operation buttons 11b include a priority button 11b1, a follow-up button 11b2, a bath automatic button 11b3, a call button 11b4, a controller on / off button 11b5, a menu button 11b6, a confirm button 11b7, a return button 11b8, and a cross key 11b9.

優先ボタン１１ｂ１は、浴室で給湯温度やシャワー温度を設定したいときに使用するボタンである。一般的に水や湯は、浴室以外にも台所等で用いられる。このため、浴室装置２１の給湯温度やシャワー温度を設定しても他の箇所で水や湯を使用されると、実際の給湯温度やシャワー温度にズレが生じる可能性がある。そこで、優先ボタン１１ｂ１を押下することにより、他の箇所よりも浴室を優先し、実際の給湯温度やシャワー温度にズレが生じ難いようにすることができる。また、優先ボタン１１ｂ１が押下されると、表示部１１ｃに優先マーク（不図示）が表示される。 The priority button 11b1 is a button used when it is desired to set a hot water supply temperature or a shower temperature in the bathroom. In general, water and hot water are used not only in the bathroom but also in the kitchen. For this reason, even if the hot water supply temperature and the shower temperature of thebathroom device 21 are set, if water or hot water is used in other places, the actual hot water supply temperature or the shower temperature may be shifted. Therefore, by depressing the priority button 11b1, it is possible to give priority to the bathroom over other places, and to prevent the actual hot water supply temperature and shower temperature from being displaced. When the priority button 11b1 is pressed, a priority mark (not shown) is displayed on thedisplay unit 11c.

追いだきボタン１１ｂ２は、浴槽３０内で冷たくなった湯水の温度を高くするときに使用されるボタンである。また、追いだきボタン１１ｂ２が押下されると、表示部１１ｃに追いだきマーク（不図示）が表示される。 The chasing button 11b2 is a button used when raising the temperature of hot water that has become cold in thebathtub 30. When the tracking button 11b2 is pressed, a tracking mark (not shown) is displayed on thedisplay unit 11c.

ふろ自動ボタン１１ｂ３は、設定した湯量と温度とで浴槽３０内にお湯をはるときに使用されるボタンである。また、ふろ自動ボタン１１ｂ３が押下されると、表示部１１ｃに自動マーク（不図示）が表示される。 The bath automatic button 11b3 is a button used when hot water is poured into thebathtub 30 with the set amount and temperature of hot water. When the automatic button 11b3 is pressed, an automatic mark (not shown) is displayed on thedisplay unit 11c.

通話ボタン１１ｂ４は、浴室外、例えば台所などに設置される台所用リモコンと通話するときに使用されるボタンである。また、通話ボタン１１ｂ４が押下されると、表示部１１ｃに通話マーク（不図示）が表示される。 The call button 11b4 is a button used when talking with a kitchen remote controller installed outside the bathroom, for example, in a kitchen. When the call button 11b4 is pressed, a call mark (not shown) is displayed on thedisplay unit 11c.

コントローラオンオフボタン１１ｂ５は、コントローラ１１自体の電源をオンオフするためのボタンである。コントローラオンオフボタン１１ｂ５により電源がオフされた場合、表示部１１ｃの表示は消去することとなる。 The controller on / off button 11b5 is a button for turning on / off the power of thecontroller 11 itself. When the power is turned off by the controller on / off button 11b5, the display on thedisplay unit 11c is erased.

メニューボタン１１ｂ６は、手入力により制御機器２０の動作を設定するためのボタンである。このボタン１１ｂ６が押下されると、各制御機器２０の動作項目（例えば換気扇オフ、テレビ電源オン、テレビチャンネル＋１など）が表示部１１ｃに複数個表示される。ユーザは、これら複数の動作項目から十字キー１１ｂ９を操作して１つの動作項目を選択することとなる。 The menu button 11b6 is a button for setting the operation of thecontrol device 20 by manual input. When the button 11b6 is pressed, a plurality of operation items (for example, ventilation fan off, television power on,television channel + 1, etc.) of eachcontrol device 20 are displayed on thedisplay unit 11c. The user selects one action item by operating the cross key 11b9 from the plurality of action items.

確定ボタン１１ｂ７は、十字キー１１ｂ９を操作して選択された動作項目の動作を制御機器２０に実行させる際に押下されるボタンである。戻るボタン１１ｂ８は、表示部１１ｃに表示される画面を１つ前の状態に戻すときなどに使用されるボタンである。例えば、表示部１１ｃ上に動作項目を３つ程度しか表示できない場合、十字キー１１ｂ９を操作することにより、表示画面を次の画面に移行させて新たな動作項目を表示させることができる。この状態において、戻るボタン１１ｂ８を押下すれば、移行した画面を元に戻して、前回画面の動作項目を表示部１１ｃに表示させることができる。 The confirmation button 11b7 is a button that is pressed when thecontrol device 20 executes the operation of the operation item selected by operating the cross key 11b9. The return button 11b8 is a button used to return the screen displayed on thedisplay unit 11c to the previous state. For example, when only about three action items can be displayed on thedisplay unit 11c, the display screen can be shifted to the next screen and a new action item can be displayed by operating the cross key 11b9. In this state, if the return button 11b8 is pressed, the transitioned screen can be restored and the operation items of the previous screen can be displayed on thedisplay unit 11c.

十字キー１１ｂ９は、給湯温度やシャワー温度の温度設定、及び湯量の設定などに用いられるボタンである。また、十字キー１１ｂ９は、表示部１１ｃにより表示される動作項目の選択にも用いられる。 The cross key 11b9 is a button used for setting the temperature of the hot water supply temperature or shower temperature, setting the amount of hot water, and the like. The cross key 11b9 is also used to select an operation item displayed on thedisplay unit 11c.

さらに、本実施形態では、コントローラ１１の操作ボタン１１ｂを操作することにより、音声入力モードと、ボタン操作入力モードとを選択可能となっている。具体的にユーザは、メニューボタン１１ｂ６を操作し、表示部１１ｃに表示される入力モードを選択することによって、音声入力モードとボタン操作入力モードとを選択することができる。 Furthermore, in this embodiment, by operating theoperation button 11b of thecontroller 11, the voice input mode and the button operation input mode can be selected. Specifically, the user can select the voice input mode and the button operation input mode by operating the menu button 11b6 and selecting the input mode displayed on thedisplay unit 11c.

再度、図１を参照する。記憶部１２は、音声認識に必要となる情報を記憶したものであり、登録語彙記憶部（登録語彙記憶手段）１２ａと、音声モデル記憶部（音声モデル記憶手段）１２ｂとを備えている。登録語彙記憶部１２ａは、少なくとも１つの登録語彙を記憶したものであり、例えば「チャイルドロック」、「ロック解除（ロック解除語彙）」、「湯温１℃アップ」、「テレビ電源オン」、「テレビ電源オフ」、及び「モード切替」などの登録語彙を記憶している。 Reference is again made to FIG. Thestorage unit 12 stores information necessary for speech recognition, and includes a registered vocabulary storage unit (registered vocabulary storage unit) 12a and a speech model storage unit (speech model storage unit) 12b. The registeredvocabulary storage unit 12a stores at least one registered vocabulary. For example, "Child lock", "Unlock (unlocked vocabulary)", "Temperature rise by 1 ° C", "TV power on", " Registered vocabulary such as “TV power off” and “mode switching” is stored.

音声モデル記憶部１２ｂは、登録語彙が、どの話者によって発話されたかを識別するための複数の音声モデルを記憶したものである。具体的に音声モデル記憶部１２ｂは、登録語彙が子供によって発話されたことを識別する子供音声モデルと、登録語彙が大人によって発話されたことを識別する大人音声モデルと、登録語彙が高齢者によって発話されたことを識別する高齢者音声モデルとを記憶している。なお、本実施形態において子供とは幼児、小学生など１２歳以下の子供を意味し、高齢者とは６０歳以上の成人を意味し、大人とは１３歳以上５９歳以下の成人を意味している。しかし、子供、高齢者及び大人は、これらに限らず、制御機器２０の用途や音声認識システム１の用いられる環境にあわせて、年齢等は変更可能である。さらに、音声モデルは、特に子供、大人、高齢者による発話を識別するものに限らず、男性や女性を識別したり、同じ大人であってもＡさんやＢさんなど特定人を識別したりするものであってもよい。 The speech model storage unit 12b stores a plurality of speech models for identifying which speaker has spoken the registered vocabulary. Specifically, the speech model storage unit 12b includes a child speech model that identifies that the registered vocabulary is uttered by a child, an adult speech model that identifies that the registered vocabulary is uttered by an adult, and a registered vocabulary by an elderly person. An elderly speech model for identifying the utterance is stored. In this embodiment, a child means a child under 12 years of age such as an infant or elementary school student, an elderly person means an adult over 60 years old, and an adult means an adult over 13 years old and under 59 years old. Yes. However, children, elderly people, and adults are not limited to these, and the age and the like can be changed according to the use of thecontrol device 20 and the environment in which thevoice recognition system 1 is used. Furthermore, the speech model is not limited to identifying utterances by children, adults, and elderly people in particular, but identifies men and women, and identifies specific persons such as Mr. A and Mr. B even if they are the same adult. It may be a thing.

ここで、記憶部１２の記憶内容をさらに詳細に説明する。図４は、図１に示した記憶部１２の記憶内容の一例を示す概念図である。図４に示すように、登録語彙記憶部１２ａは、「チャイルドロック」、「ロック解除」、「湯温１℃アップ」、「テレビ電源オン」、「テレビ電源オフ」、及び「モード切替」などの登録語彙を記憶している。「チャイルドロック」は、子供にとって不適切な制御機器２０の制御を行わせないようにするための登録語彙である。「ロック解除」は、「チャイルドロック」によるロック状態を解除するための登録語彙である。「湯温１℃アップ」は、浴室装置２１による給湯及びシャワーの設定温度を１度上げるための登録語彙である。「テレビ電源オン」及び「テレビ電源オフ」は、それぞれテレビ２３の電源をオンオフするための登録語彙である。「モード切替」は、音声入力モードからボタン操作入力モードへ移行させるための登録語彙である。 Here, the contents stored in thestorage unit 12 will be described in more detail. FIG. 4 is a conceptual diagram showing an example of the contents stored in thestorage unit 12 shown in FIG. As shown in FIG. 4, the registeredvocabulary storage unit 12a includes “child lock”, “unlock”, “uphot water temperature 1 ° C.”, “TV power on”, “TV power off”, “mode switch”, and the like. The registered vocabulary is memorized. “Child lock” is a registered vocabulary for preventing control of thecontrol device 20 inappropriate for children. “Unlock” is a registered vocabulary for releasing a locked state by “child lock”. “Uphot water temperature 1 ° C.” is a registered vocabulary for raising the set temperature of hot water supply and shower by thebathroom device 21 once. “TV power on” and “TV power off” are registered vocabularies for turning on / off the power of thetelevision 23, respectively. “Mode switching” is a registered vocabulary for shifting from the voice input mode to the button operation input mode.

また、音声モデル記憶部１２ｂは、登録語彙毎に１又は複数の音声モデルを記憶している。具体的に、音声モデル記憶部１２ｂは、「ロック解除」、「湯温１℃アップ」、及び「テレビ電源オン」に対して、子供音声モデル、大人音声モデル、及び高齢者音声モデルを対応付けて記憶している。 The speech model storage unit 12b stores one or a plurality of speech models for each registered vocabulary. Specifically, the voice model storage unit 12b associates the child voice model, the adult voice model, and the elderly voice model with “unlock”, “up to 1 ° C. hot water temperature”, and “TV power on”. I remember.

また、音声モデル記憶部１２ｂは、子供、大人及び高齢者などの区別がない一般音声モデルを記憶している。この一般音声モデルは、「チャイルドロック」、「テレビ電源オフ」、及び「モード切替」に対応付けて記憶されている。このように、音声モデル記憶部１２ｂは、特定の登録語彙に対してのみ、複数の音声モデルを記憶している。 The speech model storage unit 12b stores a general speech model that does not distinguish between children, adults, and elderly people. This general voice model is stored in association with “child lock”, “TV power off”, and “mode switching”. In this way, the speech model storage unit 12b stores a plurality of speech models only for a specific registered vocabulary.

再度、図１を参照する。音声認識部１３は、音声入力モード中に音声入力部１１ａを介して入力されたユーザからの発話音声が予め登録される登録語彙に該当するか否かを判断するものである。さらに、音声認識部１３は、ユーザからの発話音声が登録語彙に該当するか否かを判断するのみならず、音声モデル記憶部１２ｂに記憶される複数の音声モデルに基づいて、登録語彙がどの話者によって発話されたかを識別する構成となっている。すなわち、音声認識部１３は、ユーザからの発話音声が登録語彙に該当する否か、及び、話者の識別の２処理を行う構成となっている。 Reference is again made to FIG. Thevoice recognizing unit 13 determines whether or not the uttered voice from the user input via thevoice input unit 11a during the voice input mode corresponds to a registered vocabulary registered in advance. Furthermore, thespeech recognition unit 13 not only determines whether the speech uttered by the user corresponds to the registered vocabulary, but also determines which registered vocabulary is based on a plurality of speech models stored in the speech model storage unit 12b. It is the structure which identifies whether it was uttered by the speaker. That is, thevoice recognition unit 13 is configured to perform two processes: whether or not the utterance voice from the user corresponds to the registered vocabulary and speaker identification.

以下、音声認識部１３による登録語彙の認識及び話者の識別方法の一例を説明する。まず、子供により「ロック解除」が発話されたとする。このとき、音声認識部１３は、図４に示す記憶内容に沿ってどの登録語彙がどの話者によって発話されたか判断することとなる。まず、音声認識部１３は、一般音声モデルに基づく「チャイルドロック」という登録語彙と、ユーザからの発話音声との一致度を算出する。その後、音声認識部１３は、子供音声モデルに基づく「ロック解除」、大人音声モデルに基づく「ロック解除」、及び高齢者音声モデルに基づく「ロック解除」との一致度を算出する。さらに、音声認識部１３は、他の登録語彙についても音声モデル毎に一致度を算出する。これにより、音声認識部１３は、一致度を算出し、最も一致度が高い登録語彙及び音声モデルを判断する。 Hereinafter, an example of a registered vocabulary recognition and speaker identification method by thespeech recognition unit 13 will be described. First, it is assumed that “unlock” is spoken by a child. At this time, thespeech recognition unit 13 determines which registered vocabulary is spoken by which speaker along the stored contents shown in FIG. First, thespeech recognition unit 13 calculates the degree of coincidence between the registered vocabulary “child lock” based on the general speech model and the uttered speech from the user. Thereafter, thevoice recognition unit 13 calculates the degree of coincidence between “unlock” based on the child voice model, “unlock” based on the adult voice model, and “unlock” based on the elderly voice model. Furthermore, thespeech recognition unit 13 calculates the degree of coincidence for each speech model for other registered vocabularies. Thereby, thespeech recognition unit 13 calculates the degree of coincidence, and determines the registered vocabulary and the speech model with the highest degree of coincidence.

ここでは、子供により「ロック解除」が発話されている。このため、ユーザからの発話音声は、子供音声モデルに基づく「ロック解除」と最も一致度が高いこととなり、音声認識部１３は、最も一致度が高い「ロック解除」が子供により発話されたと判断する。そして、音声認識部１３は、認識した「ロック解除」の登録語彙の情報を制御機器制御部１４に出力すると共に、登録語彙が子供により発話されたこと、すなわち、話者に関する情報を制御機器制御部１４に出力することとなる。 Here, “unlock” is spoken by the child. Therefore, the speech voice from the user has the highest degree of coincidence with “unlocked” based on the child voice model, and thespeech recognition unit 13 determines that “unlocked” with the highest degree of coincidence has been uttered by the child. To do. Then, thevoice recognition unit 13 outputs the recognized “unlocked” registered vocabulary information to the controldevice control unit 14 and controls that the registered vocabulary is spoken by the child, that is, information about the speaker is controlled by the control device. The data is output to theunit 14.

なお、音声認識部１３は、「チャイルドロック」など、複数の音声モデルと対応付けられていない登録語彙に関しては、一般音声モデルのみに基づいて、発話が「チャイルドロック」であったか否かを判断することとなる。このため、ユーザからの発話音声が「チャイルドロック」に該当した場合、話者は識別されないこととなり、音声認識部１３は、話者に関する情報を制御機器制御部１４に出力しないこととなる。 Note that, for a registered vocabulary that is not associated with a plurality of voice models, such as “child lock”, thevoice recognition unit 13 determines whether the utterance is “child lock” based only on the general voice model. It will be. For this reason, when the speech voice from the user corresponds to “child lock”, the speaker is not identified, and thespeech recognition unit 13 does not output information about the speaker to the controldevice control unit 14.

また、音声認識部１３は、所定の閾値を有しており、入力した音声の音圧が雑音によるものなど所定の閾値未満である場合、登録語彙に該当するか否か、及び話者の識別の判断を行わない構成となっている。 In addition, thevoice recognition unit 13 has a predetermined threshold value, and if the sound pressure of the input voice is less than the predetermined threshold value, such as that caused by noise, whether or not it corresponds to the registered vocabulary and speaker identification It is the structure which does not perform judgment of.

制御機器制御部１４は、制御機器２０の動作を制御するものである。制御機器制御部１４は、例えば音声認識部１３からテレビ２３の電源をオフする旨の信号を受信した場合、テレビ２３に対して電源をオフする制御信号を出力する。これにより、テレビ２３の電源はオフすることとなる。 The controldevice control unit 14 controls the operation of thecontrol device 20. For example, when the controldevice control unit 14 receives a signal to turn off the power of thetelevision 23 from thevoice recognition unit 13, the controldevice control unit 14 outputs a control signal to turn off the power to thetelevision 23. Thereby, the power supply of thetelevision 23 is turned off.

また、制御機器制御部１４は、音声認識部１３により「ロック解除」などの登録語彙が特定の話者によって発話されたと識別された場合と、他の話者による発話されたと識別された場合とで、制御機器２０への制御内容を異ならせる。なお、制御機器２０への制御内容を異ならせるとは、制御信号の出力制御を異ならせるという意味であって、出力する制御信号の種類を異ならせる場合のみならず、制御信号の出力を禁止することも含む概念である。 In addition, the controldevice control unit 14 recognizes that a registered vocabulary such as “unlock” is uttered by a specific speaker by thevoice recognition unit 13 and a case where it is identified that the registered vocabulary is uttered by another speaker. Thus, the control content to thecontrol device 20 is varied. Note that different control contents for thecontrol device 20 mean that control signal output control is different, and not only when the type of control signal to be output is different, but also for prohibiting control signal output. It is a concept that includes things.

制御機器２０への制御内容を異ならせる一例を説明する。例えば、制御機器制御部１４は、音声認識部１３により「ロック解除」の登録語彙が大人や高齢者によって発話されたと識別された場合、チャイルドロックによるロック状態を解除する制御信号を出力する。一方、制御機器制御部１４は、「ロック解除」の登録語彙が特定の話者である子供によって発話されたと識別された場合、ロック状態の解除する制御信号を出力せず、ロック状態を維持し続ける。このように、制御機器制御部１４は、登録語彙に基づく制御機器２０に対する制御を異ならせて、子供にとって不適切な制御機器２０の制御が可能となってしまう事態を防止する。なお、制御機器制御部１４は、音声認識部１３により「テレビ電源オン」などの他の登録語彙が子供によって発話されたと識別された場合、テレビ２３等の制御機器の操作に不慣れな子供のために、テレビ２３の電源オンと同時に、子供向けアニメ等の番組をつけることなどをしてもよい。 An example in which the control content to thecontrol device 20 is changed will be described. For example, the controldevice control unit 14 outputs a control signal for releasing the locked state by the child lock when thevoice recognition unit 13 identifies that the registered vocabulary of “unlock” is spoken by an adult or an elderly person. On the other hand, when the registered vocabulary “unlock” is identified as being uttered by a child who is a specific speaker, the controldevice control unit 14 does not output a control signal for releasing the lock state, and maintains the lock state. to continue. As described above, the controldevice control unit 14 changes the control of thecontrol device 20 based on the registered vocabulary to prevent a situation in which thecontrol device 20 inappropriate for children can be controlled. Note that the controldevice control unit 14 is for children who are unaccustomed to the operation of the control device such as theTV 23 when thevoice recognition unit 13 identifies other registered vocabulary such as “TV power on” as spoken by the child. In addition, a program such as an animation for children may be attached at the same time when thetelevision 23 is turned on.

さらに、制御機器制御部１４は、子供による発話の場合のみ、制御機器２０への制御内容を異ならせるわけでなく、高齢者による発話によっても、制御機器２０への制御内容を異ならせてもよい。例えば、制御機器制御部１４は、音声認識部１３により「テレビ電源オン」の登録語彙が子供や大人によって発話されたと識別された場合、単にテレビ２３の電源をオンさせる制御信号を出力する。一方、制御機器制御部１４は、「テレビ電源オン」の登録語彙が特定の話者である高齢者によって発話されたと識別された場合、テレビ２３の電源をオンさせると共に、テレビ２３が字幕等のデータを受信している場合には、字幕を大きくして、高齢者向けの制御を行う制御信号を出力する。これにより、高齢者向けの制御ができ、制御機器２０より適切に制御することとなる。 Furthermore, the controldevice control unit 14 does not change the control content to thecontrol device 20 only in the case of an utterance by a child, and may change the control content to thecontrol device 20 by an utterance by an elderly person. . For example, when thevoice recognition unit 13 identifies that the registered vocabulary “TV power on” is spoken by a child or an adult, the controldevice control unit 14 simply outputs a control signal for turning on thetelevision 23. On the other hand, when the registered vocabulary “TV power on” is identified as being uttered by an elderly person who is a specific speaker, the controldevice control unit 14 turns on theTV 23 and theTV 23 displays subtitles and the like. When data is received, the subtitle is enlarged and a control signal for performing control for the elderly is output. Thereby, control for elderly people can be performed, and it will control more appropriately than thecontrol apparatus 20. FIG.

装置制御部１５は、操作ボタン１１ｂによって入力された入力内容を認識して、その旨の信号を制御機器制御部１４に出力するものである。例えばふろ自動ボタン１１ｂ３が操作された場合、装置制御部１５は、自動に湯をはる旨の信号を制御機器制御部１４に出力する。これにより、制御機器制御部１４は、設定温度及び設定湯量で湯をはる旨の制御信号を浴室装置２１に出力する。そして、浴室装置２１は、設定温度及び設定湯量で湯をはることとなる。 Thedevice control unit 15 recognizes the input content input by theoperation button 11b and outputs a signal to that effect to the controldevice control unit 14. For example, when the automatic bath button 11b3 is operated, thedevice control unit 15 outputs a signal to the controldevice control unit 14 to automatically pour hot water. As a result, the controldevice control unit 14 outputs a control signal to thebathroom device 21 to fill the hot water at the set temperature and the set hot water volume. And thebathroom apparatus 21 will pour hot water with preset temperature and preset hot water quantity.

次に、本実施形態に係る音声認識装置１０の動作の概略を説明する。まず、ユーザが操作ボタン１１ｂを操作して音声認識装置１０をボタン操作入力モードから音声入力モードに移行させる。これにより、ユーザは、発話によって制御機器２０を制御可能となる。 Next, an outline of the operation of thespeech recognition apparatus 10 according to the present embodiment will be described. First, the user operates theoperation button 11b to shift thevoice recognition device 10 from the button operation input mode to the voice input mode. Thereby, the user can control thecontrol device 20 by speech.

そして、大人や高齢者により「ロック解除」と発話されたとする。この場合、音声認識部１３は、大人や高齢者によって「ロック解除」と発話されたと判断し、その登録語彙の情報と話者（大人や高齢者）の情報とを制御機器制御部１４に送信する。これにより、制御機器制御部１４は、制御機器２０のロック状態を解除することとなる。 Then, it is assumed that “unlocked” is spoken by an adult or an elderly person. In this case, thevoice recognition unit 13 determines that an “unlocked” speech has been made by an adult or an elderly person, and transmits the registered vocabulary information and the speaker (adult or elderly person) information to the controldevice control unit 14. To do. Thereby, the controldevice control unit 14 releases the lock state of thecontrol device 20.

また、子供により「ロック解除」と発話されたとする。この場合、音声認識部１３は、子供によって「ロック解除」と発話されたと判断し、その登録語彙の情報と話者（子供）の情報とを制御機器制御部１４に送信する。これにより、制御機器制御部１４は、大人や高齢者によって「ロック解除」と発話された場合と制御を異ならせることとなる。すなわち、制御機器制御部１４は、制御機器２０のロック状態を解除せず、ロック状態を維持することとなる。 Also, suppose that the child has spoken “unlocked”. In this case, thevoice recognizing unit 13 determines that the utterance is “unlocked” by the child, and transmits the registered vocabulary information and the speaker (child) information to the controldevice control unit 14. Thereby, the controldevice control unit 14 makes the control different from the case where “unlock” is spoken by an adult or an elderly person. That is, the controldevice control unit 14 maintains the locked state without releasing the locked state of thecontrol device 20.

さらに、大人により「テレビ電源オン」と発話されたとする。この場合、音声認識部１３は、大人によって「テレビ電源オン」と発話されたと判断し、その登録語彙の情報と話者（子供や大人）の情報とを制御機器制御部１４に送信する。これにより、制御機器制御部１４は、テレビ２３の電源をオンさせることとなる。 Furthermore, suppose that an adult says “TV power on”. In this case, thevoice recognizing unit 13 determines that the “TV power on” is spoken by an adult, and transmits the registered vocabulary information and the speaker (child or adult) information to the controldevice control unit 14. As a result, the controldevice control unit 14 turns on the power of thetelevision 23.

また、子供により「テレビ電源オン」と発話されたとする。この場合、音声認識部１３は、子供によって「テレビ電源オン」と発話されたと判断し、その登録語彙の情報と話者（子供）の情報とを制御機器制御部１４に送信する。これにより、制御機器制御部１４は、大人によって「テレビ電源オン」と発話された場合と制御を異ならせることとなる。すなわち、制御機器制御部１４は、単にテレビ２３の電源をオンするだけでなく、テレビ２３を子供向けの番組をつけ、子供向けの制御をすることとなる。 Also, suppose that a child says “TV power on”. In this case, thespeech recognition unit 13 determines that the child has spoken “TV power on”, and transmits the registered vocabulary information and the speaker (child) information to the controldevice control unit 14. As a result, the controldevice control unit 14 makes the control different from the case where “TV power on” is spoken by an adult. That is, the controldevice control unit 14 not only turns on the power of thetelevision 23 but also attaches a program for children to thetelevision 23 to control for children.

また、高齢者により「テレビ電源オン」と発話されたとする。この場合、音声認識部１３は、高齢者によって「テレビ電源オン」と発話されたと判断し、その登録語彙の情報と話者（高齢者）の情報とを制御機器制御部１４に送信する。これにより、制御機器制御部１４は、子供や大人によって「テレビ電源オン」と発話された場合と制御を異ならせることとなる。すなわち、制御機器制御部１４は、単にテレビ２３の電源をオンするだけでなく、テレビ２３の字幕を大きくして、高齢者向けの制御をすることとなる。 Also, suppose that an elderly person says “TV power on”. In this case, thevoice recognizing unit 13 determines that the elderly person has spoken “TV power on”, and transmits the registered vocabulary information and the speaker (elderly person) information to the controldevice control unit 14. As a result, the controldevice control unit 14 makes the control different from the case where “TV power on” is spoken by a child or an adult. That is, the controldevice control unit 14 not only turns on the power of thetelevision 23 but also enlarges the caption of thetelevision 23 and performs control for the elderly.

次に、本実施形態に係る音声認識装置１０の詳細動作を説明する。図５は、図１に示した音声認識装置１０の動作の詳細を示すフローチャートである。なお、図５に示す処理は音声認識装置１０の電源がオフされるまで繰り返される。 Next, detailed operation of thespeech recognition apparatus 10 according to the present embodiment will be described. FIG. 5 is a flowchart showing details of the operation of thespeech recognition apparatus 10 shown in FIG. The process shown in FIG. 5 is repeated until the power of thespeech recognition apparatus 10 is turned off.

図５に示すように、まず、音声認識部１３は、ユーザからの発話音声を入力したか否かを判断する（Ｓ１）。この際、音声認識部１３は、入力した音声の音圧と所定の閾値とに基づいて発話音声が入力したか否かを判断する。ユーザからの発話音声を入力しなかったと判断した場合（Ｓ１：ＮＯ）、すなわち、所定の閾値未満の音圧を有する音声が入力した場合、処理はステップＳ１の処理を繰り返すこととなる。 As shown in FIG. 5, first, thevoice recognition unit 13 determines whether or not an utterance voice from the user has been input (S1). At this time, thevoice recognition unit 13 determines whether or not an utterance voice is input based on the sound pressure of the input voice and a predetermined threshold. If it is determined that the speech voice from the user has not been input (S1: NO), that is, if a voice having a sound pressure less than a predetermined threshold is input, the process repeats the process of step S1.

一方、ユーザからの発話音声を入力したと判断した場合（Ｓ１：ＹＥＳ）、すなわち、所定の閾値以上の音圧を有する音声が入力した場合、音声認識部１３は、音声認識処理を実行する（Ｓ２）。この音声認識処理において、ユーザからの発話音声は、どの登録語彙に該当するか、並びに、話者は子供、大人、及び高齢者のいずれかであったかが判断されることとなる。 On the other hand, when it is determined that the speech voice from the user is input (S1: YES), that is, when a voice having a sound pressure equal to or higher than a predetermined threshold is input, thevoice recognition unit 13 executes a voice recognition process ( S2). In this speech recognition process, it is determined which registered vocabulary the speech speech from the user corresponds to and whether the speaker is a child, an adult, or an elderly person.

その後、音声認識部１３は、ユーザからの発話音声が登録語彙に該当したか否かを判断する（Ｓ３）。ユーザからの発話音声が登録語彙に該当しなかったと判断した場合（Ｓ３：ＮＯ）、処理はステップＳ１に移行する。一方、ユーザからの発話音声が登録語彙に該当したと判断した場合（Ｓ３：ＹＥＳ）、音声認識部１３は、話者が子供であったか否かを判断する（Ｓ４）。すなわち、音声認識部１３は、子供音声モデルにより発話音声が認識されたか否かを判断することとなる。 After that, thevoice recognition unit 13 determines whether or not the uttered voice from the user corresponds to the registered vocabulary (S3). If it is determined that the uttered voice from the user does not correspond to the registered vocabulary (S3: NO), the process proceeds to step S1. On the other hand, when it is determined that the utterance voice from the user corresponds to the registered vocabulary (S3: YES), thevoice recognition unit 13 determines whether or not the speaker is a child (S4). That is, thevoice recognition unit 13 determines whether or not the utterance voice is recognized by the child voice model.

子供音声モデルにより発話音声が認識された場合（Ｓ４：ＹＥＳ）、制御機器制御部１４は、子供向けの制御を行う（Ｓ６）。例えば、「ロック解除」の登録語彙が認識されたとしても、ロックを解除する旨の制御信号を制御機器２０に出力することなく、子供にとって適切な制御を行う。その後、図５に示す処理は終了する。 When the speech voice is recognized by the child voice model (S4: YES), the controldevice control unit 14 performs control for the child (S6). For example, even if a registered vocabulary of “unlock” is recognized, control appropriate for the child is performed without outputting a control signal for unlocking to thecontrol device 20. Thereafter, the process shown in FIG. 5 ends.

一方、子供音声モデルにより発話音声が認識されなかった場合（Ｓ４：ＮＯ）、音声認識部１３は、話者が高齢者であったか否かを判断する（Ｓ５）。すなわち、音声認識部１３は、高齢者音声モデルにより発話音声が認識されたか否かを判断することとなる。 On the other hand, when the utterance voice is not recognized by the child voice model (S4: NO), thevoice recognition unit 13 determines whether or not the speaker is an elderly person (S5). That is, thevoice recognition unit 13 determines whether or not the speech voice is recognized by the elderly person voice model.

高齢者音声モデルにより発話音声が認識された場合（Ｓ５：ＹＥＳ）、制御機器制御部１４は、高齢者向けの制御を行う（Ｓ６）。例えば、「テレビ電源オン」の登録語彙が認識された場合に、テレビ２３の電源をオンするだけでなく、字幕等の文字を大きく表示するなど、テレビ２３に出力する制御信号の内容を変更して、高齢者にとって適切な制御を行う。その後、図５に示す処理は終了する。 When the speech voice is recognized by the elderly person voice model (S5: YES), the controldevice control unit 14 performs control for the elderly person (S6). For example, when the registered vocabulary “TV power on” is recognized, not only the power of thetelevision 23 is turned on, but also the content of the control signal output to thetelevision 23 is changed, such as displaying characters such as subtitles large. Appropriate control for the elderly. Thereafter, the process shown in FIG. 5 ends.

一方、高齢者音声モデルにより発話音声が認識されなかった場合（Ｓ５：ＮＯ）、登録語彙は、大人音声モデル又は一般音声モデルによって認識されたこととなる。このため、制御機器制御部１４は、通常の制御を行う（Ｓ７）。その後、図５に示す処理は終了する。 On the other hand, when the utterance speech is not recognized by the elderly speech model (S5: NO), the registered vocabulary is recognized by the adult speech model or the general speech model. For this reason, the controlapparatus control part 14 performs normal control (S7). Thereafter, the process shown in FIG. 5 ends.

このようにして、本実施形態に係る音声認識装置１０によれば、複数の音声モデルに基づいて話者を識別し、登録語彙が特定の話者によって発話された場合と、他の話者による発話であると識別された場合とで制御機器２０への制御内容を異ならせる。このため、話者に応じて制御信号を異ならせることとなり、話者に応じて制御機器２０を制御することができ、制御機器２０を話者に応じて適切に制御することができる。さらに、複数の音声モデルにより話者を識別する手法は、従来の音声認識装置に対して処理量の大幅な増加やハードウェアの追加をする必要性がない。従って、処理負荷やコストの大幅な増大を招くことなく、制御機器２０をより適切に制御することができる。 In this way, according to thespeech recognition apparatus 10 according to the present embodiment, a speaker is identified based on a plurality of speech models, and when a registered vocabulary is spoken by a specific speaker and by another speaker The content of control to thecontrol device 20 is different depending on whether the speech is identified. For this reason, a control signal will be varied according to a speaker, thecontrol apparatus 20 can be controlled according to a speaker, and thecontrol apparatus 20 can be appropriately controlled according to a speaker. Furthermore, the method for identifying a speaker using a plurality of speech models does not require a significant increase in processing amount or addition of hardware compared to a conventional speech recognition apparatus. Therefore, thecontrol device 20 can be more appropriately controlled without causing a significant increase in processing load and cost.

また、子供音声モデルを記憶し、登録語彙が子供によって発話されたと識別した場合と、他の話者によって発話されたと識別した場合とで、制御機器２０への制御内容を異ならせる。このため、子供用の制御を行うことが可能となり、制御機器２０がテレビ２３であってテレビ２３の電源をオンする登録語彙が認識された場合に子供向けアニメ等の番組をつけることなどができる。従って、例えば、制御機器２０の操作が不慣れな子供に対して利便性を向上させることができる。 Also, the child voice model is stored, and the control content to thecontrol device 20 is different depending on whether the registered vocabulary is identified as being uttered by a child or when it is identified as being uttered by another speaker. Therefore, it becomes possible to perform control for children, and when thecontrol device 20 is thetelevision 23 and the registered vocabulary for turning on thetelevision 23 is recognized, a program such as an animation for children can be added. . Therefore, for example, it is possible to improve convenience for a child who is unfamiliar with the operation of thecontrol device 20.

また、登録語彙が子供によって発話されたと識別された場合に制御信号の出力をせず、登録語彙が子供を除く他の話者によって発話されたと識別された場合に制御信号を出力する。このため、制御機器２０が浴室装置２１である場合に湯温の設定を禁止したり、制御機器２０がパーソナルコンピュータ等のインターネット接続が可能な機器である場合に成人向けコンテンツへの接続を禁止したりなど、不用意に制御機器２０を動かしてしまいがちな子供に対して制御機器２０をより適切に制御することができる。 Further, when the registered vocabulary is identified as uttered by a child, no control signal is output, and when the registered vocabulary is identified as uttered by another speaker other than the child, a control signal is output. For this reason, when thecontrol device 20 is thebathroom device 21, setting of the hot water temperature is prohibited, and when thecontrol device 20 is a device such as a personal computer that can be connected to the Internet, connection to adult content is prohibited. For example, thecontrol device 20 can be more appropriately controlled for a child who tends to move thecontrol device 20 carelessly.

また、ロック解除語彙を登録語彙として記憶し、ロック解除語彙が子供によって発話された識別した場合、子供による制御機器２０の制御を許可しない。このため、子供によってチャイルドロックが解除されてしまい、子供にとって不適切な制御内容が制御可能となってしまう事態を防止することができる。 Further, when the unlocked vocabulary is stored as a registered vocabulary and the unlocked vocabulary is identified as uttered by a child, the control of thecontrol device 20 by the child is not permitted. For this reason, it is possible to prevent a situation in which the child lock is released by the child and control contents inappropriate for the child can be controlled.

また、高齢者音声モデルを記憶し、登録語彙が高齢者によって発話された識別した場合、制御機器２０に対して、予め登録された高齢者向けの制御をする。このため、制御機器２０がパーソナルコンピュータ等の文字を表示するものであって、高齢者からの発話により電源がオンされた場合に、文字を大きく表示することや、制御機器２０がマッサージ機である場合に、強くマッサージし過ぎないようにすることができる。従って、高齢者向けの制御が可能となって、制御機器２０をより適切に制御することができる。 In addition, when the elderly speech model is stored and the registered vocabulary is identified as spoken by the elderly, thecontrol device 20 is controlled for the elderly registered in advance. For this reason, thecontrol device 20 displays characters of a personal computer or the like, and when the power is turned on by an utterance from an elderly person, the characters are displayed large, or thecontrol device 20 is a massage machine. In some cases, you can avoid over-massaging. Therefore, control for elderly people is possible, and thecontrol device 20 can be controlled more appropriately.

また、特定の登録語彙に対してのみ、複数の音声モデルを記憶しているため、話者を識別する必要がない語彙について、話者の識別処理を省略して処理負荷の軽減を図ることができる。 In addition, since a plurality of speech models are stored only for a specific registered vocabulary, it is possible to reduce the processing load by omitting the speaker identification process for a vocabulary that does not need to identify the speaker. it can.

以上、本発明に係る音声認識装置を実施形態に基づいて説明したが、本発明はこれに限定されるものではなく、本発明の趣旨を逸脱しない範囲で、変更を加えてもよい。 As described above, the speech recognition apparatus according to the present invention has been described based on the embodiment. However, the present invention is not limited to this, and modifications may be made without departing from the spirit of the present invention.

例えば、本実施形態では、テレビ２３の字幕を大きくすることを高齢者向けの制御の一例として説明したが、高齢者向けの制御はこれに限られるものではない。例えば、「テレビ電源オン」の登録語彙を高齢者音声モデルにより認識した場合、テレビ２３の電源をオンすることにあわせて、テレビ２３の音量を大きくするようにしてもよい。また、制御機器２０がパーソナルコンピュータである場合にも同様に、文字等のフォントを大きくしたり、音量を大きくしたりしてもよい。また、制御機器２０が、パーソナルコンピュータ等の音声ガイダンスを流す機器である場合、ガイダンス音声を比較的ゆっくりと流すようにしてもよい。さらに、制御機器２０がマッサージ機である場合、強くマッサージし過ぎないようにしてもよい。 For example, in this embodiment, increasing the subtitles of thetelevision 23 has been described as an example of control for the elderly, but control for the elderly is not limited to this. For example, when the registered vocabulary “TV power on” is recognized by the elderly voice model, the volume of thetelevision 23 may be increased in accordance with the power on of thetelevision 23. Similarly, when thecontrol device 20 is a personal computer, the font of characters or the like may be increased or the volume may be increased. When thecontrol device 20 is a device that plays voice guidance, such as a personal computer, the guidance sound may be played relatively slowly. Further, when thecontrol device 20 is a massage machine, it may be possible not to massage too much.

また、本実施形態では、「ロック解除」を子供音声モデルにより認識した場合、ロックを解除しないこと等を、子供向けの制御として説明したが、子供向けの制御は、これに限られるものではない。例えば、テレビ２３やパーソナルコンピュータ等によって、子供が成人向けの番組やコンテンツを視聴しようとした場合に、視聴を禁止するように制御を行ってもよいし、子供がウェブブラウザや動画検索サイトにおいて検索を行った場合に、検索結果から成人向けコンテンツを取り除くフィルタを機能させるようにしてもよい。 Further, in the present embodiment, when “unlock” is recognized by the child voice model, the description that the lock is not released is described as the control for the child, but the control for the child is not limited to this. . For example, when a child tries to watch an adult-oriented program or content on thetelevision 23 or a personal computer, the control may be performed so that the viewing is prohibited, or the child searches on a web browser or a video search site. When performing the above, a filter for removing adult-oriented content from the search result may be made to function.

また、本実施形態では、大人、子供及び高齢者を識別する例を説明したが、これに限らず、例えば男性や女性、ＡさんやＢさんなどの特定人を識別するようにされていてもよい。この場合、制御機器２０を男性向けにカスタマイズしたり、女性用にカスタマイズしたりすることができる。同様に、制御機器２０が音響機器であって、Ａさんの発話により電源がオンされた場合に、Ａさんが好み重低音を大きくすることなどできる。 In the present embodiment, an example of identifying an adult, a child, and an elderly person has been described. However, the present invention is not limited to this. For example, a specific person such as a male or female, Mr. A or Mr. B may be identified. Good. In this case, thecontrol device 20 can be customized for men or customized for women. Similarly, when thecontrol device 20 is an acoustic device and the power is turned on by Mr. A's utterance, Mr. A can increase the preference bass.

また、本実施形態では、操作ボタン１１ｂを操作することにより音声入力モードとボタン操作入力モードと切り替え可能となっているが、これに限らず、音声入力モード中には、発話によりボタン操作入力モードへ移行させるようにしてもよい。 Further, in the present embodiment, the voice input mode and the button operation input mode can be switched by operating theoperation button 11b. However, the present invention is not limited to this, and during the voice input mode, the button operation input mode is set by utterance. You may make it shift to.

本発明の実施形態に係る音声認識装置を含む音声認識システムを示す構成図である。It is a block diagram which shows the speech recognition system containing the speech recognition apparatus which concerns on embodiment of this invention.図１に示した音声認識装置の設置例を示す外観図である。It is an external view which shows the example of installation of the speech recognition apparatus shown in FIG.図１に示したコントローラの詳細を示す正面図である。It is a front view which shows the detail of the controller shown in FIG.図１に示した記憶部の記憶内容の一例を示す概念図である。It is a conceptual diagram which shows an example of the memory content of the memory | storage part shown in FIG.図１に示した音声認識装置の動作の詳細を示すフローチャートである。It is a flowchart which shows the detail of operation | movement of the speech recognition apparatus shown in FIG.

符号の説明Explanation of symbols

１音声認識システム
１０音声認識装置
１１コントローラ
１１ａ音声入力部
１１ｂ操作ボタン
１１ｃ表示部
１１ｄＬＥＤランプ
１２記憶部
１２ａ登録語彙記憶部
１２ｂ音声モデル記憶部
１３音声認識部
１４制御機器制御部
１５装置制御部
２０制御機器
２１浴室装置
２２換気扇
２３テレビDESCRIPTION OFSYMBOLS 1Voice recognition system 10Voice recognition apparatus 11Controller 11aVoice input part11b Operation button11c Display part11d LED lamp 12 Memory |storage part 12a Registered vocabulary memory | storage part 12b Voice model memory |storage part 13Voice recognition part 14 Controlapparatus control part 15Device control part 20Control equipment 21Bathroom equipment 22Ventilation fan 23 Television

Claims

Translated fromJapanese

前記音声モデル記憶手段は、前記登録語彙が子供によって発話されたことを認識する子供音声モデルを記憶し、
前記制御手段は、前記話者識別手段により前記登録語彙が子供によって発話されたと識別された場合、前記話者識別手段により前記登録語彙が子供を除く他の話者によって発話されたと識別された場合とで、前記登録語彙に基づく制御機器への制御内容を異ならせる
ことを特徴とする請求項１に記載の音声認識装置。The voice model storage means stores a child voice model for recognizing that the registered vocabulary is spoken by a child,
The control means, when the registered vocabulary is identified as uttered by a child by the speaker identifying means, and the registered vocabulary is identified as uttered by another speaker other than the child by the speaker identifying means The speech recognition apparatus according to claim 1, wherein the control contents to the control device based on the registered vocabulary are different.

前記制御手段は、前記話者識別手段により前記登録語彙が子供によって発話されたと識別された場合、制御信号の出力をせず、前記話者識別手段により前記登録語彙が子供を除く他の話者によって発話されたと識別された場合、制御信号を出力する
ことを特徴とする請求項２に記載の音声認識装置。The control means does not output a control signal when the registered vocabulary is identified as being uttered by a child by the speaker identifying means, and the registered vocabulary other than the child is excluded by the speaker identifying means. The speech recognition apparatus according to claim 2, wherein a control signal is output when it is identified that the voice is spoken.

前記登録語彙記憶手段は、子供による制御機器の制御を許可するロック解除語彙を前記登録語彙として記憶し、
前記制御手段は、前記話者識別手段により前記ロック解除語彙が子供によって発話されたと識別された場合、子供による制御機器の制御を許可せず、前記話者識別手段により前記登録語彙が子供を除く他の話者によって発話されたと識別された場合、子供による制御機器の制御を許可する
ことを特徴とする請求項３に記載の音声認識装置。The registered vocabulary storage means stores, as the registered vocabulary, an unlocked vocabulary that permits control of a control device by a child.
The control means does not allow the child to control the control device when the unlocking vocabulary is identified by the child by the speaker identification means, and the registered vocabulary excludes the child by the speaker identification means. The voice recognition device according to claim 3, wherein when it is identified that the voice is spoken by another speaker, control of the control device by the child is permitted.

前記音声モデル記憶手段は、前記登録語彙が高齢者によって発話されたことを認識する高齢者音声モデルを記憶し、
前記制御手段は、前記話者識別手段により前記登録語彙が高齢者によって発話されたと識別された場合、前記制御機器に対して予め登録された高齢者向けの制御をし、前記話者識別手段により前記登録語彙が高齢者を除く他の話者によって発話されたと識別された場合、前記制御機器に対して前記高齢者向けの制御を行わない
ことを特徴とする請求項１から請求項４のいずれか１項に記載の音声認識装置。The speech model storage means stores an elderly speech model that recognizes that the registered vocabulary is spoken by an elderly person,
When the registered vocabulary is identified by an elderly person by the speaker identifying means, the control means performs control for the elderly registered in advance with respect to the control device, and the speaker identifying means The control for the elderly person is not performed on the control device when the registered vocabulary is identified as being spoken by another speaker other than the elderly person. The speech recognition apparatus according to claim 1.

前記音声モデル記憶手段は、特定の登録語彙に対してのみ、前記複数の音声モデルを記憶している
ことを特徴とする請求項１から請求項５のいずれか１項に記載の音声認識装置。The speech recognition apparatus according to claim 1, wherein the speech model storage unit stores the plurality of speech models only for a specific registered vocabulary.