JP2019197236A

Movatterモバイル変換

Info

Publication number: JP2019197236A
Application number: JP2019148071A
Authority: JP
Inventors: 達也北村; Tatsuya Kitamura
Original assignee: Konan University
Current assignee: Konan University
Priority date: 2019-08-09
Filing date: 2019-08-09
Publication date: 2019-11-14
Anticipated expiration: 2039-08-09
Also published as: JP7381054B2

Abstract

【課題】より効果的に被訓練者の発話訓練を行なうことが可能な発話訓練システム、発話訓練方法及びプログラムを提供する。【解決手段】発話訓練システムは、被訓練者の発話訓練に用いられる。発話訓練システムは、撮像手段と、表示手段とを備える。撮像手段は、被訓練者の顔を撮像し、動画像データを生成する。表示手段は、動画像データが示す動画像を表示する。表示手段は、被訓練者の口の動きの量を示す画像を動画像に重畳して表示する。この発話訓練システムによれば、被訓練者が口の周りの筋肉を大きく動かすことを意識して発話訓練を行なうことができるため、より効果的に被訓練者の発話訓練を行なうことができる。【選択図】図９PROBLEM TO BE SOLVED: To provide an utterance training system, an utterance training method and a program capable of more effectively utterance training of a trainee. A speech training system is used for speech training of a trainee. The speech training system includes an imaging unit and a display unit. The imaging means images the trainee's face and generates moving image data. The display means displays the moving image represented by the moving image data. The display means superimposes and displays an image showing the amount of movement of the trainee's mouth on the moving image. According to this utterance training system, the trainee can perform the utterance training while being aware of the fact that the trainee largely moves the muscles around the mouth, so that the trainee can more effectively perform the utterance training. [Selection diagram] Fig. 9

Description

Translated fromJapanese

本発明は、発話訓練システム、発話訓練方法及びプログラムに関する。 The present invention relates to an utterance training system, an utterance training method, and a program.

特開平７−３１９３８０号公報（特許文献１）は、発声訓練装置を開示する。この発声訓練装置においては、被訓練者の発声の調音法とモデル発声の調音法とのずれに基づく指示文が被訓練者にフィードバックされる。この発声訓練装置によれば、被訓練者は、指示文に従って訓練を進めることで調音法の矯正を効果的に行なうことができる（特許文献１参照）。 Japanese Patent Laid-Open No. 7-319380 (Patent Document 1) discloses an utterance training apparatus. In this utterance training apparatus, an instruction sentence based on the difference between the articulation method of the trainee's utterance and the articulation method of the model utterance is fed back to the trainee. According to this utterance training apparatus, the trainee can effectively correct the articulation method by proceeding with the training according to the instructions (see Patent Document 1).

特開平７−３１９３８０号公報JP 7-319380 A

上記特許文献１に開示されている発声訓練装置においては、被訓練者が発声した音声に基づいて被訓練者へのフィードバックが行なわれている。しかしながら、被訓練者が発声した音声に基づいたフィードバックのみでは、必ずしも被訓練者の発話訓練が効果的に行なわれないことを本発明者は見出した。 In the utterance training apparatus disclosed in Patent Document 1, feedback to the trainee is performed based on the voice uttered by the trainee. However, the present inventor has found that the utterance training of the trainee is not always effectively performed only by feedback based on the voice uttered by the trainee.

本発明は、このような問題を解決するためになされたものであって、その目的は、より効果的に被訓練者の発話訓練を行なうことが可能な発話訓練システム、発話訓練方法及びプログラムを提供することである。 The present invention has been made to solve such problems, and an object of the present invention is to provide an utterance training system, an utterance training method, and a program capable of performing utterance training of a trainee more effectively. Is to provide.

本発明のある局面に従う発話訓練システムは、被訓練者の発話訓練に用いられる。発話訓練システムは、撮像手段と、表示手段とを備える。撮像手段は、被訓練者の顔を撮像し、動画像データを生成する。表示手段は、動画像データが示す動画像を表示する。表示手段は、被訓練者の口の動きの量を示す画像を動画像に重畳して表示する。 An utterance training system according to an aspect of the present invention is used for utterance training of a trainee. The speech training system includes an imaging unit and a display unit. The imaging means images the face of the trainee and generates moving image data. The display means displays the moving image indicated by the moving image data. The display means superimposes and displays an image indicating the amount of movement of the trainee's mouth on the moving image.

本発明者は、口の周りの筋肉を大きく動かすことを意識して発話訓練を行なうと、音声器官の可動域が広がり、被訓練者が発する音声の明瞭性が向上することを見出した。この発話訓練システムによれば、被訓練者の口の動きの量を示す画像が動画像に重畳して表示されるため、口の動きが不十分か否かを被訓練者に視覚的に認識させることができる。その結果、この発話訓練システムによれば、被訓練者が口の周りの筋肉を大きく動かすことを意識して発話訓練を行なうことができるため、より効果的に被訓練者の発話訓練を行なうことができる。 The present inventor has found that when speech training is performed with the intention of greatly moving muscles around the mouth, the range of motion of the speech organs is expanded and the clarity of speech uttered by the trainee is improved. According to this speech training system, an image showing the amount of movement of the trainee's mouth is displayed superimposed on the moving image, so that the trainee can visually recognize whether the mouth movement is insufficient. Can be made. As a result, according to this speech training system, the trainee can perform speech training with the awareness that the trainee greatly moves the muscles around the mouth, so that the trainee can perform speech training more effectively. Can do.

上記発話訓練システムにおいて、表示手段は、被訓練者に音読させる文章をさらに表示してもよい。 In the utterance training system, the display means may further display a sentence that causes the trainee to read aloud.

この発話訓練システムによれば、被訓練者に音読させる文章が表示されるため、被訓練者は、表示される文章を音読するだけで発話訓練を行なうことができる。 According to this utterance training system, since the text to be read aloud by the trainee is displayed, the trainee can perform the utterance training simply by reading the displayed text aloud.

上記発話訓練システムにおいて、表示手段は、被訓練者の発話に関する評価結果をさらに表示してもよい。 In the utterance training system, the display means may further display an evaluation result relating to the trainee's utterance.

この発話訓練システムによれば、被訓練者の発話に関する評価結果が表示されるため、被訓練者は、評価結果を確認しながら発話訓練を行なうことができる。 According to this utterance training system, since the evaluation result regarding the utterance of the trainee is displayed, the trainee can perform utterance training while confirming the evaluation result.

上記発話訓練システムにおいて、表示手段は、被訓練者の発話が所定要件を満たさない場合に、警告メッセージをさらに表示してもよい。 In the utterance training system, the display means may further display a warning message when the utterance of the trainee does not satisfy the predetermined requirement.

この発話訓練システムによれば、被訓練者の発話が所定要件を満たさない場合に警告メッセージが表示されるため、被訓練者は、自らの発話が所定要件を満たしていないことを視覚的に認識することができる。 According to this utterance training system, a warning message is displayed when the utterance of the trainee does not meet the predetermined requirement, so that the trainee visually recognizes that his utterance does not meet the predetermined requirement. can do.

上記発話訓練システムにおいて、口の動きの量を示す画像は、口が移動した軌跡を示す線であってもよい。 In the utterance training system, the image indicating the amount of mouth movement may be a line indicating a locus of movement of the mouth.

上記発話訓練システムは、動画像データに基づいてオプティカルフローを算出する算出手段と、オプティカルフローに基づいて口の動きの量を示す画像を生成する生成手段とをさらに備えてもよい。 The speech training system may further include a calculation unit that calculates an optical flow based on moving image data, and a generation unit that generates an image indicating the amount of mouth movement based on the optical flow.

本発明の別の局面に従う発話訓練方法は、発話に関して被訓練者を訓練する。発話訓練方法は、被訓練者の顔を撮像し、動画像データを生成するステップと、動画像データが示す動画像を表示するステップと、被訓練者の口の動きの量を示す画像を動画像に重畳して表示するステップとを含む。 The speech training method according to another aspect of the present invention trains a trainee regarding speech. The speech training method includes a step of imaging a trainee's face, generating moving image data, a step of displaying a moving image indicated by the moving image data, and an image showing an amount of movement of the trainee's mouth And displaying the image superimposed on the image.

この発話訓練方法によれば、被訓練者の口の動きの量を示す画像が動画像に重畳して表示されるため、口の動きが不十分か否かを被訓練者に視覚的に認識させることができる。その結果、この発話訓練方法によれば、より効果的に被訓練者の発話訓練を行なうことができる。 According to this speech training method, an image showing the amount of movement of the trainee's mouth is displayed superimposed on the moving image, so that the trainee can visually recognize whether the mouth movement is insufficient. Can be made. As a result, according to the speech training method, the trainee can be trained more effectively.

本発明の別の局面に従うプログラムは、被訓練者の発話訓練に用いられる。プログラムは、撮像手段に、被訓練者の顔を撮像させ、動画像データを生成させるステップと、表示手段に、動画像データが示す動画像を表示させるステップと、表示手段に、被訓練者の口の動きの量を示す画像を動画像に重畳して表示させるステップとをコンピュータに実行させる。 A program according to another aspect of the present invention is used for utterance training of a trainee. The program causes the imaging means to image the face of the trainee and generate moving image data, causes the display means to display the moving image indicated by the moving image data, and causes the display means to display the trainee's face. Causing the computer to execute a step of superimposing and displaying an image indicating the amount of mouth movement on the moving image.

このプログラムがコンピュータによって実行されると、被訓練者の口の動きの量を示す画像が動画像に重畳して表示されるため、口の動きが不十分か否かを被訓練者に視覚的に認識させることができる。その結果、このプログラムによれば、より効果的に被訓練者の発話訓練を行なうことができる。 When this program is executed by a computer, an image showing the amount of movement of the trainee's mouth is superimposed on the moving image and displayed to the trainee whether or not mouth movement is insufficient. Can be recognized. As a result, according to this program, the trainee can be trained more effectively.

本発明によれば、より効果的に被訓練者の発話訓練を行なうことが可能な発話訓練システム、発話訓練方法及びプログラムを提供することができる。 According to the present invention, it is possible to provide an utterance training system, an utterance training method, and a program capable of performing utterance training of a trainee more effectively.

スマートフォンを用いた発話訓練風景の一例を示す図である。It is a figure which shows an example of the speech training scenery using a smart phone.スマートフォンのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of a smart phone.制御部によって実現される各ソフトウェアモジュールの関係の一例を示す図である。It is a figure which shows an example of the relationship of each software module implement | achieved by the control part.動画表示処理の実行手順を示すフローチャートである。It is a flowchart which shows the execution procedure of a moving image display process.ディスプレイに表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on a display.オプティカルフロー表示処理の実行手順を示すフローチャートである。It is a flowchart which shows the execution procedure of an optical flow display process.筋活動量表示処理の実行手順を示すフローチャートである。It is a flowchart which shows the execution procedure of a muscle activity amount display process.音声特徴量表示処理の実行手順を示すフローチャートである。It is a flowchart which shows the execution procedure of an audio | voice feature-value display process.警告メッセージ表示処理の実行手順を示すフローチャートである。It is a flowchart which shows the execution procedure of a warning message display process.ディスプレイに表示される画像の一例を示す図である。It is a figure which shows an example of the image displayed on a display.訓練前後に録音した音声の振幅を示す図である。It is a figure which shows the amplitude of the audio | voice recorded before and after training.訓練前後に録音した音声の基本周波数の変化幅を示す図である。It is a figure which shows the change width of the fundamental frequency of the audio | voice recorded before and after training.訓練前後に計測したＶＡＳを示す図である。It is a figure which shows VAS measured before and after training.

以下、本発明の実施の形態について、図面を参照しながら詳細に説明する。なお、図中同一又は相当部分には同一符号を付してその説明は繰り返さない。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the drawings, the same or corresponding portions are denoted by the same reference numerals and description thereof will not be repeated.

［１．概要］
本発明者の調査によって、健常者である大学生及び大学院生の約３割が発話のしにくさを自覚していることが分かった。本発明者が種々の発話訓練方法を試したところ、被訓練者が細い棒を咥えた状態で発話訓練を行なうことによって、高い訓練効果が得られる可能性があることが分かった。特に、本発明者は、発話訓練時に、被訓練者が大きい声を出すとともに顔面の筋肉をしっかりと動かすことによって高い訓練効果が得られることを見出した。[1. Overview]
According to the investigation by the present inventor, it was found that about 30% of university students and graduate students who are healthy persons are aware of difficulty in speaking. When the present inventor tried various utterance training methods, it was found that there is a possibility that a high training effect may be obtained by performing utterance training in a state where the trainee holds a thin stick. In particular, the present inventor has found that a high training effect can be obtained when the trainee speaks loudly and moves facial muscles firmly during speech training.

図１は、本実施の形態に従うスマートフォン１００を用いた発話訓練風景の一例を示す図である。図１に示されるように、発話訓練において、被訓練者１０は、棒２０を咥えた状態で発声する。被訓練者１０は、スマートフォン１００に表示される画像を見ながら発話訓練を行なう。詳細については後述するが、スマートフォン１００には、大きい声を出すとともに顔面の筋肉をしっかりと動かすように被訓練者１０に促す画像が表示される。以下、スマートフォン１００の詳細について説明する。 FIG. 1 is a diagram showing an example of an utterance training scene using the smartphone 100 according to the present embodiment. As shown in FIG. 1, in the utterance training, the trainee 10 speaks while holding the stick 20. The trainee 10 performs speech training while viewing an image displayed on the smartphone 100. Although details will be described later, the smartphone 100 displays an image that prompts the trainee 10 to make a loud voice and move the facial muscles firmly. Hereinafter, the details of the smartphone 100 will be described.

［２．ハードウェア構成］
図２は、スマートフォン１００のハードウェア構成の一例を示す図である。図２に示されるように、スマートフォン１００は、カメラ１３０と、ディスプレイ１４０と、マイク１５０と、スピーカ１６０と、制御部１７０と、記憶部１８０と、通信モジュール１９０とを含んでいる。スマートフォン１００に含まれる各構成要素は、バスを介して電気的に接続されている。[2. Hardware configuration]
FIG. 2 is a diagram illustrating an example of a hardware configuration of the smartphone 100. As shown in FIG. 2, the smartphone 100 includes a camera 130, a display 140, a microphone 150, a speaker 160, a control unit 170, a storage unit 180, and a communication module 190. Each component included in the smartphone 100 is electrically connected via a bus.

カメラ１３０は、被写体像を撮像し、画像データを生成するように構成されている。カメラ１３０は、たとえば、被訓練者１０（図１）を撮像し、動画像データを生成する。カメラ１３０は、たとえば、ＣＭＯＳイメージセンサ又はＣＣＤイメージセンサ等のイメージセンサを含んでいる。 The camera 130 is configured to capture a subject image and generate image data. For example, the camera 130 images the trainee 10 (FIG. 1) and generates moving image data. The camera 130 includes an image sensor such as a CMOS image sensor or a CCD image sensor.

ディスプレイ１４０は、画像を表示するように構成されている。ディスプレイ１４０は、たとえば、カメラ１３０によって生成された動画像データが示す動画像を表示する。ディスプレイ１４０は、たとえば、液晶ディスプレイ又は有機ＥＬディスプレイ等のディスプレイによって構成される。 The display 140 is configured to display an image. For example, the display 140 displays a moving image indicated by moving image data generated by the camera 130. The display 140 is configured by a display such as a liquid crystal display or an organic EL display, for example.

マイク１５０は、マイク１５０の周囲の音に基づいて音声データを生成するように構成されている。マイク１５０は、たとえば、被訓練者１０が発した声に基づいて音声データを生成する。 The microphone 150 is configured to generate audio data based on sounds around the microphone 150. For example, the microphone 150 generates voice data based on a voice uttered by the trainee 10.

スピーカ１６０は、音声データが示す音声を出力するように構成されている。スピーカ１６０は、たとえば、被訓練者１０の声に基づいて生成された音声データが示す音を出力する。 The speaker 160 is configured to output sound indicated by the sound data. The speaker 160 outputs a sound indicated by voice data generated based on the voice of the trainee 10, for example.

制御部１７０は、ＣＰＵ（Central Processing Unit）１７２、ＲＡＭ（Random Access Memory）１７４及びＲＯＭ（Read Only Memory）１７６等を含み、情報処理に応じて各構成要素の制御を行なうように構成されている。 The control unit 170 includes a CPU (Central Processing Unit) 172, a RAM (Random Access Memory) 174, a ROM (Read Only Memory) 176, and the like, and is configured to control each component according to information processing. .

記憶部１８０は、たとえば、フラッシュメモリ等のメモリである。記憶部１８０は、たとえば、制御プログラム１８１を記憶するように構成されている。制御プログラム１８１は、制御部１７０によって実行されるスマートフォン１００の制御プログラムである。制御部１７０が制御プログラム１８１を実行する場合に、制御プログラム１８１は、ＲＡＭ１７４に展開される。そして、制御部１７０は、ＲＡＭ１７４に展開された制御プログラム１８１をＣＰＵ１７２によって解釈及び実行することにより、各構成要素を制御する。 The storage unit 180 is a memory such as a flash memory, for example. The storage unit 180 is configured to store a control program 181, for example. The control program 181 is a control program for the smartphone 100 that is executed by the control unit 170. When the control unit 170 executes the control program 181, the control program 181 is expanded in the RAM 174. Then, the control unit 170 controls each component by interpreting and executing the control program 181 expanded in the RAM 174 by the CPU 172.

通信モジュール１９０は、外部機器と通信するように構成されている。通信モジュール１９０は、たとえば、ＬＴＥ（Long Term Evolution）モジュール、無線ＬＡＮモジュール等で構成される。 The communication module 190 is configured to communicate with an external device. The communication module 190 includes, for example, an LTE (Long Term Evolution) module, a wireless LAN module, and the like.

［３．ソフトウェア構成］
図３は、制御部１７０によって実現される各ソフトウェアモジュールの関係の一例を示す図である。図３に示されるように、顔領域抽出部１３１、画素移動量算出部１３２、顔移動量補正部１３３、筋活動量推定部１３４、第１判定部１３５、音声特徴抽出部１５１及び第２判定部１５２の各々は、ソフトウェアモジュールであり、制御部１７０が制御プログラム１８１を実行することによって実現されている。[3. Software configuration]
FIG. 3 is a diagram illustrating an example of the relationship between the software modules realized by the control unit 170. As shown in FIG. 3, the face area extraction unit 131, the pixel movement amount calculation unit 132, the face movement amount correction unit 133, the muscle activity amount estimation unit 134, the first determination unit 135, the voice feature extraction unit 151, and the second determination Each of the units 152 is a software module, and is realized by the control unit 170 executing the control program 181.

顔領域抽出部１３１は、カメラ１３０によって生成された動画像データに基づいて、被訓練者１０の顔に対応する領域を抽出するように構成されている。顔領域の抽出方法としては、公知の種々の方法が用いられる。 The face area extraction unit 131 is configured to extract an area corresponding to the face of the trainee 10 based on the moving image data generated by the camera 130. Various known methods are used as a method for extracting a face region.

画素移動量算出部１３２は、カメラ１３０によって生成された動画像データに基づいて、各領域のオプティカルフローを算出するように構成されている。オプティカルフローの算出方法としては、公知の種々の方法が用いられる。ここで、各領域は、画像に含まれる各画素によって構成されてもよいし、画像に含まれる複数画素によって構成されてもよい。また、画素移動量算出部１３２は、領域毎に、算出されたオプティカルフローの大きさを示す画像を生成し、生成された画像をディスプレイ１４０に出力する。 The pixel movement amount calculation unit 132 is configured to calculate an optical flow of each region based on moving image data generated by the camera 130. Various known methods are used as a method of calculating the optical flow. Here, each region may be configured by each pixel included in the image, or may be configured by a plurality of pixels included in the image. In addition, the pixel movement amount calculation unit 132 generates an image indicating the calculated optical flow size for each region, and outputs the generated image to the display 140.

顔移動量補正部１３３は、顔領域抽出部１３１によって抽出された顔領域の移動量及び移動方向を算出し、画素移動量算出部１３２によって算出されたオプティカルフローから減算するように構成されている。これにより、顔の移動量を差し引いた、顔面の筋肉の動きを示すオプティカルフローを算出することができる。 The face movement amount correction unit 133 is configured to calculate the movement amount and movement direction of the face region extracted by the face region extraction unit 131 and subtract it from the optical flow calculated by the pixel movement amount calculation unit 132. . Thereby, an optical flow indicating the movement of the facial muscles after subtracting the amount of movement of the face can be calculated.

筋活動量推定部１３４は、各領域のオプティカルフローの大きさの和を算出することによって、被訓練者１０の顔面の筋肉の動き量を推定するように構成されている。すなわち、筋活動量推定部１３４は、被訓練者１０の口の動き量を推定するように構成されている。推定された顔面の筋肉の動き量（各領域のオプティカルフローの大きさの和）は、ディスプレイ１４０に出力される。 The muscle activity amount estimation unit 134 is configured to estimate the amount of muscle movement of the face of the trainee 10 by calculating the sum of the magnitudes of the optical flows of the respective regions. That is, the muscle activity amount estimation unit 134 is configured to estimate the amount of movement of the mouth of the trainee 10. The estimated facial muscle movement (the sum of the magnitudes of the optical flows in each region) is output to the display 140.

第１判定部１３５は、筋活動量推定部１３４によって推定された顔面の筋肉の動き量が第１所定量より小さい状態が所定時間継続したか否かを判定するように構成されている。第１所定量は、顔面の筋肉の動き量がこれよりも小さい場合に期待される発話訓練効果が得られない値である。顔面の筋肉の動き量が第１所定量よりも小さい状態が所定時間継続した場合に、第１警告画像がディスプレイ１４０に出力される。 The first determination unit 135 is configured to determine whether or not the state in which the facial muscle motion amount estimated by the muscle activity amount estimation unit 134 is smaller than the first predetermined amount has continued for a predetermined time. The first predetermined amount is a value at which an expected speech training effect cannot be obtained when the amount of facial muscle movement is smaller than this. A first warning image is output to the display 140 when a state where the amount of movement of the facial muscles is smaller than the first predetermined amount continues for a predetermined time.

音声特徴抽出部１５１は、マイク１５０によって生成された音声データに基づいて、被訓練者１０が発した声の特徴量を抽出するように構成されている。音声特徴抽出部１５１は、たとえば、被訓練者１０が発した声の大きさを抽出する。また、音声特徴抽出部１５１は、抽出された声の大きさを示す画像を生成し、生成された画像をディスプレイ１４０に出力する。 The voice feature extraction unit 151 is configured to extract the feature amount of the voice uttered by the trainee 10 based on the voice data generated by the microphone 150. The voice feature extraction unit 151 extracts, for example, the volume of the voice uttered by the trainee 10. In addition, the voice feature extraction unit 151 generates an image indicating the size of the extracted voice, and outputs the generated image to the display 140.

第２判定部１５２は、音声特徴抽出部１５１によって抽出された声の特徴量が第２所定量より小さい状態が所定時間継続したか否かを判定するように構成されている。第２所定量は、声の特徴量がこれよりも小さい場合に期待される発話訓練効果が得られない値である。声の特徴量が第２所定量よりも小さい状態が所定時間継続した場合に、第２警告画像がディスプレイ１４０に出力される。 The second determination unit 152 is configured to determine whether or not a state in which the voice feature amount extracted by the voice feature extraction unit 151 is smaller than the second predetermined amount has continued for a predetermined time. The second predetermined amount is a value at which the utterance training effect expected when the voice feature amount is smaller than this is not obtained. When the state in which the voice feature amount is smaller than the second predetermined amount continues for a predetermined time, the second warning image is output to the display 140.

［４．動作］
本実施の形態に従うスマートフォン１００においては、制御部１７０によって、動画表示処理、オプティカルフロー表示処理、筋活動量表示処理、音声特徴量表示処理及び警告メッセージ表示処理が並列的に実行されている。以下、各処理について順に説明する。[4. Operation]
In smart phone 100 according to the present embodiment, control unit 170 performs moving image display processing, optical flow display processing, muscle activity amount display processing, audio feature amount display processing, and warning message display processing in parallel. Hereinafter, each process is demonstrated in order.

（４−１．動画表示処理）
図４は、動画表示処理の実行手順を示すフローチャートである。このフローチャートに示される処理は、予め定められた周期で実行される。(4-1. Movie display processing)
FIG. 4 is a flowchart showing the execution procedure of the moving image display process. The processing shown in this flowchart is executed at a predetermined cycle.

図４を参照して、制御部１７０は、被訓練者１０の顔を含む動画像を撮像し動画像データを生成するとともに、被訓練者１０の声を含む音声データを生成するようにカメラ１３０及びマイク１５０をそれぞれ制御する（ステップＳ１００）。制御部１７０は、生成された動画像データに基づいて、動画に含まれる顔領域を抽出する（ステップＳ１１０）。制御部１７０は、被訓練者１０に読ませる文章、抽出された顔領域を囲む枠、及び、動画像データが示す動画を重畳して表示するようにディスプレイ１４０を制御する（ステップＳ１２０）。被訓練者１０に読ませる文章を示すテキストデータは、たとえば、記憶部１８０（図２）に予め記憶されている。 With reference to FIG. 4, the control unit 170 captures a moving image including the face of the trainee 10 and generates moving image data, and also generates a sound data including the voice of the trainee 10. And the microphone 150 are respectively controlled (step S100). The control unit 170 extracts a face area included in the moving image based on the generated moving image data (step S110). The control unit 170 controls the display 140 to superimpose and display the text to be read by the trainee 10, the frame surrounding the extracted face area, and the moving image indicated by the moving image data (step S120). Text data indicating a sentence to be read by the trainee 10 is stored in advance in the storage unit 180 (FIG. 2), for example.

図５は、ディスプレイ１４０に表示される画像の一例を示す図である。図５に示されるように、ディスプレイ１４０には、被訓練者１０を含む動画、被訓練者１０の顔領域を囲む顔枠２００、及び、被訓練者１０に読ませる文章２１０が表示されている。スマートフォン１００によれば、被訓練者１０に音読させる文章２１０がディスプレイ１４０に表示されるため、被訓練者１０は、表示される文章を音読するだけで発話訓練を行なうことができる。 FIG. 5 is a diagram illustrating an example of an image displayed on the display 140. As shown in FIG. 5, the display 140 displays a moving image including the trainee 10, a face frame 200 that surrounds the face area of the trainee 10, and a sentence 210 that can be read by the trainee 10. . According to the smartphone 100, since the sentence 210 that the trainee 10 reads aloud is displayed on the display 140, the trainee 10 can perform speech training only by reading the displayed sentence aloud.

（４−２．オプティカルフロー表示処理）
図６は、オプティカルフロー表示処理の実行手順を示すフローチャートである。このフローチャートに示される処理は、予め定められた周期で実行される。(4-2. Optical flow display processing)
FIG. 6 is a flowchart showing the execution procedure of the optical flow display process. The processing shown in this flowchart is executed at a predetermined cycle.

図６を参照して、制御部１７０は、動画表示処理において生成された動画像データに基づいて、各領域のオプティカルフローを算出する（ステップＳ２００）。制御部１７０は、領域毎に、オプティカルフローの大きさ及び方向を示す画像を生成する（ステップＳ２１０）。制御部１７０は、生成された画像を動画に重畳表示するようにディスプレイ１４０を制御する（ステップＳ２２０）。 Referring to FIG. 6, control unit 170 calculates an optical flow for each region based on moving image data generated in the moving image display process (step S200). The control unit 170 generates an image indicating the magnitude and direction of the optical flow for each region (step S210). The control unit 170 controls the display 140 so as to superimpose and display the generated image on the moving image (step S220).

再び図５を参照して、ディスプレイ１４０においては、被訓練者１０の口が移動した軌跡を示す線２４０（オプティカルフロー）の画像が動画に重畳表示される。スマートフォン１００によれば、被訓練者１０の口の動きの量を示す画像が動画像に重畳して表示されるため、口の動きが不十分か否かを被訓練者１０に視覚的に認識させることができる。その結果、スマートフォン１００によれば、被訓練者１０が口の周りの筋肉を大きく動かすことを意識して発話訓練を行なうことができるため、より効果的に被訓練者１０の発話訓練を行なうことができる。 Referring to FIG. 5 again, on the display 140, an image of a line 240 (optical flow) indicating the locus of movement of the mouth of the trainee 10 is superimposed on the moving image. According to the smartphone 100, an image indicating the amount of mouth movement of the trainee 10 is displayed superimposed on the moving image, so the trainee 10 visually recognizes whether the mouth movement is insufficient. Can be made. As a result, according to the smartphone 100, the utterance training can be performed with the consciousness that the trainee 10 moves the muscles around the mouth greatly, so that the utterance training of the trainee 10 can be performed more effectively. Can do.

（４−３．筋活動量表示処理）
図７は、筋活動量表示処理の実行手順を示すフローチャートである。このフローチャートに示される処理は、予め定められた周期で実行される。(4-3. Muscle activity amount display processing)
FIG. 7 is a flowchart showing an execution procedure of the muscle activity amount display process. The processing shown in this flowchart is executed at a predetermined cycle.

図７を参照して、制御部１７０は、動画表示処理において生成された動画像データに基づいて、被訓練者１０の顔領域を抽出するとともに、顔領域の動き（大きさ及び方向）を抽出する（ステップＳ３００）。制御部１７０は、オプティカルフロー表示処理において算出されたオプティカルフローから、ステップＳ３００において抽出された顔領域の動きを減算することによって、オプティカルフローの補正を行なう（ステップＳ３１０）。制御部１７０は、各領域の補正後のオプティカルフローの大きさの和を算出することによって、被訓練者１０の顔面の筋肉の動き量（以下、「筋活動量」とも称する。）を推定する（ステップＳ３２０）。制御部１７０は、推定された顔面の筋肉の動き量（各領域のオプティカルフローの大きさの和）を示す画像を生成し、該画像を表示するようにディスプレイ１４０を制御する（ステップＳ３３０）。 With reference to FIG. 7, the control unit 170 extracts the face area of the trainee 10 and extracts the movement (size and direction) of the face area based on the moving image data generated in the moving image display process. (Step S300). The control unit 170 corrects the optical flow by subtracting the movement of the face area extracted in step S300 from the optical flow calculated in the optical flow display process (step S310). The control unit 170 estimates the amount of movement of the muscle of the face of the trainee 10 (hereinafter also referred to as “muscle activity amount”) by calculating the sum of the magnitudes of the optical flows after correction of each region. (Step S320). The control unit 170 generates an image indicating the estimated amount of facial muscle movement (the sum of the magnitudes of the optical flows in each region), and controls the display 140 to display the image (step S330).

再び図５を参照して、ディスプレイ１４０においては、レベルメータ２２０のような顔面の筋肉の動き量を示す画像が動画に重畳表示される。スマートフォン１００によれば、被訓練者１０の発話に関する評価結果（たとえば、口を含む顔面の筋肉の動き量）がディスプレイ１４０に表示されるため、被訓練者１０は、評価結果を確認しながら発話訓練を行なうことができる。 Referring to FIG. 5 again, on display 140, an image showing the amount of facial muscle movement such as level meter 220 is superimposed on the moving image. According to the smartphone 100, the evaluation result (for example, the amount of movement of the facial muscles including the mouth) regarding the utterance of the trainee 10 is displayed on the display 140. Therefore, the trainee 10 speaks while confirming the evaluation result. Training can be performed.

（４−４．音声特徴量表示処理）
図８は、音声特徴量表示処理の実行手順を示すフローチャートである。このフローチャートに示される処理は、予め定められた周期で実行される。(4-4. Voice feature value display processing)
FIG. 8 is a flowchart showing an execution procedure of the audio feature amount display process. The processing shown in this flowchart is executed at a predetermined cycle.

図８を参照して、制御部１７０は、動画表示処理において生成された音声データに基づいて、被訓練者１０の声の特徴量（たとえば、大きさ）を抽出する（ステップＳ４００）。制御部１７０は、抽出された声の特徴量を示す画像を生成し、該画像を表示するようにディスプレイ１４０を制御する（ステップＳ４１０）。 Referring to FIG. 8, control unit 170 extracts a feature amount (for example, magnitude) of voice of trainee 10 based on the voice data generated in the moving image display process (step S400). The control unit 170 generates an image indicating the extracted voice feature amount, and controls the display 140 to display the image (step S410).

再び図５を参照して、ディスプレイ１４０においては、レベルメータ２３０のような声の特徴量を示す画像が動画に重畳表示される。スマートフォン１００によれば、被訓練者１０の発話に関する評価結果（たとえば、声の大きさ）がディスプレイ１４０に表示されるため、被訓練者１０は、評価結果を確認しながら発話訓練を行なうことができる。 Referring to FIG. 5 again, on display 140, an image showing the voice feature amount such as level meter 230 is superimposed and displayed on the moving image. According to the smartphone 100, since the evaluation result (for example, the loudness of voice) related to the utterance of the trainee 10 is displayed on the display 140, the trainee 10 can perform speech training while confirming the evaluation result. it can.

（４−５．警告メッセージ表示処理）
図９は、警告メッセージ表示処理の実行手順を示すフローチャートである。このフローチャートに示される処理は、予め定められた周期で実行される。(4-5. Warning message display processing)
FIG. 9 is a flowchart showing the execution procedure of the warning message display process. The processing shown in this flowchart is executed at a predetermined cycle.

図９を参照して、制御部１７０は、筋活動量表示処理において推定された筋活動量が第１所定量よりも小さい状態が所定時間継続したか否かを判定する（ステップＳ５００）。筋活動量が第１所定量以上であると判定されると（ステップＳ５００においてＮＯ）、処理はステップＳ５１０に移行する。一方、筋活動量が第１所定量よりも小さい状態が所定時間継続したと判定されると（ステップＳ５００においてＹＥＳ）、制御部１７０は、第１警告画像を表示するようにディスプレイ１４０を制御する（ステップＳ５１０）。 Referring to FIG. 9, control unit 170 determines whether or not the state in which the muscle activity amount estimated in the muscle activity amount display process is smaller than the first predetermined amount has continued for a predetermined time (step S500). If it is determined that the amount of muscle activity is greater than or equal to the first predetermined amount (NO in step S500), the process proceeds to step S510. On the other hand, when it is determined that the state in which the amount of muscle activity is smaller than the first predetermined amount has continued for a predetermined time (YES in step S500), control unit 170 controls display 140 to display the first warning image. (Step S510).

図１０は、ディスプレイ１４０に表示される画像の一例を示す図である。図１０に示されるように、筋活動量が第１所定量よりも小さい状態が所定時間継続した場合には、第１警告画像２５０（「もっと口を動かして！」）がディスプレイ１４０に表示される。スマートフォン１００によれば、被訓練者１０の発話が所定要件を満たさない場合に第１警告画像２５０が表示されるため、被訓練者１０は、自らの発話が所定要件を満たしていないことを視覚的に認識することができる。 FIG. 10 is a diagram illustrating an example of an image displayed on the display 140. As shown in FIG. 10, when the muscle activity amount is smaller than the first predetermined amount for a predetermined time, the first warning image 250 (“Move your mouth more!”) Is displayed on the display 140. The According to the smartphone 100, since the first warning image 250 is displayed when the utterance of the trainee 10 does not satisfy the predetermined requirement, the trainee 10 visually recognizes that his / her utterance does not satisfy the predetermined requirement. Can be recognized.

再び図９を参照して、次に、制御部１７０は、音声特徴量表示処理において抽出された音声特徴量が第２所定量よりも小さい状態が所定時間継続したか否かを判定する（ステップＳ５２０）。音声特徴量が第２所定量以上であると判定されると（ステップＳ５２０においてＮＯ）、処理はステップＳ５００に移行する。一方、音声特徴量が第２所定量よりも小さい状態が所定時間継続したと判定されると（ステップＳ５２０においてＹＥＳ）、制御部１７０は、第２警告画像を表示するようにディスプレイ１４０を制御する（ステップＳ５３０）。 Referring to FIG. 9 again, next, control unit 170 determines whether or not the state in which the speech feature amount extracted in the speech feature amount display process is smaller than the second predetermined amount has continued for a predetermined time (step). S520). If it is determined that the audio feature amount is greater than or equal to the second predetermined amount (NO in step S520), the process proceeds to step S500. On the other hand, when it is determined that the state in which the audio feature amount is smaller than the second predetermined amount has continued for a predetermined time (YES in step S520), control unit 170 controls display 140 to display the second warning image. (Step S530).

再び図１０を参照して、音声特徴量が第２所定量よりも小さい状態が所定時間継続した場合には、第２警告画像２６０（「もっと大きな声で！」）がディスプレイ１４０に表示される。スマートフォン１００によれば、被訓練者１０の発話が所定要件を満たさない場合に第２警告画像２６０が表示されるため、被訓練者１０は、自らの発話が所定要件を満たしていないことを視覚的に認識することができる。 Referring to FIG. 10 again, when the state where the audio feature amount is smaller than the second predetermined amount continues for a predetermined time, a second warning image 260 (“loud louder!”) Is displayed on display 140. . According to the smartphone 100, since the second warning image 260 is displayed when the utterance of the trainee 10 does not satisfy the predetermined requirement, the trainee 10 visually recognizes that his / her utterance does not satisfy the predetermined requirement. Can be recognized.

［５．特徴］
以上のように、本実施の形態に従うスマートフォン１００において、ディスプレイ１４０は、被訓練者１０の口の動きの量を示す画像を動画像に重畳して表示する。スマートフォン１００によれば、被訓練者１０の口の動きの量を示す画像が動画像に重畳して表示されるため、口の動きが不十分か否かを被訓練者１０に視覚的に認識させることができる。その結果、スマートフォン１００によれば、被訓練者１０が口の周りの筋肉を大きく動かすことを意識して発話訓練を行なうことができるため、より効果的に被訓練者１０の発話訓練を行なうことができる。[5. Characteristic]
As described above, in smartphone 100 according to the present embodiment, display 140 displays an image indicating the amount of movement of mouth of trainee 10 superimposed on a moving image. According to the smartphone 100, an image indicating the amount of mouth movement of the trainee 10 is displayed superimposed on the moving image, so the trainee 10 visually recognizes whether the mouth movement is insufficient. Can be made. As a result, according to the smartphone 100, the utterance training can be performed with the consciousness that the trainee 10 moves the muscles around the mouth greatly, so the utterance training of the trainee 10 can be performed more effectively. Can do.

なお、スマートフォン１００は、「発話訓練システム」の一例であり、カメラ１３０は、「撮像手段」の一例であり、ディスプレイ１４０は、「表示手段」の一例である。また、画素移動量算出部１３２は、「算出手段」及び「生成手段」の一例である。 The smartphone 100 is an example of an “utterance training system”, the camera 130 is an example of an “imaging unit”, and the display 140 is an example of a “display unit”. The pixel movement amount calculation unit 132 is an example of “calculation means” and “generation means”.

［６．実験］
本発明者は、以下の実験を行なった。本実験は、防音室で行なわれた。実験に先立ち、実験参加者に実験の説明を行なった。次に、実験参加者に発話訓練の意義を説明し、意欲を持って実験に参加するよう依頼した。声量や話速は、高校の教室で朗読することをイメージするよう指示した。音声収録及び発話訓練は立位にて行った。訓練時は、ＰＣ（Personal Computer）のディスプレイ（EIZO EV2450）を実験参加者の顔の正面にくるよう配置し、正面を向いた状態で練習できるようにした。なお、本実験においては、上記実施の形態に従うスマートフォン１００において実装されたアプリケーションがＰＣにインストールされている。[6. Experiment]
The inventor conducted the following experiment. This experiment was conducted in a soundproof room. Prior to the experiment, the experiment was explained to the participants. Next, I explained the significance of the speech training to the experiment participants and asked them to participate in the experiment with motivation. The voice volume and speaking speed were instructed to image reading in a high school classroom. Voice recording and speech training were performed in a standing position. During training, a PC (Personal Computer) display (EIZO EV2450) was placed in front of the participants' faces so that they could practice while facing the front. In this experiment, an application implemented in smartphone 100 according to the above embodiment is installed in the PC.

実験では、まず実験参加者の訓練前の音声を録音し、ＶＡＳ（Visual analog scale）にてその発話がどの程度うまくできたかを自己評価させた。続いて，前歯で割り箸を噛んだ状態で，上記ＰＣ（発話訓練システム）を用いて３分間練習した。その後、訓練後の音声を収録し、再びＶＡＳを計測した。音声はコンデンサマイクロフォン(SonyECM-77B)とレコーダ(Marantz PMD671)とを用いて標本化周波数16 kHz、量子化16 bitにて収録した。 In the experiment, first, the voices of the participants before the training were recorded and self-evaluated how well the utterance was made by VAS (Visual analog scale). Subsequently, with the front teeth biting the chopsticks, they practiced for 3 minutes using the PC (utterance training system). After that, the voice after training was recorded, and the VAS was measured again. The sound was recorded using a condenser microphone (SonyECM-77B) and a recorder (Marantz PMD671) at a sampling frequency of 16 kHz and a quantization of 16 bits.

図１１は、訓練前後に録音した音声の振幅を示す図である。図１２は、訓練前後に録音した音声の基本周波数の変化幅を示す図である。これらの結果は、各実験参加者の１４文の平均値の分布を示している。図１１及び図１２に示されるように、上記発話訓練システムを用いた訓練によって振幅と基本周波数の変化幅がともに上昇する傾向にあることがわかる。訓練前後の中央値の比較では，振幅が４．３ｄＢ、基本周波数の変化幅が１．１９semitone上昇した。図は示さないが、基本周波数の平均値も訓練後に上昇する傾向にあった。 FIG. 11 is a diagram showing the amplitudes of voices recorded before and after training. FIG. 12 is a diagram showing the change width of the fundamental frequency of the sound recorded before and after the training. These results show the distribution of the average value of 14 sentences of each experiment participant. As shown in FIG. 11 and FIG. 12, it can be seen that the amplitude and the variation range of the fundamental frequency tend to increase due to the training using the utterance training system. In the comparison of the median values before and after the training, the amplitude increased by 4.3 dB, and the change width of the fundamental frequency increased by 1.19 semitone. Although not shown, the average value of the fundamental frequency tended to increase after training.

図１３は、訓練前後に計測したＶＡＳを示す図である。図１３に示されるように、ほぼ全ての実験参加者が、訓練によってうまく読めるようになったという自覚を持った。実験後には「ハキハキ言えるようになった」、「サ行，タ行が良くなった」、「(口の) 横の筋肉が動きやすくなった」などの肯定的なコメントが多く聞かれた。顔面の動きをフィードバックすることにより、わずか３分間の練習でも口の動きを改善する効果があったと考えられる。 FIG. 13 is a diagram showing VAS measured before and after training. As shown in FIG. 13, almost all the experimental participants were aware that they were able to read well through training. After the experiment, many positive comments such as “I can say hakihaki”, “I improved my sa line and ta line”, “I can move my side muscles easily” have been heard. By feeding back the facial movements, it is thought that even 3 minutes of practice had the effect of improving mouth movements.

［７．変形例］
以上、実施の形態について説明したが、本発明は、上記実施の形態に限定されるものではなく、その趣旨を逸脱しない限りにおいて、種々の変更が可能である。以下、変形例について説明する。[7. Modified example]
Although the embodiment has been described above, the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the spirit of the invention. Hereinafter, modified examples will be described.

（７−１）
上記実施の形態においては、筋活動量を示す画像（口の動きの量を示す画像）として線２４０が用いられた。しかしながら、筋活動量を示す画像は、線２４０に限定されない。筋活動量を示す画像は、たとえば、動きの方向及び大きさを示す矢印であってもよい。また、筋活動量を示すために、たとえば、動き量が多い部分と動き量が小さい部分とで色を異ならせてもよい。たとえば、動き量が大きい領域は赤色で表現し、動き量が小さい領域は青色で表現してもよい。(7-1)
In the above embodiment, the line 240 is used as an image indicating the amount of muscle activity (an image indicating the amount of mouth movement). However, the image showing the amount of muscle activity is not limited to the line 240. The image indicating the amount of muscle activity may be, for example, an arrow indicating the direction and magnitude of the movement. Further, in order to indicate the amount of muscle activity, for example, the color may be different between a portion with a large amount of movement and a portion with a small amount of movement. For example, a region with a large amount of motion may be expressed in red, and a region with a small amount of motion may be expressed in blue.

（７−２）
上記実施の形態においては、カメラ１３０によって撮像された動画像全体のオプティカルフローが算出された。しかしながら、オプティカルフローが算出される範囲はこれに限定されない。たとえば、被訓練者１０の顔領域のみのオプティカルフローが算出されてもよいし、被訓練者１０の顔の下半分の領域のみのオプティカルフローが算出されてもよいし、被訓練者１０の口領域のみのオプティカルフローが算出されてもよい。オプティカルフローを算出する領域を絞ることで、制御部１７０による計算量を減らすことができる。(7-2)
In the above embodiment, the optical flow of the entire moving image captured by the camera 130 is calculated. However, the range in which the optical flow is calculated is not limited to this. For example, the optical flow of only the face area of the trainee 10 may be calculated, the optical flow of only the lower half area of the face of the trainee 10 may be calculated, or the mouth of the trainee 10 The optical flow of only the area may be calculated. By narrowing down the area for calculating the optical flow, the amount of calculation by the control unit 170 can be reduced.

（７−３）
上記実施の形態においては、被訓練者１０の画像及び音声を用いて発話訓練が行われた。しかしながら、被訓練者１０の音声は、必ずしも発話訓練に用いられる必要はない。(7-3)
In the said embodiment, speech training was performed using the image and sound of the trainee 10. However, the voice of the trainee 10 is not necessarily used for speech training.

（７−４）
上記実施の形態においては、被訓練者１０の口の動き量を得るためにオプティカルフローが算出された。しかしながら、必ずしもオプティカルフローが算出されなくてもよい。たとえば、単に動画像におけるフレーム間の差分を算出することによって、被訓練者１０の口の動き量が得られてもよい。(7-4)
In the above embodiment, the optical flow is calculated in order to obtain the amount of movement of the mouth of the trainee 10. However, the optical flow does not necessarily have to be calculated. For example, the movement amount of the mouth of the trainee 10 may be obtained simply by calculating the difference between frames in the moving image.

（７−５）
上記実施の形態においては、スマートフォンにおいて発話訓練システムが実現されたが、本発明に従う発話訓練システムは、たとえば、ＰＣ、タブレット等によって実現されてもよい。(7-5)
In the said embodiment, although the speech training system was implement | achieved in the smart phone, the speech training system according to this invention may be implement | achieved by PC, a tablet, etc., for example.

（７−６）
上記実施の形態において、発話訓練中に、ディスプレイ１４０に講師の手本動画があわせて表示されてもよい。(7-6)
In the above embodiment, the model video of the instructor may be displayed on the display 140 during the speech training.

（７−７）
上記実施の形態においては、被訓練者１０の顔の領域毎のオプティカルフローが算出されている。したがって、たとえば、被訓練者１０の顔の何れの領域の動きが不足しているかを算出することも可能である。たとえば、被訓練者１０の顔の何れの領域の動きが不足しているかを示す警告画像がディスプレイ１４０に表示されてもよい。(7-7)
In the above embodiment, the optical flow for each face area of the trainee 10 is calculated. Therefore, for example, it is possible to calculate which region of the face of the trainee 10 is deficient. For example, a warning image indicating which region of the face of the trainee 10 is lacking may be displayed on the display 140.

（７−８）
上記実施の形態において、たとえば、被訓練者１０の発話訓練の履歴が順次記憶部１８０に記憶されてもよい。これにより、たとえば、被訓練者１０が新たに発話訓練を行なった場合に、前回と比較してどの部分が改善されたか、どの部分が悪くなったか等を被訓練者１０に知らせることができる。(7-8)
In the above embodiment, for example, the history of speech training of the trainee 10 may be sequentially stored in the storage unit 180. Thereby, for example, when the trainee 10 newly performs speech training, the trainee 10 can be notified of which part has been improved compared to the previous time, which part has become worse, and the like.

１０被訓練者、２０棒、１００スマートフォン、１３０カメラ、１３１顔領域抽出部、１３２画素移動量算出部、１３３顔移動量補正部、１３４筋活動量推定部、１３５第１判定部、１４０ディスプレイ、１５０マイク、１５１音声特徴抽出部、１５２第２判定部、１６０スピーカ、１７０制御部、１７２ＣＰＵ、１７４ＲＡＭ、１７６ＲＯＭ、１８０記憶部、１８１制御プログラム、１９０通信モジュール、２００顔枠、２１０文章、２２０，２３０レベルメータ、２４０線、２５０第１警告画像、２６０第２警告画像。 10 trainees, 20 sticks, 100 smartphones, 130 cameras, 131 face area extraction units, 132 pixel movement amount calculation units, 133 face movement amount correction units, 134 muscle activity amount estimation units, 135 first determination units, 140 displays, 150 microphone, 151 voice feature extraction unit, 152 second determination unit, 160 speaker, 170 control unit, 172 CPU, 174 RAM, 176 ROM, 180 storage unit, 181 control program, 190 communication module, 200 face frame, 210 text, 220, 230 level meter, 240 lines, 250 first warning image, 260 second warning image.

Claims

Translated fromJapanese

被訓練者の発話訓練に用いられる発話訓練システムであって、
前記被訓練者の顔を撮像し、動画像データを生成する撮像手段と、
前記動画像データが示す動画像を表示する表示手段とを備え、
前記表示手段は、前記被訓練者の口の動きの量を示す画像を前記動画像に重畳して表示する、発話訓練システム。An utterance training system used for utterance training of trainees,
Imaging means for imaging the face of the trainee and generating moving image data;
Display means for displaying a moving image indicated by the moving image data,
The speech training system, wherein the display means superimposes and displays an image indicating an amount of movement of the trainee's mouth on the moving image.

前記表示手段は、前記被訓練者に音読させる文章をさらに表示する、請求項１に記載の発話訓練システム。 The utterance training system according to claim 1, wherein the display unit further displays a sentence that the trainee reads aloud.

前記表示手段は、前記被訓練者の発話に関する評価結果をさらに表示する、請求項１又は請求項２に記載の発話訓練システム。 The utterance training system according to claim 1, wherein the display unit further displays an evaluation result regarding the utterance of the trainee.

前記表示手段は、前記被訓練者の発話が所定要件を満たさない場合に、警告メッセージをさらに表示する、請求項１から請求項３のいずれか１項に記載の発話訓練システム。 The utterance training system according to any one of claims 1 to 3, wherein the display unit further displays a warning message when the utterance of the trainee does not satisfy a predetermined requirement.

前記口の動きの量を示す画像は、前記口が移動した軌跡を示す線である、請求項１から請求項４のいずれか１項に記載の発話訓練システム。 The speech training system according to any one of claims 1 to 4, wherein the image indicating the amount of movement of the mouth is a line indicating a locus of movement of the mouth.

前記動画像データに基づいてオプティカルフローを算出する算出手段と、
前記オプティカルフローに基づいて前記口の動きの量を示す画像を生成する生成手段とをさらに備える、請求項１から請求項５のいずれか１項に記載の発話訓練システム。Calculating means for calculating an optical flow based on the moving image data;
The speech training system according to any one of claims 1 to 5, further comprising generating means for generating an image indicating an amount of movement of the mouth based on the optical flow.

発話に関して被訓練者を訓練する発話訓練方法であって、
前記被訓練者の顔を撮像し、動画像データを生成するステップと、
前記動画像データが示す動画像を表示するステップと、
前記被訓練者の口の動きの量を示す画像を前記動画像に重畳して表示するステップとを含む、発話訓練方法。An utterance training method for training a trainee regarding utterance,
Imaging the face of the trainee and generating moving image data;
Displaying a moving image indicated by the moving image data;
An utterance training method comprising: superimposing and displaying an image indicating an amount of movement of the trainee's mouth on the moving image.

被訓練者の発話訓練に用いられるプログラムであって、
撮像手段に、前記被訓練者の顔を撮像させ、動画像データを生成させるステップと、
表示手段に、前記動画像データが示す動画像を表示させるステップと、
前記表示手段に、前記被訓練者の口の動きの量を示す画像を前記動画像に重畳して表示させるステップとをコンピュータに実行させるプログラム。A program used for training utterances of trainees,
Causing the imaging means to image the face of the trainee and generating moving image data;
Displaying a moving image indicated by the moving image data on a display means;
A program for causing a computer to execute a step of causing the display unit to display an image indicating an amount of movement of the trainee's mouth superimposed on the moving image.