JP2024141284A

Movatterモバイル変換

Info

Publication number: JP2024141284A
Application number: JP2023052838A
Authority: JP
Inventors: 和真橋本; Kazuma Hashimoto; 君孝村下; Kimitaka Murashita; 徹洋加藤; Tetsuyo Kato; 渉長谷川; Wataru Hasegawa
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 2023-03-29
Filing date: 2023-03-29
Publication date: 2024-10-10

Abstract

【課題】生体信号の個人差、環境差の影響を低減させたＡＩモデルを形成でき、感情推定の精度向上を図る。【解決手段】感情推定ＡＩの学習方法は、同じタイミングで、被験者の生体信号と外観情報を取得し、取得した前記生体信号に基づき感情指標値を算出し、算出した前記感情指標値を正規化して正規化感情指標値を算出し、取得した前記外観情報を入力値とし、前記正規化感情指標値を正解値とする教師付き学習データを生成し、生成した前記教師付き学習データでＡＩモデルを学習する。【選択図】図６[Problem] It is possible to form an AI model that reduces the influence of individual and environmental differences in biosignals, thereby improving the accuracy of emotion estimation. [Solution] A learning method for emotion estimation AI acquires biosignals and appearance information of a subject at the same time, calculates an emotion index value based on the acquired biosignals, normalizes the calculated emotion index value to calculate a normalized emotion index value, uses the acquired appearance information as an input value, generates supervised learning data with the normalized emotion index value as a correct answer value, and trains an AI model with the generated supervised learning data. [Selected Figure] Figure 6

Description

Translated fromJapanese

本発明は、感情推定ＡＩに係る学習方法、学習装置、学習プログラム、及び感情推定装置に関する。The present invention relates to a learning method, learning device, learning program, and emotion estimation device related to emotion estimation AI.

従来、生体信号（例えば、心拍と脳波）に基づき感情を推定する感情推定システムが知られている（例えば特許文献１参照）。Conventionally, emotion estimation systems that estimate emotions based on biosignals (e.g., heart rate and brain waves) are known (see, for example, Patent Document 1).

特開２０２０－１８５１３８号公報JP 2020-185138 A

ところで、人工知能（ＡＩ）の高性能化、普及が進んでおり、感情推定装置にもＡＩの適用が考えられる。感情推定装置に用いられるＡＩモデルの作成及び精度向上には、多くの被験者から生体信号等の感情推定に関わる各種データのサンプリングを行い、これらサンプリングデータに基づいて教師付き学習データを作成し、学習前ＡＩモデルに学習させる必要がある。Meanwhile, artificial intelligence (AI) is becoming more sophisticated and widespread, and the application of AI to emotion estimation devices is also being considered. To create an AI model to be used in emotion estimation devices and improve its accuracy, it is necessary to sample various data related to emotion estimation, such as biosignals, from many subjects, create supervised learning data based on this sampling data, and train a pre-learning AI model.

しかしながら、生体信号のレベルは、個人差、環境差が大きく、学習に適さないという課題があった。つまり、同じ正解データであっても、個人差、環境差によって生体信号が大きく異なる教師付き学習データが生成されてしまうことがあった。However, there was an issue that the level of biosignals differed greatly between individuals and environments, making them unsuitable for learning. In other words, even if the same correct answer data was used, supervised learning data could be generated in which the biosignals differed greatly depending on individual and environmental differences.

本発明は、上記の課題に鑑み、生体信号の個人差、環境差の影響を低減させたＡＩモデルを形成でき、その結果、感情推定の精度向上を図ることが可能な技術を提供することを目的とする。In view of the above problems, the present invention aims to provide a technology that can form an AI model that reduces the effects of individual and environmental differences in biosignals, thereby improving the accuracy of emotion estimation.

例示的な本発明の感情推定ＡＩの学習方法は、同じタイミングで、被験者の生体信号と外観情報を取得し、取得した前記生体信号に基づき感情指標値を算出し、算出した前記感情指標値を正規化して正規化感情指標値を算出し、取得した前記外観情報を入力値とし、前記正規化感情指標値を正解値とする教師付き学習データを生成し、生成した前記教師付き学習データでＡＩモデルを学習する。An exemplary learning method for emotion estimation AI of the present invention simultaneously acquires biosignals and appearance information of a subject, calculates an emotion index value based on the acquired biosignals, normalizes the calculated emotion index value to calculate a normalized emotion index value, generates supervised learning data in which the acquired appearance information is used as an input value and the normalized emotion index value is used as a correct answer value, and trains an AI model with the generated supervised learning data.

本発明によれば、個人差等が大きい感情指標値を正規化した感情指標値をＡＩ学習用の正解値として使用し、ＡＩモデルの学習を行う。したがって、ＡＩモデルは個人差等が考慮された学習が施されることになり、個人差等の悪影響が抑制された精度の高い学習済みＡＩモデルが生成されることになる。その結果、当該ＡＩモデルを用いた感情推定の精度の向上が期待できる。According to the present invention, an emotion index value that is normalized from an emotion index value with large individual differences, etc., is used as a correct answer value for AI learning, and an AI model is trained. Therefore, the AI model is trained taking individual differences, etc. into consideration, and a trained AI model with high accuracy in which the adverse effects of individual differences, etc. are suppressed is generated. As a result, it is expected that the accuracy of emotion estimation using the AI model will be improved.

感情推定装置の概念的構成を示す構成図A diagram showing a conceptual configuration of an emotion estimation device.感情推定の複数次元（２次元）モデル（心理平面）の一例を示す図A diagram showing an example of a multi-dimensional (two-dimensional) model (mental plane) for emotion estimation.ニュートラル領域を含む心理平面の一例を示す図A diagram showing an example of a psychological plane including a neutral region.車両制御システムの構成を示すブロック図Block diagram showing the configuration of a vehicle control system図４の感情推定装置のコントローラが実行する感情推定処理を示すフローチャートA flowchart showing an emotion estimation process executed by a controller of the emotion estimation device of FIG.感情推定ＡＩの学習方法の一例を示す説明図An explanatory diagram showing an example of a learning method for emotion estimation AI学習装置の一例を示す構成図FIG. 1 is a block diagram showing an example of a learning device.２人の被験者の感情指標値（覚醒度）の時間変化を示す模式図A schematic diagram showing the time changes in the emotional index values (arousal level) of two subjects.タスクテーブルの一例を示す図A diagram showing an example of a task table.覚醒度のニュートラル領域を推定するためのタスク実行時の覚醒度の時間変化を示す模式図A schematic diagram showing the time change in arousal level during task execution to estimate the neutral region of arousal level.感情指標値（覚醒度）の時間変化を示す模式図Schematic diagram showing changes in emotion index value (arousal level) over timeデータ蓄積期間に対するニュートラル領域の上限推定値の収束状況を示す図A graph showing the convergence of the upper limit estimate of the neutral region over the data accumulation period.ニュートラル領域テーブルの一例を示す図FIG. 13 is a diagram showing an example of a neutral region table.２人の被験者の感情指標値（覚醒度）の時間変化を示す模式図A schematic diagram showing the time changes in the emotional index values (arousal level) of two subjects.図７の学習装置のコントローラが実行する感情推定ＡＩの学習処理を示すフローチャートA flowchart showing the learning process of the emotion estimation AI executed by the controller of the learning device of FIG.感情推定装置の概念的構成を示す構成図A diagram showing a conceptual configuration of an emotion estimation device.感情推定装置のＡＩモデルを学習する学習装置の概念的構成を示す構成図A diagram showing a conceptual configuration of a learning device that learns an AI model of an emotion estimation device.感情推定装置の概念的構成を示す構成図A diagram showing a conceptual configuration of an emotion estimation device.感情推定装置のＡＩモデルを学習する学習装置の概念的構成を示す構成図A diagram showing a conceptual configuration of a learning device that learns an AI model of an emotion estimation device.

以下、本発明の例示的な実施形態について、図面を参照しながら詳細に説明する。なお、本発明は以下に示す実施形態の内容に限定されるものではない。Below, exemplary embodiments of the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to the contents of the embodiments shown below.

＜１．感情推定装置＞
まず、感情推定装置について説明する。図１は、感情推定装置２０の概念的構成を示す構成図である。<1. Emotion estimation device>
1 is a diagram showing a conceptual configuration of a feeling estimation device 20.

感情推定装置２０の感情推定モデル２２２ｍは、感情に関する心身状態を示す指標である２つの感情指標値に基づき感情を推定する。本実施形態で用いる感情指標値の１つは、中枢神経系覚醒度（以下、覚醒度と称する）であり、その指標値は「脳波のβ波／α波」で算出することができる。また、他の感情指標値は、自律神経系の活性度（以下、活性度と称する）であり、その指標値は「心拍ＬＦ（Low Frequency）成分（心拍波形信号の低周波成分）の標準偏差」で算出することができる。The emotion estimation model 222m of the emotion estimation device 20 estimates emotions based on two emotion index values, which are indices that indicate the mental and physical state related to emotions. One of the emotion index values used in this embodiment is the central nervous system arousal level (hereinafter referred to as arousal level), and its index value can be calculated using "β waves/α waves of the brain waves." The other emotion index value is the activity level of the autonomic nervous system (hereinafter referred to as activity level), and its index value can be calculated using the "standard deviation of the heartbeat LF (Low Frequency) component (low frequency component of the heartbeat waveform signal)."

感情推定モデル２２２ｍは、覚醒度と活性度とに基づく感情推定用のモデル（算出式や変換データテーブル）により構成される。感情推定モデル２２２ｍは、覚醒度と活性度とをパラメータとして感情を推定する複数次元モデル（ここでは覚醒度と活性度とを２軸とする２次元モデル）により構成される。なお、２次元モデルは、複数の各指標と感情との関係（覚醒度・活性度と、感情との関係）を示す医学的エビデンス（論文等）に基づいて作成される。或いは、２次元モデルは、多くの被験者によるアンケート結果（被験者による感情申告とその際の覚醒度・活性度（脳波・心拍計測値に基づく）からなるデータ）に基づいて作成される。なお、感情推定モデル２２２ｍを２次元モデルだけでなく、３次元以上の多次元モデルとすることも可能である。The emotion estimation model 222m is composed of a model (calculation formula or conversion data table) for emotion estimation based on arousal and activity. The emotion estimation model 222m is composed of a multidimensional model (here, a two-dimensional model with arousal and activity as two axes) that estimates emotions using arousal and activity as parameters. The two-dimensional model is created based on medical evidence (papers, etc.) that shows the relationship between multiple indicators and emotions (the relationship between arousal/activity and emotions). Alternatively, the two-dimensional model is created based on the results of a questionnaire by many subjects (data consisting of subjects' emotion declarations and their arousal/activity (based on electroencephalogram/heart rate measurements)). The emotion estimation model 222m can be a multidimensional model with three or more dimensions, in addition to a two-dimensional model.

図２は、感情推定の複数次元（２次元）モデル（心理平面）の一例を示す図である。心理学に関する各種医学的エビデンスによると、心理は身体状態を示す２種類の指標に基づき推定できるとされる。図２に示される心理平面は、縦軸が「覚醒度（覚醒－不覚醒）」であり、横軸が「自律神経系の活性度（交換神経活性（強い感情）－副交感神経活性（弱い感情）」である。Figure 2 shows an example of a multi-dimensional (two-dimensional) model (psychological plane) for estimating emotions. According to various medical evidence related to psychology, it is said that psychology can be estimated based on two types of indices that show physical state. In the psychological plane shown in Figure 2, the vertical axis is "arousal level (aroused - unaroused)" and the horizontal axis is "activity of the autonomic nervous system (sympathetic nervous activity (strong emotions) - parasympathetic nervous activity (weak emotions))."

この心理平面では、縦軸と横軸で分離される４つの象限のそれぞれに、該当する心理状態が割り当てられている。各軸からの距離が、該当する心理状態の強度を示す。第一象限には「楽しい、喜び、怒り、悲しみ」の心理状態が割り当てられている。また、第二象限には「憂鬱」の心理状態が割り当てられている。また、第三象限には「リラックス、落ち着き」の心理状態が割り当てられている。また、第四象限には「不安、恐怖、不愉快」の心理状態が割り当てられている。In this psychological plane, each of the four quadrants separated by the vertical and horizontal axes is assigned to a corresponding psychological state. The distance from each axis indicates the intensity of the corresponding psychological state. The first quadrant is assigned to the psychological states of "fun, joy, anger, and sadness." The second quadrant is assigned to the psychological state of "melancholy." The third quadrant is assigned to the psychological state of "relaxation and calm." And the fourth quadrant is assigned to the psychological states of "anxiety, fear, and discomfort."

なお、軸の位置は実験（被験者の覚醒度及び活性度を計測し、統計的処理を施す）等に基づき適宜設定されることになるが、後述（図１０、図１１参照）するニュートラル領域の決定方法において、決定したニュートラル領域の中央を軸とする方法や、後述（図１２参照）する正規化方法において、正規化後の感情指標値の変動範囲を多くの被験者で計測し、それら計測変動範囲の中央値の平均を軸とする方法等が、可能である。The position of the axis will be set appropriately based on an experiment (measuring the subject's level of arousal and activity and applying statistical processing), etc., but possible methods include using the center of the determined neutral area as the axis in the method of determining the neutral area described below (see Figures 10 and 11), and using the average of the medians of the measured ranges of variation measured in the normalization method described below (see Figure 12).

そして、生体信号に基づいて得られる２種類の心身状態の指標値（覚醒度及び活性度）を、心理平面にプロットすることにより得られる座標から、心理状態の推定を行うことができる。具体的には、プロットした座標が、心理平面のどの象限に存在するか、象限内のどの位置にあるか、また原点から距離がどの程度であるかに基づき、心理状態とその強度を推定することができる。なお、図２に示す感情推定モデルは、２次元の平面であるが、使用する指標数に応じて３次元以上の多次元空間となる。Then, the index values of two types of mental and physical states (alertness and activity) obtained based on the biosignals are plotted on a psychological plane to obtain coordinates, from which the psychological state can be estimated. Specifically, the psychological state and its intensity can be estimated based on which quadrant of the psychological plane the plotted coordinates are in, where they are located within the quadrant, and how far they are from the origin. Note that although the emotion estimation model shown in Figure 2 is a two-dimensional plane, it can become a multidimensional space of three or more dimensions depending on the number of indexes used.

なお、感情強度の推定は難しく、比較的大きな誤差を伴い、また使用用途も限られるので、感情指標値が心理平面のどの象限に存在するかに基づき感情種別のみを判定し、当該感情を利用するのが多用される使用方法である。However, since estimating the intensity of emotions is difficult and involves a relatively large margin of error, and its applications are limited, the most commonly used method is to determine only the type of emotion based on which quadrant of the psychological plane the emotion index value is in, and use that emotion.

また、感情強度が強い場合、つまり感情指標値が最大値側、最小値側に大きく振れている場合は、感情推定精度は高くなる。しかし、感情強度が弱い場合、つまり感情指標値が中央値付近にある場合は、感情推定精度は低くなる。このため、感情指標値の中央値付近の領域をニュートラル領域として、推定感情無、感情推定不可と言った判定をする方法が考えられる。In addition, when the emotion intensity is strong, that is, when the emotion index value swings significantly toward the maximum or minimum value, the emotion estimation accuracy is high. However, when the emotion intensity is weak, that is, when the emotion index value is near the median, the emotion estimation accuracy is low. For this reason, one method can be considered to treat the area near the median of the emotion index value as a neutral area and determine that there is no estimated emotion or that emotion estimation is impossible.

図３は、ニュートラル領域を含む心理平面の一例を示す図である。図３において、斜線で示す領域ＮＲ１、ＮＲ２は、ニュートラル領域である。Figure 3 shows an example of a psychological plane that includes a neutral region. In Figure 3, the shaded regions NR1 and NR2 are neutral regions.

「覚醒度」の感情指標におけるニュートラル領域の上限値及び下限値は、ＹＰ及びＹＮであり、上限値ＹＰ及び下限値ＹＮで挟まれる領域が「覚醒度」に対する覚醒度ニュートラル領域ＮＲ１である。また「活性度」の感情指標におけるニュートラル領域の上限値及び下限値は、ＸＰ及びＸＮであり、上限値ＸＰ及び下限値ＸＮで挟まれる領域が「活性度」に対する活性度ニュートラル領域ＮＲ２である。The upper and lower limits of the neutral region for the emotion index of "alertness" are YP and YN, and the region between the upper limit YP and the lower limit YN is the arousal neutral region NR1 for "alertness". The upper and lower limits of the neutral region for the emotion index of "activity" are XP and XN, and the region between the upper limit XP and the lower limit XN is the activity neutral region NR2 for "activity".

これらニュートラル領域（上限値ＹＰ、下限値ＹＮ、上限値ＸＰ及び下限値ＸＮ）の設定は、実験等により適宜設定することも可能である。なお、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢの学習時に適当な値を算出するため、当該算出値を適用することもできる。なお、具体的な、ニュートラル領域の決定方法例については、後述（図１０、図１１参照）する。These neutral regions (upper limit YP, lower limit YN, upper limit XP, and lower limit XN) can be set appropriately through experiments, etc. Note that appropriate values are calculated when learning the first AI model 121mA and the second AI model 121mB, so the calculated values can also be applied. Specific examples of methods for determining the neutral regions will be described later (see Figures 10 and 11).

感情推定装置２０は、入力された感情指標「覚醒度」と「活性度」とを、このような心理平面で示される感情推定モデル２２２ｍに当て嵌めて（座標としてプロットして）用いて感情を推定する。The emotion estimation device 20 estimates emotions by fitting (plotting as coordinates) the input emotion indicators "arousal level" and "activity level" into the emotion estimation model 222m shown on such a psychological plane.

感情推定装置２０の感情推定モデル２２２ｍは、脳波データ及び心拍データに基づき算出される感情指標の覚醒度及び活性度を使用するため、脳波データ及び心拍データを入力とする。なお、本実施形態では、脳波センサ及び心拍センサにより検出される脳波データ及び心拍データではなく、視線、顔向き、音声からＡＩモデルにて推定された感情指標の覚醒度及び活性度を入力する。これは、感情推定の対象者に接触センサである脳波センサ及び心拍センサを装着する手間を省くための技術であり、非接触センサであるカメラ、マイク等を用いて視線、顔向き、音声のデータを収集し、ＡＩモデルにて推定された脳波データ及び心拍データを推定するものである。例えば、車両の運転手の感情を推定して、車両の走行制御等に用いる場合等では、運転手に接触センサを装着するのは実用上困難であり、本技術は特に有用になる。The emotion estimation model 222m of the emotion estimation device 20 uses the arousal and activity of the emotion index calculated based on the electroencephalogram data and the heartbeat data, so it receives the electroencephalogram data and the heartbeat data as input. In this embodiment, the arousal and activity of the emotion index estimated by the AI model from the gaze, face direction, and voice are input, rather than the electroencephalogram data and the heartbeat data detected by the electroencephalogram sensor and the heartbeat sensor. This is a technology to eliminate the trouble of attaching the electroencephalogram sensor and the heartbeat sensor, which are contact sensors, to the subject of emotion estimation, and estimates the electroencephalogram data and the heartbeat data estimated by the AI model using non-contact sensors such as a camera and a microphone to collect gaze, face direction, and voice data. For example, when estimating the emotion of a vehicle driver and using it for vehicle driving control, etc., it is practically difficult to attach a contact sensor to the driver, and this technology is particularly useful.

つまり、感情指標の算出には、通常生体信号が必要であるが、生体信号は接触型のセンサが大半であり、用途に制限が加わるという課題がある。このため、非接触系のセンサで検出可能な感情推定対象者（被験者）の外観に基づく情報を検出センサの利用が望まれる。なお、ここでの外観情報とは、画像データだけでなく音声等の非接触系センサ（遠隔検出系センサ：カメラ、マイク等）で検出可能な情報も含まれる。In other words, calculating emotion indices usually requires biosignals, but most biosignals are obtained from contact-type sensors, which limits their application. For this reason, it is desirable to use sensors that detect information based on the appearance of the subject (test subject) whose emotions are to be estimated, which can be detected by non-contact sensors. Note that appearance information here includes not only image data, but also information that can be detected by non-contact sensors such as voice (remote detection sensors: cameras, microphones, etc.).

図１に示した感情推定装置２０の使用例において、ユーザＵ２は、車両Ｖ２の運転者である。当該推定方法で用いられる第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢは、運転者（ユーザＵ２）の感情の推定に用いられる。第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢは、別途学習が行われて、学習済みモデルとして提供され、感情推定装置２０に搭載される。なお、感情推定装置２０の詳細については後述する。In the use example of the emotion estimation device 20 shown in FIG. 1, a user U2 is a driver of a vehicle V2. The first AI model 121mA and the second AI model 121mB used in the estimation method are used to estimate the emotion of the driver (user U2). The first AI model 121mA and the second AI model 121mB are trained separately, provided as trained models, and installed in the emotion estimation device 20. Details of the emotion estimation device 20 will be described later.

なお、感情推定装置２０は、車両Ｖ２に搭載されるコンピュータ装置で実現でき、また車両Ｖ２とネットワークを介して接続されたサーバでも実現できる。また、当該サーバは、物理サーバであっても、仮想サーバであっても良い。The emotion estimation device 20 can be realized by a computer device mounted on the vehicle V2, or by a server connected to the vehicle V2 via a network. The server can be a physical server or a virtual server.

車両Ｖ２には、車載センサとして、カメラＣと、マイクＭとが搭載されている。そして、カメラＣの撮影画像に基づくユーザＵ２（運転者）の視線データ及び顔向きデータと、マイクＭの取得したユーザＵ２（運転者）の音声データとが、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢに入力されている。The vehicle V2 is equipped with a camera C and a microphone M as on-board sensors. Then, gaze data and facial direction data of the user U2 (driver) based on the image captured by the camera C, and voice data of the user U2 (driver) acquired by the microphone M are input to the first AI model 121mA and the second AI model 121mB.

なお、図示の例では、カメラＣの撮影画像を画像認識処理等によって処理し、視線データ（眼球部の画像や、視線（向き）を示すテキスト・数値データに加工したデータ）及び顔向きデータ（顔面の画像や、顔面の向きを示すテキスト・数値データに加工したデータ）に加工して、第１ＡＩモデル１２１ｍＡおよび第２ＡＩモデル１２１ｍＢに入力しているが、カメラＣの撮影画像を入力しても良い。なお、入力データ種別に応じた第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢ（入力データ種別に応じて設計され、学習が施されたＡＩモデル）が、感情推定装置２０に搭載されることになる。In the illustrated example, the image captured by camera C is processed by image recognition processing or the like, and processed into gaze data (image of the eyeball, and data processed into text and numerical data indicating gaze (direction)) and facial direction data (image of the face, and data processed into text and numerical data indicating the facial direction), and input to the first AI model 121mA and the second AI model 121mB, but the image captured by camera C may also be input. The first AI model 121mA and the second AI model 121mB (AI models designed and trained according to the input data type) corresponding to the input data type are installed in the emotion estimation device 20.

また、視線データ及び顔向きデータは、感情推定タイミングに合わせたタイミングの静止画データで良い。なお、視線データ及び顔向きデータは、感情推定にはそれらの動きが重要なものとなるので、感情推定タイミングに合わせた期間の動画データ、例えば感情推定タイミングの直前の予め定めた時間長の動画が好ましい。同様に、音声データも感情推定タイミングの直前の予め定めた時間長の動画が好ましい。例えば、感情推定間隔が１秒の場合、感情推定タイミングＴａにおいては、時刻Ｔａ－１０秒（始点）から時刻Ｔａ（終点）までの顔面の動画像データに基づく視線データ、顔向きデータ、及び音声データを、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢに入力する。なお、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢは、このようなデータ形態の学習用入力データにより学習されることになる。The gaze data and face direction data may be still image data timed to coincide with the emotion estimation timing. Since the gaze data and face direction data are important for emotion estimation, video data for a period that coincides with the emotion estimation timing, for example, a video of a predetermined length just before the emotion estimation timing, is preferable. Similarly, the voice data is preferably a video of a predetermined length just before the emotion estimation timing. For example, when the emotion estimation interval is 1 second, at the emotion estimation timing Ta, gaze data, face direction data, and voice data based on the facial video data from time Ta-10 seconds (start point) to time Ta (end point) are input to the first AI model 121mA and the second AI model 121mB. The first AI model 121mA and the second AI model 121mB are trained using learning input data in this data format.

第１ＡＩモデル１２１ｍＡは、ユーザＵ２の視線データ、顔向きデータ、及び音声データを入力して、感情指標の覚醒度を出力するＡＩモデルであり、当該入出力に適した構造に設計されている。そして、第１ＡＩモデル１２１ｍＡは、視線データ、顔向きデータ、及び音声データを入力データとし、当該入力データに対応する覚醒度を正解データとする多くの学習データのデータセットで学習されている。The first AI model 121mA is an AI model that inputs gaze data, facial direction data, and voice data of the user U2, and outputs the arousal level of the emotional index, and is designed with a structure suitable for such input and output. The first AI model 121mA is trained with a large number of learning data sets in which gaze data, facial direction data, and voice data are input data, and the arousal level corresponding to the input data is used as the correct answer data.

また、第２ＡＩモデル１２１ｍＢは、ユーザＵ２の視線データ、顔向きデータ、及び音声データを入力して、感情指標の活性度を出力するＡＩモデルであり、当該入出力に適した構造に設計されている。そして、第２ＡＩモデル１２１ｍＢは、視線データ、顔向きデータ、及び音声データを入力データとし、当該入力データに対応する活性度を正解データとする多くの学習データのデータセットで学習されている。なお、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢの学習方法の詳細については、後述する。The second AI model 121mB is an AI model that inputs gaze data, face direction data, and voice data of the user U2 and outputs the activity of an emotional index, and is designed with a structure suitable for such input and output. The second AI model 121mB is trained with a large number of training data sets in which gaze data, face direction data, and voice data are input data and the activity corresponding to the input data is used as correct answer data. Details of the training methods of the first AI model 121mA and the second AI model 121mB will be described later.

そして、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢは、ユーザＵ２の視線データ、顔向きデータ、及び音声データに基づいて推定（出力）した２種類の感情指標である覚醒度及び活性度を、感情推定モデル２２２ｍに出力する。感情推定モデル２２２ｍは、これら入力された覚醒度及び活性度に基づいて感情を推定する。Then, the first AI model 121mA and the second AI model 121mB output two types of emotion indicators, arousal level and activity level, estimated (output) based on the gaze data, face direction data, and voice data of the user U2 to the emotion estimation model 222m. The emotion estimation model 222m estimates the emotion based on the input arousal level and activity level.

＜２．車両制御システム＞
次に、推定された感情データを用いて車両制御を行う車両制御システムについて、図４を用いて説明する。図４は、車両制御システム４０の構成を示すブロック図である。図４では、本実施形態の特徴を説明するために必要な構成要素が示されており、一般的な構成要素の記載は省略されている。2. Vehicle Control System
Next, a vehicle control system that uses estimated emotion data to control a vehicle will be described with reference to Fig. 4. Fig. 4 is a block diagram showing the configuration of a vehicle control system 40. Fig. 4 shows components necessary for explaining the features of this embodiment, and omits the description of general components.

図４に示すように、車両制御システム４０は、感情推定装置２０と、車両制御装置３０と、アクチュエータ部４１と、報知部４２と、を備える。なお、図示は省略するが、車両制御システム４０は、キーボード、タッチパネル等の入力装置や、ディスプレイ等の出力装置を備える。As shown in FIG. 4, the vehicle control system 40 includes an emotion estimation device 20, a vehicle control device 30, an actuator unit 41, and an alarm unit 42. Although not shown in the figure, the vehicle control system 40 also includes input devices such as a keyboard and a touch panel, and an output device such as a display.

＜２－１．感情推定装置＞
感情推定装置２０は、通信部２１と、記憶部２２と、コントローラ２３と、を備える。感情推定装置２０は、一般的には制御の即応性が求められるので、本例のように車両Ｖ２に搭載されるが、車両Ｖ２とネットワークを介して接続されたサーバで構成することも可能である。<2-1. Emotion estimation device>
The feeling estimation device 20 includes a communication unit 21, a storage unit 22, and a controller 23. Since the feeling estimation device 20 is generally required to have quick response in control, the feeling estimation device 20 is configured to include a vehicle V2 as in this example. It is installed on the vehicle V2, but it can also be configured as a server connected to the vehicle V2 via a network.

記憶部２２には、図１で示した学習済みの第１ＡＩモデル１２１ｍＡ及び学習済みの第２ＡＩモデル１２１ｍＢを記憶する第１ＡＩモデル記憶部２２１Ａ及び第２ＡＩモデル記憶部２２１Ｂが設けられる。The memory unit 22 is provided with a first AI model memory unit 221A and a second AI model memory unit 221B that store the trained first AI model 121mA and trained second AI model 121mB shown in FIG. 1.

なお、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢは、例えば、次のようにして第１ＡＩモデル記憶部２２１Ａ及び第２ＡＩモデル記憶部２２１Ｂに記憶される。The first AI model 121mA and the second AI model 121mB are stored in the first AI model storage unit 221A and the second AI model storage unit 221B, for example, as follows.

車両制御システム４０の製作者等が、システムの組み立て時等に、学習済み第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢが書き込まれたメモリを第１ＡＩモデル記憶部２２１Ａ及び第２ＡＩモデル記憶部２２１Ｂとして搭載する。或いは、車両制御システム４０の製作者等が、学習済み第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢが記憶された外部記憶装置を車両制御システム４０（感情推定装置２０）に接続し、当該第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢを読み込んで第１ＡＩモデル記憶部２２１Ａ及び第２ＡＩモデル記憶部２２１Ｂに書き込む。或いは、車両制御システム４０の製作者等が、ネットワークを介して学習装置１０から学習済み第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢを受信し、第１ＡＩモデル記憶部２２１Ａ及び第２ＡＩモデル記憶部２２１Ｂに書き込む。The manufacturer of the vehicle control system 40, when assembling the system, etc., installs a memory in which the trained first AI model 121mA and second AI model 121mB are written as the first AI model storage unit 221A and the second AI model storage unit 221B. Alternatively, the manufacturer of the vehicle control system 40, etc. connects an external storage device in which the trained first AI model 121mA and second AI model 121mB are stored to the vehicle control system 40 (emotion estimation device 20), reads the first AI model 121mA and second AI model 121mB, and writes them to the first AI model storage unit 221A and the second AI model storage unit 221B. Alternatively, the manufacturer of the vehicle control system 40 receives the trained first AI model 121mA and second AI model 121mB from the learning device 10 via a network and writes them to the first AI model storage unit 221A and second AI model storage unit 221B.

また、記憶部２２には、図１で示した感情推定モデル２２２ｍを記憶する感情推定モデル記憶部２２２が設けられる。なお、感情推定モデル記憶部２２２には、感情推定に必要な各種データ、例えば、図２或いは図３で示した感情指標値から感情を推定するためのデータである心理平面テーブル２２４等が記憶される。The memory unit 22 also includes an emotion estimation model memory unit 222 that stores the emotion estimation model 222m shown in FIG. 1. The emotion estimation model memory unit 222 stores various data necessary for emotion estimation, such as a psychological plane table 224 that is data for estimating emotions from the emotion index values shown in FIG. 2 or 3.

なお、心理平面テーブル２２４は、図２或いは図３に示した感情推定モデル（心理平面）を形成するデータで、２種類の感情指標値（覚醒度、活性度）と、それに対応する感情情報を関連づけたデータ群となる。また、このような感情推定モデル（感情推定モデルを構成する各種データ）も、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢと同様の方法で、感情推定装置２０の設計時、或いは組み立て時等に、記憶部２２（感情推定モデル記憶部２２２）に記憶されることになる。The psychological plane table 224 is data forming the emotion estimation model (psychological plane) shown in FIG. 2 or FIG. 3, and is a data group that associates two types of emotion index values (arousal level, activity level) with the corresponding emotion information. In addition, such emotion estimation models (various data constituting the emotion estimation models) are also stored in the memory unit 22 (emotion estimation model memory unit 222) when the emotion estimation device 20 is designed or assembled, in a manner similar to that of the first AI model 121mA and the second AI model 121mB.

コントローラ２３は、感情推定装置２０の各種動作等を制御するものであり、演算処理等を行うプロセッサを含み、例えばＣＰＵで構成される。コントローラ２３は、その機能として、取得部２３１と、挙動判定部２３２と、感情推定部２３３と、提供部２３４と、を備える。コントローラ２３の機能は、記憶部２２に記憶されるプログラムに従った演算処理をプロセッサが実行することによって実現される。The controller 23 controls various operations of the emotion estimation device 20, includes a processor that performs arithmetic processing, and is configured, for example, by a CPU. The controller 23 has, as its functions, an acquisition unit 231, a behavior determination unit 232, an emotion estimation unit 233, and a provision unit 234. The functions of the controller 23 are realized by the processor executing arithmetic processing in accordance with a program stored in the storage unit 22.

取得部２３１は、通信部２１を介して、カメラＣ、及びマイクＭによって撮影・集音された感情推定対象者であるユーザＵ２の各種情報（画像情報、音声情報）を取得する。取得部２３１は、取得した各種情報を、その後の処理のために必要に応じて記憶部２２に形成されたデータテーブルに記憶する。なお、取得部２３１は、実質的に同時刻におけるこれらの情報を取得して、一つのデータセットとしてデータテーブルに記憶する。そして、これらデータは当該時刻における挙動状態、感情を推定するために使用される。The acquisition unit 231 acquires various information (image information, audio information) of the user U2, who is the subject of emotion estimation, captured by the camera C and microphone M via the communication unit 21. The acquisition unit 231 stores the acquired various information in a data table formed in the storage unit 22 as necessary for subsequent processing. The acquisition unit 231 acquires this information at substantially the same time and stores it in the data table as one data set. These pieces of data are then used to estimate the behavior state and emotion at that time.

挙動判定部２３２は、取得部２３１が取得した、ユーザＵ２の画像情報及び音声情報に対して解析処理を行うことで、ユーザＵ２の視線や顔の向き、表情、音声等の予め定めれた種類の挙動を判定する。予め定めれた種類の挙動は、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢが入力とするデータ種に対応する挙動であり、本実施形態の場合、具体的には、視線、顔向き、音声（内容）となる。The behavior determination unit 232 performs an analysis process on the image information and voice information of the user U2 acquired by the acquisition unit 231 to determine predetermined types of behavior such as the line of sight, facial direction, facial expression, and voice of the user U2. The predetermined types of behavior are behaviors that correspond to the types of data input by the first AI model 121mA and the second AI model 121mB, and in this embodiment, specifically, the line of sight, facial direction, and voice (contents).

したがって、挙動判定部２３２は、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢの入力仕様（設計条件、学習条件等により定まる）に応じた各種処理を行うことになる。当該各種処理は、例えば、処理対象期間の画像（静止画或いは動画）及び音声データの切り出し処理、カメラＣの撮影画像からのユーザＵ２の顔面や眼球部分の切り出し処理及び向きの検出処理、並びにマイクＭの出力信号の周波数解析処理等である。なお、音声データの切り出し処理では、例えば、感情推定タイミングの直前１０秒間のデータ等を切り出す。なお、データ種別に応じて時間長を変更しても良い。Therefore, the behavior determination unit 232 performs various processes according to the input specifications (determined by design conditions, learning conditions, etc.) of the first AI model 121mA and the second AI model 121mB. The various processes include, for example, cutting out images (still images or videos) and audio data from the processing period, cutting out and detecting the orientation of the face and eyes of the user U2 from the image captured by the camera C, and frequency analysis of the output signal from the microphone M. Note that, in the cutting out of audio data, for example, data for 10 seconds immediately prior to the emotion estimation timing is cut out. Note that the time length may be changed depending on the data type.

そして、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢは、学習時の入力データと種別（内容・様式）が同じデータの入力に対して適切な推定（正解）データを出力する。このため、挙動判定部２３２の出力データ種別（内容・様式）は、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢの学習時の入力データと同じものとする必要がある。挙動判定部２３２は、そのようなデータ処理を行うことになる。挙動判定部２３２のこれらの動作を実現するためのプログラムやデータは、記憶部２２に記憶される。The first AI model 121mA and the second AI model 121mB output appropriate estimated (correct) data in response to input of data of the same type (content and format) as the input data during learning. For this reason, the output data type (content and format) of the behavior determination unit 232 needs to be the same as the input data during learning of the first AI model 121mA and the second AI model 121mB. The behavior determination unit 232 will perform such data processing. Programs and data for realizing these operations of the behavior determination unit 232 are stored in the memory unit 22.

感情推定部２３３は、挙動判定部２３２の判定した挙動のデータに基づき、ユーザＵ２の感情を推定する。詳細には、挙動判定部２３２の判定した挙動のデータを第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢに入力する。その結果、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢは、ユーザＵ２の視線や顔の向き、表情、音声等に係る挙動情報に基づき推定したユーザＵ２の感情指標値（覚醒度及び活性度）を出力する。そして、第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢが出力したユーザＵ２の推定感情指標値は、感情推定モデル２２２ｍに入力される。そして、感情推定モデル２２２ｍは、ユーザＵ２の感情指標値に基づいてユーザＵ２の感情を推定し、出力する。The emotion estimation unit 233 estimates the emotion of the user U2 based on the behavior data determined by the behavior determination unit 232. In detail, the behavior data determined by the behavior determination unit 232 is input to the first AI model 121mA and the second AI model 121mB. As a result, the first AI model 121mA and the second AI model 121mB output the emotion index value (alertness and activity level) of the user U2 estimated based on the behavior information related to the line of sight, face direction, facial expression, voice, etc. of the user U2. The estimated emotion index value of the user U2 output by the first AI model 121mA and the second AI model 121mB is input to the emotion estimation model 222m. The emotion estimation model 222m estimates and outputs the emotion of the user U2 based on the emotion index value of the user U2.

なお、上述の感情推定に用いる各データ（カメラ撮影画像、マイク集音音声、挙動情報、推定感情指標値等）としては、適当な期間長における複数の各データを統計処理（平均処理、ローパスフィルタ処理等）等したデータを用いて、感情を推定するようにしても良い。この場合、統計処理は、統計処理が行われるデータ種別に応じて、適宜、取得部２３１、挙動判定部２３２、或いは感情推定部２３３で行うことになる。Note that as for each piece of data used for the emotion estimation described above (images captured by a camera, sounds collected by a microphone, behavior information, estimated emotion index values, etc.), emotions may be estimated using data obtained by performing statistical processing (average processing, low-pass filter processing, etc.) on a plurality of pieces of data over an appropriate period of time. In this case, the statistical processing is performed by the acquisition unit 231, the behavior determination unit 232, or the emotion estimation unit 233 as appropriate, depending on the type of data to be subjected to the statistical processing.

提供部２３４は、感情推定部２３３によって推定されたユーザＵ２の感情を、車両制御装置３０に提供する。これにより、車両制御装置３０において、ユーザＵ２の感情に基づく車両Ｖ２の制御が行えるようになる。The providing unit 234 provides the emotion of the user U2 estimated by the emotion estimation unit 233 to the vehicle control device 30. This enables the vehicle control device 30 to control the vehicle V2 based on the emotion of the user U2.

＜２－２．車両制御装置、及び他の構成要素＞
車両制御装置３０は、例えば車両制御用のＥＣＵ（Electronic Control Unit）であって、車両Ｖ２に搭載される。車両制御装置３０は、コントローラ３１と、記憶部３２と、通信部３３と、を備える。<2-2. Vehicle control device and other components>
The vehicle control device 30 is, for example, an ECU (Electronic Control Unit) for vehicle control, and is mounted on the vehicle V2. The vehicle control device 30 includes a controller 31, a storage unit 32, and a communication unit 33.

コントローラ３１は、車両Ｖ２の運転者であるユーザＵ２による運転操作や、接続された各種センサ（不図示）、例えば車速センサ、空燃比センサ、操舵角センサ等からの情報に基づき各種車両制御、例えば車両Ｖ２の向きや速度等の制御を行う。さらに、コントローラ３１は、通信部３３を介し、感情推定装置２０からユーザＵ２の感情に係る情報（推定感情）を受信する。そして、コントローラ３１は、受信した推定感情情報も利用した車両制御、例えば速度、アクセル感度、ブレーキ感度等の制御を行う。具体的には、コントローラ３１は、運転者が興奮した状態では、アクセル感度を低くする（加速し難くする）、ブレーキ感度を高くする（停止し易くする）、最高速度制御の上限を低くする、という安全側に寄せた、つまり運転者の興奮度の影響を抑えたような走行制御を行う。The controller 31 performs various vehicle controls, such as control of the direction and speed of the vehicle V2, based on driving operations by the user U2, who is the driver of the vehicle V2, and information from various connected sensors (not shown), such as a vehicle speed sensor, an air-fuel ratio sensor, and a steering angle sensor. Furthermore, the controller 31 receives information related to the emotions of the user U2 (estimated emotions) from the emotion estimation device 20 via the communication unit 33. The controller 31 then performs vehicle control using the received estimated emotion information, such as control of the speed, accelerator sensitivity, and brake sensitivity. Specifically, when the driver is excited, the controller 31 performs driving control that is on the safe side, that is, that suppresses the influence of the driver's excitement, by lowering the accelerator sensitivity (making it harder to accelerate), increasing the brake sensitivity (making it easier to stop), and lowering the upper limit of the maximum speed control.

また、推定感情情報自体の報知や、当該推定感情情報に応じた報知を行う。具体的には、コントローラ３１は、運転者が興奮した状態では、「落ち着きましょう」と言う表示や音声案内を行い、各種音声案内における音声の質を落ち着いた話し方・表現とする制御を行う。The controller 31 also notifies the driver of the estimated emotion information itself and notifies the driver according to the estimated emotion information. Specifically, when the driver is excited, the controller 31 displays and provides voice guidance such as "calm down," and controls the quality of the voice guidance to be calm in speech and expression.

記憶部３２は、記憶部１２と同様に各種メモリにより構成され、コントローラ３１が処理で使用するデータ等、例えばプログラム、処理用係数データ、処理中の一時記憶データ等を記憶する。そして、記憶部３２には、感情推定装置２０から受信した感情に係る情報と、車両Ｖ２の制御信号及び車室内のユーザＵ２へ報知情報との対応付けがなされた各種処理用の複数のデータテーブルが設けられる。The storage unit 32 is composed of various memories, similar to the storage unit 12, and stores data used by the controller 31 in processing, such as programs, processing coefficient data, and temporary storage data during processing. The storage unit 32 is provided with multiple data tables for various processing in which emotion-related information received from the emotion estimation device 20 is associated with control signals for the vehicle V2 and notification information for the user U2 in the vehicle cabin.

通信部３３は、感情推定装置２０、アクチュエータ部４１、及び報知部４２との間でデータの通信を行うためのインタフェースであり、例えばＣＡＮ（Controller Area Network）インターフェースである。The communication unit 33 is an interface for communicating data between the emotion estimation device 20, the actuator unit 41, and the notification unit 42, and is, for example, a CAN (Controller Area Network) interface.

アクチュエータ部４１は、車両の各種動作を実現するモータ等で構成された各種駆動部品で、車両制御装置３０によってその動作が駆動制御される。具体的に言えば、例えば、アクチュエータ部４１は、車両Ｖ２において駆動力を発生させるエンジン及びモータと、車両Ｖ２のステアリングを駆動させるステアリングアクチュエータと、車両Ｖ２のブレーキを駆動させるブレーキアクチュエータ等である。The actuator unit 41 is a variety of drive components composed of motors and the like that realize various vehicle operations, and the operations are driven and controlled by the vehicle control device 30. Specifically, for example, the actuator unit 41 is an engine and a motor that generate drive force in the vehicle V2, a steering actuator that drives the steering of the vehicle V2, a brake actuator that drives the brakes of the vehicle V2, etc.

報知部４２は、車両制御装置３０からの報知情報を、視覚的、聴覚的方法等により車室内のユーザＵ２等に報知する。具体的に言えば、例えば、報知部４２は、ユーザＵ２に、文字、画像、映像等によって報知情報を伝える液晶ディスプレイや音声、警告音等によって報知情報を伝えるスピーカ等で構成される。The notification unit 42 notifies the user U2 in the vehicle cabin of the notification information from the vehicle control device 30 by visual or auditory means. Specifically, for example, the notification unit 42 is composed of a liquid crystal display that conveys notification information to the user U2 by characters, images, videos, etc., and a speaker that conveys notification information by voice, warning sounds, etc.

車両制御装置３０は、各種センサからの情報、及び感情推定装置２０から提供されたユーザＵ２の感情に基づいて車両Ｖ２の制御信号を生成して出力する。具体的に言えば、例えばユーザＵ２が感情を高揚させている場合や、恐怖を感じている場合に、車両制御装置３０は、表示部やスピーカを介して「気持ちを落ち着かせましょう」といったことを表示、音声によって伝える。また、車両制御装置３０は、例えば、車速センサや障害物センサ（レーダ等）等からの情報に基づき車両Ｖ２の速度制御を行うが、ユーザＵ２が感情を高揚させている場合には通常の感情に比べで速度を遅くした速度制御を行う。The vehicle control device 30 generates and outputs a control signal for the vehicle V2 based on information from various sensors and the emotion of the user U2 provided by the emotion estimation device 20. Specifically, for example, when the user U2 is excited or scared, the vehicle control device 30 displays and sounds a message such as "calm down" via the display unit and speaker. The vehicle control device 30 also controls the speed of the vehicle V2 based on information from, for example, a vehicle speed sensor and an obstacle sensor (radar, etc.), and when the user U2 is excited, the vehicle control device 30 controls the speed to be slower than when the user is normally excited.

すなわち、車両制御システム４０は、車両Ｖ２の運転者であるユーザＵ２の感情を推定し、推定した当該感情も考慮した車両Ｖ２の動作、例えば走行制御を行う。この構成によれば、ユーザＵ２の感情に応じた車両Ｖ２の制御を行うことができる。したがって、本実施形態によれば、安全運転に影響のある運転者の感情に応じた車両Ｖ２の制御が可能となるので、車両Ｖ２の安全運転に寄与することが可能である。That is, the vehicle control system 40 estimates the emotions of the user U2, who is the driver of the vehicle V2, and performs operation of the vehicle V2, for example, driving control, taking into account the estimated emotions. With this configuration, the vehicle V2 can be controlled according to the emotions of the user U2. Therefore, according to this embodiment, it is possible to control the vehicle V2 according to the emotions of the driver that affect safe driving, which can contribute to safe driving of the vehicle V2.

＜２－３．感情推定装置（コントローラ）の処理＞
図５は、図４の感情推定装置２０のコントローラ２３が実行する感情推定処理を示すフローチャートである。このフローチャートは、コンピュータ装置に感情推定処理を実現させるコンピュータプログラムの技術的内容を示す。また、当該コンピュータプログラムは、読み取り可能な各種不揮発性記録媒体に記憶され、提供（販売、流通等）される。当該コンピュータプログラムは、１つのプログラムのみで構成されても良いが、協働する複数のプログラムによって構成されても良い。<2-3. Processing of the emotion estimation device (controller)>
5 is a flowchart showing emotion estimation processing executed by the controller 23 of the emotion estimation device 20 in FIG. 4. This flowchart shows the technical contents of a computer program that causes a computer device to realize the emotion estimation processing. In addition, the computer program is stored in various readable non-volatile recording media and provided (sold, distributed, etc.). The computer program may be composed of only one program, or may be composed of multiple programs that work together.

図５に示す処理は、車両Ｖ２が始動し、推定感情に基づく各種制御を開始するタイミングで起動し、その後各種制御に対して適当なタイミングで、つまり感情推定が必要なタイミングで、或いは感情推定が必要な時間帯に渡って繰り返し実行される。The process shown in FIG. 5 is started when the vehicle V2 starts and various controls based on estimated emotions are initiated, and is then repeatedly executed at appropriate times for the various controls, i.e., when emotion estimation is required, or over a period of time when emotion estimation is required.

ステップＳ１１１において、コントローラ２３（取得部２３１）は、ユーザＵ２の挙動を示すデータ、具体的にはユーザＵ２に対するカメラ撮影画像、マイク集音音声を取得し、ステップＳ１１２に移る。取得した各種情報は、必要に応じて記憶部２２のデータテーブルに記憶される。In step S111, the controller 23 (acquisition unit 231) acquires data indicating the behavior of the user U2, specifically, images captured by a camera and audio collected by a microphone of the user U2, and proceeds to step S112. The acquired information is stored in a data table of the storage unit 22 as necessary.

ステップＳ１１２において、コントローラ２３（挙動判定部２３２）は、ステップＳ１１１で取得した（取得部２３１が取得した）各データ（カメラ撮影画像、マイク集音音声）に基づいてユーザＵ２の挙動、具体的には視線、顔向き、音声（内容）を判定し、ステップＳ１１３に移る。In step S112, the controller 23 (behavior determination unit 232) determines the behavior of the user U2, specifically, the line of sight, facial direction, and voice (contents), based on the data (camera captured image, microphone collected sound) acquired in step S111 (acquired by the acquisition unit 231), and proceeds to step S113.

ステップＳ１１３において、コントローラ２３（感情推定部２３３）は、ステップＳ１１２で判定した挙動判定結果（挙動判定部２３２による挙動判定結果）に基づく各データを入力値として第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢに入力し、それぞれ覚醒度及び活性度の感情指標を第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢに推定させ、ステップＳ１１４に移る。In step S113, the controller 23 (emotion estimation unit 233) inputs each piece of data based on the behavior judgment result (behavior judgment result by the behavior judgment unit 232) determined in step S112 as an input value to the first AI model 121mA and the second AI model 121mB, causes the first AI model 121mA and the second AI model 121mB to estimate the emotion indices of arousal and activity, respectively, and proceeds to step S114.

ステップＳ１１４において、コントローラ２３（感情推定部２３３）は、推定した２種類の感情指標値（覚醒度、活性度）を感情推定モデル２２２ｍ（心理平面テーブル２２４のデータ）に適用（入力）して、ユーザＵ２の感情を推定し、ステップＳ１１５に移る。In step S114, the controller 23 (emotion estimation unit 233) applies (inputs) the two estimated emotion index values (arousal level, activity level) to the emotion estimation model 222m (data from the psychological plane table 224) to estimate the emotion of the user U2, and then proceeds to step S115.

ステップＳ１１５において、コントローラ２３（提供部２３４）は、車両制御装置３０に感情推定部２３３が推定したユーザＵ２の推定感情情報を提供（出力）し、処理を終了する。In step S115, the controller 23 (providing unit 234) provides (outputs) the estimated emotion information of the user U2 estimated by the emotion estimation unit 233 to the vehicle control device 30, and ends the processing.

なお、１回毎の推定された感情は、センサ出力のノイズ等による悪影響（誤判定）があると推定されるので、予め定めた所定期間の平均・最頻値を採用する等、統計的処理を施した推定感情を採用することが好ましい。このため、コントローラ２３（提供部２３４）は、推定感情にタイムスタンプを付加して蓄積し、統計的処理を施した推定感情を車両制御装置３０に提供する（ステップＳ１１５）ことが好ましい。なお、車両制御装置３０が推定感情情報を蓄積して統計的処理を施し、その結果を制御に用いる方法も適用可能である。In addition, since it is assumed that each estimated emotion is adversely affected (misjudgment) due to noise in the sensor output, etc., it is preferable to adopt an estimated emotion that has been subjected to statistical processing, such as adopting the average or most frequent value over a predetermined period. For this reason, it is preferable that the controller 23 (providing unit 234) adds a timestamp to the estimated emotion, stores it, and provides the estimated emotion that has been subjected to statistical processing to the vehicle control device 30 (step S115). In addition, a method in which the vehicle control device 30 stores estimated emotion information, performs statistical processing, and uses the results for control is also applicable.

＜３．感情推定用ＡＩの学習方法＞
次に、第１ＡＩモデル１２１ｍａ及び第２ＡＩモデル１２１ｍｂの学習方法について説明する。なお、学習前と学習後のモデルを識別し易くするために、学習前モデルは第１ＡＩモデル１２１ｍａ及び第２ＡＩモデル１２１ｍｂとし、また学習後モデルは第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢとし、符号の最後尾文字で学習前後を識別できるように表記する。図６は、感情推定ＡＩの学習方法の一例を示す説明図である。3. Learning method for emotion estimation AI
Next, the learning method of the first AI model 121ma and the second AI model 121mb will be described. In order to easily distinguish the models before and after learning, the models before learning are called the first AI model 121ma and the second AI model 121mb, and the models after learning are called the first AI model 121mA and the second AI model 121mB, and the models before and after learning are written so that the last character of the code can be distinguished. Figure 6 is an explanatory diagram showing an example of a learning method of the emotion estimation AI.

本実施形態において、ユーザＵ１は、学習データ生成のための被験者であり、感情推定装置２０の開発チームにおけるメンバー等が当該被験者となる。なお、使用者が限定される個人専用の感情推定装置２０の場合、被験者は使用者であることが好ましいが、他人であっても類似性はあるので、被験者は使用者と他人であっても良い。また、使用者が限定されない汎用の感情推定装置２０の場合、被験者は使用者と他人となるが、複数の被験者で学習データを生成すれば、いろいろな特性の被験者による学習データが得られ、汎用性が高い感情推定装置２０となることが期待できる。また、本実施形態においては、車載の感情推定装置２０を想定しているので、環境が近い車載装置で学習を行うこととしており、被験者のユーザＵ１は車両Ｖ１の運転者である。In this embodiment, the user U1 is a subject for generating learning data, and the subject is a member of the development team of the emotion estimation device 20. In the case of a personal emotion estimation device 20 for which the user is limited, the subject is preferably the user, but since there is similarity even if the subject is a different person, the subject may be the user and a different person. In the case of a general-purpose emotion estimation device 20 for which the user is not limited, the subject is the user and a different person, but if learning data is generated by multiple subjects, learning data from subjects with various characteristics can be obtained, and it is expected that the emotion estimation device 20 will be highly versatile. In this embodiment, since the emotion estimation device 20 is assumed to be mounted on an in-vehicle device, learning is performed in a similar environment, and the subject user U1 is the driver of the vehicle V1.

なお、ＡＩモデルの学習に用いる学習データは、ＡＩモデルの用途によって適切な情報が決まる。例えば、特定の運転者に対するＡＩ適用装置の場合には該当の運転者における情報が適切な情報となり、いろいろな運転者に対するＡＩ適用装置の場合には多種多様な運転者における情報が適切な情報となる。また、運転者に対するＡＩ適用装置の場合には運転者における情報が適切な情報となり、運転者以外の車両各乗員に対するＡＩ適用装置の場合には運転者以外の各種乗員における情報が適切な情報となる。そして、学習データは、ＡＩモデルによる感情推定の精度を高くするために、多くの情報を必要とする。The appropriate learning data used to train an AI model is determined by the application of the AI model. For example, in the case of an AI-applied device for a specific driver, information on that driver will be the appropriate information, and in the case of an AI-applied device for various drivers, information on a wide variety of drivers will be the appropriate information. In addition, in the case of an AI-applied device for a driver, information on the driver will be the appropriate information, and in the case of an AI-applied device for each vehicle occupant other than the driver, information on various occupants other than the driver will be the appropriate information. In order to increase the accuracy of emotion estimation by the AI model, a large amount of information is required as learning data.

本実施形態においては、ＡＩ適用装置を、運転者全般（特定の個人ではなく、一般の運転者）に適用できることとする。それら運転者の感情に基づき車両の制御を行う装置とするため、ＡＩ学習のための実験走行試運転は、いろいろなタイプの複数被験者によるいろいろなパターンの実験走行試運転が好ましい。なお、各被験者による学習は同様であるので、ある一人の被験者（運転者）での学習について説明する。In this embodiment, the AI-applied device is applicable to drivers in general (not specific individuals, but general drivers). In order to make the device control the vehicle based on the emotions of the driver, it is preferable that the experimental test drive for learning the AI be an experimental test drive with various patterns and multiple subjects of various types. Since the learning by each subject is similar, the learning by one subject (driver) will be explained.

また、例えば、ＡＩ適用装置が医療機関における診療装置の場合、被験者に適したユーザＵ１（情報収集対象）は医療機関における患者や医師等になる。また、ＡＩ適用装置が教育機関における教育指導装置の場合、被験者に適したユーザＵ１は生徒や教師等になる。また、ＡＩ適用装置がｅスポーツ関連装置、エンターテイメントのコンテンツ関連装置の場合、被験者に適したユーザＵ１はｅスポーツのプレイヤーやコンテンツの視聴者等になる。For example, if the AI-applied device is a medical treatment device in a medical institution, a suitable user U1 (information collection target) for the subject would be a patient or a doctor in the medical institution. If the AI-applied device is an educational instruction device in an educational institution, a suitable user U1 for the subject would be a student or a teacher. If the AI-applied device is an e-sports-related device or an entertainment content-related device, a suitable user U1 for the subject would be an e-sports player or a content viewer.

また、学習装置１０を車両Ｖ１に搭載して学習を行わせることも、また学習装置１０を研究開発室等の車両Ｖ１以外の場所に設置して学習を行わせることも可能である。後者の場合、車両Ｖ１に乗車した被験者のユーザＵ１の各生体データ等（脳波、心拍、画像、音声等の計測データ）を、記録媒体を介して、或いは通信回線を介して学習装置１０に入力する。なお、本実施形態では、学習装置１０を車両Ｖ１に搭載して学習を行わせる例について説明する。It is also possible to mount the learning device 10 on the vehicle V1 and have it perform learning, or to install the learning device 10 in a location other than the vehicle V1, such as a research and development laboratory, and have it perform learning. In the latter case, each biological data (measurement data such as brain waves, heart rate, images, and voice) of the subject user U1 who is riding in the vehicle V1 is input to the learning device 10 via a recording medium or a communication line. In this embodiment, an example in which the learning device 10 is mounted on the vehicle V1 and has it perform learning will be described.

車両Ｖ１に搭載された学習装置１０の記憶装置（メモリ等）にＡＩモデルの記憶部が設けられ、学習前の第１ＡＩモデル１２１ｍａ及び第２ＡＩモデル１２１ｍｂが記憶される。そして、学習装置１０が以下で説明する学習処理を実行することにより、学習前のＡＩモデルの学習が行われ、学習済ＡＩモデル、つまりＡＩ適用装置に実装される第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢが生成されることになる。A storage unit for the AI models is provided in a storage device (memory, etc.) of the learning device 10 mounted on the vehicle V1, and the first AI model 121ma and the second AI model 121mb before learning are stored. Then, the learning device 10 executes the learning process described below, whereby the AI models before learning are learned, and the learned AI models, that is, the first AI model 121mA and the second AI model 121mB to be implemented in the AI application device, are generated.

学習後の第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢは、前述の図１、図４に示した感情推定装置２０で使用されることになる。したがって、第１ＡＩモデル１２１ｍａの学習データは、入力データがユーザＵ１の視線や顔の向き、表情、音声等の挙動情報（感情推定装置２０の第１ＡＩモデル１２１ｍＡの入力データと同じ（種別、様式の）データ）で、正解データが感情指標値の覚醒度である。また、第２ＡＩモデル１２１ｍｂの学習データは、入力データがユーザＵ１の視線や顔の向き、表情、音声等の挙動情報（感情推定装置２０の第２ＡＩモデル１２１ｍＢの入力データと同じ（種別、様式の）データ）で、正解データが感情指標値の活性度である。The first AI model 121mA and the second AI model 121mB after learning will be used in the emotion estimation device 20 shown in Figures 1 and 4 described above. Therefore, the learning data of the first AI model 121ma is input data of behavior information such as the gaze, face direction, facial expression, and voice of the user U1 (data of the same type and style as the input data of the first AI model 121mA of the emotion estimation device 20), and the correct answer data is the arousal level of the emotion index value. Also, the learning data of the second AI model 121mb is input data of behavior information such as the gaze, face direction, facial expression, and voice of the user U1 (data of the same type and style as the input data of the second AI model 121mB of the emotion estimation device 20), and the correct answer data is the activity level of the emotion index value.

このため、学習装置１０が搭載される車両Ｖ１には、感情推定装置２０で使用される（感情推定装置２０に入力される）データを出力する各種の車載センサが設けられる。本実施形態において、具体的には、車載センサは、カメラＣと、マイクＭとを含む。カメラＣは、運転者Ｕ１の視線と顔の向きのデータが得られるように、顔面が映った撮影画像情報を出力する。マイクＭは、運転者Ｕ１が発する発生音等の音声情報を検出する。カメラＣ及びマイクＭは、例えば車両Ｖ１のフロントガラス、或いはダッシュボード付近に、運転者Ｕ１の方向を撮影方向及び集音方向とするように設置される。For this reason, the vehicle V1 on which the learning device 10 is mounted is provided with various on-board sensors that output data used by (input to) the emotion estimation device 20. In this embodiment, specifically, the on-board sensors include a camera C and a microphone M. The camera C outputs captured image information showing the face of the driver U1 so that data on the line of sight and facial direction of the driver U1 can be obtained. The microphone M detects audio information such as sounds emitted by the driver U1. The camera C and the microphone M are installed, for example, near the windshield or dashboard of the vehicle V1 so that the direction of the driver U1 is the direction of the image capture and sound collection.

そして、カメラＣ及びマイクＭの出力は、図４の挙動判定部２３２と同様の構成の挙動判定部１３４によりの挙動データ化の処理が行われ、視線、顔向き、音声の挙動データに加工される。そして、これらの挙動データが、学習装置の第１ＡＩモデル１２１ｍａ及び第２ＡＩモデル１２１ｍｂに入力値として入力されることになる。The outputs of the camera C and microphone M are processed into behavior data by a behavior determination unit 134 having a configuration similar to that of the behavior determination unit 232 in FIG. 4, and are processed into behavior data of gaze, facial direction, and voice. These behavior data are then input as input values to the first AI model 121ma and the second AI model 121mb of the learning device.

運転者Ｕ１には、生体センサが装着される。本実施形態において、生体センサは、脳波を検出する脳波センサＳ１１と、心拍を検出する心拍センサＳ１２と、である。脳波センサＳ１１は、例えばヘッドギア型の脳波センサが用いられる。心拍センサＳ１２は、例えば胸ベルト型の心電式心拍センサが用いられる。なお、生体センサは、取得したい生体情報、装着性等に応じて他のセンサを追加、変更しても良い。他の生体センサとしては、例えば光学式心拍（脈拍）センサ、血圧計、またはＮＩＲＳ（Near Infrared Spectroscopy）装置等であっても良い。The driver U1 is fitted with a biosensor. In this embodiment, the biosensors are an EEG sensor S11 that detects brain waves and a heart rate sensor S12 that detects heart rate. The EEG sensor S11 may be, for example, a headgear-type EEG sensor. The heart rate sensor S12 may be, for example, a chest belt-type electrocardiogram-type heart rate sensor. Note that the biosensor may be replaced or supplemented with other sensors depending on the bioinformation to be acquired, wearability, etc. Examples of other biosensors include an optical heart rate (pulse) sensor, a blood pressure monitor, or a near infrared spectroscopy (NIRS) device.

そして、脳波センサＳ１１が出力する脳波データと、心拍センサＳ１２が出力する心拍データは、感情指標値である覚醒度及び活性度に変換される。具体的には、前述の覚醒度及び活性度の算出方法と同様の方法で、覚醒度は「脳波のβ波／α波」で算出され、活性度は「心拍ＬＦ（Low Frequency）成分（心拍波形信号の低周波成分）の標準偏差」で算出される。Then, the brainwave data output by the brainwave sensor S11 and the heartbeat data output by the heartbeat sensor S12 are converted into emotional index values, alertness and activity. Specifically, in a manner similar to the method for calculating alertness and activity described above, alertness is calculated using "β waves/α waves of brainwaves" and activity is calculated using "standard deviation of heartbeat LF (Low Frequency) components (low frequency components of the heartbeat waveform signal)."

さらに、各感情指標値は、個人差による影響、環境による影響等を低減するため、正規化処理が施される。そして、正規化された感情指標値の覚醒度は、正解値として第１ＡＩモデル１２１ｍａに入力される。また、正規化された感情指標値の活性度は、正解値として第２ＡＩモデル１２１ｍｂに入力される。Furthermore, each emotion index value is normalized to reduce the influence of individual differences, the influence of the environment, and the like. The arousal level of the normalized emotion index value is then input to the first AI model 121ma as the correct answer value. Furthermore, the activity level of the normalized emotion index value is input to the second AI model 121mb as the correct answer value.

つまり、第１ＡＩモデル１２１ｍａは、入力値（例題）が視線、顔向き、及び音声の挙動データであり、正解値が正規化された覚醒度である教師付き学習データセットで学習が行われることになる。したがって、学習済み第１ＡＩモデル１２１ｍＡは、図１に示すような入力値を視線、顔向き、音声の挙動データとし、出力を覚醒度とするＡＩモデルとして生成される。In other words, the first AI model 121ma is trained using a supervised learning dataset in which the input values (example questions) are behavioral data of gaze, face direction, and voice, and the correct answer value is normalized alertness. Therefore, the trained first AI model 121mA is generated as an AI model in which the input values are behavioral data of gaze, face direction, and voice, as shown in FIG. 1, and the output is alertness.

また、第２ＡＩモデル１２１ｍｂは、入力値（例題）が視線、顔向き、及び音声の挙動データであり、正解値が正規化された活性度である教師付き学習データセットで学習が行われることになる。したがって、学習済み第２ＡＩモデル１２１ｍＢは、図１に示すような入力値を視線、顔向き、音声の挙動データとし、出力を活性度とするＡＩモデルとして生成される。The second AI model 121mb is trained using a supervised learning dataset in which the input values (example questions) are behavioral data of gaze, face direction, and voice, and the correct answer value is normalized activity. Therefore, the trained second AI model 121mB is generated as an AI model in which the input values are behavioral data of gaze, face direction, and voice, as shown in FIG. 1, and the output is activity.

続いて、図６を参照して、第１ＡＩモデル１２１ｍａ及び第２ＡＩモデル１２１ｍｂの学習方法（ＡＩの学習プロセス）の流れを説明する。なお、センサの装着等の具体的な作業は、ＡＩモデルの学習担当作業者、被験者（運転者Ｕ１）等が行うことになる。Next, the flow of the learning method (AI learning process) of the first AI model 121ma and the second AI model 121mb will be explained with reference to FIG. 6. Note that the specific tasks such as mounting the sensor will be performed by the worker in charge of learning the AI model, the subject (driver U1), etc.

学習装置１０の記憶装置（メモリ等）には、視線、顔の向き、音声の情報を入力として覚醒度を推定する第１ＡＩモデル１２１ｍａ（学習前）と、視線、顔の向き、音声の情報を入力として活性度を推定する第２ＡＩモデル１２１ｍｂ（学習前）が記憶される。なお、第１ＡＩモデル１２１ｍａ（学習前）及び第２ＡＩモデル１２１ｍｂ（学習前）は、事前にＡＩ設計開発者等が、ＡＩ設計用のコンピュータを用いて、またＡＩモデルの仕様（使用形態、要求性能等）に応じて、別途、作成することになる。The storage device (memory, etc.) of the learning device 10 stores a first AI model 121ma (before learning) that estimates alertness using gaze, face direction, and voice information as input, and a second AI model 121mb (before learning) that estimates activity using gaze, face direction, and voice information as input. Note that the first AI model 121ma (before learning) and the second AI model 121mb (before learning) are created separately in advance by an AI design developer, etc., using a computer for AI design, and according to the specifications of the AI model (usage form, required performance, etc.).

運転者Ｕ１は、学習装置１０が搭載された車両に乗車し、車両に搭載されたカメラＣ及びマイクＭの位置、向き、感度等の調整等を行う。また、運転者Ｕ１は、脳波センサＳ１１及び心拍センサＳ１２を装着する。以上のような学習の準備が整うと、運転者Ｕ１等が学習装置１０を起動し学習が開始される。なお、必要に応じて（車両走行が学習条件であるなど）、運転者Ｕ１は車両の運転（操作）を開始する。The driver U1 gets into the vehicle equipped with the learning device 10 and adjusts the position, direction, sensitivity, etc. of the camera C and microphone M mounted on the vehicle. The driver U1 also wears the brain wave sensor S11 and heart rate sensor S12. Once the above-mentioned preparations for learning are complete, the driver U1 etc. starts up the learning device 10 and learning begins. If necessary (for example, when the vehicle is running, which is a learning condition), the driver U1 starts driving (operating) the vehicle.

カメラＣ及びマイクＭは、ユーザＵ１の画像及び発する音声を検出し、画像情報及び音声情報を学習装置１０に出力する。The camera C and microphone M detect the image and voice of the user U1, and output the image information and voice information to the learning device 10.

脳波センサＳ１１及び心拍センサＳ１２は、ユーザＵ１の生体信号である脳波及び心拍を検出し、学習装置１０に出力する。The brain wave sensor S11 and the heart rate sensor S12 detect the brain waves and heart rate, which are bio-signals of the user U1, and output them to the learning device 10.

そして、学習装置１０は、同じタイミングのユーザＵ１の画像情報及び音声情報と、脳波及び心拍とを対のデータ（１個の学習データ）として、以降の処理を行う。Then, the learning device 10 performs subsequent processing using the image information and audio information of user U1 at the same timing, and the brainwaves and heart rate as a pair of data (one learning data).

学習装置１０は、入力された生体信号の脳波及び心拍を、感情指標値化処理により覚醒度及び活性度に変換する。さらに、学習装置１０は、変換生成した覚醒度及び活性度に正規化処理を施し、正規化覚醒度及び正規化活性度を生成する。また、学習装置１０は、入力されたユーザＵ１の画像情報及び音声情報を挙動データ化して、視線、顔の向き、音声の情報を生成する。The learning device 10 converts the inputted biosignals of brain waves and heart rate into alertness and activity levels through emotion index value processing. Furthermore, the learning device 10 performs normalization processing on the converted and generated alertness and activity levels to generate normalized alertness and normalized activity levels. The learning device 10 also converts the inputted image information and voice information of user U1 into behavior data to generate information on gaze, face direction, and voice.

そして、学習装置１０は、視線、顔の向き、音声の情報を入力値とし、正規化覚醒度を正解値とする第１ＡＩモデル１２１ｍａ用の学習データを生成する。また、学習装置１０は、視線、顔の向き、音声の情報を入力値とし、正規化活性度を正解値とする第２ＡＩモデル１２１ｍｂ用の学習データを生成する。そして、学習装置１０は、各タイミングにおけるこれら学習データを集合化して、各モデル用の学習データセットを形成する。その後、学習装置１０は、生成した当該各学習データセットで、第１ＡＩモデル１２１ｍａ及び第２ＡＩモデル１２１ｍｂの学習を行う。Then, the learning device 10 generates learning data for the first AI model 121ma, which uses the gaze, face direction, and voice information as input values and the normalized alertness as a correct value. The learning device 10 also generates learning data for the second AI model 121mb, which uses the gaze, face direction, and voice information as input values and the normalized activity as a correct value. The learning device 10 then aggregates these learning data at each timing to form a learning data set for each model. The learning device 10 then trains the first AI model 121ma and the second AI model 121mb using each of the generated learning data sets.

具体的には、学習装置１０は、生成した各学習データセットの学習データを順次、第１ＡＩモデル１２１ｍａ及び第２ＡＩモデル１２１ｍｂに入力する。そして、学習装置１０は、誤差逆伝播学習法等の学習アルゴリズムを用いて、第１ＡＩモデル１２１ｍａ及び第２ＡＩモデル１２１ｍｂにおける重み等のパラメータを調整する等の学習を行う。Specifically, the learning device 10 sequentially inputs the learning data of each generated learning data set into the first AI model 121ma and the second AI model 121mb. Then, the learning device 10 performs learning such as adjusting parameters such as weights in the first AI model 121ma and the second AI model 121mb using a learning algorithm such as an error backpropagation learning method.

これにより、各センサにより順次検出される各データに基づき生成される数多くの教師付き学習データにより、第１ＡＩモデル１２１ｍａ及び第２ＡＩモデル１２１ｍｂの学習が行われ、感情推定装置２０で使用される学習済み第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢが生成されることになる。As a result, the first AI model 121ma and the second AI model 121mb are trained using a large amount of supervised learning data generated based on each piece of data detected sequentially by each sensor, and the trained first AI model 121mA and second AI model 121mB used by the emotion estimation device 20 are generated.

＜４．感情推定用ＡＩ学習装置の構成＞
図７は、学習装置１０の一例を示す構成図である。図７では、本実施形態の特徴を説明するために必要な構成要素が示されており、一般的な構成要素の記載は省略されている。4. Configuration of the emotion estimation AI learning device
Fig. 7 is a configuration diagram showing an example of a learning device 10. Fig. 7 shows components necessary for explaining the features of this embodiment, and omits the description of general components.

図７に示すように、学習装置１０は、通信部１１と、記憶部１２と、コントローラ１３と、を備える。学習装置１０は、いわゆるコンピュータ装置で構成できる。なお、図示は省略するが、学習装置１０は、キーボード等の入力装置や、ディスプレイ等の出力装置を備える。As shown in FIG. 7, the learning device 10 includes a communication unit 11, a memory unit 12, and a controller 13. The learning device 10 can be configured as a so-called computer device. Although not shown in the figure, the learning device 10 includes an input device such as a keyboard and an output device such as a display.

通信部１１は、通信ネットワークを介して他の装置、各種センサ類との間でデータの通信を行うためのインタフェースである。通信部１１は、例えばＮＩＣ（Network Interface Card）で構成される。The communication unit 11 is an interface for communicating data with other devices and various sensors via a communication network. The communication unit 11 is configured, for example, with a NIC (Network Interface Card).

記憶部１２は、揮発性メモリ及び不揮発性メモリを含んで構成される。揮発性メモリには、例えばＲＡＭ（Random Access Memory）で構成される。不揮発性メモリには、例えばＲＯＭ（Read Only Memory）、フラッシュメモリ、ハードディスクドライブで構成される。不揮発性メモリには、コントローラ１３により読み取り可能なプログラム及びデータが格納される。不揮発性メモリに格納されるプログラム及びデータの少なくとも一部は、有線や無線で接続される他のコンピュータ装置（サーバ装置）、または可搬型記録媒体から取得される構成としても良い。The storage unit 12 is configured to include volatile memory and non-volatile memory. The volatile memory is, for example, RAM (Random Access Memory). The non-volatile memory is, for example, ROM (Read Only Memory), flash memory, and a hard disk drive. The non-volatile memory stores programs and data that can be read by the controller 13. At least a portion of the programs and data stored in the non-volatile memory may be obtained from another computer device (server device) connected by wire or wirelessly, or from a portable recording medium.

記憶部１２には、第１ＡＩモデル記憶部１２１Ａと、第２ＡＩモデル記憶部１２１Ｂとが設けられる。第１ＡＩモデル記憶部１２１Ａと第２ＡＩモデル記憶部１２１Ｂとのそれぞれには、学習対象の第１ＡＩモデル１２１ｍａと第２ＡＩモデル１２１ｍｂとが記憶される。さらに、記憶部１２には、各種処理用のデータテーブルとして、タスクテーブル１２３と、ニュートラル領域テーブル１２４とが設けられる。タスクテーブル１２３及びニュートラル領域テーブル１２４については後述する。The memory unit 12 is provided with a first AI model memory unit 121A and a second AI model memory unit 121B. The first AI model memory unit 121A and the second AI model memory unit 121B store the first AI model 121ma and the second AI model 121mb to be learned, respectively. Furthermore, the memory unit 12 is provided with a task table 123 and a neutral area table 124 as data tables for various processes. The task table 123 and the neutral area table 124 will be described later.

コントローラ１３は、学習装置１０の各種機能を実現するもので、演算処理等を行うプロセッサを含む。プロセッサは、例えばＣＰＵ（Central Processing Unit）を含んで構成される。コントローラ１３は、１つのプロセッサで構成されても良いし、複数のプロセッサで構成されても良い。複数のプロセッサで構成される場合には、それらのプロセッサは互いに通信可能に接続され、協働して処理を実行する。なお、学習装置１０をクラウドサーバで構成することも可能であり、その場合、プロセッサを構成するＣＰＵは仮想ＣＰＵであって良い。The controller 13 realizes various functions of the learning device 10 and includes a processor that performs arithmetic processing, etc. The processor includes, for example, a CPU (Central Processing Unit). The controller 13 may be configured with one processor or multiple processors. When configured with multiple processors, the processors are connected to each other so that they can communicate with each other and work together to execute processing. It is also possible to configure the learning device 10 as a cloud server, in which case the CPU that configures the processor may be a virtual CPU.

コントローラ１３は、その機能として、取得部１３１と、指標値算出部１３２と、正規化部１３３と、挙動判定部１３４と、生成部１３５と、提供部１３６と、を備える。本実施形態においては、コントローラ１３の機能は、記憶部１２に記憶されるプログラムに従った演算処理をプロセッサが実行することによって実現される。The controller 13 includes, as its functions, an acquisition unit 131, an index value calculation unit 132, a normalization unit 133, a behavior determination unit 134, a generation unit 135, and a provision unit 136. In this embodiment, the functions of the controller 13 are realized by a processor executing arithmetic processing according to a program stored in the storage unit 12.

取得部１３１は、通信部１１を介して、カメラＣ、マイクＭ、脳波センサＳ１１、及び心拍センサＳ１２によって検出された各種情報（画像情報、音声情報、脳波情報、心拍情報）を取得する。取得部１３１は、取得した各種情報を、その後の処理のために必要に応じて記憶部１２に形成されたデータテーブルに記憶する。なお、取得部１３１は、実質的に同時刻におけるこれらの情報を取得して、一つのデータセットとしてデータテーブルにおける一つのデータレコードに記憶する。そして、これらデータは一つの教師付き学習データを生成するために使用される。The acquisition unit 131 acquires various pieces of information (image information, audio information, brainwave information, heartbeat information) detected by the camera C, microphone M, brainwave sensor S11, and heartbeat sensor S12 via the communication unit 11. The acquisition unit 131 stores the acquired various pieces of information in a data table formed in the storage unit 12 as necessary for subsequent processing. The acquisition unit 131 acquires these pieces of information at substantially the same time and stores them in one data record in the data table as one data set. These pieces of data are then used to generate one piece of supervised learning data.

指標値算出部１３２は、生体情報の脳波及び心拍のデータに基づき感情指標値である覚醒度及び活性度を算出する。前述したように、覚醒度は「脳波のβ波／α波」で算出することができる。また、活性度は「心拍ＬＦ成分の標準偏差」で算出することができる。なお、これら感情指標値の算出に必要な算出式等のデータは、記憶部１２に記憶される。The index value calculation unit 132 calculates the emotional index values of arousal and activity based on the brainwave and heartbeat data of biological information. As described above, arousal can be calculated using "brainwave beta waves/alpha waves." Activity can be calculated using the "standard deviation of the heartbeat LF component." Note that data such as the calculation formulas required to calculate these emotional index values is stored in the storage unit 12.

正規化部１３３は、感情指標値に対して正規化処理を施す。脳波情報、心拍情報といった生体信号、特に数値系の（定量的な）生体信号は個人差があり、また周囲環境の影響を受けるので、それらを低減させるために正規化処理を実行する。次に、正規化処理の具体的方法について説明する。The normalization unit 133 performs normalization processing on the emotion index value. Biosignals such as electroencephalogram information and heart rate information, particularly numerical (quantitative) biosignals, vary from person to person and are also influenced by the surrounding environment, so normalization processing is performed to reduce these differences. Next, a specific method for normalization processing will be described.

図８は、２人の被験者の感情指標値（覚醒度）の時間変化を示す模式図である。図８に示す４つのグラフにおいて、横軸は時間、縦軸は生体信号に基づく感情指標値である。なお、説明を分かり易くするため、ここでは感情指標値は覚醒度として説明する。Figure 8 is a schematic diagram showing the change over time in the emotion index value (arousal level) of two subjects. In the four graphs shown in Figure 8, the horizontal axis is time, and the vertical axis is the emotion index value based on the biosignal. For ease of understanding, the emotion index value will be explained here as the arousal level.

図８において、左側（ａ１、ａ２）は第１被験者のデータであり、右側（ｂ１、ｂ２）は第２被験者のデータであり、上段（ａ１、ｂ１）は正規化前のデータであり、下段（ａ２、ｂ２）は正規化後のデータである。なお、説明を分かり易くするため、各被験者において計測期間内で覚醒度が上限・下限に振れた状態例で説明する。In Figure 8, the left side (a1, a2) is the data of the first subject, the right side (b1, b2) is the data of the second subject, the top row (a1, b1) is the data before normalization, and the bottom row (a2, b2) is the data after normalization. For ease of explanation, an example will be given in which the alertness level fluctuates between the upper and lower limits during the measurement period for each subject.

図８（ａ１、ｂ１）に示すように、第１被験者と第２被験者とのそれぞれの覚醒度の振れ幅は大きく異なり、第１被験者よりも第２被験者のほうが大きくなっている。すなわち、この例では、第１被験者は計測される覚醒度の幅が狭い、換言すれば覚醒度に対する感度が低いと言える。これに対して、第２被験者は計測される覚醒度の幅が広い、換言すれば覚醒度に対する感度が高いと言える。なお、覚醒度に対する感度等は、被験者生来の個人差だけでなく、環境差、例えば周囲の温湿度や照明等の明るさ、被験者の状態、例えば空腹度などの計測時における各種条件の影響も受ける。As shown in Figure 8 (a1, b1), the fluctuation range of the alertness of the first and second subjects is significantly different, with the second subject having a larger fluctuation than the first subject. That is, in this example, the first subject has a narrow range of measured alertness, or in other words, a low sensitivity to alertness. In contrast, the second subject has a wide range of measured alertness, or in other words, a high sensitivity to alertness. Note that sensitivity to alertness is influenced not only by innate individual differences between subjects, but also by environmental differences, such as the surrounding temperature and humidity, brightness of lighting, and the subject's condition, such as hunger level, and various other conditions at the time of measurement.

感情指標値においては、その振れ幅の中央付近にニュートラル状態（中立状態：図８においてＮＲで表記）が存在する。覚醒度の場合、覚醒状態と非覚醒状態（睡眠様状態）との間に中立状態、いわゆる通常状態が存在する。このようなニュートラル状態の領域（領域境界）は、特定のタスクの実行により比較的判別（推定）し易い傾向にある。このため、本実施形態では、このニュートラル状態の領域を用いた正規化方法、つまりニュートラル状態の領域を同じにする感情指標値の補正を行う。In the emotion index value, a neutral state (represented by NR in FIG. 8) exists near the center of the range of fluctuation. In the case of alertness, a neutral state, or so-called normal state, exists between an alert state and a non-alert state (sleep-like state). Such neutral state regions (region boundaries) tend to be relatively easy to distinguish (estimate) by performing specific tasks. For this reason, in this embodiment, a normalization method using this neutral state region is used, that is, the emotion index value is corrected to make the neutral state region the same.

図８（ａ１、ｂ１）に示すように、ニュートラル領域ＮＲは、個人差等により各被験者間では、その位置（ニュートラル領域の上下限値：第１被験者ニュートラル上限値Ｍａｘ１、第１被験者ニュートラル下限値Ｍｉｎ１、第２被験者ニュートラル上限値Ｍａｘ２、第２被験者ニュートラル下限値Ｍｉｎ２）、幅（ニュートラル領域の上下限値の差：第１被験者ニュートラル幅Ｗ１（Ｍａｘ１－Ｍｉｎ１）、第２被験者ニュートラル幅Ｗ２（Ｍａｘ２－Ｍｉｎ２））が異なっている。As shown in Figure 8 (a1, b1), the neutral region NR differs between subjects due to individual differences, etc. in its position (upper and lower limits of the neutral region: first subject's neutral upper limit Max1, first subject's neutral lower limit Min1, second subject's neutral upper limit Max2, second subject's neutral lower limit Min2) and width (difference between upper and lower limits of the neutral region: first subject's neutral width W1 (Max1-Min1), second subject's neutral width W2 (Max2-Min2)).

そこで、各被験者のニュートラル領域ＮＲが、予め定めた標準のニュートラル領域ＮＲ０と同じ位置、幅（標準上限値Ｍａｘ０、標準下限値Ｍｉｎ０、標準幅Ｗ０（Ｍａｘ０－Ｍｉｎ０））となるように、感情指標値（覚醒度）の補正を行う。具体的には、以下の算出式を用いて感情指標値（覚醒度）の補正を行う。なお、活性度等の他の感情指標値も同様の方法（覚醒度に対する各パラメータを、補正対象感情指標値の対応するパラメータで置き換える）で補正できる。The emotion index value (arousal level) is then corrected so that the neutral region NR of each subject has the same position and width (standard upper limit Max0, standard lower limit Min0, standard width W0 (Max0-Min0)) as the predetermined standard neutral region NR0. Specifically, the emotion index value (arousal level) is corrected using the following calculation formula. Note that other emotion index values such as activity level can also be corrected in a similar manner (replace each parameter for arousal level with the corresponding parameter of the emotion index value to be corrected).

第１被験者覚醒度：計測値に基づく（補正前）覚醒度ＡＷ１Ｂ、補正後覚醒度ＡＷ１Ａ
第２被験者覚醒度：計測値に基づく（補正前）覚醒度ＡＷ２Ｂ、補正後覚醒度ＡＷ２Ａ
ＡＷ１Ａ＝ＡＷ１Ｂ×Ａ１＋Ｂ１
ＡＷ２Ａ＝ＡＷ２Ｂ×Ａ２＋Ｂ２
Ａ１＝（Ｍａｘ０－Ｍｉｎ０）／（Ｍａｘ１－Ｍｉｎ１）
Ａ２＝（Ｍａｘ０－Ｍｉｎ０）／（Ｍａｘ２－Ｍｉｎ２）
Ｂ１＝（（Ｍａｘ０＋Ｍｉｎ０）－（Ｍａｘ１＋Ｍｉｎ１）×Ａ１）／２
Ｂ２＝（（Ｍａｘ０＋Ｍｉｎ０）－（Ｍａｘ２＋Ｍｉｎ２）×Ａ２）／２ First subject's alertness: Based on the measured value (before correction) alertness AW1B, corrected alertness AW1A
Second subject's alertness: Based on the measured value (before correction) alertness AW2B, corrected alertness AW2A
AW1A=AW1B×A1+B1
AW2A=AW2B×A2+B2
A1=(Max0-Min0)/(Max1-Min1)
A2=(Max0-Min0)/(Max2-Min2)
B1=((Max0+Min0)-(Max1+Min1)×A1)/2
B2=((Max0+Min0)-(Max2+Min2)×A2)/2

なお、第ｎ被験者（ニュートラル上限値Ｍａｘｎ、ニュートラル下限値Ｍｉｎｎ）で一般化すると、次のようになる。If we generalize this to the nth subject (neutral upper limit Maxn, neutral lower limit Minn), we get the following:

第ｎ被験者覚醒度：計測値に基づく（補正前）覚醒度ＡＷｎＢ、補正後覚醒度ＡＷｎＡ
ＡＷｎＡ＝ＡＷｎＢ×Ａｎ＋Ｂｎ
Ａｎ＝（Ｍａｘ０－Ｍｉｎ０）／（Ｍａｘｎ－Ｍｉｎｎ）
Ｂｎ＝（（Ｍａｘ０＋Ｍｉｎ０）－（Ｍａｘｎ＋Ｍｉｎｎ）×Ａｎ）／２ nth subject's alertness: (before correction) alertness AWnB based on the measured value, and (after correction) alertness AWnA
AWnA=AWnB×An+Bn
An=(Max0-Min0)/(Maxn-Minn)
Bn=((Max0+Min0)-(Maxn+Minn)×An)/2

このような正規化方式（算出式）により、第ｎ被験者の覚醒度を正規化する場合、第ｎ被験者覚醒度のニュートラル上限値Ｍａｘｎ、第ｎ被験者ニュートラル下限値Ｍｉｎｎを推定する必要がある。このニュートラル領域の推定のために、被験者に特定のタスクを実行させ、その際の覚醒度を計測する。そして、その覚醒度の計測値に基づき第ｎ被験者覚醒度のニュートラル上限値Ｍａｘｎ、第ｎ被験者ニュートラル下限値Ｍｉｎｎを推定する。実行されるタスクは、タスクを行った被験者が特定の心身状態（特定状態）となるようなタスクであり、例えば覚醒度がニュートラル上限値を境にして上側で振れる等のタスクであり、医学・心理学等の分野の研究でエビデンスが得られているタスクが何種類か知られている。つまり、これらのタスクを被験者に行わせることで被験者の特定の心身状態とし、当該特定の心身状態における被験者生体信号（値）を用いて正規化処理を実行するという技術思想である。When normalizing the arousal level of the nth subject using such a normalization method (calculation formula), it is necessary to estimate the neutral upper limit value Maxn of the arousal level of the nth subject and the neutral lower limit value Minn of the nth subject. In order to estimate this neutral region, the subject is made to perform a specific task, and the arousal level at that time is measured. Then, the neutral upper limit value Maxn of the arousal level of the nth subject and the neutral lower limit value Minn of the nth subject are estimated based on the measured value of the arousal level. The task to be performed is a task that puts the subject who performed the task into a specific mental and physical state (specific state), for example, a task in which the arousal level fluctuates above the neutral upper limit value, and there are several types of tasks for which evidence has been obtained in research in the fields of medicine, psychology, etc. In other words, the technical idea is to have the subject perform these tasks to put the subject into a specific mental and physical state, and to perform normalization processing using the subject's biosignal (value) in that specific mental and physical state.

図９は、タスクテーブル１２３の一例を示す図である。図９に示すように、タスクテーブル１２３の項目には、「タスクＩＤ」、「対応指標種別」、「タスク内容」、及び「指標値遷移状態」が含まれる。Figure 9 is a diagram showing an example of the task table 123. As shown in Figure 9, the items in the task table 123 include "task ID", "corresponding index type", "task content", and "index value transition state".

「タスクＩＤ」は、タスク情報を識別するための識別情報であるタスクＩＤデータである。タスクＩＤデータは、タスクテーブル１２３におけるデータレコードの主キーでもある。つまりタスクテーブル１２３では、タスクＩＤデータごとにデータレコードが構成され、当該データレコードにタスクＩＤデータに紐づいた各項目のデータが記憶されることになる。"Task ID" is task ID data, which is identification information for identifying task information. Task ID data is also the primary key of the data record in task table 123. In other words, in task table 123, a data record is created for each task ID data, and data for each item linked to the task ID data is stored in that data record.

「対応指標種別」は、正規化対象の感情指標種別である。例えば、覚醒度、活性度が記憶されることになる。The "Corresponding index type" is the type of emotion index to be normalized. For example, arousal level and activity level are stored.

「タスク内容」は、正規化処理のために被験者が実行するタスクの具体的内容である。"Task content" is the specific content of the task that the subject performs for the normalization process.

「指標値遷移状態」は、対応する（同じレコードに記憶された）タスク内容を被験者が実行した場合に対応する感情指標値が遷移すると推定される状態で、例えば「高覚醒度：覚醒度が覚醒側（覚醒度高）側の状態（高指標値状態）になる」といったデータが記憶されることになる。The "index value transition state" is the state to which the corresponding emotional index value is estimated to transition when the subject performs the corresponding task content (stored in the same record). For example, data such as "high arousal: arousal level moves to the arousal side (high arousal level) (high index value state)" will be stored.

このタスクテーブル１２３によれば、例えばタスクＩＤがＳＴ０１のタスクデータは、対象感情指標が覚醒度であり、覚醒度の状態遷移が高覚醒度であり、そして具体的タスク内容は「表示される数字を暗算加算する」となる。According to this task table 123, for example, the task data with task ID ST01 has a target emotion index of arousal, a state transition of arousal is high arousal, and the specific task content is "mentally add the displayed numbers."

そして、正規化処理の対象、つまり対象の感情指標、及びニュートラル領域の対象境界に応じて、被験者に実行させるタスク内容が選択されることになる。例えば、覚醒度の正規化処理のために覚醒度のニュートラル領域の上側境界を推定する場合は、「対応指標種別」が「覚醒度」、「指標値遷移状態」が「高覚醒度」である「タスク内容」の「表示される数字を暗算加算する」が、被験者に実行させるタスクとして選択されることになる。Then, the task content to be performed by the subject is selected according to the target of the normalization process, i.e., the target emotional index and the target boundary of the neutral region. For example, when estimating the upper boundary of the neutral region of arousal for the normalization process of arousal, the "Task content" "Mental addition of displayed numbers" with "Corresponding index type" of "Arousal" and "Index value transition state" of "High arousal" is selected as the task to be performed by the subject.

ニュートラル領域の推定手法の具体例について、図１０を参照しながら説明する。なお、説明を分かり易くするため、感情指標として覚醒度を例にして、また図９のタスクテーブル１２３のデータを使用して説明する。A specific example of a method for estimating a neutral region will be described with reference to FIG. 10. To make the explanation easier to understand, arousal will be used as an example of an emotional index, and the data in task table 123 in FIG. 9 will be used for the explanation.

図１０は、覚醒度のニュートラル領域ＮＲ（上下境界値ＮＲＵ，ＮＲＤ）を推定するために、タスクテーブル１２３のタスクＳＴ０１及びタスクＳＴ０２が選択された例におけるタスク実行時の覚醒度の時間変化を示す模式図である。そして、グラフの横軸は時間、縦軸は覚醒度である。信号αは、タスクＳＴ０１のタスク内容「表示される数字を暗算加算する」を被験者が実行した際の覚醒度の時間変化を表す。信号βは、タスクＳＴ０２のタスク内容「安静にして拳を握る」を被験者が実行した際の覚醒度の時間変化を表す。Figure 10 is a schematic diagram showing the change in alertness over time when a task is performed in an example where tasks ST01 and ST02 from task table 123 are selected to estimate the neutral region NR of alertness (upper and lower boundary values NRU, NRD). The horizontal axis of the graph represents time, and the vertical axis represents alertness. Signal α represents the change in alertness over time when the subject performs the task content of task ST01, "Mental addition of displayed numbers." Signal β represents the change in alertness over time when the subject performs the task content of task ST02, "Remain still and clench your fist."

図１０に示すように、被験者が覚醒度が高い状態となるタスクＳＴ０１を実行している状態では、被験者の覚醒度は高い状態（高指標値状態）にあるため、計測される被験者の覚醒度は高覚醒度の領域内で変化（変動）することが予想される。したがって、タスクＳＴ０１を実行中（効果のタイムラグを考慮すれば、実行開始からタイムラグ相当時間経過後から、実行終了からタイムラグ相当時間経過後までの期間）における覚醒度の最低値は、覚醒度のニュートラル領域ＮＲの上限値ＮＲＵと推定できる。As shown in Figure 10, when the subject is performing task ST01, which results in a high level of alertness, the subject's level of alertness is in a high state (high index value state), and the measured level of alertness of the subject is expected to change (fluctue) within the high level of alertness range. Therefore, the minimum level of alertness during the execution of task ST01 (taking into account the time lag of the effect, the period from the start of execution after a time equivalent to the time lag has elapsed to the end of execution after a time equivalent to the time lag has elapsed) can be estimated to be the upper limit value NRU of the neutral level of alertness range NR.

また、被験者が覚醒度が低い状態となるタスクＳＴ０２を実行している状態では、被験者の覚醒度は低い状態（非覚醒状態、低指標値状態）にあるため、計測される被験者の覚醒度は低覚醒度（非覚醒状態）の領域内で変化することが予想される。したがって、タスクＳＴ０２を実行中（効果のタイムラグを考慮すれば、実行開始からタイムラグ相当時間経過後から、実行終了からタイムラグ相当時間経過後までの期間）における覚醒度の最高値は、覚醒度のニュートラル領域ＮＲの下限値ＮＲＤと推定できる。In addition, when the subject is performing task ST02, which puts the subject in a state of low alertness, the subject's alertness is in a low state (non-alert state, low index value state), so the measured alertness of the subject is expected to change within the region of low alertness (non-alert state). Therefore, the maximum alertness value during the execution of task ST02 (taking into account the time lag of the effect, the period from the start of execution after a time equivalent to the time lag has elapsed to the end of execution after a time equivalent to the time lag has elapsed) can be estimated to be the lower limit NRD of the neutral region NR of alertness.

このような、技術思想に基づいて覚醒度等の感情指標値に対するニュートラル領域を推定する。Based on this technical concept, we estimate the neutral region for emotional index values such as arousal level.

正規化部１３３は、上記に方法によりニュートラル領域ＮＲを決定するが、具体的には、第１タスクの実行時に得られる感情指標値（覚醒度：信号α）の時間変化データを統計処理して、ニュートラル領域ＮＲの上限値ＮＲＵを決定する。例えば、正規化部１３３は、第１タスクの実行時に得られる生体信号の時間変化データをスムージング処理し、スムージング処理後の信号αデータの最小値をニュートラル領域ＮＲの上限値ＮＲＵとする。The normalization unit 133 determines the neutral region NR using the method described above, but more specifically, it performs statistical processing on the time-varying data of the emotion index value (arousal level: signal α) obtained when the first task is executed to determine the upper limit value NRU of the neutral region NR. For example, the normalization unit 133 performs smoothing processing on the time-varying data of the biological signal obtained when the first task is executed, and sets the minimum value of the signal α data after the smoothing processing as the upper limit value NRU of the neutral region NR.

また、正規化部１３３は、第２タスクの実行時に得られる感情指標値（活性度：信号β）の時間変化データを統計処理して、ニュートラル領域ＮＲの下限値ＮＲＤを決定する。例えば、正規化部１３３は、第２タスクの実行時に得られる生体信号の時間変化データをスムージング処理し、スムージング処理後の信号βデータの最大値をニュートラル領域ＮＲの下限値ＮＲＤとする。なお、スムージング処理は、例えば、微細なピーク信号のようなノイズ成分等の除去を目的として行われる。The normalization unit 133 also performs statistical processing on the time-varying data of the emotion index value (activity: signal β) obtained during execution of the second task to determine the lower limit NRD of the neutral region NR. For example, the normalization unit 133 performs smoothing processing on the time-varying data of the biological signal obtained during execution of the second task, and sets the maximum value of the signal β data after the smoothing processing as the lower limit NRD of the neutral region NR. Note that the smoothing processing is performed for the purpose of removing noise components such as minute peak signals, for example.

なお、正規化部１３３は、感情指標値の種別ごとにニュートラル領域の上限値及び下限値を決定する。このために、本実施形態では、覚醒度及び活性度のそれぞれのデータについて、上述のようなニュートラル領域上限決定用の第１タスク及びニュートラル領域下限決定用の第２タスクの実行が被験者（ユーザ）に対して要求される。そして、覚醒度及び活性度のそれぞれについて、各タスクの実行時における覚醒度及び活性度の計測（算出）結果に応じて覚醒度及び活性度の各ニュートラル領域の決定処理が行われる。The normalization unit 133 determines the upper and lower limits of the neutral region for each type of emotion index value. For this purpose, in this embodiment, the subject (user) is requested to execute the first task for determining the upper limit of the neutral region and the second task for determining the lower limit of the neutral region for each data of the arousal level and activity level, as described above. Then, for each of the arousal level and activity level, a process for determining the neutral region for each of the arousal level and activity level is performed according to the measurement (calculation) results of the arousal level and activity level when each task is executed.

また、以上では、第１タスクの実行時に得られる感情指標値（覚醒度：信号α）の変動範囲である第１変動範囲と、第２タスクの実行時に得られる感情指標値（活性度：信号β）の変動範囲である第２変動範囲とが離れていることを前提として、２つの変動範囲の間をニュートラル領域ＮＲとした（図１０参照）。In the above, it is assumed that the first fluctuation range, which is the fluctuation range of the emotion index value (arousal level: signal α) obtained when the first task is performed, and the second fluctuation range, which is the fluctuation range of the emotion index value (activity level: signal β) obtained when the second task is performed, are separated from each other, and the area between the two fluctuation ranges is defined as a neutral region NR (see Figure 10).

しかしながら、感情指標値の種類や、タスクの内容によっては、上述の第１変動範囲と第２変動範囲とが重なることもあり得る。このような場合には、第１変動範囲と第２変動範囲との重畳範囲に基づきニュートラル領域を決定しても良い。例えば、第１変動範囲と第２変動範囲との重畳範囲をニュートラル領域とする、第１変動範囲と第２変動範囲との重畳範囲に適当なオフセット（感情の誤判定を防止する観点から適当な値、例えば１０％範囲拡大、を設定）を加えて拡大した範囲或いはオフセット（感情の判定不能を防止する観点から適当な値、例えば１０％範囲縮小、を設定）を減じて縮小した範囲をニュートラル領域とする、といった方法が可能である。However, depending on the type of emotion index value and the content of the task, the first and second variation ranges may overlap. In such a case, the neutral area may be determined based on the overlapping range of the first and second variation ranges. For example, the overlapping range of the first and second variation ranges may be set as the neutral area, or the overlapping range of the first and second variation ranges may be expanded by adding an appropriate offset (a suitable value from the viewpoint of preventing erroneous determination of emotion, for example, a 10% range expansion) to the overlapping range of the first and second variation ranges, or the overlapping range of the first and second variation ranges may be reduced by subtracting an offset (a suitable value from the viewpoint of preventing inability to determine emotion, for example, a 10% range reduction) to the reduced range.

次に、ニュートラル領域の決定処理の第２例について説明する。第２例では、被験者（ユーザＵ１）に対し、上記第１タスク及び第２タスクといった特定のタスクの実行を要求しない。第２例では、感情指標値の計測（算出）の時系列データの蓄積が進むにつれて、統計処理で推定されるニュートラル領域の上限値と下限値とのそれぞれが一定の値に収束するという点に着目して、ニュートラル領域を決定する。Next, a second example of the process for determining the neutral area will be described. In the second example, the subject (user U1) is not requested to perform specific tasks such as the first task and the second task. In the second example, the neutral area is determined by focusing on the fact that as the accumulation of time-series data of the measurement (calculation) of the emotion index value progresses, the upper and lower limit values of the neutral area estimated by the statistical processing converge to certain values.

この第２例におけるニュートラル領域を決定する手法について、図１１を参照しながら説明する。図１１は、感情指標値（覚醒度）の時間変化を示す模式図であり、横軸が時間、縦軸が覚醒度である。The method for determining the neutral region in this second example will be described with reference to FIG. 11. FIG. 11 is a schematic diagram showing the change in emotion index value (arousal level) over time, with the horizontal axis representing time and the vertical axis representing arousal level.

図１１に示すように、人の心身状態は、特定のタスクを与えなくても、時間に伴って変化する環境等の変化に基づき様々な状態に変化する。したがって、統計処理的に相当量の感情指標値の時系列データを蓄積し、蓄積したデータを値の大きさで３つの状態（高覚醒度、ニュートラル、低覚醒度（睡眠様状態））に分けるクラスタ分析を行うことによってニュートラル領域（ニュートラル領域の上限値及び下限値）を推定することができる。クラスタ分析には、公知の手法、例えば、ｋ－ｍｅａｎｓ法、大津の多値化手法、混合ガウスモデル等が利用できる。As shown in FIG. 11, a person's mental and physical state can change to various states based on changes in the environment over time, even without giving them a specific task. Therefore, a significant amount of time-series data on emotion index values can be statistically processed, and the neutral region (upper and lower limits of the neutral region) can be estimated by performing cluster analysis that divides the accumulated data into three states (high alertness, neutral, low alertness (sleep-like state)) based on the magnitude of the values. For cluster analysis, known methods such as the k-means method, Otsu's multi-value method, and Gaussian mixture models can be used.

図１２は、図１１に示したような統計的手法によりニュートラル領域を決定する場合において、データ蓄積期間（データ蓄積量に比例）に対するニュートラル領域の上限推定値の収束状況を示す図である。なお、ニュートラル領域の下限推定値の収束状況も同様となる。Figure 12 shows the convergence status of the upper limit estimate of the neutral region versus the data accumulation period (proportional to the amount of accumulated data) when the neutral region is determined using a statistical method such as that shown in Figure 11. The convergence status of the lower limit estimate of the neutral region is also similar.

統計的手法を用いたニュートラル領域の上限値の推定に利用されるデータ量は、時間の経過とともに増大する。時間の経過にともなって統計処理に用いるデータ量が増大することにより、推定される上限値は、時間の経過に伴って一定の値に収束していく。この傾向は、下限値でも同様である。収束した上限値及び下限値で特定されるニュートラル領域は、信頼性が高いと解される。このため、統計的手法で算出される上限値及び下限値が収束したと判定された段階で、ニュートラル領域を決定する。なお、収束の判定は、ニュートラル領域の上限値（下限値）の算出値の変動値（例えば、数回の算出値における平均変化値）が、適当に設定された収束判定閾値以下になった場合に収束と判定する、といった方法が可能である。The amount of data used to estimate the upper limit of the neutral region using statistical methods increases over time. As the amount of data used in statistical processing increases over time, the estimated upper limit converges to a certain value over time. This tendency is the same for the lower limit. A neutral region identified by converged upper and lower limits is considered to be highly reliable. For this reason, the neutral region is determined when it is determined that the upper and lower limits calculated using statistical methods have converged. Convergence can be determined by a method in which the fluctuation value of the calculated value of the upper limit (lower limit) of the neutral region (for example, the average change value over several calculations) falls below an appropriately set convergence determination threshold.

なお、正規化部１３３は、感情指標値種別（覚醒度及び活性度）ごとにニュートラル領域の上限値及び下限値を決定する。The normalization unit 133 determines the upper and lower limits of the neutral region for each emotion index value type (arousal level and activity level).

正規化部１３３は、決定したニュートラル領域の上限値及び下限値を記憶部１２のニュートラル領域テーブル１２４に記憶する。詳細には、正規化部１３３は、決定したニュートラル領域の上限値及び下限値をデータテーブル化して記憶部１２に記憶する。なお、被験者（ユーザ）ごとのニュートラル領域情報を、ニュートラル領域テーブル１２４に記憶する方法が有効である。その場合、同じ被験者で学習データセットを再作成（追加作成）する際に、ニュートラル領域テーブル１２４に記憶されたニュートラル領域情報を利用することも可能となる。The normalization unit 133 stores the determined upper and lower limit values of the neutral region in the neutral region table 124 of the storage unit 12. In detail, the normalization unit 133 creates a data table of the determined upper and lower limit values of the neutral region and stores it in the storage unit 12. Note that it is effective to store neutral region information for each subject (user) in the neutral region table 124. In that case, it is also possible to use the neutral region information stored in the neutral region table 124 when recreating (adding) a learning dataset for the same subject.

図１３は、ニュートラル領域テーブル１２４の一例を示す図である。図１３に示すように、ニュートラル領域テーブル１２４の項目には、「ニュートラル情報ＩＤ」、「ユーザＩＤ」、「感情指標種別」、「上限値」、「下限値」、及び「取得日時」が含まれ、各項目欄（記憶部）には対応するデータが記憶される。Figure 13 is a diagram showing an example of the neutral area table 124. As shown in Figure 13, the items in the neutral area table 124 include "neutral information ID", "user ID", "emotion index type", "upper limit value", "lower limit value", and "acquisition date and time", and corresponding data is stored in each item column (storage unit).

「ニュートラル情報ＩＤ」は、ニュートラル情報を識別するための識別情報である。ニュートラル情報ＩＤは、ニュートラル領域テーブル１２４におけるデータレコードの主キーでもある。つまり、ニュートラル領域テーブル１２４では、ニュートラル情報ＩＤごとにデータレコードが構成され、当該データレコードにニュートラル情報ＩＤに紐づいた各項目のデータが記憶されることになる。The "neutral information ID" is identification information for identifying neutral information. The neutral information ID is also the primary key of the data record in the neutral area table 124. In other words, in the neutral area table 124, a data record is created for each neutral information ID, and data for each item linked to the neutral information ID is stored in that data record.

「ユーザＩＤ」は、ユーザ情報を識別するための識別情報である。つまり、ニュートラル領域の決定に使用される感情指標データ計測に対する被験者のデータである。"User ID" is identification information for identifying user information. In other words, it is the subject's data for the emotion index data measurement used to determine the neutral region.

「上限値」及び「下限値」は、正規化部１３３により決定されたニュートラル領域の上限値及び下限値の情報である。The "upper limit" and "lower limit" are information on the upper and lower limits of the neutral region determined by the normalization unit 133.

「取得日時」は、ニュートラル領域の上限値及び下限値の情報が取得（決定・記憶）された日時情報を記憶する。なお、同じユーザＩＤ、同じ感情指標種別の「上限値」及び「下限値」と、「取得日時」は、最新の情報が取得されるごとに更新される。"Acquisition date and time" stores the date and time when the information on the upper and lower limits of the neutral region was acquired (determined and stored). Note that the "upper limit" and "lower limit" and "acquisition date and time" for the same user ID and the same emotion index type are updated each time the latest information is acquired.

他、被験者（ユーザＵ１）から感情申告に基づき、ニュートラル領域を推定する方法も可能である。Another possible method is to estimate the neutral region based on the emotion declared by the subject (user U1).

具体的には、正規化部１３３（コントローラ１３）は、被験者（ユーザ）に被験者が自覚している感情指標状態に関する質問（例えば、覚醒状態を３段階（覚醒、通常、非覚醒（眠い））のいずれかであるかの質問）を行い、その回答とその際に計測（算出）された感情指標値（覚醒度）を蓄積する。そして、正規化部１３３は、その蓄積データに統計的処理を施し、ニュートラル領域を推定する。例えば、被験者の回答が「通常」の際に算出された覚醒度の高値１０％の平均値と低値１０％の平均値をニュートラル領域の上限値、下限値とする。或いは、例えば、被験者の回答が「覚醒」の際に算出された覚醒度の低値１０％の平均値から適当なオフセット値を減じた値をニュートラル領域の上限値とし、被験者の回答が「非覚醒」の際に算出された覚醒度の高値１０％の平均値から適当なオフセット値を加えた値をニュートラル領域の下限値とする、といった方法が適用可能である。Specifically, the normalization unit 133 (controller 13) asks the subject (user) questions about the emotional index state that the subject is aware of (for example, a question about whether the subject's arousal state is one of three levels (awake, normal, not awake (sleepy))), and accumulates the answer and the emotional index value (arousal level) measured (calculated) at that time. The normalization unit 133 then performs statistical processing on the accumulated data to estimate the neutral region. For example, the average value of the highest 10% of the arousal level calculated when the subject's answer is "normal" and the average value of the lowest 10% of the arousal level calculated when the subject's answer is "normal" are set as the upper limit and lower limit of the neutral region. Alternatively, for example, a method in which a value obtained by subtracting an appropriate offset value from the average value of the lowest 10% of the arousal level calculated when the subject's answer is "awake" is set as the upper limit of the neutral region, and a value obtained by adding an appropriate offset value to the average value of the highest 10% of the arousal level calculated when the subject's answer is "not awake" is set as the lower limit of the neutral region can be applied.

以上の正規化方法は、感情指標値のニュートラル領域に基づき行う方法であるが、別の方法で行うこともできる。次に示す方法は、適当な計測期間内に算出される感情指標値の範囲（上限値及び下限値）に基づき正規化を行う方法である。The normalization method described above is based on the neutral region of the emotion index value, but it can also be done in a different way. The method shown below is a method of normalizing based on the range (upper and lower limits) of the emotion index value calculated within an appropriate measurement period.

図１４は、２人の被験者の感情指標値（覚醒度）の時間変化を示す模式図である。図１４に示す４つのグラフにおいて、横軸は時間、縦軸は生体信号に基づく感情指標値である。なお、説明を分かり易くするため、ここでは感情指標値は覚醒度として説明する。Figure 14 is a schematic diagram showing the change over time in the emotion index value (arousal level) of two subjects. In the four graphs shown in Figure 14, the horizontal axis is time, and the vertical axis is the emotion index value based on the biosignal. For ease of understanding, the emotion index value will be explained here as the arousal level.

図１４において、左側（ａ１、ａ２）は第１被験者のデータであり、右側（ｂ１、ｂ２）は第２被験者のデータであり、上段（ａ１、ｂ１）は正規化前のデータであり、下段（ａ２、ｂ２）は正規化後のデータである。なお、説明を分かり易くするため、各被験者において計測期間内で覚醒度が上限・下限に振れた状態例で説明する。In FIG. 14, the left side (a1, a2) is the data of the first subject, the right side (b1, b2) is the data of the second subject, the top row (a1, b1) is the data before normalization, and the bottom row (a2, b2) is the data after normalization. For ease of explanation, an example will be given in which the alertness level fluctuates between the upper and lower limits during the measurement period for each subject.

図１４（ａ１、ｂ１）に示すように、第１被験者と第２被験者とのそれぞれの覚醒度の振れ幅は大きく異なり、第１被験者よりも第２被験者のほうが大きくなっている。すなわち、この例では、第１被験者は計測される覚醒度の幅が狭い、換言すれば覚醒度に対する感度が低いと言える。これに対して、第２被験者は計測される覚醒度の幅が広い、換言すれば覚醒度に対する感度が高いと言える。なお、覚醒度に対する感度等は、被験者生来の個人差だけでなく、環境差、例えば周囲の温湿度や照明等の明るさ、被験者の状態、例えば空腹度などの計測時における各種条件の影響も受ける。As shown in Figure 14 (a1, b1), the fluctuation range of the alertness of the first and second subjects is significantly different, with the second subject having a larger fluctuation than the first subject. That is, in this example, the first subject has a narrow range of measured alertness, in other words, a low sensitivity to alertness. In contrast, the second subject has a wide range of measured alertness, in other words, a high sensitivity to alertness. Note that sensitivity to alertness is influenced not only by innate individual differences between subjects, but also by environmental differences, such as the surrounding temperature and humidity, brightness of lighting, etc., and the subject's condition, such as hunger level, and various other conditions at the time of measurement.

本正規化方法は、適当な計測期間における感情指標値（覚醒度）の変動範囲が、被験者の感情指標値に関する感度及び計測環境に応じて変化するという考えに基づく正規化方法であり、感情指標値（覚醒度）の変動範囲を同じにする感情指標値の補正を行う。This normalization method is based on the idea that the range of variation of the emotional index value (arousal) during an appropriate measurement period changes depending on the subject's sensitivity to the emotional index value and the measurement environment, and corrects the emotional index value to make the range of variation of the emotional index value (arousal) the same.

図１４（ａ１、ｂ１）に示すように、覚醒度の変動範囲ＣＨは、個人差等により各被験者間では、その位置（変動範囲の上下限値：第１被験者変動範囲上限値Ｍａｘ１、第１被験者変動範囲下限値Ｍｉｎ１、第２被験者変動範囲上限値Ｍａｘ２、第２被験者変動範囲下限値Ｍｉｎ２）、幅（変動範囲の上下限値の差：第１被験者変動範囲幅Ｗ１（Ｍａｘ１－Ｍｉｎ１）、第２被験者変動範囲幅Ｗ２（Ｍａｘ２－Ｍｉｎ２））が異なっている。As shown in Figure 14 (a1, b1), the wakefulness fluctuation range CH differs between subjects due to individual differences, etc. in its position (upper and lower limits of the fluctuation range: first subject's fluctuation range upper limit Max1, first subject's fluctuation range lower limit Min1, second subject's fluctuation range upper limit Max2, second subject's fluctuation range lower limit Min2) and width (difference between upper and lower limits of the fluctuation range: first subject's fluctuation range width W1 (Max1-Min1), second subject's fluctuation range width W2 (Max2-Min2)).

そこで、各被験者の変動範囲ＣＨが、予め定めた標準の変動範囲ＣＨ０と同じ位置、幅（標準上限値Ｍａｘ０、標準下限値Ｍｉｎ０、標準幅Ｗ０（Ｍａｘ０－Ｍｉｎ０））となるように、感情指標値（覚醒度）の補正を行う。具体的には、以下の算出式を用いて感情指標値（覚醒度）の補正を行う。なお、活性度等の他の感情指標値も同様の方法（覚醒度に対する各パラメータを、補正対象感情指標値の対応するパラメータで置き換える）で補正できる。The emotion index value (arousal level) is then corrected so that the variation range CH for each subject has the same position and width (standard upper limit Max0, standard lower limit Min0, standard width W0 (Max0-Min0)) as the predetermined standard variation range CH0. Specifically, the emotion index value (arousal level) is corrected using the following calculation formula. Note that other emotion index values such as activity level can also be corrected in a similar manner (replace each parameter for arousal level with the corresponding parameter of the emotion index value to be corrected).

なお、第ｎ被験者（変動範囲上限値Ｍａｘｎ、変動範囲下限値Ｍｉｎｎ）で一般化すると、次のようになる。If we generalize this to the nth subject (upper limit of the variation range Maxn, lower limit of the variation range Minn), we get the following:

上記の正規化方法によれば、被験者による特別なタスクの実行やそのタスク実行の際のデータ収集、収集したデータの演算等の必要がない。このため、被験者の負担を軽減することができ、また学習装置の処理負荷、学習の時間削減を図ることができる。The normalization method described above does not require the subject to perform a special task, collect data when performing the task, or perform calculations on the collected data. This reduces the burden on the subject and also reduces the processing load on the learning device and the learning time.

図７に戻って、説明を続ける。挙動判定部１３４は、取得部１３１が取得した、ユーザＵ１（図６参照）の顔を含む画像情報、発生音声に対して解析処理を行うことで、ユーザＵ１の視線や顔の向き、表情、音声等の予め定められた種類の挙動を判定する。なお、挙動判定部１３４の行う処理と、図４の挙動判定部２３２の行う処理は同等であり、それぞれ同形式の視線、顔の向き、表情、音声のデータを出力する。Returning to FIG. 7, the explanation will be continued. The behavior determination unit 134 performs analysis processing on the image information including the face of the user U1 (see FIG. 6) and the generated voice acquired by the acquisition unit 131, thereby determining predetermined types of behavior such as the gaze, facial direction, facial expression, and voice of the user U1. Note that the processing performed by the behavior determination unit 134 is equivalent to the processing performed by the behavior determination unit 232 in FIG. 4, and each outputs gaze, facial direction, facial expression, and voice data in the same format.

例えば、ユーザＵ１の視線に関して、挙動判定部１３４は、ユーザＵ１の顔を含む画像からユーザＵ１の左右の眼球を検知対象物とした特徴量算出、形状判別等の認識処理を行う。挙動判定部１３４は、当該認識処理の結果に基づき、例えば目頭の位置、眼の虹彩及び瞳孔の中心位置、近赤外照明による角膜反射像（プルキニエ像）の中心位置、眼球の中心位置等を用いた所定の視線検出処理によりユーザＵ１の視線及び注視点の挙動を判定する。ユーザＵ１の視線は、例えばユーザＵ１の前方にユーザＵ１と正対する仮想平面に設け、ユーザＵ１の視線ベクトルが仮想平面を貫く位置の二次元座標で表すことができる。For example, regarding the gaze of user U1, the behavior determination unit 134 performs recognition processing such as feature calculation and shape discrimination using the left and right eyeballs of user U1 as detection objects from an image including the face of user U1. Based on the results of the recognition processing, the behavior determination unit 134 determines the behavior of user U1's gaze and gaze point by a predetermined gaze detection processing using, for example, the position of the inner corner of the eye, the center position of the iris and pupil of the eye, the center position of the corneal reflection image (Purkinje image) by near-infrared illumination, the center position of the eyeball, etc. The gaze of user U1 can be expressed, for example, in a virtual plane in front of user U1 directly facing user U1, and the position where user U1's gaze vector passes through the virtual plane as two-dimensional coordinates.

また、例えば、ユーザＵ１の顔の向きに関して、挙動判定部１３４は、ユーザＵ１の顔を含む画像からユーザＵ１の顔を検知対象物とした特徴量算出、形状判別等の認識処理を行う。挙動判定部１３４は、当該認識処理の結果に基づき、例えば目、鼻、口等のそれぞれの位置、鼻頂部の位置、顔の輪郭、顔の輪郭の幅方向の中心位置等を用いた所定の顔向き検出処理によりユーザＵ１の顔の向きの挙動を判定する。For example, regarding the direction of the face of user U1, the behavior determination unit 134 performs recognition processing such as feature calculation and shape determination using the face of user U1 as the detection target from an image including the face of user U1. Based on the results of the recognition processing, the behavior determination unit 134 determines the behavior of the direction of the face of user U1 by a predetermined face direction detection processing using, for example, the positions of the eyes, nose, mouth, etc., the position of the tip of the nose, the facial contour, the center position of the facial contour in the width direction, etc.

また、例えば、ユーザＵ１の表情に関して、挙動判定部１３４は、ユーザＵ１の顔を含む画像からユーザＵ１の顔を検知対象物とした特徴量算出、形状判別等の認識処理を行う。挙動判定部１３４は、当該認識処理の結果に基づき、例えば口角の角度、眉の角度、目の開き具合等を用いた所定の表情検出処理によりユーザＵ１の表情の挙動を判定する。For example, with regard to the facial expression of user U1, the behavior determination unit 134 performs recognition processing such as feature calculation and shape discrimination using the face of user U1 as the detection target from an image including the face of user U1. Based on the results of the recognition processing, the behavior determination unit 134 determines the behavior of the facial expression of user U1 through a predetermined facial expression detection processing using, for example, the angle of the corners of the mouth, the angle of the eyebrows, the degree of opening of the eyes, etc.

また、例えば、ユーザＵ１の声等の発音を含む音声情報に対して、ユーザＵ１の、音声音量や発声速度、発言頻度、音声認識結果に基づく発声内容の言語解析結果（例えば喜びや怒り、不安等に関係する単語や文章の出現）に基づき、ユーザＵ１の音声の挙動を判定する。In addition, for example, for audio information including pronunciation of user U1's voice, the voice behavior of user U1 is determined based on the voice volume, speaking speed, and frequency of speech of user U1, as well as the results of language analysis of the content of the speech based on the results of voice recognition (e.g., the appearance of words and sentences related to joy, anger, anxiety, etc.).

生成部１３５は、第１ＡＩモデル１２１ｍａに、挙動判定部１３４によって判定されたユーザＵ１の視線や顔の向き、表情、音声の情報を入力値とし、正規化部１３３によって正規化された感情指標値の覚醒度を正解値としてそれぞれ入力し、第１ＡＩモデル１２１ｍａの学習を行う。また、生成部１３５は、第２ＡＩモデル１２１ｍｂに、挙動判定部１３４によって判定されたユーザＵ１の視線や顔の向き、表情、音声の情報を入力値とし、正規化部１３３によって正規化された感情指標値の活性度を正解値としてそれぞれ入力し、第２ＡＩモデル１２１ｍｂの学習を行う。The generation unit 135 inputs the gaze, face direction, facial expression, and voice information of the user U1 determined by the behavior determination unit 134 as input values to the first AI model 121ma, and the arousal level of the emotion index value normalized by the normalization unit 133 as a correct answer value to train the first AI model 121ma. The generation unit 135 also inputs the gaze, face direction, facial expression, and voice information of the user U1 determined by the behavior determination unit 134 as input values to the second AI model 121mb, and the activity level of the emotion index value normalized by the normalization unit 133 as a correct answer value to train the second AI model 121mb.

つまり、生成部１３５は、遠隔系センサ（非接触センサ：カメラ、マイク）の検出した情報に基づくユーザＵ１の挙動情報を入力データとし、接触系の生体信号センサ（接触センサ：脳波センサ、心拍センサ）によって検出された生体信号に基づき算出され、さらに正規化処理が施された感情指標値（覚醒度及び活性度）を正解データとした教師付き学習データセットを生成する。そして、生成部１３５は、生成した教師付き学習データセットを用いて第１ＡＩモデル１２１ｍａ及び第２ＡＩモデル１２１ｍｂの学習を行う。この学習により、学習済みの第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢは、挙動情報を入力として覚醒度及び活性度を推定するＡＩモデルとなる。In other words, the generation unit 135 generates a supervised learning dataset using behavior information of the user U1 based on information detected by remote sensors (non-contact sensors: camera, microphone) as input data, and emotion index values (arousal and activity) calculated based on biosignals detected by contact biosignal sensors (contact sensors: brainwave sensor, heart rate sensor) and normalized as correct answer data. The generation unit 135 then uses the generated supervised learning dataset to train the first AI model 121ma and the second AI model 121mb. Through this training, the trained first AI model 121mA and second AI model 121mB become AI models that estimate arousal and activity using behavior information as input.

なお、生成部１３５は、誤差逆伝播学習法等の学習アルゴリズムを用いて、第１ＡＩモデル１２１ｍａ及び第２ＡＩモデル１２１ｍｂにおける重み等のパラメータを調整する等して、第１ＡＩモデル１２１ｍａ及び第２ＡＩモデル１２１ｍｂの学習を行う。The generation unit 135 uses a learning algorithm such as the backpropagation learning method to adjust parameters such as weights in the first AI model 121ma and the second AI model 121mb, and thereby learns the first AI model 121ma and the second AI model 121mb.

提供部１３６は、生成部１３５によって生成された学習済みの第１ＡＩモデル１２１ｍＡ（のデータ）及び第２ＡＩモデル１２１ｍＢ（のデータ）を、ネットワークを介して後述する車両Ｖ２で用いられる感情推定装置２０（図１及び図４参照）に提供する。これにより、感情推定装置２０は、提供部１３６によって提供された第１ＡＩモデル１２１ｍＡ（のデータ）及び第２ＡＩモデル１２１ｍＢ（のデータ）を取り込んで第１ＡＩモデル記憶部２２１Ａ及び第１ＡＩモデル記憶部２２１Ａに記憶する。これにより、感情推定装置２０は、感情推定機能がカメラＣ及びマイクＭが取得した画像、音声情報に基づく感情推定が可能となり、車両Ｖ２の各種機能において推定した感情情報を利用することができる。The providing unit 136 provides the trained first AI model 121mA (data) and second AI model 121mB (data) generated by the generating unit 135 to the emotion estimation device 20 (see Figures 1 and 4) used in the vehicle V2 described later via a network. As a result, the emotion estimation device 20 imports the first AI model 121mA (data) and second AI model 121mB (data) provided by the providing unit 136 and stores them in the first AI model storage unit 221A and the first AI model storage unit 221A. As a result, the emotion estimation function of the emotion estimation device 20 becomes able to estimate emotions based on image and audio information acquired by the camera C and microphone M, and the estimated emotion information can be used in various functions of the vehicle V2.

なお、提供部１３６は、生成部１３５によって生成された学習済みの第１ＡＩモデル１２１ｍＡ（のデータ）及び第２ＡＩモデル１２１ｍＢ（のデータ）を、ネットワークを介して感情推定装置２０に提供するようにしたが、別の方法で提供することも可能である。例えば、第１ＡＩモデル１２１ｍＡのデータ及び第２ＡＩモデル１２１ｍＢのデータをＡＩプラットホームを形成するＬＳＩに書き込んでＡＩ実行ＬＳＩを生成し、当該ＡＩ実行ＬＳＩを感情推定装置２０に組み込むことで、第１ＡＩモデル１２１ｍＡ（のデータ）及び第２ＡＩモデル１２１ｍＢ（のデータ）を提供することもできる。また、通信以外のデータ伝達媒体、例えばメモリカード、光ディスク記録媒体等を用いて感情推定装置２０に提供することも可能である。The providing unit 136 provides the trained first AI model 121mA (data) and second AI model 121mB (data) generated by the generating unit 135 to the emotion estimation device 20 via a network, but it is also possible to provide them by another method. For example, the data of the first AI model 121mA and the data of the second AI model 121mB can be written into an LSI forming an AI platform to generate an AI execution LSI, and the AI execution LSI can be incorporated into the emotion estimation device 20 to provide the first AI model 121mA (data) and the second AI model 121mB (data). It is also possible to provide them to the emotion estimation device 20 using a data transmission medium other than communication, such as a memory card or an optical disk recording medium.

＜５．感情推定ＡＩの学習装置の動作例＞
図１５は、図７の学習装置１０のコントローラ１３が実行する感情推定ＡＩの学習処理を示すフローチャートである。このフローチャートは、コンピュータ装置に感情推定用のＡＩモデルの学習処理を実現させるコンピュータプログラムの技術的内容を示す。また、当該コンピュータプログラムは、読み取り可能な各種不揮発性記録媒体に記憶され、提供（販売、流通等）される。当該コンピュータプログラムは、１つのプログラムのみで構成されても良いが、協働する複数のプログラムによって構成されても良い。<5. Example of operation of emotion estimation AI learning device>
Fig. 15 is a flowchart showing the learning process of the emotion estimation AI executed by the controller 13 of the learning device 10 in Fig. 7. This flowchart shows the technical contents of a computer program that causes a computer device to realize the learning process of an AI model for emotion estimation. In addition, the computer program is stored in various readable non-volatile recording media and provided (sold, distributed, etc.). The computer program may be composed of only one program, or may be composed of multiple programs that work together.

図１５に示す処理は、学習装置１０の設計者等が感情推定用のＡＩモデルの学習処理を実行する際、例えばキーボード等の操作部により学習処理の開始操作が行われたときに実行される。なお、車両運転時の学習データセットを収集、生成するのであれば、作業者或いは被験者等による学習データセットの収集開始操作に基づき処理が実行される。The process shown in FIG. 15 is executed when a designer of the learning device 10 executes a learning process of an AI model for emotion estimation, for example, when an operation to start the learning process is performed by an operation unit such as a keyboard. Note that if a learning dataset is to be collected and generated during vehicle driving, the process is executed based on an operation to start collection of the learning dataset by an operator or subject.

ステップＳ１０１において、コントローラ１３（取得部１３１）は、カメラＣ及びマイクＭからカメラ画像及びマイク音声のデータを取得し、また脳波センサＳ１１及び心拍センサＳ１２から生体信号（データ）を取得して記憶部１２に記憶し、ステップＳ１０２に移る。In step S101, the controller 13 (acquisition unit 131) acquires camera image and microphone audio data from the camera C and microphone M, and also acquires biosignals (data) from the brain wave sensor S11 and heart rate sensor S12, stores them in the memory unit 12, and then proceeds to step S102.

なお、車載装置が収集した各データを用いて研究開発室等に設置された学習装置１０にて学習を行う場合は、無線通信や記録媒体で学習装置１０に提供された車載装置が収集、蓄積した各データを用いて、ステップＳ１０１及び以降の処理が行われることになる。When learning is performed by a learning device 10 installed in a research and development laboratory or the like using the various data collected by the in-vehicle device, step S101 and subsequent processes are performed using the various data collected and accumulated by the in-vehicle device and provided to the learning device 10 via wireless communication or recording medium.

ステップＳ１０２において、コントローラ１３（挙動判定部１３４）は、カメラ画像及びマイク音声のデータの各データに基づき挙動（視線、顔向き、音声）を判定し、ステップＳ１０３に移る。In step S102, the controller 13 (behavior determination unit 134) determines behavior (gaze, facial direction, and voice) based on the camera image and microphone audio data, and proceeds to step S103.

ステップＳ１０３において、コントローラ１３（指標値算出部１３２）は、生体信号の脳波及び心拍データに基づき、感情指標値の覚醒度及び活性度を算出し、ステップＳ１０４に移る。In step S103, the controller 13 (index value calculation unit 132) calculates the arousal level and activity level of the emotion index value based on the brainwave and heart rate data of the biological signal, and proceeds to step S104.

ステップＳ１０４において、コントローラ１３（正規化部１３３）は、ステップＳ１０３で算出した感情指標値の覚醒度及び活性度の各データに対して正規化処理を施し、ステップＳ１０５に移る。In step S104, the controller 13 (normalization unit 133) performs normalization processing on each of the data of the arousal and activity of the emotion index value calculated in step S103, and then proceeds to step S105.

ステップＳ１０５において、コントローラ１３（生成部１３５）は、判定した挙動の視線、顔向き、音声データを入力値とし、正規化された感情指標値の覚醒度を正解値とする覚醒度学習用教師付き学習データを生成して、ステップＳ１０６に移る。In step S105, the controller 13 (generation unit 135) generates supervised learning data for learning arousal level, in which the gaze, facial direction, and voice data of the determined behavior are used as input values, and the arousal level of the normalized emotion index value is used as the correct answer value, and then the process proceeds to step S106.

ステップＳ１０６において、コントローラ１３（生成部１３５）は、判定した挙動の視線、顔向き、音声データを入力値とし、正規化された感情指標値の活性度を正解値とする活性度学習用教師付き学習データを生成して、ステップＳ１０７に移る。In step S106, the controller 13 (generation unit 135) generates supervised learning data for activity learning in which the gaze, facial direction, and voice data of the determined behavior are used as input values, and the activity of the normalized emotion index value is used as the correct answer value, and then the process proceeds to step S107.

ステップＳ１０７において、コントローラ１３（生成部１３５）は、ステップＳ１０５で生成した覚醒度学習用教師付き学習データを学習完了前の第１ＡＩモデル１２１ｍａに提供して第１ＡＩモデル１２１ｍａの学習を行い、ステップＳ１０８に移る。In step S107, the controller 13 (generation unit 135) provides the supervised learning data for alertness learning generated in step S105 to the first AI model 121ma before learning is completed, and trains the first AI model 121ma, and then proceeds to step S108.

ステップＳ１０８において、コントローラ１３（生成部１３５）は、ステップＳ１０６で生成した活性度学習用教師付き学習データを学習完了前の第２ＡＩモデル１２１ｍｂに提供して第２ＡＩモデル１２１ｍｂの学習を行い、ステップＳ１０９に移る。In step S108, the controller 13 (generation unit 135) provides the supervised learning data for activity learning generated in step S106 to the second AI model 121mb before completing learning, thereby training the second AI model 121mb, and then proceeds to step S109.

ステップＳ１０９において、各モデルの学習が完了したか、ここでは車両Ｖ１の学習用走行が終了したか否かを、例えばイグニッションスィッチ状態や車両運転者（学習作業者、被験者）の操作等に基づいて判断し、終了していれば処理を終え、終了していなければステップＳ１０１に戻り、学習を継続する。In step S109, it is determined whether the learning of each model is complete, in this case whether the learning run of vehicle V1 is complete, for example based on the state of the ignition switch or the operation of the vehicle driver (learning operator, test subject), and if it is complete, the process ends; if it is not complete, the process returns to step S101 and learning continues.

この後、学習が完了した第１ＡＩモデル１２１ｍＡ及び第２ＡＩモデル１２１ｍＢは、学習装置１０を操作するオペレータの指示等に基づき、図１に示す車両Ｖ２で用いられる感情推定装置２０等のＡＩモデル利用装置に提供される。After this, the first AI model 121mA and the second AI model 121mB, for which learning has been completed, are provided to an AI model utilization device such as the emotion estimation device 20 used in the vehicle V2 shown in FIG. 1, based on instructions from an operator operating the learning device 10, etc.

なお、図１５で示した学習処理においては、各センサからデータを収集する都度、教師付き学習データを生成して各ＡＩモデルの学習を行っている。しかしながら、各センサからデータを収集する都度、教師付き学習データを生成して蓄積し、蓄積した学習データで構成される学習データセットを用いて学習完了前の第１ＡＩモデル１２１ｍａ及び第２ＡＩモデル１２１ｍｂの学習を行っても良い。In the learning process shown in FIG. 15, each time data is collected from each sensor, supervised learning data is generated and each AI model is trained. However, each time data is collected from each sensor, supervised learning data may be generated and accumulated, and the first AI model 121ma and the second AI model 121mb before learning is completed may be trained using a learning data set consisting of the accumulated learning data.

その場合、ステップＳ１０７は、ステップＳ１０５で生成した覚醒度学習用教師付き学習データを蓄積する。ステップＳ１０８は、ステップＳ１０６で生成した活性度学習用教師付き学習データを蓄積する。そして、ステップＳ１０９は、学習データの作成が完了したかを判断し（例えば、車両Ｖ１の学習用走行が終了したか否かを、例えばイグニッションスィッチ状態や車両運転者（学習作業者、被験者）の操作等に基づいて判断し）、終了していれば処理を終え、終了していなければステップＳ１０１に戻り、学習データの生成処理を継続することになる。In this case, step S107 accumulates the supervised learning data for learning alertness generated in step S105. Step S108 accumulates the supervised learning data for learning activity generated in step S106. Then, step S109 determines whether the creation of the learning data is complete (for example, by determining whether the learning run of vehicle V1 has ended based on, for example, the state of the ignition switch or the operation of the vehicle driver (learning worker, subject)), and ends the process if completed, or returns to step S101 to continue the generation process of the learning data if not completed.

また、上記のＡＩモデルの学習処理においては、各センサから１計測データ単位で学習データを生成している。しかしながら、適当な期間における複数の各データを統計処理したデータ（センサ出力データ、挙動データ或いは感情指標値を統計処理）を用いて学習データを生成し、学習データとすることも可能である。In addition, in the learning process of the above AI model, learning data is generated in units of one measurement data from each sensor. However, it is also possible to generate learning data using data obtained by statistically processing a plurality of data over an appropriate period of time (sensor output data, behavior data, or emotion index values) and use this as the learning data.

＜６－１．変形例１＞
以上の実施形態では、カメラ撮影画像及びマイク集音音声を入力として感情指標値（覚醒度及び活性度）を推定するＡＩモデルを生成する学習装置１０について説明したが、カメラ撮影画像及びマイク集音音声を入力として感情を推定するＡＩモデルを生成する学習装置も実現可能である。<6-1. Modification 1>
In the above embodiment, a learning device 10 has been described that generates an AI model that estimates emotion index values (alertness and activity) using an image captured by a camera and an audio picked up by a microphone as input. However, it is also possible to realize a learning device that generates an AI model that estimates emotions using an image captured by a camera and an audio picked up by a microphone as input.

図１６は、感情推定装置５０の概念的構成を示す構成図である。また、図１７は、感情推定装置５０のＡＩモデルを学習する学習装置６０の概念的構成を示す構成図である。なお、図１、図６等における各構成と同様の構成については、同じ符号を付し、詳細な説明を省略する。Figure 16 is a configuration diagram showing the conceptual configuration of the emotion estimation device 50. Also, Figure 17 is a configuration diagram showing the conceptual configuration of a learning device 60 that learns the AI model of the emotion estimation device 50. Note that the same components as those in Figures 1, 6, etc. are given the same reference numerals and detailed descriptions are omitted.

感情推定装置５０は、カメラ撮影画像及びマイク集音音声を入力し、挙動判定部２３２はこれらカメラ撮影画像及びマイク集音音声に基づき、視線、顔向き、音声のデータを生成し、これら視線、顔向き、音声のデータを学習済みの第３ＡＩモデル１２１ｍＣに入力データとして出力する。第３ＡＩモデル１２１ｍＣは、視線、顔向き、音声データの入力に対して感情を推定する学習済みＡＩモデルであり、被験者であるユーザＵ２の感情情報を出力する。The emotion estimation device 50 inputs camera-captured images and microphone-captured audio, and the behavior determination unit 232 generates gaze, facial direction, and audio data based on the camera-captured images and microphone-captured audio, and outputs the gaze, facial direction, and audio data as input data to the trained third AI model 121mC. The third AI model 121mC is a trained AI model that estimates emotions based on input gaze, facial direction, and audio data, and outputs emotional information of the test subject, user U2.

学習装置６０は、脳波センサＳ１１及び心拍センサＳ１２が検出した脳波情報及び心拍情報を入力する。指標値算出部１３２は、これら脳波情報及び心拍情報に基づき、感情指標である覚醒度及び活性度を算出し、正規化部１３３に出力する。そして、正規化部１３３は、これら覚醒度及び活性度を正規化し、正規化覚醒度及び正規化活性度を感情推定モデル６２２ｍ（心理平面モデル部分）に出力する。The learning device 60 inputs the brainwave information and heartbeat information detected by the brainwave sensor S11 and heartbeat sensor S12. The index value calculation unit 132 calculates the arousal level and activity level, which are emotion indices, based on the brainwave information and heartbeat information, and outputs these to the normalization unit 133. The normalization unit 133 then normalizes these arousal levels and activity levels, and outputs the normalized arousal level and normalized activity level to the emotion estimation model 622m (psychological plane model portion).

感情推定モデル６２２ｍは、正規化覚醒度及び正規化活性度を心理平面モデルに適用して感情を推定し、当該推定した感情情報を正解値として第３ＡＩモデル１２１ｍｃ（学習前・学習中）に出力する。なお、感情推定モデル６２２ｍは、図１に示した感情推定モデル２２２ｍと同様の構成であり、図１に示した心理平面テーブル２２４と同様構成の心理平面テーブル６２４を用いて感情を推定する。The emotion estimation model 622m estimates emotions by applying the normalized arousal level and normalized activity level to the psychological plane model, and outputs the estimated emotion information as a correct answer value to the third AI model 121mc (before learning and during learning). Note that the emotion estimation model 622m has the same configuration as the emotion estimation model 222m shown in FIG. 1, and estimates emotions using a psychological plane table 624 that has the same configuration as the psychological plane table 224 shown in FIG. 1.

また、学習装置６０は、カメラ撮影画像及びマイク集音音声を入力する。挙動判定部１３４は、これらカメラ撮影画像及びマイク集音音声に基づき、視線、顔向き、音声のデータを生成し、これら視線、顔向き、音声のデータを学習前（学習中）の第３ＡＩモデル１２１ｍｃに入力データとして出力する。The learning device 60 also inputs images captured by a camera and audio picked up by a microphone. The behavior determination unit 134 generates data on the gaze, facial direction, and audio based on the images captured by the camera and the audio picked up by the microphone, and outputs the gaze, facial direction, and audio data to the third AI model 121mc before learning (during learning) as input data.

そして、学習装置６０は、誤差逆伝播学習法等の学習アルゴリズムを用いて、第３ＡＩモデル１２１ｍｃにおける重み等のパラメータを調整する。そして、学習装置６０は、学習期間に順次入力（計測）された上記各データを用いて第３ＡＩモデル１２１ｍｃに対して上記学習処理を実行し、学習済みの第３ＡＩモデル１２１ｍＣを生成する。Then, the learning device 60 adjusts parameters such as weights in the third AI model 121mc using a learning algorithm such as the backpropagation learning method. The learning device 60 then executes the above learning process on the third AI model 121mc using the above data sequentially input (measured) during the learning period, and generates a learned third AI model 121mC.

以上のように、本変形例の学習装置６０によれば、個人差等がある感情指標値（覚醒度及び活性度）を正規化した感情指標値を用いて感情を推定し、当該推定した感情をＡＩ学習用の正解値として使用する。したがって、ＡＩモデルは、個人差等が抑制された学習が施されることになり、精度の高い学習済みＡＩモデルが生成されることになる。As described above, according to the learning device 60 of this modified example, emotions are estimated using emotion index values that are normalized from emotion index values (arousal and activity levels) that vary among individuals, and the estimated emotions are used as correct values for AI learning. Therefore, the AI model undergoes learning that suppresses individual differences, and a highly accurate trained AI model is generated.

＜６－２．変形例２＞
次に、別の実施形態として、量子化した感情指標値（覚醒度及び活性度）を推定するＡＩモデルを生成する学習装置８０について説明する。<6-2. Modification 2>
Next, as another embodiment, a learning device 80 that generates an AI model that estimates quantized emotion index values (arousal level and activity level) will be described.

図１８は、感情推定装置７０の概念的構成を示す構成図である。また、図１９は、感情推定装置７０のＡＩモデルを学習する学習装置８０の概念的構成を示す構成図である。なお、図１、図６、図１６、図１７等における各構成と同様の構成については、同じ符号を付し、詳細な説明を省略する。Figure 18 is a configuration diagram showing the conceptual configuration of the emotion estimation device 70. Also, Figure 19 is a configuration diagram showing the conceptual configuration of a learning device 80 that learns the AI model of the emotion estimation device 70. Note that the same reference numerals are used for configurations similar to those in Figures 1, 6, 16, 17, etc., and detailed descriptions are omitted.

感情推定装置７０は、カメラ撮影画像及びマイク集音音声を入力し、挙動判定部２３２はこれらカメラ撮影画像及びマイク集音音声に基づき、視線、顔向き、音声のデータを生成し、これら視線、顔向き、音声のデータを学習済みの第４ＡＩモデル１２１ｍＤ及び第５ＡＩモデル１２１ｍＥに入力データとして出力する。The emotion estimation device 70 inputs the images captured by the camera and the audio picked up by the microphone, and the behavior determination unit 232 generates data on the gaze, face direction, and audio based on the images captured by the camera and the audio picked up by the microphone, and outputs the gaze, face direction, and audio data as input data to the trained fourth AI model 121mD and fifth AI model 121mE.

第４ＡＩモデル１２１ｍＤは、視線、顔向き、音声データの入力に対して量子化された覚醒度を推定する学習済みＡＩモデルであり、被験者であるユーザＵ２の量子化感情指標値情報の量子化覚醒度を、感情推定モデル７２２ｍ（心理平面モデル部分）に出力する。第５ＡＩモデル１２１ｍＥは、視線、顔向き、音声データの入力に対して量子化された活性度を推定する学習済みＡＩモデルであり、被験者であるユーザＵ２の量子化感情指標値情報の量子化活性度を、感情推定モデル７２２ｍ（心理平面モデル部分）に出力する。The fourth AI model 121mD is a trained AI model that estimates quantized arousal levels for inputs of gaze, facial direction, and voice data, and outputs the quantized arousal levels of the quantized emotion index value information of the subject user U2 to the emotion estimation model 722m (psychological plane model portion).The fifth AI model 121mE is a trained AI model that estimates quantized activity levels for inputs of gaze, facial direction, and voice data, and outputs the quantized activity levels of the quantized emotion index value information of the subject user U2 to the emotion estimation model 722m (psychological plane model portion).

なお、量子化は、感情指標値（覚醒度及び活性度）を心理平面モデルの境界値を閾値として量子化したものであり、覚醒度は覚醒／非覚醒に量子化（符号化）され、活性度は活性／非活性に量子化（符号化）される。また、ニュートラル領域境界を閾値として量子化しても良く、その場合、覚醒度は覚醒／ニュートラル／非覚醒に量子化（符号化）され、活性度は活性／ニュートラル／非活性に量子化（符号化）される。The quantization is performed by quantizing the emotional index values (arousal and activity) using the boundary values of the psychological plane model as thresholds, with arousal being quantized (encoded) into arousal/non-arousal and activity being quantized (encoded) into activity/non-activity. Quantization may also be performed using the neutral region boundary as a threshold, in which case arousal is quantized (encoded) into arousal/neutral/non-arousal and activity is quantized (encoded) into activity/neutral/non-activity.

感情推定モデル７２２ｍは、量子化覚醒度及び量子化活性度を心理平面モデルに適用（該当象限領域の決定、或いはニュートラル領域の決定）して感情を推定し、当該推定した感情情報を出力する。なお、感情推定モデル７２２ｍは、図１に示した感情推定モデル２２２ｍと同様の構成であり、図１に示した心理平面テーブル２２４と同様構成の心理平面テーブル７２４を用いて感情を推定する。The emotion estimation model 722m estimates emotion by applying the quantized arousal level and the quantized activity level to the psychological plane model (determining the corresponding quadrant area or the neutral area) and outputs the estimated emotion information. Note that the emotion estimation model 722m has the same configuration as the emotion estimation model 222m shown in FIG. 1, and estimates emotion using a psychological plane table 724 that has the same configuration as the psychological plane table 224 shown in FIG. 1.

学習装置８０は、脳波センサＳ１１及び心拍センサＳ１２が検出した脳波情報及び心拍情報を入力する。指標値算出部１３２は、これら脳波情報及び心拍情報に基づき、感情指標である覚醒度及び活性度を算出し、正規化部１３３に出力する。そして、正規化部１３３は、これら覚醒度及び活性度を正規化し、正規化覚醒度及び正規化活性度を量子化部８３７に出力する。The learning device 80 inputs the brain wave information and heart rate information detected by the brain wave sensor S11 and the heart rate sensor S12. The index value calculation unit 132 calculates the arousal level and activity level, which are emotion indices, based on the brain wave information and heart rate information, and outputs them to the normalization unit 133. The normalization unit 133 then normalizes the arousal level and activity level, and outputs the normalized arousal level and normalized activity level to the quantization unit 837.

量子化部８３７は、正規化覚醒度及び正規化活性度を量子化し、量子化覚醒度及び量子化活性度を学習前の第４ＡＩモデル１２１ｍｄ及び第５ＡＩモデル１２１ｍｅに正解値として、各々出力する。量子化は、感情指標値（覚醒度及び活性度）を心理平面モデルの境界値を閾値として量子化したものであり、覚醒度は覚醒／非覚醒に量子化（符号化）され、活性度は活性／非活性に量子化（符号化）される。また、ニュートラル領域境界を閾値として量子化しても良く、その場合、覚醒度は覚醒／ニュートラル／非覚醒に量子化（符号化）され、活性度は活性／ニュートラル／非活性に量子化（符号化）される。また、量子化閾値は、設計者等が実験等に基づき設定することも可能であるが、正規化部１３３が正規化処理時に算出したニュートラル領域境界値を利用することもできる。The quantization unit 837 quantizes the normalized arousal level and normalized activity level, and outputs the quantized arousal level and the quantized activity level to the fourth AI model 121md and the fifth AI model 121me before learning as correct values, respectively. Quantization is performed by quantizing the emotional index values (arousal level and activity level) using the boundary values of the psychological plane model as thresholds, and the arousal level is quantized (encoded) into arousal/non-arousal, and the activity level is quantized (encoded) into activity/non-activity. Quantization may also be performed using the neutral region boundary as a threshold, in which case the arousal level is quantized (encoded) into arousal/neutral/non-arousal, and the activity level is quantized (encoded) into activity/neutral/non-activity. The quantization threshold can be set by a designer or the like based on experiments, or the neutral region boundary value calculated by the normalization unit 133 during the normalization process can be used.

また、学習装置８０は、カメラ撮影画像及びマイク集音音声を入力する。挙動判定部１３４は、これらカメラ撮影画像及びマイク集音音声に基づき、視線、顔向き、音声のデータを生成し、これら視線、顔向き、音声のデータを第４ＡＩモデル１２１ｍｄ及び第５ＡＩモデル１２１ｍｅに入力データとして出力する。The learning device 80 also inputs images captured by a camera and audio picked up by a microphone. The behavior determination unit 134 generates data on the gaze, facial direction, and audio based on the images captured by the camera and the audio picked up by the microphone, and outputs the data on the gaze, facial direction, and audio to the fourth AI model 121md and the fifth AI model 121me as input data.

そして、学習装置８０は、誤差逆伝播学習法等の学習アルゴリズムを用いて、第４ＡＩモデル１２１ｍｄ及び第５ＡＩモデル１２１ｍｅにおける重み等のパラメータを調整する。そして、学習装置８０は、学習期間に順次入力（計測）された上記各データを用いて第４ＡＩモデル１２１ｍｄ及び第５ＡＩモデル１２１ｍｅに対して上記学習処理を実行し、学習済みの第４ＡＩモデル１２１ｍＤ及び第５ＡＩモデル１２１ｍＥを生成する。Then, the learning device 80 adjusts parameters such as weights in the fourth AI model 121md and the fifth AI model 121me using a learning algorithm such as the backpropagation learning method.Then, the learning device 80 executes the above learning process on the fourth AI model 121md and the fifth AI model 121me using the above data sequentially input (measured) during the learning period, and generates the learned fourth AI model 121mD and fifth AI model 121mE.

以上のように、本変形例の学習装置８０によれば、個人差等がある感情指標値（覚醒度及び活性度）を正規化した感情指標値を用いて量子化感情指標値を推定し、当該推定した量子化感情指標値をＡＩ学習用の正解値として使用する。したがって、ＡＩモデルは個人差等が抑制された学習が施されることになり、精度の高い学習済みＡＩモデル(量子化感情指標値推定)が生成されることになる。結果、当該量子化感情指標値を用いた感情推定の精度の向上が期待できる。As described above, according to the learning device 80 of this modified example, a quantized emotion index value is estimated using an emotion index value that is normalized from emotion index values (arousal and activity) that have individual differences, etc., and the estimated quantized emotion index value is used as the correct answer value for AI learning. Therefore, the AI model is subjected to learning that suppresses individual differences, etc., and a highly accurate trained AI model (quantized emotion index value estimation) is generated. As a result, it is expected that the accuracy of emotion estimation using the quantized emotion index value will be improved.

＜７．留意事項等＞
本明細書中で実施形態として開示された種々の技術的特徴は、その技術的創作の趣旨を逸脱しない範囲で種々の変更を加えることが可能である。すなわち、上記実施形態は、全ての点で例示であって、制限的なものではない。本発明の技術的範囲は、上記実施形態の説明ではなく、特許請求の範囲によって示されるものであり、特許請求の範囲と均等の意味及び範囲内に属する全ての変更が含まれる。また、本明細書中で示した複数の実施形態は、可能な範囲で適宜組み合わせて実施して良い。<7. Important points to note>
Various technical features disclosed as embodiments in this specification can be modified in various ways without departing from the spirit of the technical creation. In other words, the above-mentioned embodiments are illustrative in all respects and are not restrictive. The technical scope of the present invention is indicated by the claims, not the description of the above-mentioned embodiments, and includes all modifications that fall within the meaning and scope of the claims. In addition, the multiple embodiments shown in this specification may be combined as appropriate to the extent possible.

また、上記実施形態では、プログラムに従ったＣＰＵの演算処理によってソフトウェア的に各種の機能が実現されていると説明したが、これらの機能の少なくとも一部は電気的なハードウェア資源によって実現されて良い。ハードウェア資源としては、例えばＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等であって良い。また逆に、ハードウェア資源によって実現されるとした機能の少なくとも一部は、ソフトウェア的に実現されて良い。In the above embodiment, it has been described that various functions are realized in a software manner by the arithmetic processing of the CPU according to the program, but at least some of these functions may be realized by electrical hardware resources. Examples of hardware resources may include an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). Conversely, at least some of the functions realized by hardware resources may be realized by software.

また、学習装置１０及び感情推定装置２０のそれぞれの少なくとも一部の機能をプロセッサ（コンピュータ）に実現させるコンピュータプログラムが含まれて良い。なお、そのようなコンピュータプログラムは、コンピュータ読取り可能な不揮発性記録媒体（例えば上述の不揮発性メモリの他、光記録媒体（例えば光ディスク）、光磁気記録媒体（例えば光磁気ディスク）、ＵＳＢメモリ、或いはＳＤカード等）に記憶され提供（販売等）されることが可能で、またインターネット等の通信回線を介してサーバ装置で提供する方法、いわゆるダウンロードによる提供も可能である。It may also include a computer program that causes a processor (computer) to realize at least some of the functions of the learning device 10 and the emotion estimation device 20. Such a computer program can be stored in a computer-readable non-volatile recording medium (for example, in addition to the non-volatile memory described above, an optical recording medium (for example, an optical disk), a magneto-optical recording medium (for example, a magneto-optical disk), a USB memory, or an SD card, etc.) and provided (sold, etc.), and can also be provided by a method of providing the program on a server device via a communication line such as the Internet, that is, by downloading the program.

１０、６０、８０学習装置
１３コントローラ
２０、５０、７０感情推定装置
２２記憶部
２３コントローラ
３０車両制御装置
３１コントローラ
４０車両制御システム
４１アクチュエータ部
４２報知部
１２１ｍＡ、１２１ｍａ第１ＡＩモデル
１２１ｍＢ、１２１ｍｂ第２ＡＩモデル
１３１取得部
１３２指標値算出部
１３３正規化部
１３４挙動判定部
１３５生成部
１３６提供部
２２２ｍ感情推定モデル
２３１取得部
２３２挙動判定部
２３３感情推定部
２３４提供部
ＮＲニュートラル領域
ＮＲ１覚醒度ニュートラル領域
ＮＲ２活性度ニュートラル領域
Ｕ１、Ｕ２ユーザ 10, 60, 80 Learning device 13 Controller 20, 50, 70 Emotion estimation device 22 Memory unit 23 Controller 30 Vehicle control device 31 Controller 40 Vehicle control system 41 Actuator unit 42 Notification unit 121mA, 121ma First AI model 121mB, 121mb Second AI model 131 Acquisition unit 132 Index value calculation unit 133 Normalization unit 134 Behavior determination unit 135 Generation unit 136 Provision unit 222m Emotion estimation model 231 Acquisition unit 232 Behavior determination unit 233 Emotion estimation unit 234 Provision unit NR Neutral region NR1 Arousal neutral region NR2 Activity neutral region U1, U2 User

Claims

Translated fromJapanese

同じタイミングで、被験者の生体信号と外観情報を取得し、
取得した前記生体信号に基づき感情指標値を算出し、
算出した前記感情指標値を正規化して正規化感情指標値を算出し、
取得した前記外観情報を入力値とし、前記正規化感情指標値を正解値とする教師付き学習データを生成し、
生成した前記教師付き学習データでＡＩモデルを学習する、
学習方法。 At the same time, the subject's biosignals and appearance information are acquired.
Calculating an emotion index value based on the acquired biological signal;
Normalizing the calculated emotion index value to calculate a normalized emotion index value;
generating supervised learning data with the acquired appearance information as an input value and the normalized emotion index value as a correct answer value;
Learning an AI model with the generated supervised learning data;
How to learn.

前記感情指標値が特定状態になると推定されるタスクを前記被験者が実行した際に算出される前記感情指標値を用いた正規化方式に基づき前記正規化感情指標値を算出する、
請求項１記載の学習方法。 calculating the normalized emotion index value based on a normalization method using the emotion index value calculated when the subject executes a task in which the emotion index value is estimated to be in a specific state;
The learning method according to claim 1.

前記正規化方式は、
前記被験者の前記感情指標値が高指標値側に振れる高指標値状態、及び低指標値側に振れる低指標値状態である前記特定状態において、
前記高指標値状態における前記感情指標値の下限境界と、前記低指標値状態における前記感情指標値の上限境界とを算出し、
前記下限境界と前記上限境界とで挟まれる領域が、予め定められた標準境界となるように正規化する、
請求項２記載の学習方法。 The normalization method is:
In the specific state in which the emotion index value of the subject fluctuates to the high index value side and the specific state in which the emotion index value of the subject fluctuates to the low index value side,
Calculating a lower boundary of the emotion index value in the high index value state and an upper boundary of the emotion index value in the low index value state;
normalizing the area between the lower boundary and the upper boundary to a predetermined standard boundary;
The learning method according to claim 2.

前記正規化方式は、
前記被験者の前記感情指標値がニュートラル領域内となる前記特定状態において、
前記特定状態における前記感情指標値の下限境界と上限境界とを算出し、
前記下限境界と前記上限境界とで挟まれる領域が、予め定められた標準境界となるように正規化する、
請求項２記載の学習方法。 The normalization method is:
In the specific state in which the emotion index value of the subject is within a neutral region,
Calculating a lower boundary and an upper boundary of the emotion index value in the specific state;
normalizing the area between the lower boundary and the upper boundary to a predetermined standard boundary;
The learning method according to claim 2.

複数の前記感情指標値のデータを記憶し、
記憶した複数の前記感情指標値の統計処理結果に基づいて正規化を実行する、
請求項１記載の学習方法。 storing a plurality of pieces of emotion index value data;
Normalization is performed based on the results of statistical processing of the stored plurality of emotion index values.
The learning method according to claim 1.

被験者の生体信号に基づき指標値算出方式を用いて算出される感情推定用の感情指標値を、前記被験者の外観情報から推定するＡＩモデルの学習方法であって、
同じタイミングで、前記被験者の生体信号と外観情報を取得し、
取得した前記生体信号に基づき前記指標値算出方式を用いて感情指標値を算出し、
算出した前記感情指標値を正規化して正規化感情指標値を算出し、
取得した前記外観情報を入力値とし、前記正規化感情指標値を正解値とする教師付き学習データを生成し、
生成した前記教師付き学習データで前記ＡＩモデルを学習する、
学習方法。 A learning method for an AI model that estimates an emotion index value for emotion estimation, the emotion index value being calculated using an index value calculation method based on a biological signal of a subject, from appearance information of the subject, comprising:
At the same time, the subject's biosignal and appearance information are acquired,
Calculating an emotion index value using the index value calculation method based on the acquired biological signal;
Normalizing the calculated emotion index value to calculate a normalized emotion index value;
generating supervised learning data with the acquired appearance information as an input value and the normalized emotion index value as a correct answer value;
Training the AI model with the generated supervised training data;
How to learn.

同じタイミングで、被験者の生体信号と外観情報を取得し、
取得した前記生体信号に基づき複数の感情指標値を算出し、
算出した複数の前記感情指標値を正規化して複数の正規化感情指標値を算出し、
算出した複数の前記正規化感情指標値に基づき感情を推定し、
取得した前記外観情報を入力値とし、推定した前記感情を正解値とする教師付き学習データを生成し、
生成した前記教師付き学習データでＡＩモデルを学習する、
学習方法。 At the same time, the subject's biosignals and appearance information are acquired.
Calculating a plurality of emotion index values based on the acquired biological signals;
normalizing the calculated emotion index values to calculate normalized emotion index values;
Estimating an emotion based on the calculated normalized emotion index values;
generating supervised learning data with the acquired appearance information as an input value and the estimated emotion as a correct answer value;
Learning an AI model with the generated supervised learning data;
How to learn.

同じタイミングで、被験者の生体信号と外観情報を取得し、
取得した前記生体信号に基づき感情指標値を算出し、
算出した前記感情指標値を正規化して正規化感情指標値を算出し、
取得した前記外観情報を入力値とし、前記正規化感情指標値を正解値とする教師付き学習データを生成し、
生成した前記教師付き学習データでＡＩモデルを学習する、
学習装置。 At the same time, the subject's biosignals and appearance information are acquired.
Calculating an emotion index value based on the acquired biological signal;
Normalizing the calculated emotion index value to calculate a normalized emotion index value;
generating supervised learning data with the acquired appearance information as an input value and the normalized emotion index value as a correct answer value;
Learning an AI model with the generated supervised learning data;
Learning device.

同じタイミングで、被験者の生体信号と外観情報を取得し、
取得した前記生体信号に基づき感情指標値を算出し、
算出した前記感情指標値を正規化して正規化感情指標値を算出し、
取得した前記外観情報を入力値とし、前記正規化感情指標値を正解値とする教師付き学習データを生成し、
生成した前記教師付き学習データでＡＩモデルを学習する、
学習プログラム。 At the same time, the subject's biosignals and appearance information are acquired.
Calculating an emotion index value based on the acquired biological signal;
Normalizing the calculated emotion index value to calculate a normalized emotion index value;
generating supervised learning data with the acquired appearance information as an input value and the normalized emotion index value as a correct answer value;
Learning an AI model with the generated supervised learning data;
Study program.

被験者の脳波に基づき算出される覚醒度と、前記被験者の心拍に基づき算出される活性度とを、前記被験者の外観情報から推定するＡＩモデルの学習方法であって、
同じタイミングで、前記被験者の前記脳波と前記心拍と前記外観情報とを取得し、
取得した前記脳波におけるβ波とα波の比に基づき前記覚醒度を算出し、
取得した前記心拍における波形信号の低周波成分の標準偏差に基づき前記活性度を算出し、
算出した前記覚醒度及び前記活性度を正規化して、正規化覚醒度及び正規化活性度を算出し、
取得した前記外観情報を入力値とし、前記正規化覚醒度を正解値とする教師付き覚醒度学習データを生成し、
取得した前記外観情報を入力値とし、前記正規化活性度を正解値とする教師付き活性度学習データを生成し、
生成した前記教師付き覚醒度学習データで覚醒度の推定用の前記ＡＩモデルを学習し、
生成した前記教師付き活性度学習データで活性度の推定用の前記ＡＩモデルを学習する、
学習方法。 A learning method for an AI model that estimates a wakefulness level calculated based on a subject's electroencephalogram and an activity level calculated based on the subject's heart rate from appearance information of the subject,
The electroencephalogram, the heart rate, and the appearance information of the subject are acquired at the same time;
Calculating the level of arousal based on a ratio of beta waves to alpha waves in the acquired electroencephalogram;
Calculating the activity level based on a standard deviation of a low-frequency component of the waveform signal at the acquired heart rate;
Normalizing the calculated wakefulness and activity to calculate a normalized wakefulness and a normalized activity;
generating supervised alertness learning data with the acquired appearance information as an input value and the normalized alertness as a correct answer value;
generating supervised activity learning data with the acquired appearance information as an input value and the normalized activity as a correct answer value;
The AI model for estimating the level of alertness is trained using the generated supervised alertness training data;
The AI model for estimating activity is trained using the generated supervised activity learning data.
How to learn.

被験者の外観情報を取得し、取得した前記外観情報をＡＩモデルに入力して前記被験者の感情を推定する感情推定装置であって、
前記ＡＩモデルは、
同じタイミングで、前記被験者の生体信号と外観情報を取得し、
取得した前記生体信号に基づき複数の感情指標値を算出し、
算出した複数の前記感情指標値を正規化して複数の正規化感情指標値を算出し、
算出した複数の前記正規化感情指標値に基づき感情を推定し、
取得した前記外観情報が入力値で、推定した前記感情が正解値である教師付き学習データで学習して生成される
感情推定装置。 An emotion estimation device that acquires appearance information of a subject, inputs the acquired appearance information to an AI model, and estimates an emotion of the subject,
The AI model is
At the same time, the subject's biosignal and appearance information are acquired,
Calculating a plurality of emotion index values based on the acquired biological signals;
normalizing the calculated emotion index values to calculate normalized emotion index values;
Estimating an emotion based on the calculated normalized emotion index values;
The emotion estimation device is generated by learning with supervised learning data in which the acquired appearance information is an input value and the estimated emotion is a ground truth value.

被験者の脳波に基づき算出される覚醒度と、前記被験者の心拍に基づく算出される活性度とを、前記被験者の外観情報から推定するＡＩモデルを有し、
前記被験者の前記外観情報を取得し、取得した前記外観情報を前記ＡＩモデルに入力して前記被験者の感情を推定する感情推定装置であって、
前記ＡＩモデルは、
同じタイミングで、前記被験者の前記脳波と前記心拍と前記外観情報とを取得し、
取得した前記脳波におけるβ波とα波の比に基づき前記覚醒度を算出し、
取得した前記心拍における波形信号の低周波成分の標準偏差に基づき前記活性度を算出し、
算出した前記覚醒度及び前記活性度を正規化して、正規化覚醒度及び正規化活性度を算出し、
取得した前記外観情報を入力値とし、前記正規化覚醒度を正解値とする教師付き覚醒度学習データを生成し、
取得した前記外観情報を入力値とし、前記正規化活性度を正解値とする教師付き活性度学習データを生成し、
生成した前記教師付き覚醒度学習データで前記覚醒度の推定用の前記ＡＩモデルを学習し、
生成した前記教師付き活性度学習データで前記活性度の推定用の前記ＡＩモデルを学習して生成される、
感情推定装置。 The device has an AI model that estimates a degree of alertness calculated based on the electroencephalogram of a subject and a degree of activity calculated based on the heart rate of the subject from appearance information of the subject,
An emotion estimation device that acquires the appearance information of the subject, inputs the acquired appearance information to the AI model, and estimates an emotion of the subject,
The AI model is
The electroencephalogram, the heart rate, and the appearance information of the subject are acquired at the same time;
Calculating the level of arousal based on a ratio of beta waves to alpha waves in the acquired electroencephalogram;
Calculating the activity level based on a standard deviation of a low-frequency component of the waveform signal at the acquired heart rate;
Normalizing the calculated wakefulness and activity to calculate a normalized wakefulness and a normalized activity;
generating supervised alertness learning data with the acquired appearance information as an input value and the normalized alertness as a correct answer value;
generating supervised activity learning data with the acquired appearance information as an input value and the normalized activity as a correct answer value;
The AI model for estimating the level of alertness is trained using the generated supervised alertness training data;
The AI model for estimating the activity level is generated by learning the AI model for estimating the activity level using the generated supervised activity level learning data.
Emotion estimation device.