JP2019202131A

Movatterモバイル変換

Info

Publication number: JP2019202131A
Application number: JP2019090107A
Authority: JP
Inventors: 堀内　一仁; Kazuhito Horiuchi; 一仁堀内; 渡辺　伸之; Nobuyuki Watanabe; 伸之渡辺; 金子　善興; Yoshioki Kaneko; 善興金子; 英敏西村; Hidetoshi Nishimura
Original assignee: Olympus Corp
Current assignee: Olympus Corp
Priority date: 2018-05-17
Filing date: 2019-05-10
Publication date: 2019-11-28

Abstract

To provide an information processing apparatus, an information processing method, and a program capable of correcting a time difference between a steadily gazing point of a sight line and a time at which a voice is generated.SOLUTION: An information processing apparatus 1 comprises: an analysis unit 17 for analyzing a gazing period in which a gazing degree of a sight line of a user for each of a plurality of observation positions of a correction image displayed by a display unit 12 is equal to or larger than a predetermined value on the basis of gazing data; a setting unit 18 which sets a time for which a voice is uttered for voice data as an important voice period on the basis of the voice data for correction which is recorded by a voice data storage unit 165 for correction; and a calibration generation unit 19 which generates a time difference between the gazing period and the important voice period as calibration data of a user and records it in a recording unit 16.SELECTED DRAWING: Figure 1

Description

Translated fromJapanese

本開示は、音声データと視線データとを処理する情報処理装置、情報処理方法およびプログラムに関する。 The present disclosure relates to an information processing apparatus, an information processing method, and a program for processing audio data and line-of-sight data.

近年、画像データ等の情報を処理する情報処理装置において、利用者の音声を検出したときから、所定の時間遡った期間において、表示部が表示する画像上における複数の表示領域に対して、利用者の視線が最も長く停留した画像の表示領域を注目情報として検出するとともに、この注目情報と音声とを対応付けて記録する技術が知られている（特許文献１参照）。 In recent years, an information processing apparatus that processes information such as image data has been used for a plurality of display areas on an image displayed on a display unit in a period retroactive to a predetermined time from when a user's voice is detected. A technique is known in which a display area of an image in which a person's line of sight has been stopped the longest is detected as attention information, and the attention information and sound are recorded in association with each other (see Patent Document 1).

また、注視注釈システムにおいて、コンピューティングデバイスの表示デバイスが表示する画像に対して、注視追跡デバイスによって検出された利用者が注視する注視点の近くに注釈アンカーを表示するとともに、この注釈アンカーに音声によって情報を入力する技術が知られている（特許文献２参照）。 Further, in the gaze annotation system, an annotation anchor is displayed near the gaze point that the user gazes detected by the gaze tracking device with respect to the image displayed by the display device of the computing device, and the annotation anchor is also voiced. There is known a technique for inputting information according to (see Patent Document 2).

特許第４２８２３４３号公報Japanese Patent No. 4282343特開２０１６−１８１２４５号公報Japanese Patent Laid-Open No. 2006-181245

ところで、使用者によっては、視線の注視点と音声を発する時間とが異なる場合がある。しかしながら、上述した特許文献１，２では、視線の注視点と音声が発せられた時間との時間差については何ら考慮されていなかった。 By the way, depending on the user, the gaze point of the line of sight and the time for sounding may be different. However, in Patent Documents 1 and 2 described above, no consideration is given to the time difference between the point of gaze of the line of sight and the time at which the sound is emitted.

本発明は、上記に鑑みてなされたものであって、視線の注視点と音声が発せられた時間との時間差を使用者に合わせて補正することができる情報処理装置、情報処理方法およびプログラムを提供することを目的とする。 The present invention has been made in view of the above, and provides an information processing apparatus, an information processing method, and a program capable of correcting a time difference between a gaze point of a line of sight and a time when sound is emitted in accordance with a user. The purpose is to provide.

上述した課題を解決し、目的を達成するために、本開示に係る情報処理装置は、複数の観察箇所の各々の座標位置が設定された補正用画像を表示する表示部と、前記補正用画像の前記複数の観察箇所の各々で発声される音声が設定された補正用音声データを記録する補正用音声データ記録部と、利用者の視線を連続的に検出することによって視線データを生成する視線検出部と、前記利用者の音声の入力を受け付けることによって前記視線データと同じ時間軸が対応付けられた音声データを生成する音声入力部と、前記視線検出部によって生成された視線データに基づいて、前記複数の観察箇所の各々に対する前記利用者の視線の注視度が所定の値以上となる注視期間を解析する解析部と、前記補正用音声データ記録部が記録する前記補正用音声データに基づいて、前記音声データに対して前記音声が発声された期間を重要音声期間として設定する設定部と、前記注視期間と前記重要音声期間との時間差に基づき前記利用者のキャリブレーションデータを生成して記録部へ記録するキャリブレーション生成部と、を備える。 In order to solve the above-described problems and achieve the object, an information processing apparatus according to the present disclosure includes a display unit that displays a correction image in which coordinate positions of a plurality of observation locations are set, and the correction image. A correction audio data recording unit that records correction audio data in which a sound uttered at each of the plurality of observation points is set, and a line of sight that generates line-of-sight data by continuously detecting the user's line of sight Based on a detection unit, a voice input unit that receives voice input of the user and generates voice data associated with the same time axis as the line-of-sight data, and the line-of-sight data generated by the line-of-sight detection unit An analysis unit that analyzes a gaze period in which a gaze degree of the user's line of sight with respect to each of the plurality of observation points is equal to or greater than a predetermined value; and the correction audio data recording unit that records Based on voice data, a setting unit that sets a period during which the voice is uttered with respect to the voice data as an important voice period, and calibration data of the user based on a time difference between the gaze period and the important voice period A calibration generation unit that generates and records the data in the recording unit.

また、本開示に係る情報処理装置は、上記開示において、前記キャリブレーション生成部は、前記注視期間と前記重要音声期間との時間差を複数回算出し、該複数回の算出結果の統計的な特徴に基づき、前記キャリブレーションデータを生成する。 Further, in the information processing apparatus according to the present disclosure, in the above disclosure, the calibration generation unit calculates a time difference between the gaze period and the important audio period a plurality of times, and statistical characteristics of the calculation results of the plurality of times Based on the above, the calibration data is generated.

また、本開示に係る情報処理装置は、上記開示において、前記キャリブレーションデータは、前記視線データおよび前記音声データのどちらか一方の開始時刻または終了時刻を基準にしたときの時刻差にかかわるものである。 In the information processing apparatus according to the present disclosure, in the above disclosure, the calibration data is related to a time difference when one of the line-of-sight data and the audio data is based on a start time or an end time. is there.

また、本開示に係る情報処理装置は、上記開示において、前記キャリブレーションデータは、前記視線データおよび前記音声データのどちらか一方を基準にしたときの期間の長さにかかわるものである。 Further, in the information processing apparatus according to the present disclosure, in the above disclosure, the calibration data is related to a length of a period when one of the line-of-sight data and the audio data is used as a reference.

また、本開示に係る情報処理装置は、上記開示において、前記キャリブレーション生成部は、前記注視期間の開始時刻と前記重要音声期間の開始時刻との時間差または前記注視期間の終了時間と前記音声重要期間の終了時刻の時間差を前記キャリブレーションデータとして生成する。 Further, in the information processing apparatus according to the present disclosure, in the above disclosure, the calibration generation unit may include a time difference between a start time of the gaze period and a start time of the important audio period or an end time of the gaze period and the audio important A time difference between the end times of the periods is generated as the calibration data.

また、本開示に係る情報処理装置は、上記開示において、前記設定部は、外部から入力された操作信号に応じて予め指定されたキーワードが発声された期間を前記重要音声期間としてさらに設定する。 Moreover, in the information processing apparatus according to the present disclosure, in the above disclosure, the setting unit further sets a period in which a keyword designated in advance according to an operation signal input from the outside is uttered as the important voice period.

また、本開示に係る情報処理装置は、上記開示において、前記視線検出部は、前記キャリブレーション生成部が前記キャリブレーションデータを生成した後に、前記利用者の視線を連続的に検出することによって得られる新たな視線データを取得し、前記音声入力部は、前記キャリブレーション生成部が前記キャリブレーションデータを生成した後に、前記利用者の音声の入力を受け付けることによって得られ、前記新たな視線データと同じ時間軸が対応付けられた新たな音声データを生成し、前記解析部は、前記新たな視線データに基づいて、前記利用者の視線の新たな注視度が所定の値以上となる新たな注視期間を解析し、前記設定部は、前記キャリブレーションデータに基づいて、前記新たな注視期間と前記新たな音声データに含まれる新たな重要音声期間の時間差を補正する。 Further, in the above disclosure, the information processing apparatus according to the present disclosure is obtained by continuously detecting the line of sight of the user after the calibration generation unit generates the calibration data. And the voice input unit is obtained by receiving the user's voice input after the calibration generation unit generates the calibration data, and the new line-of-sight data Generating new audio data associated with the same time axis, and the analysis unit generates a new gaze in which a new gaze degree of the user's gaze is a predetermined value or more based on the new gaze data Analyzing the period, the setting unit is included in the new gaze period and the new audio data based on the calibration data. To correct the time difference between the new important voice period.

また、本開示に係る情報処理装置は、上記開示において、前記設定部は、前記新たな音声データに対して、前記新たな注視度と前記キャリブレーションデータとを用いて前記新たな注視度に応じた重要度を割り当てて前記記録部へ記録する。 Further, in the information processing apparatus according to the present disclosure, in the above disclosure, the setting unit responds to the new audio data using the new gaze degree and the calibration data according to the new gaze degree. Assigned importance is recorded in the recording unit.

また、本開示に係る情報処理装置は、上記開示において、前記設定部は、前記新たな音声データに対して、前記新たな音声データの中で発声された発声期間を設定し、かつ、前記新たな音声データに対して、前記発声期間と前記新たな注視度と前記キャリブレーションデータとを用いて前記新たな注視度に応じた重要度を割り当てて前記記録部へ記録する。 Further, in the information processing apparatus according to the present disclosure, in the above disclosure, the setting unit sets an utterance period uttered in the new voice data for the new voice data, and the new voice data The importance level corresponding to the new gaze degree is assigned to the voice data using the utterance period, the new gaze degree, and the calibration data, and is recorded in the recording unit.

また、本開示に係る情報処理装置は、上記開示において、前記設定部は、前記新たな音声データの中で所定のキーワードが発声された期間を新たな重要発声期間として設定し、かつ、前記新たな視線データに対して、前記新たな重要発声期間と前記キャリブレーションデータとを用いて前記新たな重要発声期間に応じた重要度を割り当てて前記記録部へ記録する。 Further, in the information processing apparatus according to the present disclosure, in the above disclosure, the setting unit sets a period during which a predetermined keyword is uttered in the new voice data as a new important utterance period, and An importance level corresponding to the new important utterance period is assigned to the line-of-sight data using the new important utterance period and the calibration data, and recorded in the recording unit.

また、本開示に係る情報処理装置は、上記開示において、前記設定部は、前記新たな音声データの中で所定のキーワードが発声された期間を新たな重要発声期間として設定し、かつ、前記新たな視線データに対して、前記新たな注視期間と前記新たな重要発声期間と前記キャリブレーションデータとを用いて前記重要発声期間に応じた重要度を割り当てて前記記録部へ記録する。 Further, in the information processing apparatus according to the present disclosure, in the above disclosure, the setting unit sets a period during which a predetermined keyword is uttered in the new voice data as a new important utterance period, and An importance level corresponding to the important utterance period is assigned to and recorded in the recording unit using the new gaze period, the new important utterance period, and the calibration data.

また、本開示に係る情報処理装置は、上記開示において、前記解析部は、前記視線の移動速度、一定の時間内における前記視線の移動距離、一定領域内における前記視線の滞留時間のいずれか一つを検出することによって、前記注視度を解析する。 Further, in the information processing apparatus according to the present disclosure, in the above disclosure, the analysis unit is any one of a movement speed of the line of sight, a movement distance of the line of sight within a certain period of time, and a dwell time of the line of sight within a certain area. The gaze degree is analyzed by detecting one.

また、本開示に係る情報処理装置は、上記開示において、外部から入力される画像データに対応する画像上に前記解析部が解析した前記新たな注視度および該新たな注視度の座標情報を関連付けた視線マッピングデータを生成する生成部をさらに備える。 In the above disclosure, the information processing apparatus according to the present disclosure associates the new gaze degree analyzed by the analysis unit and the coordinate information of the new gaze degree on an image corresponding to image data input from the outside. And a generation unit that generates the line-of-sight mapping data.

また、本開示に係る情報処理装置は、上記開示において、前記解析部は、前記新たな視線データに基づいて、前記利用者の視線の軌跡をさらに解析し、前記生成部は、前記解析部が解析した前記軌跡を前記画像上にさらに関連付けて前記視線マッピングデータを生成する。 Further, in the information processing apparatus according to the present disclosure, in the above disclosure, the analysis unit further analyzes a locus of the user's line of sight based on the new line of sight data, and the generation unit includes the analysis unit The line-of-sight mapping data is generated by further associating the analyzed locus on the image.

また、本開示に係る情報処理装置は、上記開示において、前記新たな音声データを文字情報に変換する変換部をさらに備え、前記生成部は、前記座標情報に前記文字情報をさらに関連付けて前記視線マッピングデータを生成する。 Further, in the above disclosure, the information processing apparatus according to the present disclosure further includes a conversion unit that converts the new voice data into character information, and the generation unit further associates the character information with the coordinate information, and Generate mapping data.

また、本開示に係る情報処理装置は、上記開示において、標本を観察する観察倍率を変更可能であり、前記利用者が前記標本の観察像を観察可能な接眼部を有する顕微鏡と、前記顕微鏡に接続され、前記顕微鏡が結像した前記標本の観察像を撮像することによって画像データを生成する撮像部をさらに備え、前記視線検出部は、前記顕微鏡の接眼部に設けられ、前記設定部は、前記観察倍率に応じて前記重要度の重み付けを行う。 Further, the information processing apparatus according to the present disclosure, in the above-described disclosure, is capable of changing an observation magnification for observing a specimen, and includes a microscope having an eyepiece that allows the user to observe an observation image of the specimen, and the microscope And an imaging unit that generates image data by imaging an observation image of the specimen imaged by the microscope, and the line-of-sight detection unit is provided in an eyepiece unit of the microscope, and the setting unit Performs the weighting of the importance according to the observation magnification.

また、本開示に係る情報処理装置は、上記開示において、被検体に挿入可能な挿入部の先端部に設けられ、被検体内の体内を撮像することによって画像データを生成する撮像部と、視野を変更するための各種の操作の入力を受け付ける操作部と、を有する内視鏡と、をさらに備える。 Further, in the above disclosure, the information processing apparatus according to the present disclosure is provided at a distal end portion of an insertion portion that can be inserted into a subject, and generates an image data by imaging the inside of the subject, and a visual field And an endoscope having an operation unit that receives input of various operations for changing.

また、本開示に係る情報処理装置は、上記開示において、前記新たな注視期間に該当する前記新たな注視度を有する前記新たな視線データと、前記新たな重要音声期間に該当する前記重要度を有する前記新たな音声データと、を関連付けたデータを生成する生成部をさらに備える。 Further, in the above disclosure, the information processing apparatus according to the present disclosure includes the new line-of-sight data having the new gaze degree corresponding to the new gaze period and the importance corresponding to the new important voice period. The apparatus further includes a generating unit that generates data in which the new voice data is associated.

また、本開示に係る情報処理方法は、複数の観察箇所の各々の座標位置が設定された補正用画像を表示する表示ステップと、利用者の視線を連続的に検出することによって視線データを生成する視線検出ステップと、前記利用者の音声の入力を受け付けることによって前記視線データと同じ時間軸が対応付けられた音声データを生成する音声入力ステップと、前記視線検出ステップによって生成された視線データに基づいて、前記複数の観察箇所の各々に対する前記利用者の視線の注視度が所定の値以上となる注視期間を解析する解析ステップと、前記補正用画像の前記複数の観察箇所の各々で発声される音声が設定された補正用音声データを記録する補正用音声データ記録部が記録する前記補正用音声データに基づいて、前記音声データに対して前記音声が発声された時間を重要音声期間として設定する設定ステップと、前記注視期間と前記重要音声期間との時間差を前記利用者のキャリブレーションデータとして生成して記録部へ記録するキャリブレーション生成ステップと、を含む。 In addition, the information processing method according to the present disclosure generates a line-of-sight data by displaying a correction image in which the coordinate positions of each of a plurality of observation locations are set, and continuously detecting a user's line of sight A line of sight detection step, a voice input step of generating voice data associated with the same time axis as the line of sight data by receiving input of the user's voice, and the line of sight data generated by the line of sight detection step. Based on the analysis step of analyzing a gaze period in which the gaze degree of the user's line of sight with respect to each of the plurality of observation locations is equal to or greater than a predetermined value, and is uttered at each of the plurality of observation locations of the correction image Based on the correction audio data recorded by the correction audio data recording unit for recording the correction audio data in which the audio to be set is recorded, the audio data A step of setting the time when the voice is uttered as an important voice period, and a calibration for generating a time difference between the gaze period and the important voice period as calibration data of the user and recording it in the recording unit Generating step.

また、本開示に係るプログラムは、情報処理装置に、複数の観察箇所の各々の座標位置が設定された補正用画像を表示する表示ステップと、利用者の視線を連続的に検出することによって視線データを生成する視線検出ステップと、前記利用者の音声の入力を受け付けることによって前記視線データと同じ時間軸が対応付けられた音声データを生成する音声入力ステップと、前記視線検出ステップによって生成された視線データに基づいて、前記複数の観察箇所の各々に対する前記利用者の視線の注視度が所定の値以上となる注視期間を解析する解析ステップと、前記補正用画像の前記複数の観察箇所の各々で発声される音声が設定された補正用音声データを記録する補正用音声データ記録部が記録する前記補正用音声データに基づいて、前記音声データに対して前記音声が発声された時間を重要音声期間として設定する設定ステップと、前記注視期間と前記重要音声期間との時間差を前記利用者のキャリブレーションデータとして生成して記録部へ記録するキャリブレーション生成ステップと、を実行させる。 Further, the program according to the present disclosure includes a display step of displaying a correction image in which the coordinate positions of each of the plurality of observation locations are set on the information processing device, and continuously detecting the user's line of sight. Generated by a line-of-sight detection step for generating data, a voice input step for generating voice data associated with the same time axis as the line-of-sight data by receiving input of the user's voice, and the line-of-sight detection step Based on the line-of-sight data, an analysis step of analyzing a gaze period in which a gaze degree of the user's line of sight with respect to each of the plurality of observation points is a predetermined value or more, and each of the plurality of observation points of the correction image On the basis of the correction audio data recorded by the correction audio data recording unit for recording the correction audio data in which the sound uttered in is set, A setting step for setting the time when the voice is uttered with respect to the voice data as an important voice period, and a time difference between the gaze period and the important voice period is generated as calibration data of the user and recorded in the recording unit A calibration generation step to be executed.

本開示によれば、視線の注視点と音声が発せられた時間との時間差を補正することができるという効果を奏する。 According to the present disclosure, there is an effect that it is possible to correct the time difference between the gaze point of the line of sight and the time when the sound is emitted.

図１は、本開示の実施の形態１に係る情報処理装置の機能構成を示すブロック図である。FIG. 1 is a block diagram illustrating a functional configuration of the information processing apparatus according to the first embodiment of the present disclosure.図２は、本開示の実施の形態１に係る情報処理装置が実行する処理の概要を示すフローチャートである。FIG. 2 is a flowchart illustrating an overview of processing executed by the information processing apparatus according to the first embodiment of the present disclosure.図３は、本開示の実施の形態１に係る解析部による視線の注視度が所定の値以上となる注視期間の解析方法を模式的に説明する図である。FIG. 3 is a diagram schematically illustrating a method for analyzing a gaze period in which the gaze degree of the line of sight is equal to or greater than a predetermined value by the analysis unit according to the first embodiment of the present disclosure.図４は、本開示の実施の形態１に係る設定部による音声データの重要音声期間の設定方法を模式的に説明する図である。FIG. 4 is a diagram schematically illustrating a method for setting an important voice period of voice data by the setting unit according to the first embodiment of the present disclosure.図５Ａは、本開示の実施の形態１に係るキャリブレーション生成部によるキャリブレーションデータの生成方法を模式的に説明する図である。FIG. 5A is a diagram schematically illustrating a calibration data generation method by the calibration generation unit according to Embodiment 1 of the present disclosure.図５Ｂは、本開示の実施の形態１に係るキャリブレーション生成部によるキャリブレーションデータの別の生成方法を模式的に説明する図である。FIG. 5B is a diagram schematically illustrating another method of generating calibration data by the calibration generation unit according to Embodiment 1 of the present disclosure.図６Ａは、本開示の実施の形態２に係る情報処理装置の機能構成を示すブロック図である。FIG. 6A is a block diagram illustrating a functional configuration of the information processing apparatus according to the second embodiment of the present disclosure.図６Ｂは、本開示の実施の形態２に係る情報処理装置が実行する処理の概要を示すフローチャートである。FIG. 6B is a flowchart illustrating an overview of processing executed by the information processing apparatus according to Embodiment 2 of the present disclosure.図７Ａは、本開示の実施の形態２の変形例に係る情報処理装置の機能構成を示すブロック図である。FIG. 7A is a block diagram illustrating a functional configuration of an information processing device according to a modification of the second embodiment of the present disclosure.図７Ｂは、本開示の実施の形態２の変形例に係る情報処理装置が実行する処理の概要を示すフローチャートである。FIG. 7B is a flowchart illustrating an overview of processing executed by the information processing apparatus according to the modification of Embodiment 2 of the present disclosure.図８は、本開示の実施の形態３に係る情報処理システムの機能構成を示すブロック図である。FIG. 8 is a block diagram illustrating a functional configuration of the information processing system according to the third embodiment of the present disclosure.図９は、本開示の実施の形態３に係る情報処理装置が実行する処理の概要を示すフローチャートである。FIG. 9 is a flowchart illustrating an outline of processing executed by the information processing apparatus according to the third embodiment of the present disclosure.図１０は、本開示の実施の形態３に係る表示部が表示する画像の一例を模式的に示す図である。FIG. 10 is a diagram schematically illustrating an example of an image displayed by the display unit according to the third embodiment of the present disclosure.図１１は、本開示の実施の形態３に係る表示部が表示する画像の別の一例を模式的に示す図である。FIG. 11 is a diagram schematically illustrating another example of an image displayed by the display unit according to the third embodiment of the present disclosure.図１２は、本開示の実施の形態３の変形例に係る情報処理システムの機能構成を示すブロック図である。FIG. 12 is a block diagram illustrating a functional configuration of an information processing system according to a modification of the third embodiment of the present disclosure.図１３は、本開示の実施の形態３の変形例に係る情報処理装置が実行する処理の概要を示すフローチャートである。FIG. 13 is a flowchart illustrating an outline of processing executed by the information processing apparatus according to the modification of the third embodiment of the present disclosure.図１４は、本開示の実施の形態４に係る情報処理装置の構成を示す概略図である。FIG. 14 is a schematic diagram illustrating a configuration of an information processing device according to the fourth embodiment of the present disclosure.図１５は、本開示の実施の形態４に係る情報処理装置の構成を示す概略図である。FIG. 15 is a schematic diagram illustrating a configuration of an information processing device according to the fourth embodiment of the present disclosure.図１６は、本開示の実施の形態４に係る情報処理装置の機能構成を示すブロック図である。FIG. 16 is a block diagram illustrating a functional configuration of the information processing apparatus according to the fourth embodiment of the present disclosure.図１７は、本開示の実施の形態４に係る情報処理装置が実行する処理の概要を示すフローチャートである。FIG. 17 is a flowchart illustrating an outline of processing executed by the information processing apparatus according to the fourth embodiment of the present disclosure.図１８は、本開示の実施の形態４に係る表示部が表示する視線マッピング画像の一例を示す図である。FIG. 18 is a diagram illustrating an example of a line-of-sight mapping image displayed by the display unit according to Embodiment 4 of the present disclosure.図１９Ａは、本開示の実施の形態４に係る表示部が表示する視線マッピング画像の別の一例を示す図である。FIG. 19A is a diagram illustrating another example of the line-of-sight mapping image displayed by the display unit according to Embodiment 4 of the present disclosure.図１９Ｂは、本開示の実施の形態４の変形例に係る視線データと音声データの関連付けられたデータの一例を示す図である。FIG. 19B is a diagram illustrating an example of data in which line-of-sight data and audio data are associated according to a modification of the fourth embodiment of the present disclosure.図２０は、本開示の実施の形態５に係る顕微鏡システムの構成を示す概略図である。FIG. 20 is a schematic diagram illustrating a configuration of a microscope system according to the fifth embodiment of the present disclosure.図２１は、本開示の実施の形態５に係る顕微鏡システムの機能構成を示すブロック図である。FIG. 21 is a block diagram illustrating a functional configuration of a microscope system according to the fifth embodiment of the present disclosure.図２２は、本開示の実施の形態５に係る顕微鏡システムが実行する処理の概要を示すフローチャートである。FIG. 22 is a flowchart illustrating an outline of processing executed by the microscope system according to the fifth embodiment of the present disclosure.図２３は、本開示の実施の形態６に係る内視鏡システムの構成を示す概略図である。FIG. 23 is a schematic diagram illustrating a configuration of an endoscope system according to the sixth embodiment of the present disclosure.図２４は、本開示の実施の形態６に係る内視鏡システムの機能構成を示すブロック図である。FIG. 24 is a block diagram illustrating a functional configuration of the endoscope system according to the sixth embodiment of the present disclosure.図２５は、本開示の実施の形態６に係る内視鏡システムが実行する処理の概要を示すフローチャートである。FIG. 25 is a flowchart illustrating an outline of processing executed by the endoscope system according to the sixth embodiment of the present disclosure.図２６は、本開示の実施の形態６に係る画像データ記録部が記録する複数の画像データに対応する複数の画像の一例を模式的に示す図である。FIG. 26 is a diagram schematically illustrating an example of a plurality of images corresponding to a plurality of image data recorded by the image data recording unit according to the sixth embodiment of the present disclosure.図２７は、本開示の実施の形態６に係る画像処理部が生成する統合画像データに対応する統合画像の一例を示す図である。FIG. 27 is a diagram illustrating an example of an integrated image corresponding to the integrated image data generated by the image processing unit according to the sixth embodiment of the present disclosure.図２８は、本開示の実施の形態７に係る情報処理システムの機能構成を示すブロック図である。FIG. 28 is a block diagram illustrating a functional configuration of an information processing system according to the seventh embodiment of the present disclosure.図２９は、本開示の実施の形態７に係る情報処理システムが実行する処理の概要を示すフローチャートである。FIG. 29 is a flowchart illustrating an overview of processing executed by the information processing system according to the seventh embodiment of the present disclosure.

以下、本開示を実施するための形態を図面とともに詳細に説明する。なお、以下の実施の形態により本開示が限定されるものではない。また、以下の説明において参照する各図は、本開示の内容を理解でき得る程度に形状、大きさ、および位置関係を概略的に示してあるに過ぎない。即ち、本開示は、各図で例示された形状、大きさおよび位置関係のみに限定されるものでない。 Hereinafter, the form for carrying out this indication is explained in detail with a drawing. Note that the present disclosure is not limited to the following embodiments. In addition, each drawing referred to in the following description only schematically shows the shape, size, and positional relationship to the extent that the contents of the present disclosure can be understood. That is, the present disclosure is not limited only to the shape, size, and positional relationship illustrated in each drawing.

（実施の形態１）
〔情報処理装置の構成〕
図１は、実施の形態１に係る情報処理装置の機能構成を示すブロック図である。図１に示す情報処理装置１は、視線検出部１０と、音声入力部１１と、表示部１２と、操作部１３と、制御部１４と、時間計測部１５と、記録部１６と、解析部１７と、設定部１８と、キャリブレーション生成部１９と、プログラム記憶部１６４と、補正用音声データ記憶部１６５と、キャリブレーションデータ記憶部１６６と、を備える。(Embodiment 1)
[Configuration of information processing device]
FIG. 1 is a block diagram illustrating a functional configuration of the information processing apparatus according to the first embodiment. An information processing apparatus 1 illustrated in FIG. 1 includes a line-of-sight detection unit 10, a voice input unit 11, a display unit 12, an operation unit 13, a control unit 14, a time measurement unit 15, a recording unit 16, and an analysis unit. 17, a setting unit 18, a calibration generation unit 19, a program storage unit 164, a correction audio data storage unit 165, and a calibration data storage unit 166.

視線検出部１０は、近赤外線を照射するＬＥＤ光源と、角膜上の瞳孔点と反射点を撮像する光学センサ（例えばＣＭＯＳ、ＣＣＤ等）と、を用いて構成される。視線検出部１０は、制御部１４の制御のもと、表示部１２が表示する補正用画像に対する利用者の視線を検出することによって視線データを生成し、この視線データを制御部１４へ出力する。具体的には、視線検出部１０は、制御部１４の制御のもと、ＬＥＤ光源等から近赤外線を利用者の角膜に照射し、光学センサが利用者の角膜上の瞳孔点と反射点を撮像することによって視線データを生成する。そして、視線検出部１０は、制御部１４の制御のもと、光学センサによって生成されたデータに対して画像処理等によって解析した解析結果に基づいて、利用者の瞳孔点と反射点のパターンから利用者の視線や視線を連続的に算出することによって所定時間の視線データを生成し、この視線データを後述する視線検出制御部１４１へ出力する。なお、視線検出部１０は、単に光学センサのみで利用者の瞳を周知のパターンマッチングを用いることによって瞳を検出することによって、利用者の視線を検出した視線データを生成してもよいし、他のセンサや他の周知技術を用いて利用者の視線を検出することによって視線データを生成してもよい。 The line-of-sight detection unit 10 is configured using an LED light source that irradiates near infrared rays, and an optical sensor (for example, CMOS, CCD, etc.) that images the pupil point and reflection point on the cornea. The line-of-sight detection unit 10 generates line-of-sight data by detecting the user's line of sight with respect to the correction image displayed on the display unit 12 under the control of the control unit 14, and outputs the line-of-sight data to the control unit 14. . Specifically, the line-of-sight detection unit 10 irradiates the user's cornea with near-infrared rays from an LED light source or the like under the control of the control unit 14, and the optical sensor detects pupil points and reflection points on the user's cornea. Line-of-sight data is generated by imaging. Then, the line-of-sight detection unit 10 determines from the pattern of the user's pupil points and reflection points based on the analysis result obtained by analyzing the data generated by the optical sensor by image processing or the like under the control of the control unit 14. By continuously calculating the line of sight and the line of sight of the user, line-of-sight data for a predetermined time is generated, and this line-of-sight data is output to the line-of-sight detection control unit 141 described later. The line-of-sight detection unit 10 may generate line-of-sight data that detects the user's line of sight by simply detecting the pupil of the user's pupil using only known optical pattern matching using only an optical sensor, The line-of-sight data may be generated by detecting the line of sight of the user using another sensor or another known technique.

音声入力部１１は、音声が入力されるマイクと、マイクが入力を受け付けた音声をデジタルの音声データに変換するとともに、この音声データを増幅することによって制御部１４へ出力する音声コーデックと、を用いて構成される。音声入力部１１は、制御部１４の制御のもと、利用者の音声の入力を受け付けることによって音声データを生成し、この音声データを制御部１４へ出力する。なお、音声入力部１１は、音声の入力以外にも、音声を出力することができるスピーカ等を設け、音声出力機能を設けてもよい。 The voice input unit 11 includes a microphone to which voice is input, and a voice codec that converts voice received by the microphone into digital voice data and outputs the voice data to the control unit 14 by amplifying the voice data. Constructed using. Under the control of the control unit 14, the voice input unit 11 generates voice data by receiving input of a user's voice, and outputs the voice data to the control unit 14. In addition to the voice input, the voice input unit 11 may be provided with a speaker or the like that can output voice, and may be provided with a voice output function.

表示部１２は、制御部１４の制御のもと、複数の観察箇所の各々の座標位置が設定された補正用画像を表示する。表示部１２は、液晶または有機ＥＬ（Electro Luminescence）等の表示パネル等を用いて構成される。 Under the control of the control unit 14, the display unit 12 displays a correction image in which the coordinate positions of the plurality of observation locations are set. The display unit 12 is configured using a display panel such as liquid crystal or organic EL (Electro Luminescence).

操作部１３は、情報処理装置１に関する各種操作の入力を受け付ける。操作部１３は、例えば、スイッチ、タッチパネル、キーボードおよびマウス等を用いて構成される。 The operation unit 13 receives input of various operations related to the information processing apparatus 1. The operation unit 13 is configured using, for example, a switch, a touch panel, a keyboard, a mouse, and the like.

制御部１４は、ＣＰＵ（Central Processing Unit）、ＦＰＧＡ（Field Programmable Gate Array）およびＧＰＵ（Graphics Processing Unit）等を用いて構成され、視線検出部１０、音声入力部１１および表示部１２を制御する。制御部１４は、視線検出制御部１４１と、音声入力制御部１４２と、表示制御部１４３と、を有する。 The control unit 14 is configured using a CPU (Central Processing Unit), an FPGA (Field Programmable Gate Array), a GPU (Graphics Processing Unit), and the like, and controls the line-of-sight detection unit 10, the audio input unit 11, and the display unit 12. The control unit 14 includes a line-of-sight detection control unit 141, a voice input control unit 142, and a display control unit 143.

視線検出制御部１４１は、視線検出部１０を制御する。具体的には、視線検出制御部１４１は、視線検出部１０を所定のタイミング毎に近赤外線を利用者Ｕ１へ照射させるとともに、利用者Ｕ１の瞳を視線検出部１０に撮像させることによって視線データを生成させる。また、視線検出制御部１４１は、視線検出部１０から入力された視線データに対して、各種の画像処理を行って記録部１６へ出力する。 The line-of-sight detection control unit 141 controls the line-of-sight detection unit 10. Specifically, the line-of-sight detection control unit 141 causes the line-of-sight detection unit 10 to irradiate the user U1 with near-infrared rays at every predetermined timing, and causes the line-of-sight detection unit 10 to image the eyes of the user U1. Is generated. Further, the line-of-sight detection control unit 141 performs various types of image processing on the line-of-sight data input from the line-of-sight detection unit 10 and outputs the processed image data to the recording unit 16.

音声入力制御部１４２は、音声入力部１１を制御し、音声入力部１１から入力された音声データに対して各種の処理、例えばゲインアップやノイズ低減処理等を行って記録部１６へ出力する。 The voice input control unit 142 controls the voice input unit 11, performs various processes on the voice data input from the voice input unit 11, for example, gain increase and noise reduction processing, and outputs the processed data to the recording unit 16.

表示制御部１４３は、表示部１２の表示態様を制御する。表示制御部１４３は、記録部１６に記録された画像データに対応する画像または生成部１１３によって生成された視線マッピングデータに対応する視線マッピング画像を表示部１２に表示させる。 The display control unit 143 controls the display mode of the display unit 12. The display control unit 143 causes the display unit 12 to display an image corresponding to the image data recorded in the recording unit 16 or a line-of-sight mapping image corresponding to the line-of-sight mapping data generated by the generation unit 113.

時間計測部１５は、タイマーやクロックジェネレータ等を用いて構成され、視線検出部１０によって生成された視線データおよび音声入力部１１によって生成された音声データ等に対して時刻情報を付与する。 The time measuring unit 15 is configured using a timer, a clock generator, or the like, and gives time information to the line-of-sight data generated by the line-of-sight detection unit 10, the audio data generated by the audio input unit 11, and the like.

記録部１６は、揮発性メモリ、不揮発性メモリおよび記録媒体等を用いて構成され、情報処理装置１ｂに関する各種の情報を記録する。記録部１６は、視線データ記録部１６１と、音声データ記録部１６２と、画像データ記録部１６３と、を有する。 The recording unit 16 is configured using a volatile memory, a non-volatile memory, a recording medium, and the like, and records various types of information regarding the information processing apparatus 1b. The recording unit 16 includes a line-of-sight data recording unit 161, an audio data recording unit 162, and an image data recording unit 163.

視線データ記録部１６１は、視線検出制御部１４１から入力された視線データを記録するとともに、視線データを解析部１７へ出力する。 The line-of-sight data recording unit 161 records the line-of-sight data input from the line-of-sight detection control unit 141 and outputs the line-of-sight data to the analysis unit 17.

音声データ記録部１６２は、音声入力制御部１４２から入力された音声データを記録するとともに、音声データを変換部１３５へ出力する。 The audio data recording unit 162 records the audio data input from the audio input control unit 142 and outputs the audio data to the conversion unit 135.

画像データ記録部１６３は、複数の画像データを記録する。具体的には、画像データ記録部１６３は、複数の観察箇所の各々の座標位置が設定された補正用画データを記録する。 The image data recording unit 163 records a plurality of image data. Specifically, the image data recording unit 163 records the correction image data in which the coordinate positions of the plurality of observation locations are set.

プログラム記憶部１６４は、情報処理装置１ｂが実行する各種プログラム、各種プログラムの実行中に使用するデータ（例えば辞書情報やテキスト変換辞書情報）および各種プログラムの実行中の処理データを記憶する。 The program storage unit 164 stores various programs executed by the information processing apparatus 1b, data used during execution of the various programs (for example, dictionary information and text conversion dictionary information), and processing data during execution of the various programs.

補正用音声データ記憶部１６５は、表示部１２が表示する補正用画像の複数の観察箇所の各々で発声される音声が設定された補正用音声データを記憶する。 The correction sound data storage unit 165 stores the correction sound data in which the sound uttered at each of the plurality of observation locations of the correction image displayed by the display unit 12 is set.

キャリブレーションデータ記憶部１６６は、後述するキャリブレーション生成部１９によって生成されたキャリブレーションデータを記憶する。 The calibration data storage unit 166 stores calibration data generated by a calibration generation unit 19 described later.

解析部１７は、視線データ記録部１６１が記録する視線データに基づいて、表示部１２が表示する補正用画像上における複数の観察箇所の各々に対する利用者の視線の注視度が所定の値以上となる注視期間を解析する。具体的には、解析部１７は、視線データに基づいて、視線の移動速度、一定の時間内における視線の移動距離、一定領域内における視線の滞留時間のいずれか１つを検出することによって、視線（注視点）の注視度が所定の値以上となる注視期間を解析する。解析部１７は、例えばＣＰＵ、ＦＰＧＡおよびＧＰＵ等を用いて構成される。 Based on the line-of-sight data recorded by the line-of-sight data recording unit 161, the analysis unit 17 has a gaze degree of the user's line of sight with respect to each of a plurality of observation points on the correction image displayed by the display unit 12 equal to or greater than a predetermined value. Analyze the gaze period. Specifically, the analysis unit 17 detects, based on the line-of-sight data, any one of a line-of-sight movement speed, a line-of-sight movement distance within a fixed time, and a line-of-sight dwell time within a fixed area, A gaze period in which the gaze degree of the line of sight (gaze point) is greater than or equal to a predetermined value is analyzed. The analysis unit 17 is configured using, for example, a CPU, FPGA, GPU, and the like.

設定部１８は、補正用音声データ記憶部１６５が記録する補正用音声データに基づいて、音声データ記録部１６２が記録する音声データに対して補正用画像における複数の観察箇所の各々で発声される音声が発声された期間（時間）を重要音声期間として設定する。設定部１８は、ＣＰＵ、ＦＰＧＡおよびＧＰＵ等を用いて構成される。 Based on the correction audio data recorded by the correction audio data storage unit 165, the setting unit 18 utters the audio data recorded by the audio data recording unit 162 at each of a plurality of observation locations in the correction image. The period (time) when the voice is uttered is set as the important voice period. The setting unit 18 is configured using a CPU, FPGA, GPU, and the like.

キャリブレーション生成部１９は、解析部１７によって解析された注視期間と設定部１８によって設定された重要音声期間との時間差を利用者のキャリブレーションデータとして生成して記録部１６のキャリブレーションデータ記憶部１６６へ記録する。キャリブレーションデータは、前記視線データおよび前記音声データのどちらか一方の開始時刻または終了時刻を基準にしたときの時刻差、または視線データおよび音声データのどちらか一方を基準にしたときの期間の長さにかかわるものである。なお、キャリブレーション生成部１９は、注視期間と重要音声期間との時間差を複数回算出し、この複数回の算出結果の統計的な特徴に基づき、キャリブレーションデータを生成する。また、キャリブレーション生成部１９は、注視期間の開始時刻と重要期間の開始時刻との時間差または注視期間の終了時間と音声重要期間の終了時刻の時間差をキャリブレーションデータとして生成してもよい。例えば、校正者や利用者は、図示していないが、補正用の画像の表示と重要単語の両方を用意する。具体的には、顕微鏡の画像の場合、校正者や利用者は、あらかじめ所定の病変が有ることが分かっている画像を用意し、重要単語としては、その病変にかかわる用語（キーワード）を用意する。キャリブレーション生成部１９は、使用者が病変に着目してから所定の重要単語を発するまでの時間差を計測することでキャリブレーションデータとしてもよい。また、キャリブレーション生成部１９は、キャリブレーションデータの精度を上げるために、複数回の測定を行うことが望ましい。 The calibration generation unit 19 generates a time difference between the gaze period analyzed by the analysis unit 17 and the important voice period set by the setting unit 18 as user calibration data, and the calibration data storage unit of the recording unit 16 Record to 166. The calibration data is the time difference when the start time or end time of either the line-of-sight data or the audio data is used as a reference, or the length of the period when the line-of-sight data or the audio data is used as a reference. It is something that concerns. The calibration generation unit 19 calculates the time difference between the gaze period and the important voice period a plurality of times, and generates calibration data based on the statistical characteristics of the calculation results obtained a plurality of times. The calibration generation unit 19 may generate, as calibration data, the time difference between the start time of the gaze period and the start time of the important period or the time difference between the end time of the gaze period and the end time of the audio important period. For example, the proofreader and the user prepare both the display of the image for correction and the important word (not shown). Specifically, in the case of a microscope image, a proofreader or a user prepares an image that is known to have a predetermined lesion in advance, and prepares a term (keyword) related to the lesion as an important word. . The calibration generation unit 19 may obtain calibration data by measuring a time difference from when a user pays attention to a lesion until a predetermined important word is issued. Further, it is desirable that the calibration generation unit 19 performs a plurality of measurements in order to increase the accuracy of the calibration data.

〔情報処理装置の処理〕
次に、情報処理装置１が実行する処理について説明する。図２は、情報処理装置１が実行する処理の概要を示すフローチャートである。[Processing of information processing device]
Next, processing executed by the information processing apparatus 1 will be described. FIG. 2 is a flowchart illustrating an outline of processing executed by the information processing apparatus 1.

図２に示すように、まず、表示制御部１４３は、画像データ記録部１６３が記録する補正用画像データに対応する補正用画像を表示部１２に表示させる（ステップＳ１０１）。 As shown in FIG. 2, first, the display control unit 143 causes the display unit 12 to display a correction image corresponding to the correction image data recorded by the image data recording unit 163 (step S101).

続いて、制御部１４は、視線検出部１０によって生成された視線データおよび音声入力部１１によって生成された音声データの各々と時間計測部１３３によって計測された時間とを対応付けて視線データ記録部１６１および音声データ記録部１６２に記録する（ステップＳ１０２）。 Subsequently, the control unit 14 associates each of the line-of-sight data generated by the line-of-sight detection unit 10 and the audio data generated by the audio input unit 11 with the time measured by the time measurement unit 133, and the line-of-sight data recording unit 161 and the audio data recording unit 162 (step S102).

その後、操作部１３から表示部１２が表示する補正用画像の観察を終了する指示信号が入力された場合（ステップＳ１０３：Ｙｅｓ）、情報処理装置１は、後述するステップＳ１０３へ移行する。これに対して、操作部１３から表示部１２が表示する補正用画像の観察を終了する指示信号が入力されていない場合（ステップＳ１０３：Ｎｏ）、情報処理装置１は、上述したステップＳ１０２へ戻る。 Thereafter, when an instruction signal for ending the observation of the correction image displayed on the display unit 12 is input from the operation unit 13 (step S103: Yes), the information processing apparatus 1 proceeds to step S103 described later. On the other hand, when the instruction signal for ending the observation of the correction image displayed on the display unit 12 is not input from the operation unit 13 (Step S103: No), the information processing apparatus 1 returns to Step S102 described above. .

ステップＳ１０４において、解析部１７は、視線データ記録部１６１が記録する視線データに基づいて、表示部１２が表示する補正用画像上における複数の観察箇所の各々に対する利用者の視線の注視度が所定の値以上となる注視期間を解析する。ステップＳ１０４の後、情報処理装置１は、後述するステップＳ１０５へ移行する。 In step S104, the analysis unit 17 has a predetermined gaze degree of the user's line of sight with respect to each of the plurality of observation points on the correction image displayed by the display unit 12 based on the line-of-sight data recorded by the line-of-sight data recording unit 161. Analyze gaze period that is greater than or equal to. After step S104, the information processing apparatus 1 proceeds to step S105 described later.

図３は、解析部１７による視線の注視度が所定の値以上となる注視期間の解析方法を模式的に説明する図である。図３の（ａ）および図３の（ｂ）において、横軸が時間を示し、図３の（ａ）の縦軸が移動速度を示し、図３の（ｂ）の縦軸が注視度を示す。また、図３の（ａ）の曲線Ｌ１が視線の移動速度の時間変化を示し、図３の（ｂ）の曲線Ｌ２が注視度の時間変化を示す。 FIG. 3 is a diagram schematically illustrating a method for analyzing a gaze period in which the gaze degree of the line of sight is equal to or greater than a predetermined value by the analysis unit 17. 3A and 3B, the horizontal axis indicates time, the vertical axis in FIG. 3A indicates the moving speed, and the vertical axis in FIG. 3B indicates the gaze degree. Show. Further, a curve L1 in FIG. 3A shows a temporal change in the movement speed of the line of sight, and a curve L2 in FIG. 3B shows a temporal change in the gaze degree.

一般には、視線の移動速度が大きいほど、利用者の注視度が低く、視線の移動速度が小さいほど、利用者の視線の注視度が高いと解析することができる。即ち、図３の曲線Ｌ１および曲線Ｌ２に示すように、解析部１７は、利用者の視線の移動速度が大きいほど、利用者の視線の注視度が低いと解析し、視線の移動速度が小さいほど（視線の移動速度が小さい注視期間Ｄ１を参照）、利用者の視線の注視度が高いと解析する。このように、解析部１７は、利用者が補正用画像の観察や読影を行っている時間の視線データに対して、利用者の視線の注視度を解析し、かつ、この注視度が所定の値以上となる注視期間（例えば注視期間Ｄ１）として解析する。なお、図３では、解析部１７は、利用者の視線の移動速度を解析することによって、利用者の視線の注視度を解析していたが、これに限定されることなく、一定の時間内における利用者の視線の移動距離および一定領域内における利用者の視線の滞留時間のいずれか１つを検出することによって、視線の注視度を解析してもよい。 In general, it can be analyzed that the greater the movement speed of the line of sight, the lower the gaze degree of the user, and the lower the movement speed of the line of sight, the higher the gaze degree of the user's line of sight. That is, as shown by the curve L1 and the curve L2 in FIG. 3, the analysis unit 17 analyzes that the gaze degree of the user's line of sight is lower as the movement speed of the user's line of sight increases, and the movement speed of the line of sight is lower. As shown (see the gaze period D1 where the movement speed of the line of sight is low), the user's line of sight is analyzed to have a high gaze degree. As described above, the analysis unit 17 analyzes the gaze degree of the user's line of sight with respect to the gaze data of the time when the user observes or interprets the correction image, and the gaze degree is a predetermined level. It analyzes as a gaze period (for example, gaze period D1) which becomes more than a value. In FIG. 3, the analysis unit 17 analyzes the gaze degree of the user's line of sight by analyzing the movement speed of the user's line of sight. The degree of gaze of the line of sight may be analyzed by detecting any one of the movement distance of the line of sight of the user and the dwell time of the line of sight of the user within a certain region.

図２に戻り、ステップＳ１０５以降の説明を続ける。
ステップＳ１０５において、設定部１８は、補正用音声データ記憶部１６５が記録する補正用音声データに基づいて、音声データ記録部１６２が記録する音声データに対して補正用画像における複数の観察箇所の各々で発声される音声が発声された時間を重要音声期間として設定する。ステップＳ１０５の後、情報処理装置１は、後述するステップＳ１０６へ移行する。Returning to FIG. 2, the description of step S105 and subsequent steps will be continued.
In step S105, the setting unit 18 sets each of a plurality of observation locations in the correction image with respect to the audio data recorded by the audio data recording unit 162 based on the correction audio data recorded by the correction audio data storage unit 165. The time when the voice uttered at is uttered is set as the important voice period. After step S105, the information processing apparatus 1 proceeds to step S106 described later.

図４は、設定部１８による音声データの重要音声期間の設定方法を模式的に説明する図である。図４において、横軸が時間を示し、図４の（ａ）の縦軸が音声データ（発音）を示し、図４の（ｂ）の縦軸が音声重要度を示す。また、図４の（ａ）の曲線Ｌ３が音声データの時間変化を示し、図４の（ｂ）の曲線Ｌ４が音声データの音声重要度の時間変化を示す。 FIG. 4 is a diagram schematically illustrating a setting method of the important voice period of the voice data by the setting unit 18. In FIG. 4, the horizontal axis indicates time, the vertical axis in (a) of FIG. 4 indicates voice data (sound production), and the vertical axis in (b) of FIG. 4 indicates voice importance. Also, a curve L3 in FIG. 4A shows the time change of the voice data, and a curve L4 in FIG. 4B shows the time change of the voice importance of the voice data.

図４の曲線Ｌ３および曲線Ｌ４に示すように、設定部１８は、補正用音声データ記憶部１６５が記録する補正用音声データに基づいて、音声データ記録部１６２が記録する音声データに対して補正用画像における複数の観察箇所の各々で発声される音声が発声された期間（時間）を重要音声期間として設定する。例えば、設定部１８は、補正用音声データ記憶部１６５が記録する補正用音声データに「癌」がある場合、利用者が「癌」を発声した時間（音声重要区間Ｄ２）を重要音声期間として設定する。 4, the setting unit 18 corrects the audio data recorded by the audio data recording unit 162 based on the correction audio data recorded by the correction audio data storage unit 165. The period (time) during which the voice uttered at each of the plurality of observation locations in the image is uttered is set as the important voice period. For example, when the correction audio data recorded by the correction audio data storage unit 165 includes “cancer”, the setting unit 18 uses the time when the user uttered “cancer” (the audio important interval D2) as the important audio period. Set.

図２に戻り、ステップＳ１０６以降の説明を続ける。
ステップＳ１０６において、キャリブレーション生成部１９は、キャリブレーションデータを生成する。ステップＳ１０６の後、情報処理装置１は、後述するステップＳ１０７へ移行する。Returning to FIG. 2, the description from step S106 onward will be continued.
In step S106, the calibration generation unit 19 generates calibration data. After step S106, the information processing apparatus 1 proceeds to step S107 described later.

図５Ａは、キャリブレーション生成部１９によるキャリブレーションデータの生成方法を模式的に説明する図である。図５Ａにおいて、横軸が時間を示し、図５Ａの（ａ）の縦軸が注視度を示し、図５Ａの（ｂ）の縦軸が音声重要度を示す。また、図５Ａの（ａ）の曲線Ｌ２が注視度の時間変化を示し、図５Ａの（ｂ）の曲線Ｌ４が音声重要度の時間変化を示す。 FIG. 5A is a diagram schematically illustrating a calibration data generation method by the calibration generation unit 19. 5A, the horizontal axis indicates time, the vertical axis in (a) of FIG. 5A indicates the gaze degree, and the vertical axis in (b) of FIG. 5A indicates the voice importance. Moreover, the curve L2 of (a) of FIG. 5A shows the time change of a gaze degree, and the curve L4 of (b) of FIG. 5A shows the time change of a voice importance.

図５Ａの曲線Ｌ２および曲線Ｌ４に示すように、キャリブレーション生成部１９は、解析部１７によって解析された注視期間Ｄ１の開始時刻ｔ１と設定部１８によって設定された重要音声期間Ｄ２の開始時刻ｔ２との時間差Ｄ３を利用者のキャリブレーションデータとして生成する。 As shown by the curve L2 and the curve L4 in FIG. 5A, the calibration generation unit 19 starts the time t1 of the gaze period D1 analyzed by the analysis unit 17 and the start time t2 of the important audio period D2 set by the setting unit 18. Is generated as user calibration data.

また、この変形例として、使用者の注視度が高い期間と重要単語を発話する期間とのタイミングが測定毎にばらつく場合は、そのばらつき期間を考慮してもよい。すなわち、図５Ｂで示すように、キャリブレーション生成部１９は、ばらつきの大きさに応じて、時間枠を設定する前後のマージンＤＤ２，ＤＤ３を調整し、ばらつきが大きいほどマージンを大きくしたキャリブレーションデータとする。 Moreover, as a modification, when the timing between the period when the user's attention is high and the period when the important word is uttered varies for each measurement, the variation period may be taken into consideration. That is, as shown in FIG. 5B, the calibration generation unit 19 adjusts the margins DD2 and DD3 before and after setting the time frame according to the magnitude of the variation, and the calibration data with the larger margin as the variation is larger. And

図２に戻り、ステップＳ１０７以降の説明を続ける。
ステップＳ１０７において、キャリブレーション生成部１９は、ステップＳ１０６で生成したキャリブレーションデータをキャリブレーションデータ記憶部１６６に記録する。ステップＳ１０７の後、情報処理装置１は、本処理を終了する。Returning to FIG. 2, the description of step S107 and subsequent steps will be continued.
In step S107, the calibration generation unit 19 records the calibration data generated in step S106 in the calibration data storage unit 166. After step S107, the information processing apparatus 1 ends this process.

以上説明した実施の形態１によれば、キャリブレーション生成部１９が解析部１７によって解析された注視期間Ｄ１と設定部１８によって設定された重要音声期間Ｄ２との時間差Ｄ３を利用者のキャリブレーションデータとして生成するので、視線の注視点と音声が発せられた時間との時間差を補正することができる。 According to the first embodiment described above, the time difference D3 between the gaze period D1 analyzed by the analysis unit 17 by the calibration generation unit 19 and the important audio period D2 set by the setting unit 18 is used as the user calibration data. Therefore, it is possible to correct the time difference between the gaze point of the line of sight and the time when the sound is emitted.

また、実施の形態１によれば、キャリブレーション生成部１９が解析部１７によって解析された注視期間Ｄ１の開始時刻ｔ１と設定部１８によって設定された重要音声期間Ｄ２の開始時刻ｔ２との時間差Ｄ３を利用者のキャリブレーションデータとして生成するので、視線の注視点と音声が発せられた時間との時間差をより正確に補正することができる。 Further, according to the first embodiment, the time difference D3 between the start time t1 of the gaze period D1 analyzed by the analysis unit 17 by the calibration generation unit 19 and the start time t2 of the important audio period D2 set by the setting unit 18 Is generated as the calibration data of the user, so that the time difference between the gaze point of the line of sight and the time when the sound is emitted can be corrected more accurately.

なお、実施の形態１では、キャリブレーション生成部１９が複数の注視期間と複数の重要音声期間の各々の時間差を複数回算出し、この複数回の算出結果の統計的な特徴に基づき、キャリブレーションデータを生成してもよい。これにより視線の注視点と音声が発せられた時間との時間差をより正確に補正することができる。 In the first embodiment, the calibration generation unit 19 calculates a time difference between each of a plurality of gaze periods and a plurality of important audio periods a plurality of times, and performs calibration based on a statistical feature of the calculation results of the plurality of times. Data may be generated. As a result, the time difference between the gaze point of the line of sight and the time when the sound is emitted can be corrected more accurately.

（実施の形態２）
次に、本開示の実施の形態２について説明する。上述した実施の形態１では、設定部１８が補正用音声データに基づいて、音声データの重要音声期間を設定していたが、実施の形態２では、操作部１３から入力された操作信号に応じて予め指定されたキーワードが発声された期間を音声データの重要音声期間として設定する。以下においては、実施の形態２に係る情報処理装置の構成を説明後、実施の形態２に係る情報処理装置が実行する処理について説明する。なお、上述した実施の形態１に係る情報処理装置１と同一の構成には同一の符号を付して詳細な説明は省略する。(Embodiment 2)
Next, a second embodiment of the present disclosure will be described. In the first embodiment described above, the setting unit 18 sets the important voice period of the voice data based on the correction voice data. However, in the second embodiment, the setting unit 18 responds to the operation signal input from the operation unit 13. A period in which a keyword designated in advance is uttered is set as an important voice period of the voice data. In the following, after describing the configuration of the information processing apparatus according to the second embodiment, processing executed by the information processing apparatus according to the second embodiment will be described. In addition, the same code | symbol is attached | subjected to the structure same as the information processing apparatus 1 which concerns on Embodiment 1 mentioned above, and detailed description is abbreviate | omitted.

〔情報処理装置の構成〕
図６Ａは、実施の形態２に係る情報処理装置の機能構成を示すブロック図である。図６Ａに示す情報処理装置１ａは、上述した実施の形態１に係る情報処理装置１の記録部１６に換えて、記録部１６ａを備える。さらに、情報処理装置１ａは、変換部２０と、抽出部２１と、重要単語記憶部１６８と、を備える。[Configuration of information processing device]
FIG. 6A is a block diagram illustrating a functional configuration of the information processing apparatus according to the second embodiment. An information processing apparatus 1a illustrated in FIG. 6A includes a recording unit 16a instead of the recording unit 16 of the information processing apparatus 1 according to Embodiment 1 described above. Furthermore, the information processing apparatus 1 a includes a conversion unit 20, an extraction unit 21, and an important word storage unit 168.

記録部１６ａは、上述した実施の形態１に係る記録部１６の構成に加えて、操作部１３から過去に入力された複数のキーワードを記録するキーワード履歴記録部１６７を備える。ここで、過去に入力された複数のキーワードとは、例えば、操作部１３を経由して過去に入力された単語の頻度を解析することで設定されたキーワードを抽出したものである。 The recording unit 16a includes a keyword history recording unit 167 that records a plurality of keywords input in the past from the operation unit 13 in addition to the configuration of the recording unit 16 according to Embodiment 1 described above. Here, the plurality of keywords input in the past are, for example, extracted keywords set by analyzing the frequency of words input in the past via the operation unit 13.

変換部２０は、音声データに対して周知のテキスト変換処理を行うことによって、音声データを文字情報（テキストデータ）に変換し、この文字情報を抽出部２１へ出力する。
なお、音声の文字変換はこの時点で行わない構成も可能であり、その際には、音声情報のまま重要度を設定し、その後文字情報に変換するようにしても良い。The conversion unit 20 converts the voice data into character information (text data) by performing a known text conversion process on the voice data, and outputs the character information to the extraction unit 21.
Note that it is possible to adopt a configuration in which the voice character conversion is not performed at this time. At that time, the importance may be set as the voice information and then converted into the character information.

抽出部２１は、操作部１３から入力された指示信号に対応するキーワード、キーワード履歴記録部１６７が記録するキーワードおよび重要単語記憶部１６８が記録するキーワードを、変換部２０によって変換された文字情報から抽出し、この抽出結果を設定部１８へ出力する。 The extracting unit 21 extracts the keyword corresponding to the instruction signal input from the operation unit 13, the keyword recorded by the keyword history recording unit 167 and the keyword recorded by the important word storage unit 168 from the character information converted by the conversion unit 20. The extracted result is output to the setting unit 18.

重要単語記憶部１６８は、予め設定された重要単語である複数のキーワードを記憶する。ここで、予め設定された重要単語とは、情報処理装置１ａが出荷させる前または図示しないネットワークを経由してメーカ等によって設定された単語（キーワード）である。 The important word storage unit 168 stores a plurality of keywords that are preset important words. Here, the important words set in advance are words (keywords) set by a manufacturer or the like before shipping by the information processing apparatus 1a or via a network (not shown).

〔情報処理装置の処理〕
次に、情報処理装置１ａが実行する処理について説明する。図６Ｂは、情報処理装置１ａが実行する処理の概要を示すフローチャートである。図６Ｂにおいて、ステップＳ２０１およびステップＳ２０２は、上述した図２のステップＳ１０１およびステップＳ１０３それぞれに対応する。[Processing of information processing device]
Next, processing executed by the information processing apparatus 1a will be described. FIG. 6B is a flowchart illustrating an outline of processing executed by the information processing apparatus 1a. In FIG. 6B, step S201 and step S202 correspond to step S101 and step S103 of FIG. 2 described above, respectively.

ステップＳ２０３において、変換部２０は、音声データ記録部３４２が記録する音声データを文字情報に変換する。ステップＳ２０３の後、情報処理装置１ａは、後述するステップＳ２０４へ移行する。ステップＳ２０４およびステップＳ２０５は、上述した図２のステップＳ１０３およびステップＳ１０４それぞれに対応する。ステップＳ２０５の後、情報処理装置１ａは、ステップＳ２０６へ移行する。 In step S203, the conversion unit 20 converts the audio data recorded by the audio data recording unit 342 into character information. After step S203, the information processing apparatus 1a proceeds to step S204 described later. Step S204 and step S205 correspond to step S103 and step S104 in FIG. 2 described above, respectively. After step S205, the information processing apparatus 1a proceeds to step S206.

続いて、抽出部２１は、変換部２０が変換した文字情報からキーワード履歴記録部１６７が記録するキーワードまたは重要単語記憶部１６８が記録するキーワードが出現する発声時間を抽出する（ステップＳ２０６）。 Subsequently, the extraction unit 21 extracts the utterance time when the keyword recorded by the keyword history recording unit 167 or the keyword recorded by the important word storage unit 168 appears from the character information converted by the conversion unit 20 (step S206).

その後、設定部１８は、抽出部２１が抽出した発声時間に基づいて、音声データ記録部１６２が記録する音声データに対して補正用画像における複数の観察箇所の各々で発声される音声が発声された時間を重要音声期間として設定する（ステップＳ２０７）。 Thereafter, based on the utterance time extracted by the extraction unit 21, the setting unit 18 utters the sound uttered at each of the plurality of observation locations in the correction image with respect to the audio data recorded by the audio data recording unit 162. Is set as an important voice period (step S207).

ステップＳ２０８およびステップＳ２０９は、上述した図２のステップＳ１０６およびステップＳ１０７それぞれに対応する。ステップＳ２０９の後、情報処理装置１ａは、本処理を終了する。 Step S208 and step S209 correspond to step S106 and step S107 of FIG. 2 described above, respectively. After step S209, the information processing apparatus 1a ends this process.

以上説明した実施の形態２によれば、キャリブレーション生成部１９が解析部１７によって解析された注視期間の開始時刻と設定部１８によって設定された重要音声期間の開始時刻との時間差を利用者のキャリブレーションデータとして生成するので、視線の注視点と音声が発せられた時間との時間差を補正することができる。なお、実施形態１と同様に複数回の測定においてばらつきが生じる場合はそれを考慮したキャリブレーションデータとする。 According to the second embodiment described above, the time difference between the start time of the gaze period analyzed by the analysis unit 17 and the start time of the important audio period set by the setting unit 18 is calculated by the user. Since it is generated as calibration data, it is possible to correct the time difference between the point of gaze of the line of sight and the time when the sound is emitted. As in the first embodiment, when variation occurs in a plurality of measurements, the calibration data is taken into consideration.

（実施の形態２の変形例）
次に、本開示の実施の形態の変形例について説明する。上述した実施の形態１では、設定部１８が補正用音声データに基づいて、音声データの重要音声期間を設定していたが、実施の形態２の変形例では、音声データを文字情報に変換せずに音声が発せられた期間を重要音声期間として設定する。以下においては、実施の形態の変形例に係る情報処理装置の構成を説明後、実施の形態２の変形例に係る情報処理装置が実行する処理について説明する。なお、上述した実施の形態１に係る情報処理装置１と同一の構成には同一の符号を付して詳細な説明は省略する。(Modification of Embodiment 2)
Next, a modified example of the embodiment of the present disclosure will be described. In the first embodiment described above, the setting unit 18 sets the important voice period of the voice data based on the correction voice data. However, in the modification of the second embodiment, the voice data is converted into character information. The period when the voice is emitted without setting is set as the important voice period. In the following, after the configuration of the information processing apparatus according to the modification of the embodiment is described, processing executed by the information processing apparatus according to the modification of the second embodiment will be described. In addition, the same code | symbol is attached | subjected to the structure same as the information processing apparatus 1 which concerns on Embodiment 1 mentioned above, and detailed description is abbreviate | omitted.

〔情報処理装置の構成〕
図７Ａは、実施の形態２の変形例に係る情報処理装置の機能構成を示すブロック図である。図７Ａに示す情報処理装置１ａｂは、上述した実施の形態１に係る情報処理装置１から、補正用音声データ記憶部１６５を削除した構成となっている。また、設定部１８ａｂは実施の形態１に係る情報処理装置１の設定部１８と異なり、音声データ記録部１６２から出力された音声データのうち、音声の内容に関わらず発話がある重要音声期間であって、例えば音量を表すレベルが所定の期間において一定値を超えている、と認識される期間を重要音声期間として設定する。[Configuration of information processing device]
FIG. 7A is a block diagram illustrating a functional configuration of an information processing apparatus according to a modification of the second embodiment. The information processing apparatus 1ab illustrated in FIG. 7A has a configuration in which the correction audio data storage unit 165 is deleted from the information processing apparatus 1 according to Embodiment 1 described above. Further, unlike the setting unit 18 of the information processing apparatus 1 according to the first embodiment, the setting unit 18ab is an important voice period in which speech is uttered regardless of the content of the voice among the voice data output from the voice data recording unit 162. For example, a period in which it is recognized that the level representing the volume exceeds a certain value in a predetermined period is set as the important voice period.

〔情報処理装置の処理〕
次に、情報処理装置１ａｂが実行する処理について説明する。図７Ｂは、情報処理装置ａｂが実行する処理の概要を示すフローチャートである。図７Ｂにおいて、ステップＳ２１１、ステップＳ２１２、ステップＳ２１３、及びステップＳ２１４は、上述した図２のステップＳ１０１、ステップＳ１０２、ステップＳ１０３およびステップＳ１０４にそれぞれに対応する。[Processing of information processing device]
Next, processing executed by the information processing apparatus 1ab will be described. FIG. 7B is a flowchart illustrating an outline of processing executed by the information processing apparatus ab. In FIG. 7B, step S211, step S212, step S213, and step S214 respectively correspond to step S101, step S102, step S103, and step S104 of FIG.

ステップＳ２１５において、設定部１８ａｂは、音声データ記録部１６２が記録する音声データに対して内容によらず補正用画像における複数の観察箇所の各々で発声される音声が発声された時間を重要音声期間として設定する。ステップＳ２１５の後、情報処理装置１ａｂは、後述するステップＳ２１６へ移行する。 In step S215, the setting unit 18ab determines the time when the sound uttered at each of the plurality of observation points in the correction image is uttered regardless of the content of the sound data recorded by the sound data recording unit 162 as the important sound period. Set as. After step S215, the information processing apparatus 1ab proceeds to step S216 described later.

ステップＳ２１６、及びステップＳ２１７は、上述した図２のステップＳ１０６およびステップＳ１０７それぞれに対応する。ステップＳ２１７の後、情報処理装置１ａａは、本処理を終了する。 Step S216 and step S217 correspond to step S106 and step S107 of FIG. 2 described above, respectively. After step S217, the information processing apparatus 1aa ends this process.

ここで、重要音声期間としては、発せられた言葉の内容によらず、注視期間の近傍時間（注視期間及びその前後の所定の期間）で声が発せられた期間で定義するようにしている。具体的には、設定部１８ａｂは、注視期間の近傍時間内における音声期間が複数存在する場合、例えば最も音量（レベル）の大きい音声が発せられた期間を重要音声期間として、および注視期間の中央との時間差が最小となる音声が発せられた期間を重要音声期間として、いずれかを１つ以上を設定する。 Here, the important voice period is defined as a period in which a voice is spoken in the vicinity of the gaze period (the gaze period and a predetermined period before and after the gaze period) regardless of the content of the spoken word. Specifically, when there are a plurality of audio periods within the vicinity time of the gaze period, the setting unit 18ab sets, for example, the period during which the sound with the highest volume (level) is emitted as the important audio period and the center of the gaze period One or more of these are set with the period in which the voice having the smallest time difference from the voice is emitted as the important voice period.

また、キャリブレーションデータのもととなる注視期間と重要な音声期間の時間差としては、上述した図５Ａおよび図５Ｂにある注視期間の開始時刻と重要音声期間の開始時刻、または注視期間の終了時刻と重要音声期間の終了時刻の少なくとも一方とだけに限定されるわけでない。具体的には、キャリブレーションデータのもととなる注視期間と重要な音声期間の時間差としては、注視期間の中間に相当する時刻と重要音声期間の中間に相当する時刻との差、または、注視期間の注目度および重要音声期間の音量（レベル）の大きさと時間を積分することで算出される注視期間および重要音声期間の面積の重心に相当する時刻の差である。 In addition, as the time difference between the gaze period and the important audio period that are the basis of the calibration data, the start time of the gaze period and the start time of the important audio period or the end time of the gaze period in FIGS. 5A and 5B described above And at least one of the end times of the important voice period. Specifically, the time difference between the gaze period that is the basis of the calibration data and the important audio period is the difference between the time corresponding to the middle of the gaze period and the time corresponding to the middle of the important audio period, or It is a time difference corresponding to the center of gravity of the gaze period and the important voice period calculated by integrating the attention level of the period and the volume (level) of the important voice period and the time.

また、キャリブレーション生成部１９は、複数の注視期間と重要音声期間から算出される時間差を統合してキャリブレーションデータを生成するようにしてもよい。キャリブレーション生成部１９は、例えば複数の時間差の平均から一意のキャリブレーションデータを生成する、複数の時間差の統計情報をふまえて、平均から標準偏差の所定の倍数のばらつきを持たせた「時間範囲」としてのキャリブレーションデータを生成してもよい。さらに、キャリブレーション生成部１９は、対応する注視期間と重要音声期間の面積の大きさに応じて重みを乗じて統計情報を算出してキャリブレーションデータを生成ようにしてもよい。 Further, the calibration generation unit 19 may generate calibration data by integrating time differences calculated from a plurality of gaze periods and important voice periods. The calibration generation unit 19 generates, for example, unique calibration data from the average of a plurality of time differences. Based on the statistical information of the plurality of time differences, the “time range having a predetermined multiple of the standard deviation from the average” May be generated. Furthermore, the calibration generation unit 19 may generate the calibration data by calculating the statistical information by multiplying the weight according to the size of the area of the corresponding gaze period and the important voice period.

以上説明した実施の形態２の変形例によれば、キャリブレーション生成部１９が解析部１７によって解析された注視期間と設定部１８ａｂによって設定された重要音声期間との時間差に基づき利用者のキャリブレーションデータとして生成するので、視線の注視点と音声が発せられた時間との時間差を補正することができる。 According to the modified example of the second embodiment described above, the calibration of the user is performed based on the time difference between the gaze period analyzed by the analysis unit 17 and the important voice period set by the setting unit 18ab. Since it is generated as data, it is possible to correct the time difference between the gaze point of the line of sight and the time when the sound is emitted.

（実施の形態３）
次に、本開示の実施の形態３について説明する。上述した実施の形態１，２では、利用者に対するキャリブレーションデータを生成していたが、実施の形態３では、上述した実施の形態１，２で生成されたキャリブレーションデータを用いて利用者の視線と発声との時間差を補正する。なお、上述した実施の形態１に係る情報処理装置１と同一の構成には同一の符号を付して詳細な説明は省略する。(Embodiment 3)
Next, a third embodiment of the present disclosure will be described. In the first and second embodiments described above, the calibration data for the user is generated. In the third embodiment, the calibration data generated in the first and second embodiments described above is used. Correct the time difference between the line of sight and utterance. In addition, the same code | symbol is attached | subjected to the structure same as the information processing apparatus 1 which concerns on Embodiment 1 mentioned above, and detailed description is abbreviate | omitted.

〔情報処理システムの構成〕
図８は、実施の形態３に係る情報処理システムの機能構成を示すブロック図である。図８に示す情報処理システム１ｃは、外部から入力される視線データ、音声データおよび画像データに対して各種の処理を行う情報処理装置１００と、情報処理装置１００から出力された各種データを表示する表示部１２０と、を備える。なお、情報処理装置１００と表示部１２０は、無線または有線によって双方向に接続されている。[Configuration of information processing system]
FIG. 8 is a block diagram illustrating a functional configuration of the information processing system according to the third embodiment. An information processing system 1c illustrated in FIG. 8 displays an information processing apparatus 100 that performs various processes on line-of-sight data, audio data, and image data input from the outside, and various types of data output from the information processing apparatus 100. Display unit 120. Note that the information processing apparatus 100 and the display unit 120 are connected bidirectionally by wireless or wired communication.

〔情報処理装置の構成〕
まず、情報処理装置１００の構成について説明する。
図８に示す情報処理装置１００は、例えばサーバやパーソナルコンピュータ等にインストールされたプログラムを用いて実現され、ネットワークを経由して各種データが入力される、または外部の装置で取得された各種データが入力される。図１に示すように、情報処理装置１００は、解析部１１１と、設定部１１２と、生成部１１３と、記録部１１４と、表示制御部１１５と、を備える。[Configuration of information processing device]
First, the configuration of the information processing apparatus 100 will be described.
The information processing apparatus 100 illustrated in FIG. 8 is realized using, for example, a program installed in a server, a personal computer, or the like, and various data is input via a network or various data acquired by an external apparatus is received. Entered. As illustrated in FIG. 1, the information processing apparatus 100 includes an analysis unit 111, a setting unit 112, a generation unit 113, a recording unit 114, and a display control unit 115.

解析部１１１は、外部から入力される利用者の視線を検出した所定時間の視線データ（新たな視線データ）に基づいて、利用者の視線の注視度を解析する。ここで、視線データとは、角膜反射法に基づくものである。具体的には、視線データは、上述した実施の形態１の視線検出部１０（アイトラッキング）に設けられたＬＥＤ光源等から近赤外線が利用者の角膜赤に照射された際に、視線検出部１０である光学センサが角膜上の瞳孔点と反射点を撮像することによって生成されたデータである。そして、視線データは、光学センサが角膜上の瞳孔点と反射点を撮像することによって生成されたデータに対して画像処理等を行うことによって解析した解析結果に基づく利用者の瞳孔点と反射点のパターンから利用者の視線を算出したものである。 The analysis unit 111 analyzes the gaze degree of the user's line of sight based on the line-of-sight data (new line-of-sight data) for a predetermined time when the user's line of sight input from the outside is detected. Here, the line-of-sight data is based on the corneal reflection method. Specifically, the line-of-sight data is obtained when the near-infrared ray is irradiated on the user's cornea red from the LED light source or the like provided in the line-of-sight detection unit 10 (eye tracking) of the first embodiment described above. 10 is data generated when the optical sensor 10 is imaging the pupil point and the reflection point on the cornea. The line-of-sight data includes the user's pupil points and reflection points based on the analysis results obtained by performing image processing or the like on the data generated by the optical sensor imaging the pupil points and reflection points on the cornea. The line of sight of the user is calculated from the pattern.

また、図示していないが、図示しない視線検出部１０を備える装置が視線データを計測する際には、対応する画像データを使用者（利用者）に提示したうえで、視線データを計測している。この場合、図示しない視線検出部１０を備える装置は、使用者に表示している画像が固定している場合、すなわち表示領域の時間とともに絶対座標が変化しないとき、視線に計測領域と画像の絶対座標の相対的に位置関係を固定値として与えていれば良い。ここで、絶対座標とは、画像の所定の１点を基準に表記している座標を指している。 Although not shown, when a device including the eye gaze detection unit 10 (not shown) measures gaze data, the gaze data is measured after presenting corresponding image data to the user (user). Yes. In this case, the apparatus including the line-of-sight detection unit 10 (not shown) is configured such that when the image displayed to the user is fixed, that is, when the absolute coordinates do not change with the time of the display area, What is necessary is just to give the relative positional relationship of a coordinate as a fixed value. Here, the absolute coordinate refers to a coordinate written with reference to one predetermined point of the image.

利用形態が内視鏡システムや光学顕微鏡の場合、視線を検出するために提示している視野が画像データの視野となるため、画像の絶対座標にたいする観察視野の相対的な位置関係は変わらない。また、利用形態が内視鏡システムや光学顕微鏡においては、動画として記録している場合、視野のマッピングデータを生成するために、視線検出データと、視線の検出と同時に記録された画像または提示された画像を用いる。 When the usage form is an endoscope system or an optical microscope, the visual field presented to detect the line of sight is the visual field of the image data, and the relative positional relationship of the observation visual field with respect to the absolute coordinates of the image does not change. In addition, when the usage is recorded as a moving image in an endoscope system or an optical microscope, in order to generate visual field mapping data, a visual line detection data and an image recorded or presented simultaneously with the visual line detection are presented. Use the image.

一方で、利用形態がＷＳＩ（ＷｈｏｌｅＳｌｉｄｅＩｍａｇｉｎｇ）では、顕微鏡のスライドサンプルの一部を視野として使用者が観察しており、時刻とともに観察視野が変化する。この場合、全体の画像データのどの部分が視野として提示されている。すなわち全体の画像データに対する表示領域の絶対座標の切り替えの時間情報も、視線・音声の情報と同じく同期化して記録する。 On the other hand, when the usage form is WSI (Whole Slide Imaging), the user observes a part of a slide sample of the microscope as a visual field, and the observation visual field changes with time. In this case, which part of the entire image data is presented as the field of view. That is, time information for switching the absolute coordinates of the display area for the entire image data is also recorded in synchronization with the line of sight / sound information.

解析部１１１は、外部から入力される利用者の視線を検出した所定時間の視線データに基づいて、視線の移動速度、一定の時間内における視線の移動距離、一定領域内における視線の滞留時間のいずれか１つを検出することによって、視線（注視点）の注視度が所定の値以上となる新たな注視期間を解析する。なお、図示しない視線検出部１０は、所定の場所に載置されることによって利用者を撮像することによって視線を検出するものであってもよいし、利用者が装着することによって利用者を撮像することによって視線を検出するものであってもよい。また、視線データは、これ以外にも、周知のパターンマッチングによって生成されたものであってもよい。解析部１１１は、例えばＣＰＵ、ＦＰＧＡおよびＧＰＵ等を用いて構成される。 Based on the gaze data for a predetermined time when the gaze of the user input from the outside is detected, the analysis unit 111 calculates the gaze movement speed, the gaze movement distance within a certain time, and the gaze dwell time within a certain area. By detecting any one of them, a new gaze period in which the gaze degree of the line of sight (gaze point) is a predetermined value or more is analyzed. The line-of-sight detection unit 10 (not shown) may be one that detects the line of sight by imaging the user by being placed at a predetermined location, or images the user when worn by the user. By doing so, the line of sight may be detected. In addition to this, the line-of-sight data may be generated by well-known pattern matching. The analysis unit 111 is configured using, for example, a CPU, FPGA, GPU, and the like.

設定部１１２は、外部から入力される利用者の音声データ（新たな音声データ）であって、視線データと同じ時間軸が対応付けられた音声データに対して、解析部１１１が解析した注視度およびキャリブレーションデータ記憶部１１６が記録するキャリブレーションデータに応じた重要度を音声データに割り当てて記録部１１４へ記録する。具体的には、設定部１１２は、音声データのフレーム毎に、キャリブレーションデータに基づいて、利用者の音声の発声時間と視線が注視する注視時間との時間差を補正し、この補正した音声データに対して、解析部１１１が解析した注視度に応じた重要度（例えば数値）を割り当てて記録部１１４へ記録する。また、外部から入力される利用者の音声データは、視線データと同じタイミングで図示しないマイク等の音声入力部によって生成されたものである。設定部１１２は、ＣＰＵ、ＦＰＧＡおよびＧＰＵ等を用いて構成される。 The setting unit 112 is the user's voice data (new voice data) input from the outside, and the gaze degree analyzed by the analysis unit 111 with respect to the voice data associated with the same time axis as the line-of-sight data The importance level corresponding to the calibration data recorded by the calibration data storage unit 116 is assigned to the audio data and recorded in the recording unit 114. Specifically, the setting unit 112 corrects the time difference between the voice time of the user's voice and the gaze time at which the line of sight gazes based on the calibration data for each frame of the voice data, and the corrected voice data On the other hand, importance (for example, a numerical value) corresponding to the gaze degree analyzed by the analysis unit 111 is assigned and recorded in the recording unit 114. Also, user voice data input from the outside is generated by a voice input unit such as a microphone (not shown) at the same timing as the line-of-sight data. The setting unit 112 is configured using a CPU, FPGA, GPU, and the like.

生成部１１３は、外部から入力される画像データに対応する画像上に解析部１１１が解析した注視度を関連付けた視線マッピングデータを生成し、この生成した視線マッピングデータを記録部１１４および表示制御部１１５へ出力する。具体的には、生成部１１３は、外部から入力される画像データに対応する画像上の所定領域毎に、解析部１１１が解析した注視度を画像上の座標情報に関連付けた視線マッピングデータを生成する。さらに、生成部１１３は、注視度に加えて、外部から入力される画像データに対応する画像上に解析部１１１が解析した利用者の視線の軌跡を関連付けて視線マッピングデータを生成する。生成部１１３は、ＣＰＵ、ＦＰＧＡおよびＧＰＵ等を用いて構成される。上述のWSIで使用する場合は、生成部１１３は、上述の様に視線マッピングデータを画像の絶対座標として得るとき、視線を計測した際の表示と画像の絶対座標の相対的位置関係を使用する。また、上述の様に、生成部１１３は、観察視野が時々刻々と変化する場合には、表示領域＝視野の絶対座標（例えば表示画像の左上が元の画像データに絶対座標でどこに位置するか）の経時変化を入力する。 The generation unit 113 generates line-of-sight mapping data associated with the gaze degree analyzed by the analysis unit 111 on an image corresponding to image data input from the outside, and the generated line-of-sight mapping data is recorded in the recording unit 114 and the display control unit. To 115. Specifically, the generation unit 113 generates line-of-sight mapping data in which the gaze degree analyzed by the analysis unit 111 is associated with the coordinate information on the image for each predetermined region on the image corresponding to the image data input from the outside. To do. Furthermore, in addition to the gaze degree, the generation unit 113 generates gaze mapping data by associating the user's gaze trajectory analyzed by the analysis unit 111 on the image corresponding to the image data input from the outside. The generation unit 113 is configured using a CPU, FPGA, GPU, and the like. When used in the above WSI, the generation unit 113 uses the relative positional relationship between the display when the line of sight is measured and the absolute coordinates of the image when obtaining the line-of-sight mapping data as the absolute coordinates of the image as described above. . Further, as described above, when the observation visual field changes from moment to moment, the generation unit 113 displays the display area = the absolute coordinates of the visual field (for example, where the upper left corner of the display image is located in the original image data in absolute coordinates). ) Is input.

記録部１１４は、設定部１１２から入力された音声データと、所定の時間間隔毎に割り当たれた重要度と、解析部１１１が解析した注視度と、を対応付けて記録する。また、記録部１１４は、生成部１１３から入力された視線マッピングデータを記録する。また、記録部１１４は、情報処理装置１００が実行する各種プログラムおよび処理中のデータを記録する。記録部１１４は、揮発性メモリ、不揮発性メモリおよび記録媒体等を用いて構成される。 The recording unit 114 records the audio data input from the setting unit 112, the importance assigned at every predetermined time interval, and the gaze degree analyzed by the analysis unit 111 in association with each other. The recording unit 114 records the line-of-sight mapping data input from the generation unit 113. The recording unit 114 records various programs executed by the information processing apparatus 100 and data being processed. The recording unit 114 is configured using a volatile memory, a nonvolatile memory, a recording medium, and the like.

表示制御部１１５は、外部から入力される画像データに対応する画像上に、生成部１１３が生成した視線マッピングデータを重畳して外部の表示部１２０に出力することによって表示させる。表示制御部１１５は、ＣＰＵ、ＦＰＧＡおよびＧＰＵ等を用いて構成される。なお、上述した解析部１１１、設定部１１２、視線マッピングおよび表示制御部１１５をＣＰＵ、ＦＰＧＡおよびＧＰＵのいずれか１つを用いて各機能が発揮できるように構成してもよいし、もちろん、ＣＰＵ、ＦＰＧＡおよびＧＰＵを組み合わせて各機能が発揮できるように構成してもよい。 The display control unit 115 superimposes the line-of-sight mapping data generated by the generation unit 113 on an image corresponding to image data input from the outside, and outputs it to the external display unit 120 for display. The display control unit 115 is configured using a CPU, FPGA, GPU, and the like. The analysis unit 111, the setting unit 112, and the line-of-sight mapping and display control unit 115 described above may be configured so that each function can be performed using any one of a CPU, FPGA, and GPU. The FPGA and the GPU may be combined so that each function can be exhibited.

〔表示部の構成〕
次に、表示部１２０の構成について説明する。
表示部１２０は、表示制御部１１５から入力された画像データに対応する画像や視線マッピングデータに対応する視線マッピング情報を表示する。表示部１２０は、例えば有機ＥＬ（Electro Luminescence）や液晶等の表示モニタを用いて構成される。[Configuration of display section]
Next, the configuration of the display unit 120 will be described.
The display unit 120 displays an image corresponding to the image data input from the display control unit 115 and line-of-sight mapping information corresponding to the line-of-sight mapping data. The display unit 120 is configured using a display monitor such as an organic EL (Electro Luminescence) or liquid crystal.

〔情報処理装置の処理〕
次に、情報処理装置１００の処理について説明する。図９は、情報処理装置１００が実行する処理の概要を示すフローチャートである。[Processing of information processing device]
Next, processing of the information processing apparatus 100 will be described. FIG. 9 is a flowchart illustrating an outline of processing executed by the information processing apparatus 100.

図９に示すように、まず、情報処理装置１００は、外部から入力される視線データ、音声データおよび画像データを取得する（ステップＳ３０１）。 As shown in FIG. 9, first, the information processing apparatus 100 acquires line-of-sight data, audio data, and image data input from the outside (step S301).

続いて、解析部１１１は、視線データに基づいて、利用者の視線の注視度を解析する（ステップＳ３０２）。具体的には、解析部１１１は、上述した実施の形態１の図３で説明した方法によって利用者の視線の注視度を解析する。 Subsequently, the analysis unit 111 analyzes the gaze degree of the user's line of sight based on the line-of-sight data (step S302). Specifically, the analysis unit 111 analyzes the gaze degree of the user's line of sight by the method described with reference to FIG.

その後、設定部１１２は、視線データと同期化された音声データに対して、キャリブレーションデータ記憶部１１６が記録するキャリブレーションデータおよび解析部１１１が解析した注視度に応じた重要度を音声データに割り当てる設定を行って記録部１１４に記録する（ステップＳ３０３）。具体的には、設定部１１２は、上述した実施の形態１の図４および図５で説明した方法によって、所定の時間間隔毎に解析部１１１が解析した注視度に応じた重要度を音声データに割り当てる設定を行って記録部１１４に記録する。 After that, the setting unit 112 sets the importance corresponding to the calibration data recorded by the calibration data storage unit 116 and the gaze degree analyzed by the analysis unit 111 to the audio data synchronized with the line-of-sight data. The allocation setting is performed and recorded in the recording unit 114 (step S303). Specifically, the setting unit 112 sets the importance according to the gaze degree analyzed by the analysis unit 111 at predetermined time intervals by the method described in FIGS. 4 and 5 of the first embodiment described above. Is set in the recording unit 114.

続いて、生成部１１３は、画像データに対応する画像上に解析部１１１が解析した注視度を関連付けた視線マッピングデータを生成する（ステップＳ３０４）。 Subsequently, the generation unit 113 generates line-of-sight mapping data in which the gaze degree analyzed by the analysis unit 111 is associated on the image corresponding to the image data (step S304).

続いて、表示制御部１１５は、画像データに対応する画像上に、生成部１１３が生成した視線マッピングデータを重畳して外部の表示部１２０に出力する（ステップＳ３０５）。ステップＳ３０５の後、情報処理装置１００は、本処理を終了する。 Subsequently, the display control unit 115 superimposes the line-of-sight mapping data generated by the generation unit 113 on the image corresponding to the image data, and outputs it to the external display unit 120 (step S305). After step S305, the information processing apparatus 100 ends this process.

図１０は、表示部１２０が表示する画像の一例を模式的に示す図である。図１０に示すように、表示制御部１１５は、画像データに対応する画像上に、生成部１１３が生成した視線マッピングデータに重畳した視線マッピング画像Ｐ１を表示部１２０に表示させる。図１０においては、視線の注視度が高いほど、等高線の本数が多いヒートマップＭ１〜Ｍ５の視線マッピング画像Ｐ１を表示部１２０に表示させる。 FIG. 10 is a diagram schematically illustrating an example of an image displayed on the display unit 120. As illustrated in FIG. 10, the display control unit 115 causes the display unit 120 to display the line-of-sight mapping image P1 superimposed on the line-of-sight mapping data generated by the generation unit 113 on the image corresponding to the image data. In FIG. 10, the visual line mapping image P 1 of the heat maps M 1 to M 5 having a larger number of contour lines is displayed on the display unit 120 as the gaze gaze degree is higher.

図１１は、表示部１２０が表示する画像の別の一例を模式的に示す図である。図１１に示すように、表示制御部１１５は、画像データに対応する画像上に、生成部１１３が生成した視線マッピングデータに重畳した視線マッピング画像Ｐ２を表示部１２０に表示させる。図１１においては、視線の注視度が高いほど、円の領域が大きい注視度のマークＭ１１〜Ｍ１５が重畳された視線マッピング画像Ｐ２を表示部１２０に表示させる。さらに、表示制御部１１５は、利用者の視線の軌跡Ｋ１および注視度の順番を数字で表示部１２０に表示させる。なお、図１１においては、表示制御部１１５は、各注視度の期間（時間）で利用者が発した音声データを、周知の文字変換技術を用いて変換した文字情報を、マークＭ１１〜Ｍ１５の近傍または重畳して表示部１２０に表示させてもよい。 FIG. 11 is a diagram schematically illustrating another example of an image displayed on the display unit 120. As illustrated in FIG. 11, the display control unit 115 causes the display unit 120 to display the line-of-sight mapping image P2 superimposed on the line-of-sight mapping data generated by the generation unit 113 on the image corresponding to the image data. In FIG. 11, as the gaze degree of the line of sight increases, the line-of-sight mapping image P 2 on which the marks M 11 to M 15 with a large gaze area are superimposed is displayed on the display unit 120. Further, the display control unit 115 causes the display unit 120 to display the user's line of sight line K1 and the order of the gaze degree in numbers. In FIG. 11, the display control unit 115 converts character data obtained by converting voice data generated by the user in each gaze degree period (time) using a known character conversion technique into marks M11 to M15. It may be displayed on the display unit 120 in the vicinity or superimposed.

以上説明した実施の形態３によれば、設定部１１２が外部から入力された音声データ（新たな音声データ）に対して、キャリブレーションデータ記憶部１６６が記録する利用者のキャリブレーションデータおよび解析部１１１が新たに解析した新たな注視度に応じた重要度を音声データに割り当てる設定を行って記録部１１４に記録するので、音声データのどの期間が重要であるか否かを把握することができる。 According to the third embodiment described above, user calibration data and analysis unit recorded by the calibration data storage unit 166 with respect to audio data (new audio data) input from the outside by the setting unit 112. 111 sets the degree of importance according to the newly analyzed gaze degree newly analyzed and records it in the recording unit 114, so that it is possible to grasp which period of the voice data is important. .

さらにまた、実施の形態３では、生成部１１３が外部から入力される画像データに対応する画像上に解析部１１１が解析した注視度、およびこの注視度の座標情報を関連付けた視線マッピングデータを生成するので、利用者が直感的に画像上における重要な位置を把握することができる。 Furthermore, in the third embodiment, the gaze degree analyzed by the analysis unit 111 on the image corresponding to the image data input from the outside by the generation unit 113 and the line-of-sight mapping data that associates the coordinate information of the gaze degree are generated. Therefore, the user can intuitively grasp the important position on the image.

なお、実施の形態３では、設定部１１２が外部から入力される音声データ（新たな音声データ）に対して、この音声データの中で発声された発声期間を設定し、かつ、この音声データに対して、発声期間と解析部１１１が新たに解析した新たな注視度とキャリブレーションデータ記憶部１１６が記憶するキャリブレーションデータとを用いて解析部１１１が解析した新たな注視度に応じた重要度を割り当てて記録部１１４へ記録してもよい。これにより、音声データのどの期間が重要であるか否かを把握することができる。 In the third embodiment, the setting unit 112 sets an utterance period uttered in the voice data for the voice data (new voice data) input from the outside, and the voice data is set in the voice data. On the other hand, the importance according to the new gaze degree analyzed by the analysis unit 111 using the utterance period and the new gaze degree newly analyzed by the analysis unit 111 and the calibration data stored in the calibration data storage unit 116 May be assigned and recorded in the recording unit 114. Thereby, it is possible to grasp which period of the audio data is important.

また、実施の形態３では、設定部１１２が外部から入力される音声データ（新たな音声データ）の中で所定のキーワードが発声された期間を新たな重要発声期間として設定し、かつ、外部から入力された視線データ（新たな視線データ）に対して、新たな重要発声期間とキャリブレーションデータ記憶部１１６が記憶するキャリブレーションデータとを用いて新たな重要発声期間に応じた重要度を割り当てて記録部１１４へ記録してもよい。視線データのどの期間が重要であるか否かを把握することができる。 In the third embodiment, the setting unit 112 sets a period during which a predetermined keyword is uttered in the voice data (new voice data) input from the outside as a new important utterance period, and from the outside For the input line-of-sight data (new line-of-sight data), an importance level corresponding to the new important utterance period is assigned using the new important utterance period and the calibration data stored in the calibration data storage unit 116. You may record to the recording part 114. FIG. It is possible to grasp which period of the line-of-sight data is important.

また、実施の形態３では、設定部１１２が外部から入力される音声データ（新たな音声データ）の中で所定のキーワードが発声された期間を新たな重要発声期間として設定し、かつ、外部から入力される視線データ（新たな視線データ）に対して、新たな注視期間と新たな重要発声期間とキャリブレーションデータ記憶部１１６が記憶するキャリブレーションデータとを用いて重要発声期間に応じた重要度を割り当てて記録部１１４へ記録してもよい。これにより、音声データのどの期間が重要であるか否かを把握することができる。 In the third embodiment, the setting unit 112 sets a period during which a predetermined keyword is uttered in the voice data (new voice data) input from the outside as a new important utterance period, and from the outside Importance corresponding to the important utterance period using the new gaze period, the new important utterance period, and the calibration data stored in the calibration data storage unit 116 with respect to the input sight line data (new sight line data) May be assigned and recorded in the recording unit 114. Thereby, it is possible to grasp which period of the audio data is important.

また、実施の形態３では、記録部１１４が設定部１１２によって重要度を割り当てた音声データを記録するので、ディープラーニング等の機械学習で用いる視線のマッピングに基づく画像データと音声との対応関係を学習する際の学習データを容易に取得することができる。 In the third embodiment, since the recording unit 114 records the audio data to which the importance level is assigned by the setting unit 112, the correspondence between the image data and the audio based on the line-of-sight mapping used in machine learning such as deep learning is obtained. Learning data for learning can be easily acquired.

（実施の形態３の変形例）
次に、本開示の実施の形態３の変形例について説明する。上述した実施の形態３では、設定部１１２が解析部１１１によって解析された注視度に応じた重要度を音声データに割り当てて記録部１１４へ記録していたが、実施の形態３の変形例では、設定部１１２が生成部１１３によって生成された視線マッピングデータに基づいて、重要度を音声データに割り当てて記録部１１４へ記録する。以下においては、実施の形態３の変形例に係る情報処理システムの構成を説明後、実施の形態３の変形例に係る情報処理装置が実行する処理について説明する。なお、上述した実施の形態３に係る情報処理システム１ｃと同一の構成には同一の符号を付して詳細な説明は省略する。(Modification of Embodiment 3)
Next, a modification of the third embodiment of the present disclosure will be described. In the third embodiment described above, the setting unit 112 assigns the importance according to the gaze degree analyzed by the analysis unit 111 to the audio data and records it in the recording unit 114. However, in the modification of the third embodiment, Based on the line-of-sight mapping data generated by the generation unit 113, the setting unit 112 assigns importance to audio data and records it in the recording unit 114. In the following, after describing the configuration of the information processing system according to the modification of the third embodiment, the processing executed by the information processing apparatus according to the modification of the third embodiment will be described. In addition, the same code | symbol is attached | subjected to the structure same as the information processing system 1c which concerns on Embodiment 3 mentioned above, and detailed description is abbreviate | omitted.

〔情報処理システムの構成〕
図１２は、実施の形態３の変形例に係る情報処理システムの機能構成を示すブロック図である。図１２に示す情報処理システム１ａａは、上述した実施の形態１に係る情報処理装置１００に代えて、情報処理装置１００ａを備える。情報処理装置１００ａは、上述した実施の形態１に係る設定部１１２に代えて、設定部１１２ａを備える。[Configuration of information processing system]
FIG. 12 is a block diagram illustrating a functional configuration of an information processing system according to a modification of the third embodiment. An information processing system 1aa illustrated in FIG. 12 includes an information processing apparatus 100a instead of the information processing apparatus 100 according to the first embodiment described above. The information processing apparatus 100a includes a setting unit 112a instead of the setting unit 112 according to the first embodiment described above.

設定部１１２ａは、生成部１１３によって生成された視線マッピングデータに基づいて、所定の時間間隔毎に重要度を音声データに割り当てて記録部１１４へ記録する。 Based on the line-of-sight mapping data generated by the generation unit 113, the setting unit 112a assigns importance to audio data at a predetermined time interval and records it in the recording unit 114.

〔情報処理装置の処理〕
次に、情報処理装置１００ａが実行する処理について説明する。図１３は、情報処理装置１００ａが実行する処理の概要を示すフローチャートである。図１３において、ステップＳ４０１およびステップＳ４０２は、上述した図９のステップＳ３０１およびステップＳ３０２それぞれに対応する。また、ステップＳ４０３は、上述した図９のステップＳ３０４に対応する。[Processing of information processing device]
Next, processing executed by the information processing apparatus 100a will be described. FIG. 13 is a flowchart illustrating an outline of processing executed by the information processing apparatus 100a. In FIG. 13, step S401 and step S402 correspond to step S301 and step S302 of FIG. 9 described above, respectively. Step S403 corresponds to step S304 in FIG. 9 described above.

ステップＳ４０４において、設定部１１２は、キャリブレーションデータ記憶部１１６が記録するキャリブレーションデータと生成部１１３によって生成された利用者の注視度を画像上に関連付けた視線マッピングデータに基づいて、キャリブレーションデータおよび注視度に応じた重要度を音声データに割り当てる設定を行って記録部１１４に記録する。ステップＳ４０４の後、情報処理装置１００ａは、ステップＳ４０５へ移行する。ステップＳ４０５は、上述した図９のステップＳ３０５に対応する。 In step S404, the setting unit 112 performs calibration data based on the calibration data recorded by the calibration data storage unit 116 and the line-of-sight mapping data in which the gaze degree of the user generated by the generation unit 113 is associated on the image. In addition, a setting for assigning the importance according to the gaze degree to the audio data is performed and recorded in the recording unit 114. After step S404, the information processing apparatus 100a proceeds to step S405. Step S405 corresponds to step S305 of FIG. 9 described above.

以上説明した実施の形態３の変形例によれば、設定部１１２が生成部１１３によって生成された利用者の注視度を画像上に関連付けた視線マッピングデータに基づいて、所定の時間間隔毎に注視度に応じた重要度を音声データに割り当てる設定を行って記録部１１４に記録するので、音声データのどの期間が重要であるか否かを把握することができる。 According to the modification of the third embodiment described above, the setting unit 112 gazes at predetermined time intervals based on the line-of-sight mapping data in which the user's gaze degree generated by the generation unit 113 is associated on the image. Since the degree of importance corresponding to the degree is set and recorded in the recording unit 114, it is possible to grasp which period of the voice data is important.

また、実施の形態３および実施の形態３の変形例では、注視度とキャリブレーションデータを用いて、音声・発話の重要度を付与する実施形態について説明した。これらの変形例として、所定の単語（キーワード）を発した期間である重要音声期間に対応する視線データの期間（あるいは画像にマッピングした後は画像の領域・位置）抽出するために、重要音声期間とあらかじめ取得しキャリブレーションデータを用いてもよい。 Further, in the third embodiment and the modification of the third embodiment, the embodiment has been described in which the importance level of speech / utterance is given using the gaze degree and the calibration data. As these modified examples, in order to extract the period of line-of-sight data (or the area / position of the image after mapping to an image) corresponding to the important voice period that is a period in which a predetermined word (keyword) is issued, The calibration data obtained in advance may be used.

（実施の形態４）
次に、本開示の実施の形態４について説明する。実施の形態３では、外部から視線データおよび音声データの各々が入力されていたが、実施の形態４では、視線データおよび音声データを生成する。以下においては、実施の形態４に係る情報処理装置の構成を説明後、実施の形態４に係る情報処理装置が実行する処理について説明する。なお、上述した実施の形態３に係る情報処理システム１ｃと同一の構成には同一の符号を付して詳細な説明は適宜省略する。(Embodiment 4)
Next, a fourth embodiment of the present disclosure will be described. In the third embodiment, each of the line-of-sight data and the voice data is input from the outside. In the fourth embodiment, the line-of-sight data and the voice data are generated. In the following, after the configuration of the information processing apparatus according to the fourth embodiment is described, processing executed by the information processing apparatus according to the fourth embodiment will be described. In addition, the same code | symbol is attached | subjected to the structure same as the information processing system 1c which concerns on Embodiment 3 mentioned above, and detailed description is abbreviate | omitted suitably.

〔情報処理装置の構成〕
図１４は、実施の形態４に係る情報処理装置の構成を示す概略図である。図１５は、実施の形態４に係る情報処理装置の構成を示す概略図である。図１６は、実施の形態４に係る情報処理装置の機能構成を示すブロック図である。[Configuration of information processing device]
FIG. 14 is a schematic diagram illustrating a configuration of an information processing device according to the fourth embodiment. FIG. 15 is a schematic diagram illustrating a configuration of an information processing device according to the fourth embodiment. FIG. 16 is a block diagram illustrating a functional configuration of the information processing apparatus according to the fourth embodiment.

図１４〜図１６に示す情報処理装置１ｂは、解析部１１１と、表示部１２０と、視線検出部１３０と、音声入力部１３１と、制御部１３２と、時間計測部１３３と、記録部３４と、変換部１３５と、抽出部１３６と、操作部１３７と、設定部１３８と、生成部１３９と、プログラム記憶部と、キャリブレーションデータ記憶部３４５と、を備える。 14 to 16 includes an analysis unit 111, a display unit 120, a line-of-sight detection unit 130, a voice input unit 131, a control unit 132, a time measurement unit 133, and a recording unit 34. A conversion unit 135, an extraction unit 136, an operation unit 137, a setting unit 138, a generation unit 139, a program storage unit, and a calibration data storage unit 345.

視線検出部１３０は、近赤外線を照射するＬＥＤ光源と、角膜上の瞳孔点と反射点を撮像する光学センサ（例えばＣＭＯＳ、ＣＣＤ等）と、を用いて構成される。視線検出部１３０は、利用者Ｕ１が表示部１２０を視認可能な情報処理装置１ｂの筐体の側面に設けられる（図１４および図１５を参照）。視線検出部１３０は、制御部１３２の制御のもと、表示部１２０が表示する画像に対する利用者Ｕ１の視線を検出した視線データを生成し、この視線データを制御部１３２へ出力する。具体的には、視線検出部１３０は、制御部１３２の制御のもと、ＬＥＤ光源等から近赤外線を利用者Ｕ１の角膜に照射し、光学センサが利用者Ｕ１の角膜上の瞳孔点と反射点を撮像することによって視線データを生成する。そして、視線検出部１３０は、制御部１３２の制御のもと、光学センサによって生成されたデータに対して画像処理等によって解析した解析結果に基づいて、利用者Ｕ１の瞳孔点と反射点のパターンから利用者の視線や視線を連続的に算出することによって所定時間の視線データを生成し、この視線データを後述する視線検出制御部３２１へ出力する。なお、視線検出部１３０は、単に光学センサのみで利用者Ｕ１の瞳を周知のパターンマッチングを用いることによって瞳を検出することによって、利用者Ｕ１の視線を検出した視線データを生成してもよいし、他のセンサや他の周知技術を用いて利用者Ｕ１の視線を検出することによって視線データを生成してもよい。 The line-of-sight detection unit 130 is configured using an LED light source that irradiates near infrared rays, and an optical sensor (for example, CMOS, CCD, etc.) that images the pupil point and reflection point on the cornea. The line-of-sight detection unit 130 is provided on the side surface of the housing of the information processing apparatus 1b where the user U1 can visually recognize the display unit 120 (see FIGS. 14 and 15). The line-of-sight detection unit 130 generates line-of-sight data that detects the line of sight of the user U1 with respect to the image displayed on the display unit 120 under the control of the control unit 132, and outputs the line-of-sight data to the control unit 132. Specifically, the line-of-sight detection unit 130 irradiates the user U1's cornea with near-infrared rays from an LED light source or the like under the control of the control unit 132, and the optical sensor reflects and reflects the pupil point on the user U1's cornea. Line-of-sight data is generated by imaging a point. The line-of-sight detection unit 130 controls the pupil point and reflection point patterns of the user U1 based on the analysis result obtained by analyzing the data generated by the optical sensor by image processing or the like under the control of the control unit 132. The user's line of sight and the line of sight are continuously calculated to generate gaze data for a predetermined time, and the line-of-sight data is output to a line-of-sight detection control unit 321 described later. Note that the line-of-sight detection unit 130 may generate line-of-sight data in which the line of sight of the user U1 is detected by detecting the pupil of the user U1 using a well-known pattern matching with only an optical sensor. Then, the line-of-sight data may be generated by detecting the line of sight of the user U1 using another sensor or another known technique.

音声入力部１３１は、音声が入力されるマイクと、マイクが入力を受け付けた音声をデジタルの音声データに変換するとともに、この音声データを増幅することによって制御部１３２へ出力する音声コーデックと、を用いて構成される。音声入力部１３１は、制御部１３２の制御のもと、利用者Ｕ１の音声の入力を受け付けることによって音声データを生成し、この音声データを制御部１３２へ出力する。なお、音声入力部１３１は、音声の入力以外にも、音声を出力することができるスピーカ等を設け、音声出力機能を設けてもよい。 The audio input unit 131 includes a microphone to which audio is input, and an audio codec that converts audio received by the microphone into digital audio data and outputs the audio data to the control unit 132 by amplifying the audio data. Constructed using. Under the control of the control unit 132, the voice input unit 131 generates voice data by receiving input of the voice of the user U 1, and outputs this voice data to the control unit 132. Note that the voice input unit 131 may be provided with a voice output function by providing a speaker or the like that can output voice in addition to voice input.

制御部１３２は、ＣＰＵ、ＦＰＧＡおよびＧＰＵ等を用いて構成され、視線検出部１３０、音声入力部１３１および表示部１２０を制御する。制御部１３２は、視線検出制御部３２１と、音声入力制御部３２２と、表示制御部３２３と、を有する。 The control unit 132 is configured using a CPU, FPGA, GPU, and the like, and controls the line-of-sight detection unit 130, the audio input unit 131, and the display unit 120. The control unit 132 includes a line-of-sight detection control unit 321, a voice input control unit 322, and a display control unit 323.

視線検出制御部３２１は、視線検出部１３０を制御する。具体的には、視線検出制御部３２１は、視線検出部１３０を所定のタイミング毎に近赤外線を利用者Ｕ１へ照射させるとともに、利用者Ｕ１の瞳を視線検出部１３０に撮像させることによって視線データを生成させる。また、視線検出制御部３２１は、視線検出部１３０から入力された視線データに対して、各種の画像処理を行って記録部３４へ出力する。 The line-of-sight detection control unit 321 controls the line-of-sight detection unit 130. Specifically, the line-of-sight detection control unit 321 causes the line-of-sight detection unit 130 to irradiate the user U1 with near-infrared rays at every predetermined timing, and causes the line-of-sight detection unit 130 to image the eyes of the user U1. Is generated. Further, the line-of-sight detection control unit 321 performs various types of image processing on the line-of-sight data input from the line-of-sight detection unit 130 and outputs the result to the recording unit 34.

音声入力制御部３２２は、音声入力部１３１を制御し、音声入力部１３１から入力された音声データに対して各種の処理、例えばゲインアップやノイズ低減処理等を行って記録部３４へ出力する。 The voice input control unit 322 controls the voice input unit 131, performs various processes on the voice data input from the voice input unit 131, for example, gain-up and noise reduction processing, and outputs the processed data to the recording unit 34.

表示制御部３２３は、表示部１２０の表示態様を制御する。表示制御部３２３は、記録部３４に記録された画像データに対応する画像または生成部１１３によって生成された視線マッピングデータに対応する視線マッピング画像を表示部１２０に表示させる。 The display control unit 323 controls the display mode of the display unit 120. The display control unit 323 causes the display unit 120 to display an image corresponding to the image data recorded in the recording unit 34 or a line-of-sight mapping image corresponding to the line-of-sight mapping data generated by the generation unit 113.

時間計測部１３３は、タイマーやクロックジェネレータ等を用いて構成され、視線検出部１３０によって生成された視線データおよび音声入力部１３１によって生成された音声データ等に対して時刻情報を付与する。 The time measuring unit 133 is configured using a timer, a clock generator, or the like, and gives time information to the line-of-sight data generated by the line-of-sight detection unit 130, the audio data generated by the audio input unit 131, and the like.

記録部３４は、揮発性メモリ、不揮発性メモリおよび記録媒体等を用いて構成され、情報処理装置１ｂに関する各種の情報を記録する。記録部３４は、視線データ記録部３４１と、音声データ記録部３４２と、画像データ記録部３４３と、プログラム記憶部と、キャリブレーションデータ記憶部３４５と、を有する。 The recording unit 34 is configured using a volatile memory, a non-volatile memory, a recording medium, and the like, and records various types of information regarding the information processing apparatus 1b. The recording unit 34 includes a line-of-sight data recording unit 341, an audio data recording unit 342, an image data recording unit 343, a program storage unit, and a calibration data storage unit 345.

視線データ記録部３４１は、視線検出制御部３２１から入力された視線データを記録するとともに、視線データを解析部１１１へ出力する。 The line-of-sight data recording unit 341 records the line-of-sight data input from the line-of-sight detection control unit 321 and outputs the line-of-sight data to the analysis unit 111.

音声データ記録部３４２は、音声入力制御部３２２から入力された音声データを記録するとともに、音声データを変換部１３５へ出力する。 The audio data recording unit 342 records the audio data input from the audio input control unit 322 and outputs the audio data to the conversion unit 135.

画像データ記録部３４３は、複数の画像データを記録する。この複数の画像データは、情報処理装置１ｂの外部から入力されたデータ、または記録媒体によって外部の撮像装置によって撮像されたデータである。 The image data recording unit 343 records a plurality of image data. The plurality of pieces of image data are data input from the outside of the information processing apparatus 1b or data captured by an external imaging apparatus using a recording medium.

プログラム記憶部は、情報処理装置１ｂが実行する各種プログラム、各種プログラムの実行中に使用するデータ（例えば辞書情報やテキスト変換辞書情報）および各種プログラムの実行中の処理データを記録する。 The program storage unit records various programs executed by the information processing apparatus 1b, data used during execution of the various programs (for example, dictionary information and text conversion dictionary information), and processing data during execution of the various programs.

キャリブレーションデータ記憶部３４５は、利用者毎のキャリブレーションデータを記録する。 The calibration data storage unit 345 records calibration data for each user.

変換部１３５は、音声データに対して周知のテキスト変換処理を行うことによって、音声データを文字情報（テキストデータ）に変換し、この文字情報を抽出部１３６へ出力する。なお、音声の文字変換はこの時点で行わない構成も可能であり、その際には、音声情報のまま重要度を設定し、その後文字情報に変換するようにしても良い。 The conversion unit 135 converts the voice data into character information (text data) by performing a known text conversion process on the voice data, and outputs the character information to the extraction unit 136. Note that it is possible to adopt a configuration in which the voice character conversion is not performed at this time. At that time, the importance may be set as the voice information and then converted into the character information.

抽出部１３６は、後述する操作部１３７から入力された指示信号に対応する文字や単語（キーワード）を、変換部１３５によって変換された文字情報から抽出し、この抽出結果を設定部１３８へ出力する。なお、抽出部１３６は、後述する操作部１３７から指示信号が入力されていない場合、変換部１３５から入力されたままの文字情報を設定部１３８へ出力する。 The extraction unit 136 extracts characters and words (keywords) corresponding to instruction signals input from the operation unit 137 described later from the character information converted by the conversion unit 135, and outputs the extraction result to the setting unit 138. . Note that the extraction unit 136 outputs the character information input from the conversion unit 135 to the setting unit 138 when no instruction signal is input from the operation unit 137 described later.

操作部１３７は、マウス、キーボード、タッチパネルおよび各種スイッチ等を用いて構成され、利用者Ｕ１の操作の入力を受け付け、入力を受け付けた操作内容を制御部１３２へ出力する。 The operation unit 137 is configured using a mouse, a keyboard, a touch panel, various switches, and the like. The operation unit 137 receives an operation input from the user U1 and outputs the operation content received to the control unit 132.

設定部１３８は、キャリブレーションデータ記憶部３４５が記録するキャリブレーションデータと、所定の時間間隔毎に解析部１１１が解析した注視度と、抽出部１３６によって抽出された文字情報と、に基づいて、視線データと同じ時間軸が対応付けられた音声データに重要度および変換部１３５によって変換された文字情報を割り当てて記録部３４へ記録する。 The setting unit 138 is based on the calibration data recorded by the calibration data storage unit 345, the gaze degree analyzed by the analysis unit 111 at predetermined time intervals, and the character information extracted by the extraction unit 136. The importance and the character information converted by the conversion unit 135 are assigned to the voice data associated with the same time axis as the line-of-sight data and recorded in the recording unit 34.

生成部１３９は、表示部１２０が表示する画像データに対応する画像上に解析部１１１が解析した注視度および変換部１３５が変換した文字情報を関連付けた視線マッピングデータを生成し、この視線マッピングデータを画像データ記録部３４３または表示制御部３２３へ出力する。 The generation unit 139 generates line-of-sight mapping data in which the gaze degree analyzed by the analysis unit 111 and the character information converted by the conversion unit 135 are associated on the image corresponding to the image data displayed on the display unit 120, and the line-of-sight mapping data Is output to the image data recording unit 343 or the display control unit 323.

〔情報処理装置の処理〕
次に、情報処理装置１ｂが実行する処理について説明する。図１７は、情報処理装置１ｂが実行する処理の概要を示すフローチャートである。[Processing of information processing device]
Next, processing executed by the information processing apparatus 1b will be described. FIG. 17 is a flowchart illustrating an outline of processing executed by the information processing apparatus 1b.

図１７に示すように、まず、表示制御部３２３は、画像データ記録部３４３が記録する画像データに対応する画像を表示部１２０に表示させる（ステップＳ５０１）。この場合、表示制御部３２３は、操作部１３７の操作に応じて選択された画像データに対応する画像を表示部１２０に表示させる。 As shown in FIG. 17, first, the display control unit 323 causes the display unit 120 to display an image corresponding to the image data recorded by the image data recording unit 343 (step S501). In this case, the display control unit 323 causes the display unit 120 to display an image corresponding to the image data selected according to the operation of the operation unit 137.

続いて、制御部１３２は、視線検出部１３０が生成した視線データおよび音声入力部１３１が生成した音声データの各々と時間計測部１３３によって計測された時間とを対応付けて視線データ記録部３４１および音声データ記録部３４２に記録する（ステップＳ５０２）。 Subsequently, the control unit 132 associates each of the line-of-sight data generated by the line-of-sight detection unit 130 and the audio data generated by the audio input unit 131 with the time measured by the time measurement unit 133, and the line-of-sight data recording unit 341 and Recording in the audio data recording unit 342 (step S502).

その後、変換部１３５は、音声データ記録部３４２が記録する音声データを文字情報に変換する（ステップＳ５０３）。なお、このステップは、後述のステップＳ５０６の後に行っても良い。 Thereafter, the conversion unit 135 converts the audio data recorded by the audio data recording unit 342 into character information (step S503). This step may be performed after step S506 described later.

続いて、操作部１３７から表示部１２０が表示する画像の観察を終了する指示信号が入力された場合（ステップＳ５０４：Ｙｅｓ）、情報処理装置１ｂは、後述するステップＳ５０５へ移行する。これに対して、操作部１３７から表示部１２０が表示する画像の観察を終了する指示信号が入力されていない場合（ステップＳ５０４：Ｎｏ）、情報処理装置１ｂは、ステップＳ５０２へ戻る。 Subsequently, when the instruction signal for ending the observation of the image displayed on the display unit 120 is input from the operation unit 137 (step S504: Yes), the information processing apparatus 1b proceeds to step S505 described later. On the other hand, when the instruction signal for ending the observation of the image displayed on the display unit 120 is not input from the operation unit 137 (step S504: No), the information processing apparatus 1b returns to step S502.

ステップＳ５０５は、上述した図９のステップＳ３０２に対応する。ステップＳ５０５の後、情報処理装置１ｂは、後述するステップＳ５０６へ移行する。 Step S505 corresponds to step S302 of FIG. 9 described above. After step S505, the information processing apparatus 1b proceeds to step S506 described later.

ステップＳ５０６において、設定部１３８は、キャリブレーションデータ記憶部３４５が記録するキャリブレーションデータと、解析部１１１が解析した注視度と、抽出部１３６によって抽出された文字情報と、に基づいて、視線データと同じ時間軸が対応付けられた音声データに重要度および変換部１３５によって変換された文字情報を割り当てて記録部３４へ記録する。この場合、設定部１３８は、利用者Ｕ１に対応するキャリブレーションを用いて視線が注視する注視期間と音声が発声された音声発声時間との時間差を補正した後に、抽出部１３６によって抽出された文字情報に対応する音声データの重要度の重み付けを行って記録部３４へ記録する。例えば、設定部１３８は、重要度に、抽出部１３６によって抽出された文字情報に基づく係数を注視度に乗じた値を重要度として音声データに割り当てを行って記録部３４へ記録する。 In step S506, the setting unit 138 performs line-of-sight data based on the calibration data recorded by the calibration data storage unit 345, the gaze degree analyzed by the analysis unit 111, and the character information extracted by the extraction unit 136. The importance and the character information converted by the conversion unit 135 are assigned to the voice data associated with the same time axis and recorded in the recording unit 34. In this case, the setting unit 138 uses the calibration corresponding to the user U1 to correct the time difference between the gaze period when the line of sight gazes and the voice utterance time when the voice is uttered, and then the character extracted by the extraction unit 136 The importance level of the audio data corresponding to the information is weighted and recorded in the recording unit 34. For example, the setting unit 138 assigns a value obtained by multiplying the importance degree by a coefficient based on the character information extracted by the extraction unit 136 to the gaze degree to the voice data as the importance degree, and records the voice data in the recording unit 34.

続いて、生成部１３９は、表示部１２０が表示する画像データに対応する画像上に解析部１１１が解析した注視度および変換部１３５が変換した文字情報を関連付けた視線マッピングデータを生成する（ステップＳ５０７）。 Subsequently, the generation unit 139 generates line-of-sight mapping data in which the gaze degree analyzed by the analysis unit 111 and the character information converted by the conversion unit 135 are associated on the image corresponding to the image data displayed by the display unit 120 (step S1). S507).

続いて、表示制御部３２３は、生成部１３９が生成した視線マッピングデータに対応する視線マッピング画像を表示部１２０に表示させる（ステップＳ５０８）。 Subsequently, the display control unit 323 causes the display unit 120 to display a line-of-sight mapping image corresponding to the line-of-sight mapping data generated by the generation unit 139 (step S508).

図１８は、表示部１２０が表示する視線マッピング画像の一例を示す図である。図１８に示すように、表示制御部３２３は、生成部１１３が生成した視線マッピングデータに対応する視線マッピング画像Ｐ３を表示部１２０に表示させる。視線マッピング画像Ｐ３には、視線の注視領域に対応するマークＭ１１〜Ｍ１５および視線の軌跡Ｋ１が重畳されるとともに、この注視度のタイミングで発せされた音声データの文字情報が関連付けられている。また、マークＭ１１〜Ｍ１５は、番号が利用者Ｕ１の視線の順番を示し、大きさ（領域）が注視度の大きさを示す。さらに、利用者Ｕ１が操作部１３７を操作してカーソルＡ１を所望の位置、例えばマークＭ１４に移動させた場合、マークＭ１４に関連付けられた文字情報Ｑ１、例えば「ここで癌があります。」が表示される。なお、図１８では、表示制御部３２３が文字情報を表示部１２０に表示させているが、例えば文字情報を音声に変換することによって音声データを出力してもよい。これにより、利用者Ｕ１は、重要な音声内容と注視していた領域とを直感的に把握することができる。さらに、利用者Ｕ１の観察時における視線の軌跡を直感的に把握することができる。 FIG. 18 is a diagram illustrating an example of a line-of-sight mapping image displayed on the display unit 120. As illustrated in FIG. 18, the display control unit 323 causes the display unit 120 to display a line-of-sight mapping image P3 corresponding to the line-of-sight mapping data generated by the generation unit 113. In the line-of-sight mapping image P3, marks M11 to M15 corresponding to the line-of-sight gaze area and the line-of-sight locus K1 are superimposed, and character information of the voice data generated at this gaze degree timing is associated. In addition, in the marks M11 to M15, the number indicates the order of the line of sight of the user U1, and the size (area) indicates the magnitude of the gaze degree. Further, when the user U1 operates the operation unit 137 to move the cursor A1 to a desired position, for example, the mark M14, character information Q1 associated with the mark M14, for example, “There is cancer here.” Is displayed. Is done. In FIG. 18, the display control unit 323 displays the character information on the display unit 120. However, for example, the sound data may be output by converting the character information into sound. As a result, the user U1 can intuitively grasp the important audio content and the area that was being watched. Furthermore, it is possible to intuitively grasp the line of sight when the user U1 observes.

図１９Ａは、表示部１２０が表示する視線マッピング画像の別の一例を示す図である。図１９Ａに示すように、表示制御部３２３は、生成部１１３が生成した視線マッピングデータに対応する視線マッピング画像Ｐ４を表示部１２０に表示させる。さらに、表示制御部３２３は、文字情報と、この文字情報が発声された時間とを対応付けたアイコンＢ１〜Ｂ５を表示部１２０に表示させる。さらに、表示制御部３２３は、利用者Ｕ１が操作部１３７を操作してマークＭ１１〜Ｍ１５のいずれかを選択した場合、例えばマークＭ１４を選択した場合、マークＭ１４を表示部１２０に強調表示するとともに、マークＭ１４の時間に対応する文字情報、例えばアイコンＢ４を表示部１２０に強調表示させる（例えば枠をハイライト表示または太線で表示）。これにより、利用者Ｕ１は、重要な音声内容と注視していた領域とを直感的に把握することができるうえ、発声した際の内容を直感的に把握することができる。 FIG. 19A is a diagram illustrating another example of the line-of-sight mapping image displayed by the display unit 120. As illustrated in FIG. 19A, the display control unit 323 causes the display unit 120 to display a line-of-sight mapping image P4 corresponding to the line-of-sight mapping data generated by the generation unit 113. Furthermore, the display control unit 323 causes the display unit 120 to display icons B 1 to B 5 in which character information is associated with the time when the character information was uttered. Further, the display control unit 323 highlights the mark M14 on the display unit 120 when the user U1 operates the operation unit 137 and selects any of the marks M11 to M15, for example, when the mark M14 is selected. The character information corresponding to the time of the mark M14, for example, the icon B4 is highlighted on the display unit 120 (for example, the frame is highlighted or displayed with a bold line). As a result, the user U1 can intuitively grasp the important voice content and the area that was being watched, and can intuitively grasp the content when the voice is uttered.

また、注視度のもととなる視線データと、文字情報のもととなる音声データは、キャリブレーションデータにより補正されることで、時間差が補正前に比べて小さくなっている。そのため、ユーザが画像のある部分を注視しながら発声している場合において、視線データの注視度が割り当てられている期間と音声データの重要度が割り当てられている期間は、時間的に重なる割合が補正前に比べて大きくなる。これに基づき、生成部１３９は、視線マッピングデータの中に視線データと音声データを時刻によって紐づけて関連付けるフォーマットを含むデータを生成する。 Further, the line-of-sight data that is the basis of the gaze degree and the audio data that is the basis of the character information are corrected by the calibration data, so that the time difference is smaller than before the correction. Therefore, in the case where the user is uttering while gazing at a certain part of the image, the period in which the gaze degree of the line-of-sight data is assigned and the period in which the importance level of the audio data is assigned have a ratio overlapping in time. Larger than before correction. Based on this, the generation unit 139 generates data including a format in which the line-of-sight data and the audio data are associated with each other according to time in the line-of-sight mapping data.

図１９Ｂは、視線データと音声データを時刻によって関連付けたデータの一例であって、データのデータフォーマットの内容を表す。図１９Ｂの表Ｔ１では、視線データの開始時刻と終了時刻、音声データの開始時刻と終了時刻がＩｎｄｅｘ番号により関連付けられている。 FIG. 19B is an example of data in which line-of-sight data and audio data are associated with each other according to time, and represents the contents of the data format. In the table T1 of FIG. 19B, the start time and end time of the line-of-sight data, and the start time and end time of the audio data are associated by the Index number.

これにより、生成部１３９は、図１９Ｂの表Ｔ１に示すように、例えばＩｎｄｅｘ番号により視線データを取り出して、それに該当する音声データを取り出す、ということができる。これに対して、生成部１３９は、音声が発せられた期間（重要音声期間）からそのときに画像のどこを注視していたか、という（視線データの）情報をＩｎｄｅｘ番号により取り出すこともできる。逆に、生成部１３９は、観察画像における視線の注視位置（領域）から、そこを見ていたに発していた音声（文字情報）をＩｎｄｅｘ番号により取り出して、注視位置に対応する文字情報を視覚化して提示することも可能である。上記以外の方法として、生成部１３９は、視線の特徴量（移動距離、停留時間）の経時変化と音声データを関連付けても良い。 Thereby, as shown in Table T1 of FIG. 19B, the generation unit 139 can extract the line-of-sight data by using, for example, the Index number and extract the corresponding audio data. On the other hand, the generation unit 139 can also extract information (in the line-of-sight data) indicating where in the image the user is gazing at that time from the period in which sound is generated (important sound period). Conversely, the generation unit 139 extracts voice (character information) that was uttered from the gaze position (area) of the line of sight in the observation image by using the index number, and visually displays the character information corresponding to the gaze position. It is also possible to present it. As a method other than the above, the generation unit 139 may associate the temporal change of the line-of-sight feature amount (movement distance, stop time) with the audio data.

図１７に戻り、ステップＳ５０９以降の説明を続ける。
ステップＳ５０９において、操作部１３７によって複数の注視領域に対応するマークのいずれか一つが操作された場合（ステップＳ５０９：Ｙｅｓ）、制御部１３２は、操作に応じた動作処理を実行する（ステップＳ５１０）。具体的には、表示制御部３２３は、操作部１３７によって選択された注視領域に対応するマークを表示部１２０に強調表示させる（例えば図１８を参照）。また、音声入力制御部３２２は、注視度の高い領域に関連付けられた音声データを音声入力部１３１に再生させる。ステップＳ５１０の後、情報処理装置１ｂは、後述するステップＳ５１１へ移行する。Returning to FIG. 17, the description from step S509 will be continued.
In step S509, when any one of the marks corresponding to the plurality of gaze areas is operated by the operation unit 137 (step S509: Yes), the control unit 132 executes an operation process corresponding to the operation (step S510). . Specifically, the display control unit 323 highlights the mark corresponding to the gaze area selected by the operation unit 137 on the display unit 120 (see, for example, FIG. 18). In addition, the voice input control unit 322 causes the voice input unit 131 to reproduce voice data associated with a region with a high gaze degree. After step S510, the information processing apparatus 1b proceeds to step S511 described later.

ステップＳ５０９において、操作部１３７によって複数の注視度領域に対応するマークのいずれか一つが操作されていない場合（ステップＳ５０９：Ｎｏ）、情報処理装置１ｂは、後述するステップＳ５１１へ移行する。 In step S509, when any one of the marks corresponding to the plurality of gaze degree regions is not operated by the operation unit 137 (step S509: No), the information processing apparatus 1b proceeds to step S511 described later.

ステップＳ５１１において、操作部１３７から観察の終了を指示する指示信号が入力された場合（ステップＳ５１１：Ｙｅｓ）、情報処理装置１ｂは、本処理を終了する。これに対して、操作部１３７から観察の終了を指示する指示信号が入力されていない場合（ステップＳ５１１：Ｎｏ）、情報処理装置１ｂは、上述したステップＳ５０８へ戻る。 In step S511, when an instruction signal for instructing the end of observation is input from the operation unit 137 (step S511: Yes), the information processing apparatus 1b ends the process. On the other hand, when the instruction signal for instructing the end of observation is not input from the operation unit 137 (step S511: No), the information processing apparatus 1b returns to step S508 described above.

以上説明した実施の形態４によれば、生成部１１３が表示部１２０によって表示される画像データに対応する画像上に解析部１１１が解析した注視度および変換部１３５が変換した文字情報を関連付けた視線マッピングデータを生成するので、利用者Ｕ１は、重要な音声内容と注視していた領域とを直感的に把握することができるうえ、発声した際の内容を直感的に把握することができる。 According to the fourth embodiment described above, the generation unit 113 associates the gaze degree analyzed by the analysis unit 111 and the character information converted by the conversion unit 135 on the image corresponding to the image data displayed by the display unit 120. Since the line-of-sight mapping data is generated, the user U1 can intuitively grasp the important audio content and the area that was being watched, and can intuitively grasp the content when the voice is uttered.

また、実施の形態４によれば、表示制御部３２３は、生成部１１３が生成した視線マッピングデータに対応する視線マッピング画像を表示部１２０に表示させるので、画像に対する利用者の観察の見逃し防止の確認、利用者の読影等の技術スキルの確認、他の利用者に対する読影や観察等の教育およびカンファレンス等に用いることができる。 Further, according to the fourth embodiment, the display control unit 323 displays the line-of-sight mapping image corresponding to the line-of-sight mapping data generated by the generation unit 113 on the display unit 120, thereby preventing oversight of the user's observation on the image. It can be used for confirmation, confirmation of technical skills such as interpretation of users, education for interpretation and observation of other users, and conferences.

（実施の形態５）
次に、本開示の実施の形態５について説明する。上述した実施の形態３では、情報処理装置１ｂのみで構成されていたが、実施の形態５では、顕微鏡システムの一部に情報処理装置を組み込むことによって構成する。以下においては、実施の形態５に係る顕微鏡システムの構成を説明後、実施の形態５に係る顕微鏡システムが実行する処理について説明する。なお、上述した実施の形態４に係る情報処理装置１ｂと同一の構成には同一の符号を付して詳細な説明は適宜省略する。(Embodiment 5)
Next, a fifth embodiment of the present disclosure will be described. In the third embodiment described above, only the information processing apparatus 1b is configured. However, in the fifth embodiment, the information processing apparatus is incorporated into a part of the microscope system. In the following, after describing the configuration of the microscope system according to the fifth embodiment, processing executed by the microscope system according to the fifth embodiment will be described. In addition, the same code | symbol is attached | subjected to the structure same as the information processing apparatus 1b which concerns on Embodiment 4 mentioned above, and detailed description is abbreviate | omitted suitably.

〔顕微鏡システムの構成〕
図２０は、実施の形態５に係る顕微鏡システムの構成を示す概略図である。図２１は、実施の形態５に係る顕微鏡システムの機能構成を示すブロック図である。[Configuration of microscope system]
FIG. 20 is a schematic diagram illustrating a configuration of a microscope system according to the fifth embodiment. FIG. 21 is a block diagram illustrating a functional configuration of the microscope system according to the fifth embodiment.

図２０および図２１に示すように、顕微鏡システム１１０は、情報処理装置１１０ｃと、表示部１２０と、音声入力部１３１と、操作部１３７と、顕微鏡２００と、撮像部２１０と、視線検出部２２０と、を備える。 20 and 21, the microscope system 110 includes an information processing apparatus 110c, a display unit 120, a voice input unit 131, an operation unit 137, a microscope 200, an imaging unit 210, and a line-of-sight detection unit 220. And comprising.

〔顕微鏡の構成〕
まず、顕微鏡２００の構成について説明する。
顕微鏡２００は、本体部２０１と、回転部２０２と、昇降部２０３と、レボルバ２０４と、対物レンズ２０５と、倍率検出部２０６と、鏡筒部２０７と、接続部２０８と、接眼部２０９と、を備える。[Configuration of microscope]
First, the configuration of the microscope 200 will be described.
The microscope 200 includes a main body unit 201, a rotating unit 202, an elevating unit 203, a revolver 204, an objective lens 205, a magnification detecting unit 206, a lens barrel unit 207, a connecting unit 208, and an eyepiece unit 209. .

本体部２０１は、標本ＳＰが載置される。本体部２０１は、略Ｕ字状をなし、回転部２０２を用いて昇降部２０３が接続される。 A specimen SP is placed on the main body 201. The main body 201 is substantially U-shaped, and the lifting / lowering unit 203 is connected using the rotating unit 202.

回転部２０２は、利用者Ｕ２の操作に応じて回転することによって、昇降部２０３を垂直方向へ移動させる。 The rotating unit 202 moves the elevating unit 203 in the vertical direction by rotating according to the operation of the user U2.

昇降部２０３は、本体部２０１に対して垂直方向へ移動可能に設けられている。昇降部２０３は、一端側の面にレボルバが接続され、他端側の面に鏡筒部２０７が接続される。 The elevating part 203 is provided so as to be movable in the vertical direction with respect to the main body part 201. The elevating unit 203 has a revolver connected to a surface on one end side, and a lens barrel unit 207 connected to a surface on the other end side.

レボルバ２０４は、互いに倍率が異なる複数の対物レンズ２０５が接続され、光軸Ｌ１に対して回転可能に昇降部２０３に接続される。レボルバ２０４は、利用者Ｕ２の操作に応じて、所望の対物レンズ２０５を光軸Ｌ１上に配置する。なお、複数の対物レンズ２０５には、倍率を示す情報、例えばＩＣチップやラベルが添付されている。なお、ＩＣチップやラベル以外にも、倍率を示す形状を対物レンズ２０５に設けてもよい。 The revolver 204 is connected to a plurality of objective lenses 205 having different magnifications, and is connected to the elevating unit 203 so as to be rotatable with respect to the optical axis L1. The revolver 204 arranges the desired objective lens 205 on the optical axis L1 in accordance with the operation of the user U2. Note that information indicating magnification, for example, an IC chip and a label are attached to the plurality of objective lenses 205. In addition to the IC chip and the label, the objective lens 205 may be provided with a shape indicating the magnification.

倍率検出部２０６は、光軸Ｌ１上に配置された対物レンズ２０５の倍率を検出し、この検出した検出結果を情報処理装置１１０ｃへ出力する。倍率検出部２０６は、例えば対物切り替えのレボルバ２０４の位置を検出する手段を用いて構成される。 The magnification detection unit 206 detects the magnification of the objective lens 205 disposed on the optical axis L1, and outputs the detected detection result to the information processing apparatus 110c. The magnification detection unit 206 is configured using, for example, means for detecting the position of the objective switching revolver 204.

鏡筒部２０７は、対物レンズ２０５によって結像された標本ＳＰの被写体像の一部を接続部２０８に透過するとともに、接眼部２０９へ反射する。鏡筒部２０７は、内部にプリズム、ハーフミラーおよびコリメートレンズ等を有する。 The lens barrel 207 transmits a part of the subject image of the sample SP imaged by the objective lens 205 to the connection unit 208 and reflects it to the eyepiece unit 209. The lens barrel 207 includes a prism, a half mirror, a collimator lens, and the like.

接続部２０８は、一端が鏡筒部２０７と接続され、他端が撮像部２１０と接続される。接続部２０８は、鏡筒部２０７を透過した標本ＳＰの被写体像を撮像部２１０へ導光する。接続部２０８は、複数のコリメートレンズおよび結像レンズ等を用いて構成される。 The connection unit 208 has one end connected to the lens barrel unit 207 and the other end connected to the imaging unit 210. The connection unit 208 guides the subject image of the specimen SP that has passed through the lens barrel unit 207 to the imaging unit 210. The connection unit 208 is configured using a plurality of collimating lenses, an imaging lens, and the like.

接眼部２０９は、鏡筒部２０７によって反射された被写体像を導光して結像する。接眼部２０９は、複数のコリメートレンズおよび結像レンズ等を用いて構成される。 The eyepiece unit 209 guides the subject image reflected by the lens barrel unit 207 to form an image. The eyepiece unit 209 is configured using a plurality of collimating lenses, an imaging lens, and the like.

〔撮像部の構成〕
次に、撮像部２１０の構成について説明する。
撮像部２１０は、接続部２０８が結像した標本ＳＰの被写体像を受光することによって画像データを生成し、この画像データを情報処理装置１１０ｃへ出力する。撮像部２１０は、ＣＭＯＳまたはＣＣＤ等のイメージセンサおよび画像データに対して各種の画像処理を施す画像処理エンジン等を用いて構成される。(Configuration of imaging unit)
Next, the configuration of the imaging unit 210 will be described.
The imaging unit 210 generates image data by receiving the subject image of the sample SP formed by the connection unit 208, and outputs the image data to the information processing apparatus 110c. The imaging unit 210 is configured using an image sensor such as a CMOS or CCD, and an image processing engine that performs various types of image processing on image data.

〔視線検出部の構成〕
次に、視線検出部２２０の構成について説明する。
視線検出部２２０は、接眼部２０９の内部または外部に設けられ、利用者Ｕ２の視線を検出することによって視線データを生成し、この視線データを情報処理装置１１０ｃへ出力する。視線検出部２２０は、接眼部２０９の内部に設けられ、近赤外線を照射するＬＥＤ光源と、接眼部２０９の内部に設けられ、角膜上の瞳孔点と反射点を撮像する光学センサ（例えばＣＭＯＳ、ＣＣＤ）と、を用いて構成される。視線検出部２２０は、情報処理装置１１０ｃの制御のもと、ＬＥＤ光源等から近赤外線を利用者Ｕ２の角膜に照射し、光学センサが利用者Ｕ２の角膜上の瞳孔点と反射点を撮像することによって生成する。そして、視線検出部２２２は、情報処理装置１１０ｃの制御のもと、光学センサによって生成されたデータに対して画像処理等によって解析した解析結果に基づいて、利用者Ｕ２の瞳孔点と反射点のパターンから利用者の視線を検出することによって視線データを生成し、この視線データを情報処理装置１１０ｃへ出力する。[Configuration of eye-gaze detector]
Next, the configuration of the line-of-sight detection unit 220 will be described.
The line-of-sight detection unit 220 is provided inside or outside the eyepiece unit 209, generates line-of-sight data by detecting the line of sight of the user U2, and outputs this line-of-sight data to the information processing apparatus 110c. The line-of-sight detection unit 220 is provided inside the eyepiece unit 209, and is an LED light source that irradiates near-infrared rays, and an optical sensor that is provided inside the eyepiece unit 209 and images a pupil point and a reflection point on the cornea (for example, CMOS, CCD). Under the control of the information processing device 110c, the line-of-sight detection unit 220 irradiates the user U2's cornea with near-infrared light from an LED light source or the like, and the optical sensor images the pupil points and reflection points on the user U2's cornea. Generate by. Then, the line-of-sight detection unit 222 controls the pupil points and reflection points of the user U2 based on the analysis result obtained by analyzing the data generated by the optical sensor by image processing or the like under the control of the information processing apparatus 110c. Gaze data is generated by detecting the user's gaze from the pattern, and the gaze data is output to the information processing apparatus 110c.

〔情報処理装置の構成〕
次に、情報処理装置１１０ｃの構成について説明する。
情報処理装置１１０ｃは、上述した実施の形態２に係る情報処理装置１ｂの制御部１３２、記録部３４および設定部１３８に換えて、制御部１３２ｃ、記録部３４ｃ、設定部１３８ｃと、を備える。[Configuration of information processing device]
Next, the configuration of the information processing apparatus 110c will be described.
The information processing device 110c includes a control unit 132c, a recording unit 34c, and a setting unit 138c instead of the control unit 132, the recording unit 34, and the setting unit 138 of the information processing device 1b according to the second embodiment described above.

制御部１３２ｃは、ＣＰＵ、ＦＰＧＡおよびＧＰＵ等を用いて構成され、表示部１２０、音声入力部１３１、撮像部２１０および視線検出部２２０を制御する。制御部１３２ｃは、上述した実施の形態４の制御部１３２の視線検出制御部３２１、音声入力制御部３２２、表示制御部３２３に加えて、撮影制御部３２４および倍率算出部３２５をさらに備える。 The control unit 132c is configured using a CPU, FPGA, GPU, and the like, and controls the display unit 120, the audio input unit 131, the imaging unit 210, and the line-of-sight detection unit 220. The control unit 132c further includes an imaging control unit 324 and a magnification calculation unit 325 in addition to the line-of-sight detection control unit 321, the voice input control unit 322, and the display control unit 323 of the control unit 132 of the fourth embodiment described above.

撮影制御部１３２４は、撮像部２１０の動作を制御する。撮影制御部１３２４は、撮像部２１０を所定のフレームレートに従って順次撮像させることによって画像データを生成させる。撮影制御部１３２４は、撮像部２１０から入力された画像データに対して処理の画像処理（例えば現像処理等）を施して記録部３４ｃへ出力する。 The imaging control unit 1324 controls the operation of the imaging unit 210. The imaging control unit 1324 generates image data by causing the imaging unit 210 to sequentially capture images according to a predetermined frame rate. The imaging control unit 1324 performs image processing (for example, development processing) on the image data input from the imaging unit 210 and outputs the processed image data to the recording unit 34c.

倍率算出部３２５は、倍率検出部２０６から入力された検出結果に基づいて、現在の顕微鏡２００の観察倍率を算出し、この算出結果を設定部１３８ｃへ出力する。例えば、倍率算出部３２５は、倍率検出部２０６から入力された対物レンズ２０５の倍率と接眼部２０９の倍率とに基づいて、現在の顕微鏡２００の観察倍率を算出する。 The magnification calculation unit 325 calculates the current observation magnification of the microscope 200 based on the detection result input from the magnification detection unit 206, and outputs the calculation result to the setting unit 138c. For example, the magnification calculation unit 325 calculates the current observation magnification of the microscope 200 based on the magnification of the objective lens 205 and the magnification of the eyepiece unit 209 input from the magnification detection unit 206.

記録部３４ｃは、揮発性メモリ、不揮発性メモリおよび記録媒体等を用いて構成される。記録部３４ｃは、上述した実施の形態２に係る画像データ記録部３４３に換えて、画像データ記録部３４６を備える。画像データ記録部３４６は、撮影制御部１３２４から入力された画像データを記録し、この画像データを生成部１１３へ出力する。 The recording unit 34c is configured using a volatile memory, a nonvolatile memory, a recording medium, and the like. The recording unit 34c includes an image data recording unit 346 instead of the image data recording unit 343 according to the second embodiment described above. The image data recording unit 346 records the image data input from the imaging control unit 1324 and outputs the image data to the generation unit 113.

設定部１３８ｃは、キャリブレーションデータ記憶部３４５が記録するキャリブレーションデータと、解析部１１１が解析した注視度と、倍率算出部３２５が算出した算出結果と、に基づいて、視線データと同じ時間軸が対応付けられた音声データに重要度および変換部１３５によって変換された文字情報を割り当てて記録部３４ｃへ記録する。具体的には、設定部１３８ｃは、解析部１１１が解析した注視度に、倍率算出部３２５が算出した算出結果に基づく係数を乗じた値を、音声データのフレーム毎の重要度（例えば数値）として割り当てて記録部３４ｃへ記録する。すなわち、設定部１３８ｃは、表示倍率が大きいほど重要度が高くなるような処理を行う。設定部１３８ｃは、ＣＰＵ、ＦＰＧＡおよびＧＰＵ等を用いて構成される。 The setting unit 138c uses the same time axis as the line-of-sight data based on the calibration data recorded by the calibration data storage unit 345, the gaze degree analyzed by the analysis unit 111, and the calculation result calculated by the magnification calculation unit 325. Is assigned to the voice data associated with the character information converted by the conversion unit 135 and recorded in the recording unit 34c. Specifically, the setting unit 138c calculates the importance (for example, a numerical value) for each frame of the audio data by multiplying the gaze degree analyzed by the analysis unit 111 by a coefficient based on the calculation result calculated by the magnification calculation unit 325. Assigned to the recording unit 34c. That is, the setting unit 138c performs processing such that the importance becomes higher as the display magnification is larger. The setting unit 138c is configured using a CPU, FPGA, GPU, and the like.

〔顕微鏡システムの処理〕
次に、顕微鏡システム１１０が実行する処理について説明する。図２２は、顕微鏡システム１１０が実行する処理の概要を示すフローチャートである。[Microscope system processing]
Next, processing executed by the microscope system 110 will be described. FIG. 22 is a flowchart illustrating an outline of processing executed by the microscope system 110.

図２２に示すように、まず、制御部１３２ｃは、視線検出部１３０が生成した視線データ、音声入力部１３１が生成した音声データ、および倍率算出部３２５が算出した観察倍率の各々を時間計測部１３３によって計測された時間を対応付けて視線データ記録部３４１および音声データ記録部３４２に記録する（ステップＳ６０１）。ステップＳ６０１の後、顕微鏡システム１１０は、後述するステップＳ６０２へ移行する。 As shown in FIG. 22, first, the control unit 132c calculates each of the line-of-sight data generated by the line-of-sight detection unit 130, the audio data generated by the audio input unit 131, and the observation magnification calculated by the magnification calculation unit 325. The time measured by 133 is associated and recorded in the line-of-sight data recording unit 341 and the audio data recording unit 342 (step S601). After step S601, the microscope system 110 proceeds to step S602 described later.

ステップＳ６０２〜ステップＳ６０４は、上述した図１７のステップＳ５０３〜ステップＳ５０５それぞれに対応する。ステップＳ６０４の後、顕微鏡システム１１０は、ステップＳ６０５へ移行する。 Steps S602 to S604 correspond to the above-described steps S503 to S505 in FIG. After step S604, the microscope system 110 proceeds to step S605.

ステップＳ６０５において、設定部１３８ｃは、キャリブレーションデータ記憶部３４５が記録するキャリブレーションデータと、所定の時間間隔毎に解析部１１１が解析した注視度と、倍率算出部３２５が算出した算出結果と、に基づいて、視線データと同じ時間軸が対応付けられた音声データに重要度および変換部１３５によって変換された文字情報を割り当てて記録部３４ｃへ記録する。ステップＳ６０５の後、顕微鏡システム１１０は、ステップＳ６０６へ移行する。 In step S605, the setting unit 138c includes calibration data recorded by the calibration data storage unit 345, a gaze degree analyzed by the analysis unit 111 at predetermined time intervals, a calculation result calculated by the magnification calculation unit 325, The importance and the character information converted by the conversion unit 135 are assigned to the voice data associated with the same time axis as the line-of-sight data and recorded in the recording unit 34c. After step S605, the microscope system 110 proceeds to step S606.

ステップＳ６０６〜ステップＳ６１０は、上述した図１７のステップＳ５０７〜ステップＳ５１１それぞれに対応する。 Steps S606 to S610 correspond to the above-described steps S507 to S511 of FIG.

以上説明した実施の形態５によれば、設定部１３８ｃがキャリブレーションデータ記憶部３４５によって記録されたキャリブレーションデータと、解析部１１１によって解析された注視度と、倍率算出部３２５が算出した算出結果と、に基づいて、視線データと同じ時間軸が対応付けられた音声データに重要度および変換部１３５によって変換された文字情報を割り当てて記録部３４ｃすることによって、観察倍率および注視度に基づいた重要度が音声データに割り当てられるので、観察内容および注視度を加味した音声データの重要な期間を把握することができる。 According to the fifth embodiment described above, the calibration data recorded by the setting unit 138c by the calibration data storage unit 345, the gaze degree analyzed by the analysis unit 111, and the calculation result calculated by the magnification calculation unit 325. Based on the observation magnification and the gaze degree, the voice data associated with the same time axis as the line-of-sight data is assigned the importance and the character information converted by the conversion unit 135 and is recorded in the recording unit 34c. Since the importance is assigned to the audio data, it is possible to grasp the important period of the audio data in consideration of the observation contents and the gaze degree.

なお、実施の形態５では、倍率算出部３２５が算出した観察倍率を記録部１１４に記録していたが、利用者Ｕ２の操作履歴を記録し、この操作履歴をさらに加味して音声データの重要度を割り当ててもよい。 In the fifth embodiment, the observation magnification calculated by the magnification calculation unit 325 is recorded in the recording unit 114. However, the operation history of the user U2 is recorded, and this operation history is further added to the importance of the audio data. You may assign degrees.

（実施の形態６）
次に、本開示の実施の形態６について説明する。実施の形態６では、内視鏡システムの一部に情報処理装置を組み込むことによって構成する。以下においては、実施の形態６に係る内視鏡システムの構成を説明後、実施の形態６に係る内視鏡システムが実行する処理について説明する。なお、上述した実施の形態３に係る情報処理装置１ｂと同一の構成には同一の符号を付して詳細な説明は適宜省略する。(Embodiment 6)
Next, a sixth embodiment of the present disclosure will be described. In the sixth embodiment, the information processing apparatus is incorporated into a part of the endoscope system. In the following, after describing the configuration of the endoscope system according to the sixth embodiment, processing executed by the endoscope system according to the sixth embodiment will be described. Note that the same components as those of the information processing apparatus 1b according to Embodiment 3 described above are denoted by the same reference numerals, and detailed description thereof is omitted as appropriate.

〔内視鏡システムの構成〕
図２３は、実施の形態６に係る内視鏡システムの構成を示す概略図である。図２４は、実施の形態６に係る内視鏡システムの機能構成を示すブロック図である。[Configuration of endoscope system]
FIG. 23 is a schematic diagram illustrating a configuration of an endoscope system according to the sixth embodiment. FIG. 24 is a block diagram illustrating a functional configuration of the endoscope system according to the sixth embodiment.

図２３および図２４に示す内視鏡システム３００は、表示部１２０と、内視鏡４００と、ウェアラブルデバイス５００と、入力部６００と、情報処理装置１ｄと、を備える。 The endoscope system 300 shown in FIGS. 23 and 24 includes a display unit 120, an endoscope 400, a wearable device 500, an input unit 600, and an information processing apparatus 1d.

〔内視鏡の構成〕
まず、内視鏡４００の構成について説明する。
内視鏡４００は、医者や術者等の利用者Ｕ３が被検体Ｕ４に挿入することによって、被検体Ｕ４の内部を撮像することによって画像データを生成し、この画像データを情報処理装置１ｄへ出力する。内視鏡４００は、撮像部４０１と、操作部４０２と、を備える。[Configuration of endoscope]
First, the configuration of the endoscope 400 will be described.
The endoscope 400 generates image data by imaging the inside of the subject U4 by being inserted into the subject U4 by a user U3 such as a doctor or an operator, and this image data is sent to the information processing apparatus 1d. Output. The endoscope 400 includes an imaging unit 401 and an operation unit 402.

撮像部４０１は、内視鏡４００の挿入部の先端部に設けられる。撮像部４０１は、情報処理装置１ｄの制御のもと、被検体Ｕ４の内部を撮像することによって画像データを生成し、この画像データを情報処理装置１ｄへ出力する。撮像部４０１は、観察倍率を変更することができる光学系と、光学系が結像した被写体像を受光することによって画像データを生成するＣＭＯＳやＣＣＤ等のイメージセンサ等を用いて構成される。 The imaging unit 401 is provided at the distal end of the insertion unit of the endoscope 400. The imaging unit 401 generates image data by imaging the inside of the subject U4 under the control of the information processing apparatus 1d, and outputs the image data to the information processing apparatus 1d. The imaging unit 401 is configured using an optical system that can change the observation magnification, and an image sensor such as a CMOS or CCD that generates image data by receiving a subject image formed by the optical system.

操作部４０２は、利用者Ｕ３の各種の操作の入力を受け付け、受け付けた各種操作に応じた操作信号を情報処理装置１ｄへ出力する。 The operation unit 402 receives input of various operations of the user U3 and outputs operation signals corresponding to the received various operations to the information processing apparatus 1d.

〔ウェアラブルデバイスの構成〕
次に、ウェアラブルデバイス５００の構成について説明する。
ウェアラブルデバイス５００は、利用者Ｕ３に装着され、利用者Ｕ３の視線を検出するとともに、利用者Ｕ３の音声の入力を受け付ける。ウェアラブルデバイス５００は、視線検出部５１０と、音声入力部５２０と、を有する。[Configuration of wearable device]
Next, the configuration of wearable device 500 will be described.
Wearable device 500 is attached to user U3, detects the line of sight of user U3, and accepts the input of voice of user U3. Wearable device 500 includes line-of-sight detection unit 510 and audio input unit 520.

視線検出部５１０は、ウェアラブルデバイス５００に設けられ、利用者Ｕ３の視線の注視度を検出することによって視線データを生成し、この視線データを情報処理装置１ｄへ出力する。視線検出部５１０は、上述した実施の形態３に係る視線検出部２２０と同様の構成を有するため、詳細な構成は省略する。 The line-of-sight detection unit 510 is provided in the wearable device 500, generates line-of-sight data by detecting the gaze degree of the line of sight of the user U3, and outputs this line-of-sight data to the information processing apparatus 1d. Since the line-of-sight detection unit 510 has the same configuration as the line-of-sight detection unit 220 according to Embodiment 3 described above, a detailed configuration is omitted.

音声入力部５２０は、ウェアラブルデバイス５００に設けられ、利用者Ｕ３の音声の入力を受け付けることによって音声データを生成し、この音声データを情報処理装置１ｄへ出力する。音声入力部５２０は、マイク等を用いて構成される。 The voice input unit 520 is provided in the wearable device 500, generates voice data by receiving the voice input of the user U3, and outputs the voice data to the information processing apparatus 1d. The voice input unit 520 is configured using a microphone or the like.

〔入力部の構成〕
入力部６００の構成について説明する。
入力部６００は、マウス、キーボード、タッチパネルおよび各種のスイッチを用いて構成される。入力部６００は、利用者Ｕ３の各種の操作の入力を受け付け、受け付けた各種操作に応じた操作信号を情報処理装置１ｄへ出力する。[Configuration of input section]
The configuration of the input unit 600 will be described.
The input unit 600 is configured using a mouse, a keyboard, a touch panel, and various switches. The input unit 600 receives input of various operations of the user U3 and outputs operation signals corresponding to the received various operations to the information processing apparatus 1d.

〔情報処理装置の構成〕
次に、情報処理装置１ｄの構成について説明する。
情報処理装置１ｄは、上述した実施の形態５に係る情報処理装置１１０ｃの制御部１３２ｃ、記録部３４ｃ、設定部１３８ｃ、生成部１３９に換えて、制御部１３２ｄ、記録部３４ｄ、設定部１３８ｄおよび生成部１３９ｄを備える。さらに、情報処理装置１ｄは、画像処理部１４０をさらに備える。[Configuration of information processing device]
Next, the configuration of the information processing apparatus 1d will be described.
The information processing device 1d is replaced with the control unit 132d, the recording unit 34d, the setting unit 138d, and the control unit 132c, the recording unit 34c, the setting unit 138c, and the generation unit 139 of the information processing device 110c according to the fifth embodiment. A generation unit 139d is provided. Furthermore, the information processing apparatus 1d further includes an image processing unit 140.

制御部１３２ｄは、ＣＰＵ、ＦＰＧＡおよびＧＰＵ等を用いて構成され、内視鏡４００、ウェアラブルデバイス５００および表示部１２０を制御する。制御部１３２ｄは、視線検出制御部３２１、音声入力制御部３２２、表示制御部３２３、撮影制御部３２４に加えて、操作履歴検出部３２６を備える。 The control unit 132d is configured using a CPU, FPGA, GPU, and the like, and controls the endoscope 400, the wearable device 500, and the display unit 120. The control unit 132d includes an operation history detection unit 326 in addition to the line-of-sight detection control unit 321, the voice input control unit 322, the display control unit 323, and the imaging control unit 324.

操作履歴検出部３２６は、内視鏡４００の操作部４０２が入力を受け付けた操作の内容を検出し、この検出結果を記録部３４ｄに出力する。具体的には、操作履歴検出部３２６は、内視鏡４００の操作部４０２から拡大スイッチが操作された場合、この操作内容を検出し、この検出結果を記録部３４ｄに出力する。なお、操作履歴検出部３２６は、内視鏡４００を経由して被検体Ｕ４の内部に挿入される処置具の操作内容を検出し、この検出結果を記録部３４ｄに出力してもよい。 The operation history detection unit 326 detects the content of the operation accepted by the operation unit 402 of the endoscope 400 and outputs the detection result to the recording unit 34d. Specifically, when the enlargement switch is operated from the operation unit 402 of the endoscope 400, the operation history detection unit 326 detects the operation content and outputs the detection result to the recording unit 34d. The operation history detection unit 326 may detect the operation content of the treatment instrument inserted into the subject U4 via the endoscope 400 and output the detection result to the recording unit 34d.

記録部３４ｄは、揮発性メモリ、不揮発性メモリおよび記録媒体等を用いて構成される。記録部３４ｄは、上述した実施の形態５に係る記録部３４ｃの構成に加えて、操作履歴記録部３４７をさらに備える。 The recording unit 34d is configured using a volatile memory, a nonvolatile memory, a recording medium, and the like. The recording unit 34d further includes an operation history recording unit 347 in addition to the configuration of the recording unit 34c according to the fifth embodiment described above.

操作履歴記録部３４７は、操作履歴検出部３２６から入力された内視鏡４００の操作部４０２に対する操作の履歴を記録する。 The operation history recording unit 347 records an operation history for the operation unit 402 of the endoscope 400 input from the operation history detection unit 326.

設定部１３８ｄは、キャリブレーションデータ記憶部３４５が記録するキャリブレーションデータと、解析部１１１が解析した注視度と、操作履歴記録部３４７が記録する操作履歴と、に基づいて、視線データと同じ時間軸が対応付けられた音声データに重要度および変換部１３５によって変換された文字情報を割り当てて記録部３４ｄへ記録する。具体的には、設定部１３８ｄは、キャリブレーションデータ記憶部３４５が記録するキャリブレーションデータと、解析部１１１が解析した注視度と、操作履歴記録部３４７が記録する操作履歴とに基づいて、音声データのフレーム毎に重要度（例えば数値）を割り当てて記録部３４ｄへ記録する。すなわち、設定部１３８ｄは、操作履歴の内容に応じて設定された係数が大きいほど重要度が高くなるような処理を行う。設定部１３８ｄは、ＣＰＵ、ＦＰＧＡおよびＧＰＵ等を用いて構成される。 The setting unit 138d has the same time as the line-of-sight data based on the calibration data recorded by the calibration data storage unit 345, the gaze degree analyzed by the analysis unit 111, and the operation history recorded by the operation history recording unit 347. The importance and the character information converted by the conversion unit 135 are assigned to the voice data associated with the axis and recorded in the recording unit 34d. Specifically, the setting unit 138d performs audio based on the calibration data recorded by the calibration data storage unit 345, the gaze degree analyzed by the analysis unit 111, and the operation history recorded by the operation history recording unit 347. Importance (for example, a numerical value) is assigned to each frame of data and recorded in the recording unit 34d. That is, the setting unit 138d performs a process such that the greater the coefficient set according to the content of the operation history, the higher the importance. The setting unit 138d is configured using a CPU, FPGA, GPU, and the like.

生成部１３９ｄは、画像処理部１４０が生成した統合画像データに対応する統合画像上に、解析部１１１が解析した注視度および文字情報を関連付けた視線マッピングデータを生成し、この生成した視線マッピングデータを記録部３４ｄおよび表示制御部３２３へ出力する。 The generation unit 139d generates line-of-sight mapping data in which the gaze degree and character information analyzed by the analysis unit 111 are associated on the integrated image corresponding to the integrated image data generated by the image processing unit 140, and the generated line-of-sight mapping data Is output to the recording unit 34d and the display control unit 323.

画像処理部１４０は、画像データ記録部３４６が記録する複数の画像データを合成することによって３次元画像の統合画像データを生成し、この統合画像データを生成部１３９ｄへ出力する。 The image processing unit 140 generates integrated image data of a three-dimensional image by combining a plurality of image data recorded by the image data recording unit 346, and outputs the integrated image data to the generating unit 139d.

〔内視鏡システムの処理〕
次に、内視鏡システム３００が実行する処理について説明する。図２５は、内視鏡システム３００が実行する処理の概要を示すフローチャートである。[Endoscope system processing]
Next, processing executed by the endoscope system 300 will be described. FIG. 25 is a flowchart illustrating an outline of processing executed by the endoscope system 300.

図２５に示すように、まず、制御部１３２ｄは、視線検出部１３０が生成した視線データ、音声入力部１３１が生成した音声データ、および操作履歴検出部３２６が検出した操作履歴の各々を時間計測部１３３によって計測された時間と対応付けて視線データ記録部３４１、音声データ記録部３４２および操作履歴記録部３４７に記録する（ステップＳ７０１）。ステップＳ７０１の後、内視鏡システム３００は、後述するステップＳ７０２へ移行する。 As shown in FIG. 25, first, the control unit 132d measures each of the line-of-sight data generated by the line-of-sight detection unit 130, the audio data generated by the audio input unit 131, and the operation history detected by the operation history detection unit 326. The line-of-sight data recording unit 341, the audio data recording unit 342, and the operation history recording unit 347 are recorded in association with the time measured by the unit 133 (step S701). After step S701, the endoscope system 300 proceeds to step S702 to be described later.

ステップＳ７０２〜ステップＳ７０４は、上述した図１７のステップＳ５０３〜ステップＳ５０５それぞれに対応する。ステップＳ７０４の後、内視鏡システム３００は、ステップＳ７０５へ移行する。 Steps S702 to S704 correspond to the above-described steps S503 to S505 in FIG. After step S704, the endoscope system 300 proceeds to step S705.

ステップＳ７０５において、設定部１３８ｄは、キャリブレーションデータ記憶部３４５が記録するキャリブレーションデータと、解析部１１１が解析した注視度と、操作履歴記録部３４７が記録する操作履歴と、に基づいて、視線データと同じ時間軸が対応付けられた音声データに重要度および変換部１３５によって変換された文字情報を割り当てて記録部３４ｄへ記録する。 In step S705, the setting unit 138d determines the line of sight based on the calibration data recorded by the calibration data storage unit 345, the gaze degree analyzed by the analysis unit 111, and the operation history recorded by the operation history recording unit 347. The importance and the character information converted by the conversion unit 135 are assigned to the voice data associated with the same time axis as the data and recorded in the recording unit 34d.

続いて、画像処理部１４０は、画像データ記録部３４６が記録する複数の画像データを合成することによって３次元画像の統合画像データを生成し、この統合画像データを生成部１３９ｄへ出力する（ステップＳ７０６）。図２６は、画像データ記録部３４６が記録する複数の画像データに対応する複数の画像の一例を模式的に示す図である。図２７は、画像処理部１４０が生成する統合画像データに対応する統合画像の一例を示す図である。図２６および図２７に示すように、画像処理部１４０は、時間的に連続する複数の画像データＰ１１〜Ｐ_Ｎ（Ｎ＝整数）を合成することによって統合画像データに対応する統合画像Ｐ１００を生成する。Subsequently, the image processing unit 140 generates integrated image data of a three-dimensional image by synthesizing a plurality of image data recorded by the image data recording unit 346, and outputs the integrated image data to the generation unit 139d (Step 139d). S706). FIG. 26 is a diagram schematically illustrating an example of a plurality of images corresponding to a plurality of image data recorded by the image data recording unit 346. FIG. 27 is a diagram illustrating an example of an integrated image corresponding to the integrated image data generated by the image processing unit 140. As shown in FIGS. 26 and 27, the image processing unit 140 generates an integrated image P100 corresponding to the integrated image data by combining a plurality of temporally continuous image data P11 to P_N (N = integer). To do.

その後、生成部１３９ｄは、画像処理部１４０が生成した統合画像データに対応する統合画像Ｐ１００上に、解析部１１１が解析した注視度、視線および文字情報を関連付けた視線マッピングデータを生成し、この生成した視線マッピングデータを記録部３４ｄおよび表示制御部３２３へ出力する（ステップＳ７０７）。この場合、生成部１３９ｄは、画像処理部１４０が生成した統合画像データに対応する統合画像Ｐ１００上に、解析部１１１が解析した注視度、視線Ｋ２および文字情報に加えて、操作履歴を関連付けてもよい。ステップＳ７０７の後、内視鏡システム３００は、後述するステップＳ７０８へ移行する。 Thereafter, the generation unit 139d generates line-of-sight mapping data in which the gaze degree, line-of-sight, and character information analyzed by the analysis unit 111 are associated on the integrated image P100 corresponding to the integrated image data generated by the image processing unit 140. The generated line-of-sight mapping data is output to the recording unit 34d and the display control unit 323 (step S707). In this case, the generation unit 139d associates an operation history with the gaze degree, the line of sight K2, and the character information analyzed by the analysis unit 111 on the integrated image P100 corresponding to the integrated image data generated by the image processing unit 140. Also good. After step S707, the endoscope system 300 proceeds to step S708 to be described later.

ステップＳ７０８〜ステップＳ７１１は、上述した図１７のステップＳ５０８〜ステップＳ５１１それぞれに対応する。 Steps S708 to S711 correspond to the above-described steps S508 to S511 of FIG.

以上説明した実施の形態６によれば、設定部１３８ｄが解析部１１１によって解析された注視度と操作履歴記録部３４７が記録する操作履歴とに基づいて、視線データと同じ時間軸が対応付けられた音声データに重要度および変換部１３５によって変換された文字情報を割り当てて記録部３４ｄへ記録することによって、操作履歴および注視度に基づいた重要度が音声データに割り当てられるので、操作内容および注視度を加味した音声データの重要な期間を把握することができる。 According to the sixth embodiment described above, the same time axis as the line-of-sight data is associated based on the gaze degree analyzed by the setting unit 138d by the analysis unit 111 and the operation history recorded by the operation history recording unit 347. Since the importance and the character information converted by the conversion unit 135 are assigned to the voice data and recorded in the recording unit 34d, the importance based on the operation history and the gaze degree is assigned to the voice data. It is possible to grasp the important period of audio data taking the degree into consideration.

また、実施の形態６では、内視鏡システムであったが、例えばカプセル型の内視鏡、被検体を撮像するビデオマイクロスコープ、撮像機能を有する携帯電話および撮像機能を有するタブレット型端末であっても適用することができる。 In the sixth embodiment, an endoscope system is used. For example, a capsule endoscope, a video microscope for imaging a subject, a mobile phone having an imaging function, and a tablet terminal having an imaging function are used. Even can be applied.

また、実施の形態６では、軟性の内視鏡を備えた内視鏡システムであったが、硬性の内視鏡を備えた内視鏡システム、工業用の内視鏡を備えた内視鏡システムであっても適用することができる。 In the sixth embodiment, the endoscope system includes a flexible endoscope. However, the endoscope system includes a rigid endoscope, and the endoscope includes an industrial endoscope. Even a system can be applied.

また、実施の形態６では、被検体に挿入される内視鏡を備えた内視鏡システムであったが、副鼻腔内視鏡および電気メスや検査プローブ等の内視鏡システムであっても適用することができる。 In the sixth embodiment, the endoscope system includes an endoscope that is inserted into a subject. However, the endoscope system may be a sinus endoscope, an electric knife, an inspection probe, or the like. Can be applied.

（実施の形態７）
次に、本開示の実施の形態７について説明する。上述した実施の形態３〜６は、利用者が一人の場合を想定していたが、実施の形態７では、２人以上の利用者を想定する。さらに、実施の形態７では、複数の利用者で画像を閲覧する情報処理システムに情報処理装置を組み込むことによって構成する。以下においては、実施の形態７に係る閲覧システムの構成を説明後、実施の形態７に係る情報処理システムが実行する処理について説明する。なお、上述した実施の形態３に係る情報処理装置１ｂと同一の構成には同一の符号を付して詳細な説明は適宜省略する。(Embodiment 7)
Next, a seventh embodiment of the present disclosure will be described. Embodiments 3 to 6 described above assume the case of one user, but Embodiment 7 assumes two or more users. Furthermore, the seventh embodiment is configured by incorporating an information processing apparatus into an information processing system for browsing images by a plurality of users. In the following, after describing the configuration of the browsing system according to the seventh embodiment, processing executed by the information processing system according to the seventh embodiment will be described. Note that the same components as those of the information processing apparatus 1b according to Embodiment 3 described above are denoted by the same reference numerals, and detailed description thereof is omitted as appropriate.

〔情報処理システムの構成〕
図２８は、実施の形態７に係る情報処理システムの機能構成を示すブロック図である。図２８に示す情報処理システム７００は、表示部１２０と、第１ウェアラブルデバイス７１０と、第２ウェアラブルデバイス７２０と、検出部７３０と、情報処理装置１ｅと、を備える。[Configuration of information processing system]
FIG. 28 is a block diagram illustrating a functional configuration of the information processing system according to the seventh embodiment. An information processing system 700 illustrated in FIG. 28 includes a display unit 120, a first wearable device 710, a second wearable device 720, a detection unit 730, and an information processing apparatus 1e.

〔第１ウェアラブルデバイスの構成〕
まず、第１ウェアラブルデバイス７１０の構成について説明する。
第１ウェアラブルデバイス７１０は、利用者に装着され、利用者の視線を検出するとともに、利用者の音声の入力を受け付ける。第１ウェアラブルデバイス７１０は、第１視線検出部７１１と、第１音声入力部７１２と、を有する。第１視線検出部７１１および第１音声入力部７１２は、上述した実施の形態４に係る視線検出部５１０および音声入力部５２０と同様の構成を有するため、詳細な構成は省略する。[Configuration of the first wearable device]
First, the configuration of the first wearable device 710 will be described.
The first wearable device 710 is attached to the user, detects the user's line of sight, and accepts the user's voice input. The first wearable device 710 includes a first line-of-sight detection unit 711 and a first audio input unit 712. Since the first line-of-sight detection unit 711 and the first voice input unit 712 have the same configuration as the line-of-sight detection unit 510 and the voice input unit 520 according to Embodiment 4 described above, detailed configurations are omitted.

〔第２ウェアラブルデバイスの構成〕
次に、第２ウェアラブルデバイス７２０の構成について説明する。
第２ウェアラブルデバイス７２０は、上述した第１ウェアラブルデバイス７１０と同様の構成を有し、利用者に装着され、利用者の視線を検出するとともに、利用者の音声の入力を受け付ける。第２ウェアラブルデバイス７２０は、第２視線検出部７２１と、第２音声入力部７２２と、を有する。第２視線検出部７２１および第２音声入力部７２２は、上述した実施の形態４に係る視線検出部５１０および音声入力部５２０と同様の構成を有するため、詳細な構成は省略する。[Configuration of second wearable device]
Next, the configuration of the second wearable device 720 will be described.
The second wearable device 720 has the same configuration as the first wearable device 710 described above, is worn by the user, detects the user's line of sight, and accepts the user's voice input. The second wearable device 720 includes a second line-of-sight detection unit 721 and a second audio input unit 722. The second line-of-sight detection unit 721 and the second audio input unit 722 have the same configuration as the line-of-sight detection unit 510 and the audio input unit 520 according to Embodiment 4 described above, and thus detailed configurations are omitted.

〔検出部の構成〕
次に、検出部７３０の構成について説明する。
検出部７３０は、複数の利用者の各々を識別する識別情報を検出し、この検出結果を情報処理装置１ｅへ出力する。検出部７３０は、複数の利用者の各々を識別する識別情報（例えばＩＤや名前等）を記録するＩＣカードから利用者の識別情報を検出し、この検出結果を情報処理装置１ｅへ出力する。検出部７３０は、例えば、ＩＣカードを読み取るカードリーダ等を用いて構成される。なお、検出部７３０は、複数の利用者の顔を撮像することによって生成した画像データに対応する画像に対して、予め設定された利用者の顔の特徴点および周知のパターンマッチングを用いて利用者を識別し、この識別結果を情報処理装置１ｅへ出力するようにしてもよい。もちろん、検出部７３０は、操作部１３７からの操作に応じて入力された信号に基づいて、利用者を識別し、この識別結果を情報処理装置１ｅへ出力するようにしてもよい。(Configuration of detection unit)
Next, the configuration of the detection unit 730 will be described.
The detection unit 730 detects identification information for identifying each of the plurality of users, and outputs the detection result to the information processing apparatus 1e. The detection unit 730 detects user identification information from an IC card that records identification information (for example, ID and name) for identifying each of a plurality of users, and outputs the detection result to the information processing apparatus 1e. The detection unit 730 is configured using, for example, a card reader that reads an IC card. Note that the detection unit 730 uses a feature point of the user's face set in advance and a well-known pattern matching for an image corresponding to the image data generated by imaging a plurality of user's faces. The person may be identified and the identification result may be output to the information processing apparatus 1e. Of course, the detection unit 730 may identify the user based on a signal input in response to an operation from the operation unit 137 and output the identification result to the information processing apparatus 1e.

〔情報処理装置の構成〕
次に、情報処理装置１ｅの構成について説明する。
情報処理装置１ｅは、上述した実施の形態４に係る情報処理装置１ｄの制御部１３２ｄ、記録部３４ｄおよび設定部１３８ｄに換えて、制御部１３２ｅ、記録部３４ｅおよび設定部１３８ｅを備える。[Configuration of information processing device]
Next, the configuration of the information processing apparatus 1e will be described.
The information processing device 1e includes a control unit 132e, a recording unit 34e, and a setting unit 138e instead of the control unit 132d, the recording unit 34d, and the setting unit 138d of the information processing device 1d according to Embodiment 4 described above.

制御部１３２ｅは、ＣＰＵ、ＦＰＧＡおよびＧＰＵ等を用いて構成され、第１ウェアラブルデバイス７１０、第２ウェアラブルデバイス７２０、検出部７３０および表示部１２０を制御する。制御部１３２ｅは、視線検出制御部３２１、音声入力制御部３２２、表示制御部３２３に加えて、識別検出制御部３２７を備える。 The control unit 132e is configured using a CPU, FPGA, GPU, and the like, and controls the first wearable device 710, the second wearable device 720, the detection unit 730, and the display unit 120. The control unit 132e includes an identification detection control unit 327 in addition to the line-of-sight detection control unit 321, the voice input control unit 322, and the display control unit 323.

識別検出制御部３２７は、検出部７３０を制御し、検出部７３０が取得した取得結果に基づいて、複数の利用者の各々を識別し、この識別結果を記録部３４ｅへ出力する。 The identification detection control unit 327 controls the detection unit 730, identifies each of the plurality of users based on the acquisition result acquired by the detection unit 730, and outputs the identification result to the recording unit 34e.

記録部３４ｅは、揮発性メモリ、不揮発性メモリおよび記録媒体等を用いて構成される。記録部３４ｅは、上述した実施の形態３に係る記録部３４ｃの構成に加えて、識別情報記録部３４８をさらに備える。 The recording unit 34e is configured using a volatile memory, a nonvolatile memory, a recording medium, and the like. The recording unit 34e further includes an identification information recording unit 348 in addition to the configuration of the recording unit 34c according to the third embodiment described above.

識別情報記録部３４８は、識別検出制御部３２７から入力された複数の利用者の各々の識別情報を記録する。 The identification information recording unit 348 records the identification information of each of the plurality of users input from the identification detection control unit 327.

設定部１３８ｅは、キャリブレーションデータ記憶部３４５が記録する利用者毎のキャリブレーションデータと、解析部１１１が解析した各解析結果と、抽出部１３６が抽出した文字情報と、識別情報記録部３４８が記録する識別情報と、に基づいて、所定の時間間隔毎に視線データと同じ時間軸が対応付けられた音声データに重要度および変換部１３５によって変換された文字情報を割り当てて記録部３４ｅに記録する。さらに、設定部１３８ｅは、識別情報記録部３４８が記録する各利用者の識別情報に応じて重要度の重み付けを行う。すなわち、設定部１３８ｅは、重要な利用者（例えば役職に応じて設定されたランク）ほど重要度が高くなる処理を行う。 The setting unit 138e includes calibration data for each user recorded by the calibration data storage unit 345, each analysis result analyzed by the analysis unit 111, character information extracted by the extraction unit 136, and an identification information recording unit 348. Based on the identification information to be recorded, the importance and the character information converted by the conversion unit 135 are assigned to the voice data associated with the same time axis as the line-of-sight data at predetermined time intervals and recorded in the recording unit 34e. To do. Further, the setting unit 138e weights the importance according to the identification information of each user recorded by the identification information recording unit 348. That is, the setting unit 138e performs a process in which the importance becomes higher as the important user (for example, the rank set according to the job title) is increased.

〔情報処理システムの処理〕
次に、情報処理システム７００が実行する処理について説明する。図２９は、情報処理システム７００が実行する処理の概要を示すフローチャートである。[Processing of information processing system]
Next, processing executed by the information processing system 700 will be described. FIG. 29 is a flowchart illustrating an outline of processing executed by the information processing system 700.

図２９に示すように、表示制御部３２３は、画像データ記録部３４３が記録する画像データに対応する画像を表示部１２０に表示させる（ステップＳ８０１）。 As shown in FIG. 29, the display control unit 323 causes the display unit 120 to display an image corresponding to the image data recorded by the image data recording unit 343 (step S801).

続いて、制御部１３２ｅは、第１ウェアラブルデバイス７１０および第２ウェアラブルデバイス７２０の各々が生成した視線データ、音声データ、および検出部７３０が取得した識別情報の各々を時間計測部１３３によって計測された時間と対応付けて視線データ記録部３４１、音声データ記録部３４２および識別情報記録部３４８に記録する（ステップＳ８０２）。ステップＳ８０２の後、情報処理システム７００は、ステップＳ８０３へ移行する。 Subsequently, in the control unit 132e, the time measurement unit 133 measured each of the line-of-sight data generated by each of the first wearable device 710 and the second wearable device 720, the audio data, and the identification information acquired by the detection unit 730. The line-of-sight data recording unit 341, the audio data recording unit 342, and the identification information recording unit 348 are recorded in association with time (step S802). After step S802, the information processing system 700 proceeds to step S803.

ステップＳ８０３およびステップＳ８０４は、上述した図１７のステップＳ５０３およびステップＳ５０４それぞれに対応する。ステップＳ８０４の後、情報処理システム７００は、後述するステップＳ８０５へ移行する。 Step S803 and step S804 correspond to step S503 and step S504 in FIG. 17 described above, respectively. After step S804, the information processing system 700 proceeds to step S805 described later.

ステップＳ８０５において、解析部１１１は、第１ウェアラブルデバイス７１０が生成した第１視線データおよび第２ウェアラブルデバイス７２０が生成した第２視線データに基づいて、各利用者の視線の注視度を解析する。 In step S805, the analysis unit 111 analyzes the gaze degree of each user's line of sight based on the first line-of-sight data generated by the first wearable device 710 and the second line-of-sight data generated by the second wearable device 720.

続いて、設定部１３８ｅは、所定の時間間隔毎に解析部１１１が解析した各注視度と、識別情報記録部３４８が記録する識別情報とに基づいて、視線データと同じ時間軸が対応付けられた第１音声データおよび第２音声データの各々に重要度および変換部１３５によって変換された文字情報を割り当てて記録部３４ｅへ記録する（ステップＳ８０６）。 Subsequently, the setting unit 138e associates the same time axis as the line-of-sight data based on each gaze degree analyzed by the analysis unit 111 and the identification information recorded by the identification information recording unit 348 at predetermined time intervals. The importance level and the character information converted by the conversion unit 135 are assigned to each of the first audio data and the second audio data and recorded in the recording unit 34e (step S806).

ステップＳ８０７〜ステップＳ８１１は、上述した図１７のステップＳ５０７〜ステップＳ５１１それぞれに対応する。 Steps S807 to S811 correspond to the above-described steps S507 to S511 of FIG.

以上説明した実施の形態７によれば、設定部１３８ｅが解析部１１１によって解析された各利用者の注視度と、識別情報記録部３４８が記録する識別情報とに基づいて、視線データと同じ時間軸が対応付けられた第１音声データおよび第２音声データの各々に重要度および変換部１３５によって変換された文字情報を割り当てて記録部３４ｅへ記録することによって、識別情報よび注視度に基づいた重要度が第１音声データまたは第２音声データに割り当てられるので、利用者に応じた注視度を加味した音声データの重要な期間を把握することができる。 According to the seventh embodiment described above, based on the gaze degree of each user analyzed by the analysis unit 111 by the setting unit 138e and the identification information recorded by the identification information recording unit 348, the same time as the line-of-sight data. Based on the identification information and the gaze degree by assigning the importance and the character information converted by the conversion unit 135 to each of the first audio data and the second audio data associated with the axis and recording them in the recording unit 34e. Since the importance level is assigned to the first voice data or the second voice data, it is possible to grasp the important period of the voice data in consideration of the gaze degree corresponding to the user.

なお、実施の形態７では、設定部１３８ｅが解析部１１１によって解析された各利用者の注視度と、識別情報記録部３４８が記録する各利用者の識別情報とに基づいて、視線データと同じ時間軸が対応付けられた第１音声データおよび第２音声データの各々に重要度および変換部１３５によって変換された文字情報を割り当てて記録部３４ｅへ記録していたが、これに限定されることなく、例えば複数の利用者の各々の位置を検出し、この検出結果と各利用者の注視度とに基づいて、第１音声データおよび第２音声データの各々に重要度および変換部１３５によって変換された文字情報を割り当てて記録部３４ｅへ記録してもよい。 In the seventh embodiment, the setting unit 138e is the same as the line-of-sight data based on the gaze degree of each user analyzed by the analysis unit 111 and the identification information of each user recorded by the identification information recording unit 348. The importance and the character information converted by the conversion unit 135 are assigned to each of the first audio data and the second audio data associated with the time axis and recorded in the recording unit 34e. However, the present invention is limited to this. For example, the position of each of a plurality of users is detected, and the importance and conversion unit 135 converts the first sound data and the second sound data into the first sound data and the second sound data based on the detection result and the gaze degree of each user. The recorded character information may be allocated and recorded in the recording unit 34e.

（その他の実施の形態）
上述した実施の形態１〜７に開示されている複数の構成要素を適宜組み合わせることによって、種々の発明を形成することができる。例えば、上述した実施の形態１〜５に記載した全構成要素からいくつかの構成要素を削除してもよい。さらに、上述した実施の形態１〜５で説明した構成要素を適宜組み合わせてもよい。(Other embodiments)
Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the first to seventh embodiments. For example, you may delete a some component from all the components described in Embodiment 1-5 mentioned above. Furthermore, you may combine suitably the component demonstrated in Embodiment 1-5 mentioned above.

また、実施の形態１〜７において、上述してきた「部」は、「手段」や「回路」などに読み替えることができる。例えば、制御部は、制御手段や制御回路に読み替えることができる。 In the first to seventh embodiments, the “unit” described above can be read as “means”, “circuit”, or the like. For example, the control unit can be read as control means or a control circuit.

また、実施の形態１〜７に係る情報処理装置に実行させるプログラムは、インストール可能な形式または実行可能な形式のファイルデータでＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ、ＤＶＤ（Digital Versatile Disk）、ＵＳＢ媒体、フラッシュメモリ等のコンピュータで読み取り可能な記録媒体に記録されて提供される。 The programs executed by the information processing apparatuses according to the first to seventh embodiments are file data in an installable format or executable format, and are CD-ROM, flexible disk (FD), CD-R, DVD (Digital Versatile). Disk), a USB medium, a flash memory, and the like.

また、実施の形態１〜７に係る情報処理装置に実行させるプログラムは、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。さらに、実施の形態１〜５に係る情報処理装置に実行させるプログラムをインターネット等のネットワーク経由で提供または配布するようにしてもよい。 Further, the program to be executed by the information processing apparatus according to the first to seventh embodiments may be configured to be stored on a computer connected to a network such as the Internet and provided by being downloaded via the network. Furthermore, a program to be executed by the information processing apparatus according to Embodiments 1 to 5 may be provided or distributed via a network such as the Internet.

また、実施の形態１〜７では、伝送ケーブルを経由して各種機器から信号を送信していたが、例えば有線である必要はなく、無線であってもよい。この場合、所定の無線通信規格（例えばＷｉ−Ｆｉ（登録商標）やＢｌｕｅｔｏｏｔｈ（登録商標））に従って、各機器から信号を送信するようにすればよい。もちろん、他の無線通信規格に従って無線通信を行ってもよい。 In the first to seventh embodiments, signals are transmitted from various devices via a transmission cable. However, the signals need not be wired, for example, and may be wireless. In this case, a signal may be transmitted from each device in accordance with a predetermined wireless communication standard (for example, Wi-Fi (registered trademark) or Bluetooth (registered trademark)). Of course, wireless communication may be performed according to other wireless communication standards.

なお、本明細書におけるフローチャートの説明では、「まず」、「その後」、「続いて」等の表現を用いてステップ間の処理の前後関係を明示していたが、本発明を実施するために必要な処理の順序は、それらの表現によって一意的に定められるわけではない。即ち、本明細書で記載したフローチャートにおける処理の順序は、矛盾のない範囲で変更することができる。 In the description of the flowchart in the present specification, the context of the processing between steps is clearly indicated using expressions such as “first”, “after”, “follow”, etc., in order to implement the present invention. The order of processing required is not uniquely determined by their representation. That is, the order of processing in the flowcharts described in this specification can be changed within a consistent range.

以上、本願の実施の形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、本発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 As described above, some of the embodiments of the present application have been described in detail with reference to the drawings. However, these are merely examples, and various embodiments can be made based on the knowledge of those skilled in the art including the aspects described in the disclosure section of the present invention. The present invention can be implemented in other forms that have been modified or improved.

１，１ａ，１ｂ，１ｄ，１ｅ，１００，１００ａ，１１０ｃ情報処理装置
１ａａ，１ｃ，７００情報処理システム
１０，１３０，２２０，２２２，５１０視線検出部
１１，１３１，５２０音声入力部
１２，１２０表示部
１３，１３７，４０２操作部
１４，１３２，１３２ｃ，１３２ｄ，１３２ｅ制御部
１５時間計測部
１６，１６ａ，３４，３４ｃ，３４ｄ，３４ｅ，１１４記録部
１７，１１１解析部
１８，１１２，１１２ａ，１３８，１３８ｃ，１３８ｄ，１３８ｅ設定部
１９キャリブレーション生成部
２０，１３５変換部
２１，１３６抽出部
１１０顕微鏡システム
１１３，１３９，１３９ｄ生成部
１１５，１４３，３２３表示制御部
１１６，１６６，３４５キャリブレーションデータ記憶部
１３３時間計測部
１４０画像処理部
１４１，３２１視線検出制御部
１４２，３２２音声入力制御部
１６１，３４１視線データ記録部
１６２，３４２音声データ記録部
１６３，３４３，３４６画像データ記録部
１６４，３４４プログラム記憶部
１６５補正用音声データ記憶部
１６７キーワード履歴記録部
１６８重要単語記憶部
２００顕微鏡
２０１本体部
２０２回転部
２０３昇降部
２０４レボルバ
２０５対物レンズ
２０６倍率検出部
２０７鏡筒部
２０８接続部
２０９接眼部
２１０，４０１撮像部
３００内視鏡システム
３２５倍率算出部
３２６操作履歴検出部
３４７操作履歴記録部
３４８識別情報記録部
４００内視鏡
５００ウェアラブルデバイス
６００入力部
７１０第１ウェアラブルデバイス
７１１第１視線検出部
７１２第１音声入力部
７２０第２ウェアラブルデバイス
７２１第２視線検出部
７２２第２音声入力部
７３０検出部
３２４撮影制御部
３２７識別検出制御部1, 1a, 1b, 1d, 1e, 100, 100a, 110c Information processing device 1aa, 1c, 700 Information processing system 10, 130, 220, 222, 510 Line-of-sight detection unit 11, 131, 520 Audio input unit 12, 120 Display Unit 13, 137, 402 Operation unit 14, 132, 132c, 132d, 132e Control unit 15 Time measurement unit 16, 16a, 34, 34c, 34d, 34e, 114 Recording unit 17, 111 Analysis unit 18, 112, 112a, 138 , 138c, 138d, 138e Setting unit 19 Calibration generation unit 20, 135 Conversion unit 21, 136 Extraction unit 110 Microscope system 113, 139, 139d Generation unit 115, 143, 323 Display control unit 116, 166, 345 Calibration data storage Part 133 Time measurement part 1 0 Image processing unit 141,321 Visual line detection control unit 142,322 Audio input control unit 161,341 Visual data recording unit 162,342 Audio data recording unit 163,343,346 Image data recording unit 164,344 Program storage unit 165 For correction Audio data storage unit 167 Keyword history recording unit 168 Important word storage unit 200 Microscope 201 Main body unit 202 Rotating unit 203 Lifting unit 204 Revolver 205 Objective lens 206 Magnification detection unit 207 Lens barrel unit 208 Connection unit 209 Eyepiece unit 210, 401 Imaging unit 300 Endoscope System 325 Magnification Calculation Unit 326 Operation History Detection Unit 347 Operation History Recording Unit 348 Identification Information Recording Unit 400 Endoscope 500 Wearable Device 600 Input Unit 710 First Wearable Device 711 First Gaze Detection Unit 712 First 1 voice input unit 720 second wearable device 721 second line of sight detection unit 722 second voice input unit 730 detection unit 324 imaging control unit 327 identification detection control unit