JP7716917B2

Movatterモバイル変換

Info

Publication number: JP7716917B2
Application number: JP2021119941A
Authority: JP
Inventors: 聡小柴; 進若林; 幸典岸本
Original assignee: NTT DOCOMO BUSINESS, Inc.; NTT Communications Corp
Current assignee: NTT DOCOMO BUSINESS, Inc.; NTT Communications Corp
Priority date: 2021-07-20
Filing date: 2021-07-20
Publication date: 2025-08-01
Anticipated expiration: 2041-07-20
Also published as: JP2023015877A

Description

本発明は、オンラインで話をすることを支援する技術に関する。The present invention relates to technology that supports online conversations.

従来から、ネットワークを介して会議を行うためのシステムが提案されている。最近では特に、各参加者がそれぞれカメラを使用して自身の顔を含む画像を他者と共有しながら会議を行う仕組みも普及している（例えば特許文献１参照）。Systems for holding conferences over networks have been proposed for some time. Recently, in particular, systems have become popular in which participants use their own cameras to hold conferences while sharing images, including their own faces, with others (see, for example, Patent Document 1).

特開２０１５－１５４３１５号公報JP 2015-154315 A

しかしながら、会議の状況や各参加者のカメラの使用環境など種々の要因により、他の参加者の状況が確認しにくくなってしまうことがあった。
上記事情に鑑み、本発明は、ネットワークを介して行われる会議において、他の参加者の状況をより容易に確認しやすくすることが可能となる技術の提供を目的としている。 However, various factors such as the situation of the meeting and the camera usage environment of each participant can make it difficult to check the situation of other participants.
In view of the above circumstances, an object of the present invention is to provide a technique that enables participants in a conference held over a network to more easily check the status of other participants.

本発明の一態様は、ネットワークを介して行われる会議に参加しているユーザーの画像について、所定の基準に近づくように画像処理を行い、画像処理が行われた前記ユーザーの画像を含む表示情報を生成し、前記表示情報を他のユーザーによって使用されるユーザー端末に送信する表示情報生成部と、前記会議における音声を制御する音声制御部と、を備える会議制御装置である。One aspect of the present invention is a conference control device that includes a display information generation unit that performs image processing on images of users participating in a conference held over a network so that the images conform to a predetermined standard, generates display information including the image-processed images of the users, and transmits the display information to user terminals used by other users; and an audio control unit that controls audio in the conference.

本発明の一態様は、上記の会議制御装置であって、前記所定の基準は、前記ユーザーの顔の画像の大きさに関する基準である。One aspect of the present invention is the above-mentioned conference control device, wherein the predetermined criteria are criteria related to the size of an image of the user's face.

本発明の一態様は、上記の会議制御装置であって、前記所定の基準は、前記ユーザーの顔の画像のホワイトバランス又はコントラストに関する基準である。One aspect of the present invention is the above-mentioned conference control device, wherein the predetermined criteria are criteria related to the white balance or contrast of the image of the user's face.

本発明の一態様は、上記の会議制御装置であって、前記ユーザーの画像に基づいて前記ユーザーの感情状態を推定する推定部をさらに備え、前記表示情報生成部は、前記推定部における推定結果の感情状態に応じて前記画像に画像処理を行う。One aspect of the present invention is the above-mentioned conference control device, further comprising an estimation unit that estimates the emotional state of the user based on an image of the user, and the display information generation unit performs image processing on the image in accordance with the emotional state estimated by the estimation unit.

本発明の一態様は、ネットワークを介して行われる会議に参加しているユーザーの画像について、所定の基準に近づくように画像処理を行い、画像処理が行われた前記ユーザーの画像を含む表示情報を生成し、前記表示情報を他のユーザーによって使用されるユーザー端末に送信する表示情報生成ステップと、前記会議における音声を制御する音声制御ステップと、を有する会議制御方法である。One aspect of the present invention is a conference control method that includes a display information generation step of performing image processing on images of users participating in a conference held over a network so that the images conform to a predetermined standard, generating display information including the image-processed images of the users, and transmitting the display information to user terminals used by other users; and an audio control step of controlling audio in the conference.

本発明の一態様は、上記の会議制御装置としてコンピューターを機能させるためのコンピュータープログラムである。One aspect of the present invention is a computer program for causing a computer to function as the above-mentioned conference control device.

本発明により、ネットワークを介して行われる会議において、他の参加者の状況をより容易に確認しやすくすることが可能となる。This invention makes it easier to check the status of other participants in conferences held over a network.

本発明の会議システム１００のシステム構成を示す概略ブロック図である。1 is a schematic block diagram showing the system configuration of a conference system 100 according to the present invention.ユーザー端末１０の機能構成の具体例を示す概略ブロック図である。FIG. 2 is a schematic block diagram showing a specific example of the functional configuration of the user terminal 10.会議制御装置２０の機能構成の具体例を示す概略ブロック図である。FIG. 2 is a schematic block diagram showing a specific example of the functional configuration of the conference control device 20.従来の技術でユーザー端末に表示される画像の具体例を示す図である。1A and 1B are diagrams showing specific examples of images displayed on a user terminal using conventional technology.ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。10A and 10B are diagrams showing specific examples of images displayed on the display unit 13 of the user terminal 10.ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。10A and 10B are diagrams showing specific examples of images displayed on the display unit 13 of the user terminal 10.ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。10A and 10B are diagrams showing specific examples of images displayed on the display unit 13 of the user terminal 10.ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。10A and 10B are diagrams showing specific examples of images displayed on the display unit 13 of the user terminal 10.ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。10A and 10B are diagrams showing specific examples of images displayed on the display unit 13 of the user terminal 10.会議システム１００の処理の流れの具体例を示すシーケンスチャートである。10 is a sequence chart showing a specific example of the processing flow of the conference system 100.

以下、本発明の具体的な構成例について、図面を参照しながら説明する。なお、以下の説明では、２名以上のユーザーが他者に対して発話を行うための仮想的な繋がりを示す概念を会議室と呼ぶ。そのため、以下の説明における会議室は、必ずしもその名称が会議室である必要は無く、例えば単に会話と呼ばれたりセッションと呼ばれたりするものであっても、２名以上のユーザーが他者に対して発話を行う仮想的な場であれば全て以下の説明における会議室に相当する。例えば、特定のユーザー（例えば講師）が複数（50名や100名などの多数を含む）の他者に対して一方的に発話を行うセミナーやプレゼンテーションが行われる仮想的な繋がりも、以下の説明における会議室に含まれる。Specific configuration examples of the present invention will be described below with reference to the drawings. In the following description, the concept of a virtual connection in which two or more users can speak to one another will be referred to as a conference room. Therefore, the conference room in the following description does not necessarily have to be called a conference room; for example, any virtual space in which two or more users can speak to one another, even if it is simply called a conversation or a session, will correspond to the conference room in the following description. For example, a virtual connection in which a seminar or presentation is held in which a specific user (e.g., a lecturer) speaks unilaterally to multiple others (including large groups such as 50 or 100 people), will also be included in the conference room in the following description.

図１は、本発明の会議システム１００のシステム構成を示す概略ブロック図である。会議システム１００は、ユーザー端末１０を操作するユーザー同士がネットワーク４０を介して会議を行うためのシステムである。会議システム１００は、複数のユーザー端末１０及び会議制御装置２０を含む。複数のユーザー端末１０及び会議制御装置２０は、ネットワーク３０を介して通信可能に接続される。ネットワーク３０は、無線通信を用いたネットワークであってもよいし、有線通信を用いたネットワークであってもよい。ネットワーク３０は、複数のネットワークが組み合わされて構成されてもよい。Figure 1 is a schematic block diagram showing the system configuration of a conference system 100 of the present invention. The conference system 100 is a system in which users operating user terminals 10 hold conferences via a network 40. The conference system 100 includes multiple user terminals 10 and a conference control device 20. The multiple user terminals 10 and the conference control device 20 are communicatively connected via a network 30. The network 30 may be a network using wireless communication or a network using wired communication. The network 30 may also be configured by combining multiple networks.

図２は、ユーザー端末１０の機能構成の具体例を示す概略ブロック図である。ユーザー端末１０は、例えばスマートフォン、タブレット、パーソナルコンピューター、携帯ゲーム機、据え置き型ゲーム機、専用機器などの情報機器を用いて構成される。ユーザー端末１０は、通信部１１、操作部１２、表示部１３、音声入力部１４、音声出力部１５、記憶部１６及び制御部１７を備える。Figure 2 is a schematic block diagram showing a specific example of the functional configuration of a user terminal 10. The user terminal 10 is configured using information devices such as smartphones, tablets, personal computers, portable game consoles, stationary game consoles, and dedicated devices. The user terminal 10 includes a communication unit 11, an operation unit 12, a display unit 13, an audio input unit 14, an audio output unit 15, a memory unit 16, and a control unit 17.

通信部１１は、通信機器である。通信部１１は、例えばネットワークインターフェースとして構成されてもよい。通信部１１は、制御部１７の制御に応じて、ネットワーク３０を介して他の装置とデータ通信する。通信部１１は、無線通信を行う装置であってもよいし、有線通信を行う装置であってもよい。The communication unit 11 is a communication device. The communication unit 11 may be configured as, for example, a network interface. The communication unit 11 communicates data with other devices via the network 30 in accordance with the control of the control unit 17. The communication unit 11 may be a device that performs wireless communication or a device that performs wired communication.

操作部１２は、キーボード、ポインティングデバイス（マウス、タブレット等）、ボタン、タッチパネル等の既存の入力装置を用いて構成される。操作部１２は、ユーザーの指示をユーザー端末１０に入力する際にユーザーによって操作される。操作部１２は、入力装置をユーザー端末１０に接続するためのインターフェースであっても良い。この場合、操作部１２は、入力装置においてユーザーの入力に応じ生成された入力信号をユーザー端末１０に入力する。操作部１２は、マイク及び音声認識装置を用いて構成されてもよい。この場合、操作部１２はユーザーによって発話された文言を音声認識し、認識結果の文字列情報をユーザー端末１０に入力する。この場合、操作部１２は音声入力部１４と一体に構成されてもよい。操作部１２は、ユーザーの指示をユーザー端末１０に入力可能な構成であればどのように構成されてもよい。The operation unit 12 is configured using existing input devices such as a keyboard, pointing device (mouse, tablet, etc.), buttons, and touch panel. The operation unit 12 is operated by the user when inputting user instructions into the user terminal 10. The operation unit 12 may be an interface for connecting the input device to the user terminal 10. In this case, the operation unit 12 inputs an input signal generated in the input device in response to the user's input to the user terminal 10. The operation unit 12 may be configured using a microphone and a voice recognition device. In this case, the operation unit 12 recognizes words spoken by the user and inputs character string information of the recognition result into the user terminal 10. In this case, the operation unit 12 may be configured integrally with the voice input unit 14. The operation unit 12 may be configured in any way as long as it is capable of inputting user instructions into the user terminal 10.

表示部１３は、液晶ディスプレイ、有機ＥＬ（Electro Luminescence）ディスプレイ等の画像表示装置である。表示部１３は、会議を行う際に用いられる画像データを表示する。表示部１３は、画像表示装置をユーザー端末１０に接続するためのインターフェースであっても良い。この場合、表示部１３は、画像データを表示するための映像信号を生成し、自身に接続されている画像表示装置に映像信号を出力する。The display unit 13 is an image display device such as a liquid crystal display or an organic EL (Electro Luminescence) display. The display unit 13 displays image data used when conducting a conference. The display unit 13 may also be an interface for connecting an image display device to the user terminal 10. In this case, the display unit 13 generates a video signal for displaying the image data and outputs the video signal to the image display device connected to the display unit 13.

音声入力部１４は、マイクを用いて構成される。音声入力部１４は、マイクそのものとして構成されてもよいし、外部機器としてマイクをユーザー端末１０に接続するためのインターフェースとして構成されてもよい。マイクは、会議を行うユーザーの発話音声を取得する。音声入力部１４は、マイクによって取得された音声のデータを制御部１７に出力する。The audio input unit 14 is configured using a microphone. The audio input unit 14 may be configured as a microphone itself, or as an interface for connecting a microphone as an external device to the user terminal 10. The microphone captures the speech of the user conducting the conference. The audio input unit 14 outputs the audio data captured by the microphone to the control unit 17.

音声出力部１５は、スピーカーやヘッドホンやイヤホン等の音声出力装置を用いて構成される。音声出力部１５は、音声出力装置そのものとして構成されてもよいし、外部機器として音声出力装置をユーザー端末１０に接続するためのインターフェースとして構成されてもよい。音声出力装置は、会議を行うユーザーが音声を聞き取ることができるように音声を出力することが望ましい。音声出力部１５は、制御部１７によって出力される音声信号に応じた音声を出力する。The audio output unit 15 is configured using an audio output device such as a speaker, headphones, or earphones. The audio output unit 15 may be configured as the audio output device itself, or as an interface for connecting the audio output device to the user terminal 10 as an external device. It is desirable for the audio output device to output audio so that the users participating in the conference can hear the audio. The audio output unit 15 outputs audio corresponding to the audio signal output by the control unit 17.

記憶部１６は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。記憶部１６は、制御部１７によって使用されるデータを記憶する。記憶部１６は、例えばユーザー情報記憶部１６１及び発話情報記憶部１６２として機能してもよい。The memory unit 16 is configured using a memory device such as a magnetic hard disk drive or a semiconductor memory device. The memory unit 16 stores data used by the control unit 17. The memory unit 16 may function, for example, as a user information memory unit 161 and an utterance information memory unit 162.

ユーザー情報記憶部１６１は、ユーザー端末１０を操作するユーザーに関する情報（以下「ユーザー情報」という。）を記憶する。ユーザー情報は、例えばユーザーの識別情報や属性情報を含んでもよい。属性情報は、例えばユーザーの年齢や性別等に関する情報を含んでもよい。The user information storage unit 161 stores information about the user operating the user terminal 10 (hereinafter referred to as "user information"). The user information may include, for example, the user's identification information and attribute information. The attribute information may include, for example, information about the user's age, gender, etc.

制御部１７は、ＣＰＵ（Central Processing Unit）等のプロセッサーとメモリーとを用いて構成される。制御部１７は、プロセッサーがプログラムを実行することによって、表示制御部１７１、会議制御部１７２及び音声制御部１７３として機能する。なお、制御部１７の各機能の全て又は一部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されても良い。上記のプログラムは、コンピューター読み取り可能な記録媒体に記録されても良い。コンピューター読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、半導体記憶装置（例えばＳＳＤ：Solid State Drive）等の可搬媒体、コンピューターシステムに内蔵されるハードディスクや半導体記憶装置等の記憶装置である。上記のプログラムは、電気通信回線を介して送信されてもよい。The control unit 17 is configured using a processor such as a CPU (Central Processing Unit) and memory. The control unit 17 functions as a display control unit 171, a conference control unit 172, and an audio control unit 173 when the processor executes a program. Note that all or part of the functions of the control unit 17 may be realized using hardware such as an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array). The above program may be recorded on a computer-readable recording medium. Examples of computer-readable recording media include portable media such as flexible disks, magneto-optical disks, ROMs, CD-ROMs, and semiconductor storage devices (e.g., SSDs: Solid State Drives), as well as storage devices such as hard disks and semiconductor storage devices built into computer systems. The above program may also be transmitted via telecommunications lines.

表示制御部１７１は、通信部１１を介して会議制御装置２０から表示情報を受信する。表示制御部１７１は、取得された表示情報に基づいて画像信号を生成し、表示部１３に表示させる。表示情報は、例えば表示される画像そのものを示す画像データであってもよい。この場合、画像データを生成する主体（画像データ生成部）は会議制御装置２０である。表示情報は、例えば表示される画像を生成するために必要となる情報（例えば、参加しているユーザーに関する情報）を示すデータであってもよい。この場合、表示制御部１７１は、表示データに基づいて、表示部１３に表示するための画像データを生成する。この場合、画像データを生成する主体（画像データ生成部）は表示制御部１７１である。The display control unit 171 receives display information from the conference control device 20 via the communication unit 11. The display control unit 171 generates an image signal based on the acquired display information and displays it on the display unit 13. The display information may be, for example, image data that indicates the image to be displayed. In this case, the entity that generates the image data (the image data generation unit) is the conference control device 20. The display information may be, for example, data that indicates information required to generate the image to be displayed (for example, information about participating users). In this case, the display control unit 171 generates image data to be displayed on the display unit 13 based on the display data. In this case, the entity that generates the image data (the image data generation unit) is the display control unit 171.

会議制御部１７２は、会議制御装置２０において仮想的に設けられる会議に関する制御を行う。例えば、ユーザーが操作部１２を操作することによって会議制御装置２０が提供する会議サービスへログインすることを指示した場合、会議制御部１７２は、ログインするための処理を行う。例えば、ユーザーが操作部１２を操作することによって新規の会議室を設置することを指示した場合、会議制御部１７２は、新規の会議室を設置するための処理を行う。例えば、ユーザーが操作部１２を操作することによって会議室に入室することを指示した場合、会議制御部１７２は、指示された会議室へ入室するための処理を行う。会議室への入室はどのような形で行われてもよい。例えば、会議室を示す文字やボタンやアイコンが１又は複数表示されている画面において、いずれかの文字、ボタン又はアイコンが操作されることによってその会議室への入室が行われてもよい。会議室毎に割り当てられたアドレス（例えば特定の識別番号やＵＬＲ（Uniform Resource Locator）など）に対してアクセスが行われることによって、その会議室への入室が行われてもよい。The conference control unit 172 controls the virtual conferences held in the conference control device 20. For example, if a user operates the operation unit 12 to instruct logging in to the conference service provided by the conference control device 20, the conference control unit 172 performs processing for logging in. For example, if a user operates the operation unit 12 to instruct setting up a new conference room, the conference control unit 172 performs processing for setting up the new conference room. For example, if a user operates the operation unit 12 to instruct entering a conference room, the conference control unit 172 performs processing for entering the specified conference room. Entry into a conference room may be performed in any manner. For example, entry into a conference room may be performed by operating any character, button, or icon on a screen displaying one or more characters, buttons, or icons indicating the conference room. Entry into a conference room may also be performed by accessing an address assigned to each conference room (for example, a specific identification number or URL (Uniform Resource Locator)).

音声制御部１７３は、他のユーザー端末１０のユーザーとの間で行われるやりとりされる音声に関する制御を行う。会議室に入室すると、その会議室に入室している他のユーザーとの間で音声の送受信が行われる。音声制御部１７３は、例えば音声入力部１４から入力された音声データを、通信部１１を介して会議制御装置２０へ送信する。音声制御部１７３は、会議制御装置２０から音声データを受信すると、受信された音声データを音声出力部１５から出力する。The audio control unit 173 controls the audio exchanged between users of other user terminals 10. When a user enters a conference room, audio is sent and received between the user and other users who are also in the conference room. The audio control unit 173 transmits audio data input from the audio input unit 14, for example, to the conference control device 20 via the communication unit 11. When the audio control unit 173 receives audio data from the conference control device 20, it outputs the received audio data from the audio output unit 15.

図３は、会議制御装置２０の機能構成の具体例を示す概略ブロック図である。会議制御装置２０は、例えばパーソナルコンピューターやサーバー装置などの情報処理装置を用いて構成される。会議制御装置２０は、通信部２１、記憶部２２及び制御部２３を備える。Figure 3 is a schematic block diagram showing a specific example of the functional configuration of the conference control device 20. The conference control device 20 is configured using an information processing device such as a personal computer or server device. The conference control device 20 includes a communication unit 21, a memory unit 22, and a control unit 23.

通信部２１は、通信機器である。通信部２１は、例えばネットワークインターフェースとして構成されてもよい。通信部２１は、制御部２３の制御に応じて、ネットワーク３０を介して他の装置とデータ通信する。通信部２１は、無線通信を行う装置であってもよいし、有線通信を行う装置であってもよい。The communication unit 21 is a communication device. The communication unit 21 may be configured as, for example, a network interface. The communication unit 21 communicates data with other devices via the network 30 in accordance with the control of the control unit 23. The communication unit 21 may be a device that performs wireless communication or a device that performs wired communication.

記憶部２２は、磁気ハードディスク装置や半導体記憶装置等の記憶装置を用いて構成される。記憶部２２は、制御部２３によって使用されるデータを記憶する。記憶部２２は、例えばユーザー情報記憶部２２１、会議室情報記憶部２２２及び感情状態情報記憶部２２３として機能してもよい。ユーザー情報記憶部２２１は、ユーザー端末１０を操作する複数のユーザーに関する情報（ユーザー情報）を記憶する。The memory unit 22 is configured using a memory device such as a magnetic hard disk drive or a semiconductor memory device. The memory unit 22 stores data used by the control unit 23. The memory unit 22 may function as, for example, a user information memory unit 221, a conference room information memory unit 222, and an emotional state information memory unit 223. The user information memory unit 221 stores information (user information) about multiple users who operate the user terminal 10.

会議室情報記憶部２２２は、会議室に関する情報（以下「会議室情報」という。）を記憶する。会議室とは、会議システム１００においてユーザーが会議を行うために設置する仮想的な部屋である。会議室情報は、例えばその会議室のＩＤ、会議室に設定されている名前を示す情報、会議室が設置される予約の日時を示す情報、会議室の属性に関する情報を含んでもよい。会議室の属性に関する情報とは、例えばその会議室に入室可能な人数や、会議室に入室可能なユーザーを示す情報を含んでもよい。The conference room information storage unit 222 stores information about conference rooms (hereinafter referred to as "conference room information"). A conference room is a virtual room set up in the conference system 100 for users to hold conferences. Conference room information may include, for example, the conference room ID, information indicating the name set for the conference room, information indicating the reservation date and time for setting up the conference room, and information about the attributes of the conference room. Information about the attributes of the conference room may include, for example, information indicating the number of people who can enter the conference room and the users who can enter the conference room.

感情状態情報記憶部２２３は、制御部２３の推定部２３５によって推定される各ユーザー（各参加者）の感情状態を示す情報（以下「感情状態情報」という。）を記憶する。例えば、感情状態情報記憶部２２３は、感情状態情報を、会議毎に記憶してもよい。この場合、感情状態情報は、その会議に参加した各参加者について、所定のタイミング（例えば１秒毎、５秒毎、１分毎、など）毎の感情状態の推定結果を表してもよい。The emotional state information storage unit 223 stores information indicating the emotional state of each user (each participant) estimated by the estimation unit 235 of the control unit 23 (hereinafter referred to as "emotional state information"). For example, the emotional state information storage unit 223 may store emotional state information for each conference. In this case, the emotional state information may represent the estimated emotional state for each participant in the conference at a predetermined timing (e.g., every second, every five seconds, every minute, etc.).

制御部２３は、ＣＰＵ等のプロセッサーとメモリーとを用いて構成される。制御部２３は、プロセッサーがプログラムを実行することによって、ユーザー制御部２３１、会議室制御部２３２、表示情報生成部２３３、音声制御部２３４、推定部２３５及び評価部２３６として機能する。なお、制御部２３の各機能の全て又は一部は、ＡＳＩＣやＰＬＤやＦＰＧＡ等のハードウェアを用いて実現されても良い。上記のプログラムは、コンピューター読み取り可能な記録媒体に記録されても良い。コンピューター読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、半導体記憶装置（例えばＳＳＤ）等の可搬媒体、コンピューターシステムに内蔵されるハードディスクや半導体記憶装置等の記憶装置である。上記のプログラムは、電気通信回線を介して送信されてもよい。The control unit 23 is configured using a processor such as a CPU and memory. When the processor executes a program, the control unit 23 functions as a user control unit 231, a conference room control unit 232, a display information generation unit 233, an audio control unit 234, an estimation unit 235, and an evaluation unit 236. Note that all or part of the functions of the control unit 23 may be implemented using hardware such as an ASIC, PLD, or FPGA. The above program may be recorded on a computer-readable recording medium. Examples of computer-readable recording media include portable media such as flexible disks, magneto-optical disks, ROMs, CD-ROMs, and semiconductor storage devices (e.g., SSDs), as well as storage devices such as hard disks and semiconductor storage devices built into computer systems. The above program may also be transmitted via a telecommunications line.

ユーザー制御部２３１は、ユーザーに関する制御処理を行う。例えば、ユーザー制御部２３１は、会議制御装置２０にアクセスしてくるユーザー端末１０についてログインのための処理（例えば認証処理）を行ってもよい。ユーザー制御部２３１は、ユーザー端末１０から受信されたユーザー情報をユーザー情報記憶部２２１に登録してもよい。The user control unit 231 performs control processing related to users. For example, the user control unit 231 may perform login processing (e.g., authentication processing) for a user terminal 10 that accesses the conference control device 20. The user control unit 231 may also register user information received from the user terminal 10 in the user information storage unit 221.

会議室制御部２３２は、会議室に関する制御処理を行う。例えば、会議室制御部２３２は、会議室を新たに設置することについてユーザー端末１０から指示を受けた場合には、受信される情報に基づいて会議室情報を生成し、会議室情報記憶部２２２に登録してもよい。また、会議室制御部２３２は、会議室を設置するタイミングになった場合には、その会議室を仮想的に設置する。会議室を設置するタイミングとは、例えば即時に会議室を新設することについてユーザー端末１０から指示された場合にはその時であるし、予め会議室の設置の予約が登録されていた場合にはその日時が到来した時である。会議室制御部２３２は、ユーザーによって会議室へ参加するための所定の操作が行われた場合、所定の条件が満たされると、その会議室へユーザーを参加させるための処理を行う。例えば、会議室制御部２３２は、会議室情報記憶部２２２を更新することによって、会議室に新たなユーザーが参加したことを登録する。The conference room control unit 232 performs control processing related to conference rooms. For example, when the conference room control unit 232 receives an instruction from the user terminal 10 to set up a new conference room, it may generate conference room information based on the received information and register it in the conference room information storage unit 222. Furthermore, when the time to set up a conference room arrives, the conference room control unit 232 virtually sets up the conference room. The timing for setting up a conference room may be, for example, when an instruction to set up a new conference room is received from the user terminal 10, or when the date and time arrives if a reservation for setting up a conference room has been registered in advance. When a user performs a predetermined operation to join a conference room and predetermined conditions are met, the conference room control unit 232 performs processing to allow the user to join the conference room. For example, the conference room control unit 232 updates the conference room information storage unit 222 to register that a new user has joined the conference room.

表示情報生成部２３３は、ユーザー端末１０において表示される画像の生成に必要となる情報（表示情報）を生成する。表示情報は、例えば現在設置されている会議室に関する情報や、各会議室に入室している各ユーザー端末１０のユーザーに関する情報を含んでもよい。表示情報は、各ユーザー端末１０のユーザーの感情状態について推定部２３５が推定した結果を示す情報を含んでもよい。The display information generation unit 233 generates information (display information) required to generate images to be displayed on the user terminal 10. The display information may include, for example, information about currently installed conference rooms and information about the users of each user terminal 10 who are in each conference room. The display information may also include information indicating the results of the estimation by the estimation unit 235 regarding the emotional state of the users of each user terminal 10.

表示情報は、各ユーザーの顔画像のデータを含んでもよい。表示情報生成部２３３は、各ユーザーの顔画像について、画像の品質に関する所定の適切な基準を満たしているか否か判定する。画像の品質に関する所定の基準とは、例えば顔部分の大きさに関する基準（例えば、顔部分の大きさが適切な所定の範囲内の大きさであることを示す基準）であってもよい。画像の品質に関する所定の基準とは、例えば顔部分の適切なホワイトバランス（明るさ）に関する基準（例えば、顔部分の明るさやエッジの強さを示す基準）であってもよい。画像の品質に関する所定の基準とは、例えば顔部分の適切なコントラストに関する基準（例えば、顔部分の明暗差を示す基準）であってもよい。表示情報生成部２３３は、画像の品質に関する所定の基準が満たされていない画像については、その画質が適切な基準に近づくように予め定められた画像処理を実行する。そして、表示情報生成部２３３は、画像処理が行われた画像を用いて表示情報を生成する。表示情報生成部２３３は、生成された表示情報を、ユーザー端末１０に対して送信する。The display information may include data on each user's facial image. The display information generation unit 233 determines whether each user's facial image satisfies predetermined appropriate standards for image quality. The predetermined standards for image quality may be, for example, standards regarding the size of the facial portion (e.g., standards indicating that the size of the facial portion is within a predetermined appropriate range). The predetermined standards for image quality may be, for example, standards regarding the appropriate white balance (brightness) of the facial portion (e.g., standards indicating the brightness and edge strength of the facial portion). The predetermined standards for image quality may be, for example, standards regarding the appropriate contrast of the facial portion (e.g., standards indicating the difference in brightness between the brightness and darkness of the facial portion). For images that do not satisfy the predetermined standards for image quality, the display information generation unit 233 performs predetermined image processing so that the image quality approaches the appropriate standards. The display information generation unit 233 then generates display information using the processed image. The display information generation unit 233 transmits the generated display information to the user terminal 10.

音声制御部２３４は、ユーザー端末１０から音声データを受信する。音声制御部２３４は、各ユーザー端末１０に対して出力されるべき音声データ（以下「会議音声データ」という。）を生成し、各ユーザー端末１０に会議音声データを送信する。音声制御部２３４は、例えば各ユーザー端末１０に対し、そのユーザーが入室している会議室における会議音声データを送信してもよい。The audio control unit 234 receives audio data from the user terminals 10. The audio control unit 234 generates audio data to be output to each user terminal 10 (hereinafter referred to as "conference audio data") and transmits the conference audio data to each user terminal 10. The audio control unit 234 may, for example, transmit to each user terminal 10 conference audio data for the conference room in which that user is present.

推定部２３５は、各会議室に参加している各ユーザーについて、感情状態を推定する。推定部２３５は、例えば、各ユーザーの顔画像に基づいて、どのような感情状態であるか推定する。推定される感情状態は、例えば喜・怒・哀・楽のいずれかであってもよいし、喜び・怒り・驚き・悲しい・平常のいずれかであってもよい。推定される感情状態は、例えば会議室で流れている音声の内容に対する興味の程度であってもよい。感情状態は、顔画像における表情を示す所定の特徴量に基づいて画像認識によって推定されてもよい。推定部２３５は、推定結果を時系列に沿って感情状態情報記憶部２２３に記録する。このような画像認識は、例えば予め教師画像等を用いて機械学習を行うことによって得られている学習済みモデルを用いて実行されてもよい。The estimation unit 235 estimates the emotional state of each user participating in each conference room. The estimation unit 235 estimates the emotional state of each user, for example, based on a facial image of the user. The estimated emotional state may be, for example, any of joy, anger, sorrow, or pleasure, or any of joy, anger, surprise, sadness, or neutral. The estimated emotional state may be, for example, the degree of interest in the content of the audio being played in the conference room. The emotional state may be estimated by image recognition based on predetermined features that indicate facial expressions in the facial image. The estimation unit 235 records the estimation results in chronological order in the emotional state information storage unit 223. Such image recognition may be performed using a trained model that has been obtained in advance by machine learning using training images, etc., for example.

感情状態は、例えば顔画像の大きさに基づいて推定されてもよい。顔画像の大きさに基づいて推定される場合には、例えば顔画像が大きいほど画面に近づいて見ていることが推定されるため、より大きな興味を持っていると推定されてもよい。顔画像の大きさに基づいて推定される場合には、例えば顔画像が小さいほど画面から遠ざかって見ていることが推定されるため、より小さな興味を持っていると推定されてもよい。The emotional state may be estimated based on, for example, the size of the facial image. When the emotional state is estimated based on the size of the facial image, for example, the larger the facial image, the closer the viewer is likely to be looking at the screen, and therefore, it may be estimated that the viewer has greater interest. When the emotional state is estimated based on the size of the facial image, for example, the smaller the facial image, the farther the viewer is likely to be looking at the screen, and therefore, it may be estimated that the viewer has less interest.

感情状態は、各ユーザーの動きに基づいて推定されてもよい。例えば、予め教師画像等を用いて機械学習を行うことによって得られている学習済みモデルを用いて、各ユーザーのうなずく動作を検出し、うなずきの単位時間当たりの回数や大きさ等に基づいて感情状態が推定されてもよい。例えば、うなずきの単位時間当たりの回数がより多いほどより大きな興味を持っていると推定されてもよい。例えば、うなずきの大きさがより大きいほどより大きな興味を持っていると推定されてもよい。単位時間当たりの回数及び大きさに基づいて推定されてもよい。The emotional state may be estimated based on the movements of each user. For example, a trained model obtained in advance by performing machine learning using training images, etc., may be used to detect the nodding movements of each user, and the emotional state may be estimated based on the number of nods per unit time and the magnitude of the nods. For example, the more nods per unit time, the greater the interest may be estimated. For example, the greater the magnitude of the nods, the greater the interest may be estimated. The estimation may be based on the number of nods and the magnitude of the nods per unit time.

感情状態の推定処理の具体例についていくつか説明したが、上述した以外の処理によって感情状態が推定されてもよい。Although several specific examples of emotional state estimation processes have been described, emotional states may also be estimated using processes other than those described above.

評価部２３６は、会議の内容について評価する。例えば、評価部２３６は、感情状態情報記憶部２２３に記憶されている感情状態情報に基づいて、会議の内容を評価してもよい。例えば、評価部２３６は、各ユーザーの興味の程度を示す値の統計値が興味を示していることを示す値として高い値をしめすほど、良い会議であったことを示す評価を行っても良い。評価部２３６は、喜び又は驚きを示す人がより多く、怒り・悲しいを示す人がより少ないほど、良い会議であったことを示す評価を行っても良い。The evaluation unit 236 evaluates the content of the conference. For example, the evaluation unit 236 may evaluate the content of the conference based on the emotional state information stored in the emotional state information storage unit 223. For example, the evaluation unit 236 may evaluate the conference such that the higher the statistical value indicating the degree of interest of each user, the better the conference was. The evaluation unit 236 may also evaluate the conference such that the more people who expressed happiness or surprise and the fewer people who expressed anger or sadness, the better the conference was.

評価部２３６は、感情状態情報記憶部２２３に記憶されている感情状態情報に基づいて、個々の参加者の出席態度について評価しても良い。例えば、感情情報が取得されている時間が相対的に長いほど、良い出席態度であったことを示す評価が行われても良い。感情情報が取得されているということは、モニターの前にモニターに向かって顔が位置していたことを示しており、きちんと会議に出席していたと推定できるためである。The evaluation unit 236 may evaluate the attendance attitude of each participant based on the emotional state information stored in the emotional state information storage unit 223. For example, an evaluation may be made indicating that the attendance attitude was better the longer the time that emotional information was acquired. This is because the acquisition of emotional information indicates that the participant's face was positioned in front of the monitor, facing the monitor, and it can be assumed that the participant was properly attending the meeting.

図４は、従来の技術でユーザー端末に表示される画像の具体例を示す図である。各ユーザー領域９１には、その会議室に入室しているユーザーの画像が表示される。ユーザー領域９１に表示される各ユーザーの画像は、各ユーザーのユーザー端末に接続されたカメラで撮影されている動画像である。左上のユーザーの顔画像は、適度な大きさ、適度なホワイトバランスで表示されている。Figure 4 shows a specific example of an image displayed on a user terminal using conventional technology. Each user area 91 displays an image of the user currently in the conference room. The image of each user displayed in the user area 91 is a moving image captured by a camera connected to the user terminal of each user. The user's facial image in the upper left is displayed at an appropriate size and with appropriate white balance.

右上のユーザーの顔画像は、大きすぎる。そのため、ユーザー領域９１から顔画像の一部がはみ出ている。また、右上のユーザーの顔画像は、撮影された環境が明るすぎることやカメラのパラメータの設定が不適切であることなどが起因して、ホワイトバランスが不適切な状態で撮影されている。そのため、顔画像が白くなりすぎている（いわゆる『白飛び』の状態である）。その結果、右上のユーザーの顔画像は見づらい状態になっている。The user's facial image in the upper right is too large. As a result, part of the facial image extends beyond the user area 91. Furthermore, the user's facial image in the upper right was captured with an inappropriate white balance due to factors such as the environment being too bright or the camera parameters being inappropriately set. As a result, the facial image appears too white (a condition known as "blown-out highlights"). As a result, the user's facial image in the upper right is difficult to see.

左下のユーザーの顔画像は、大きさが少し小さめである。また、左下のユーザーの顔画像は、撮影された環境が暗すぎることやカメラのパラメータの設定が不適切であることなどが起因して、ホワイトバランスが不適切な状態で撮影されている。そのため、顔画像が黒くなりすぎている（いわゆる『黒つぶれ』の状態である）。その結果、左下のユーザーの顔画像は見づらい状態になっている。The user's face image in the bottom left is a little small. Additionally, the image was taken with an inappropriate white balance due to factors such as the environment being too dark or the camera parameters being inappropriately set. This makes the face image appear too dark (a condition known as "crushed blacks"). As a result, the user's face image in the bottom left is difficult to see.

右下のユーザーの顔画像は、大きさがかなり小さめである。その結果、右下のユーザーの顔画像は見づらい状態になっている。The user's face image in the bottom right is quite small. As a result, it is difficult to see.

図５は、ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。図５は、図４と同じ状況において、本発明のユーザー端末１０の表示部１３に表示される画像の具体例を示す。図５において、表示部１３には、会議室内画面が表示されている。会議室内画面とは、ユーザーが会議室に入室している最中に表示される画像である。会議室内画面では、その会議室に入室している一部又は全部のユーザーの画像が表示される。表示される画面は１又は複数のユーザー領域５１で形成される。各ユーザー領域５１には、入室しているユーザーの画像が表示される。ユーザー領域５１に表示される各ユーザーの画像は、各ユーザーのユーザー端末１０に接続されたカメラで撮影されている動画像であってもよいし、静止画像（例えばアイコン画像）であってもよい。各ユーザー領域５１に表示される画像は、表示情報生成部２３３によって生成される。Figure 5 is a diagram showing a specific example of an image displayed on the display unit 13 of the user terminal 10. Figure 5 shows a specific example of an image displayed on the display unit 13 of the user terminal 10 of the present invention in the same situation as Figure 4. In Figure 5, the display unit 13 displays an in-conference room screen. The in-conference room screen is an image displayed while a user is in the conference room. The in-conference room screen displays images of some or all of the users who are in the conference room. The displayed screen is made up of one or more user areas 51. Each user area 51 displays an image of the user who has entered the room. The image of each user displayed in the user area 51 may be a moving image captured by a camera connected to the user terminal 10 of each user, or may be a still image (e.g., an icon image). The image displayed in each user area 51 is generated by the display information generation unit 233.

左上のユーザーの顔画像は、適度な大きさ、適度なホワイトバランス及びコントラストで撮影された画像である。すなわち、左上のユーザーの顔画像の画像データにおいて、ユーザーの顔画像の大きさは、所定の範囲内の大きさである。また、左上のユーザーの顔画像の画像データにおいて、ユーザーの顔画像の画質は、ホワイトバランス及びコントラストに関する所定の条件を満たしている。そのため、左上のユーザーの顔画像については画像処理が実行されてなくてもよい。The user's face image in the upper left is an image captured at an appropriate size, with appropriate white balance and contrast. That is, in the image data of the user's face image in the upper left, the size of the user's face image is within a specified range. Furthermore, in the image data of the user's face image in the upper left, the image quality of the user's face image satisfies specified conditions regarding white balance and contrast. Therefore, no image processing needs to be performed on the user's face image in the upper left.

右上のユーザーの顔画像は、撮影されて会議制御装置２０に受信された時点では、その顔の大きさが大きすぎていた。そのため、表示情報生成部２３３は、右上のユーザーの顔画像を、顔の大きさが所定の大きさになるように縮小している。また、右上のユーザーの顔画像は白とびしていた。そのため、表示情報生成部２３３は、右上のユーザーの顔画像に対して輝度を下げるなどの画像処理を実行することによって、白とびを軽減させている。このような画像処理が行われた後の顔画像のデータが右上のユーザーの顔画像のデータとして各ユーザー端末１０に送信されている。When the facial image of the user in the upper right corner was captured and received by the conference control device 20, the size of the face was too large. Therefore, the display information generation unit 233 reduces the facial image of the user in the upper right corner so that the face size becomes a specified size. The facial image of the user in the upper right corner also has blown-out highlights. Therefore, the display information generation unit 233 reduces the blown-out highlights by performing image processing on the facial image of the user in the upper right corner, such as lowering the brightness. The facial image data after such image processing is transmitted to each user terminal 10 as the facial image data of the user in the upper right corner.

左下のユーザーの顔画像は、撮影されて会議制御装置２０に受信された時点では、大きさが少し小さめであった。そのため、表示情報生成部２３３は、左下のユーザーの顔画像を、顔の大きさが所定の大きさになるように拡大している。また、左下のユーザーの顔画像は黒つぶれしていた。そのため、表示情報生成部２３３は、左下のユーザーの顔画像に対して輝度を上げるなどの画像処理を実行することによって、黒つぶれを軽減させている。このような画像処理が行われた後の顔画像のデータが左下のユーザーの顔画像のデータとして各ユーザー端末１０に送信されている。The facial image of the user in the lower left was slightly small when it was captured and received by the conference control device 20. Therefore, the display information generation unit 233 enlarges the facial image of the user in the lower left so that the face size is a specified size. The facial image of the user in the lower left also has crushed blacks. Therefore, the display information generation unit 233 reduces crushed blacks by performing image processing on the facial image of the user in the lower left, such as increasing the brightness. The facial image data after such image processing is transmitted to each user terminal 10 as the facial image data of the user in the lower left.

右下のユーザーの顔画像は、撮影されて会議制御装置２０に受信された時点では、大きさがかなり小さめであった。そのため、表示情報生成部２３３は、右下のユーザーの顔画像を、顔の大きさが所定の大きさになるように拡大している。このような画像処理が行われた後の顔画像のデータが右下のユーザーの顔画像のデータとして各ユーザー端末１０に送信されている。The facial image of the user at the bottom right was quite small when it was captured and received by the conference control device 20. Therefore, the display information generation unit 233 enlarges the facial image of the user at the bottom right so that the face size becomes a specified size. The facial image data after this image processing is transmitted to each user terminal 10 as the facial image data of the user at the bottom right.

このように、表示情報生成部２３３は、各ユーザー領域５１に表示される顔の大きさが所定の大きさ（略同一の大きさ）になるように各ユーザーの画像に画像処理を行う。また、表示情報生成部２３３は、各ユーザー領域５１に表示される顔の明るさが所定の明るさ（例えば顔領域内の輝度の平均値が略同一）になるように各ユーザーの画像に画像処理を行う。In this way, the display information generation unit 233 performs image processing on the image of each user so that the size of the face displayed in each user area 51 is a predetermined size (approximately the same size). The display information generation unit 233 also performs image processing on the image of each user so that the brightness of the face displayed in each user area 51 is a predetermined brightness (for example, the average brightness within the face area is approximately the same).

図６は、ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。図６に示す画像では、図５に示された各ユーザー領域５１の顔画像に対して、推定部２３５が推定した結果に基づいた画像処理が行われている。具体的には、推定部２３５によって推定された各ユーザーの感情状態に応じた画像処理が行われている。例えば、喜びと推定されたユーザーの顔画像（左上の顔画像）には、顔画像の周囲を囲む破線の円が重畳されている。例えば、怒りと推定されたユーザーの顔画像（右上の顔画像）には、顔画像の周囲を囲む一点鎖線の円が重畳されている。例えば、悲しいと推定されたユーザーの顔画像（左下の顔画像）には、顔画像の周囲を囲む点線の円が重畳されている。例えば、驚きと推定されたユーザーの顔画像（右下の顔画像）には、顔画像の周囲を囲む二点鎖線の円が重畳されている。なお、上述した各円の表示態様は、一具体例にすぎない。このように各円の線種が異なってもよいし、線の太さや色が異なってもよい。また、各顔画像を囲む幾何学図形は、円形に限られる必要は無い。例えば、矩形や楕円形や多角形であってもよいし、それぞれ異なる形であってもよい。また、各顔画像に対して重畳される画像は、上述したような円形等の幾何学図形である必要は無く、それぞれの感情状態を示すピクトグラムや文字が重畳されてもよい。Figure 6 shows a specific example of an image displayed on the display unit 13 of the user terminal 10. In the image shown in Figure 6, image processing is performed on the face images of each user area 51 shown in Figure 5 based on the results of estimation by the estimation unit 235. Specifically, image processing is performed according to the emotional state of each user estimated by the estimation unit 235. For example, a dashed circle is superimposed around the face image of a user estimated to be happy (the face image in the upper left). For example, a dotted circle is superimposed around the face image of a user estimated to be angry (the face image in the upper right). For example, a dotted circle is superimposed around the face image of a user estimated to be sad (the face image in the lower left). For example, a two-dot chain circle is superimposed around the face image of a user estimated to be surprised (the face image in the lower right). Note that the display mode of each circle described above is merely one specific example. As described above, the line type of each circle may be different, and the line thickness and color may also be different. Furthermore, the geometric shape surrounding each facial image does not have to be limited to a circle. For example, it may be a rectangle, ellipse, polygon, or a different shape. Furthermore, the image superimposed on each facial image does not have to be a geometric shape such as a circle as described above; pictograms or text indicating each emotional state may also be superimposed.

また、感情状態に応じた画像処理では、背景の色を変化させるような画像処理が行われてもよいし、ユーザー領域５１の全体の色や模様を変更するような画像処理が行われても良い。このとき、ユーザーの顔画像が視認可能な状態で表示されないほどに画像処理が行われても良い。例えば、各感情状態に応じた色又は模様でユーザー領域５１が塗りつぶされていても良い。図７は、ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。図７では、４８名分のユーザーのユーザー領域５１が表示されている。各ユーザー領域５１は、ユーザーの画像に基づいて喜び・怒り・驚き・悲しい・平常のいずれかの感情に分類され、分類された感情状態に応じた模様（パターン）で塗りつぶされている。In addition, image processing according to the emotional state may involve changing the background color, or changing the overall color or pattern of the user area 51. In this case, image processing may be performed to the extent that the user's facial image is not displayed visibly. For example, the user area 51 may be filled with a color or pattern corresponding to each emotional state. Figure 7 is a diagram showing a specific example of an image displayed on the display unit 13 of the user terminal 10. Figure 7 shows the user areas 51 of 48 users. Each user area 51 is classified into one of the following emotions based on the user's image: joy, anger, surprise, sadness, or neutral, and is filled with a pattern corresponding to the classified emotional state.

通常の表示装置を用いている場合には、そもそも４８名もの大勢の顔画像が表示されたところで、各ユーザー領域５１の大きさは小さくなってしまうため、各ユーザーの表情を読み取ることは困難である。そのような状況において、このように感情状態に応じた色や模様で各ユーザー領域５１が示されると、ユーザーはより容易に各ユーザーの感情状態を認識することが可能となる。また、顔画像が通信される場合に比べて、色や模様で塗りつぶされている画像が通信される方が、通信に要するデータ量を抑えることが可能となる。そのため、ネットワークを介した会議をより安定して実現することが可能となる。また、データ量を抑えることにより、安価なスマートフォン等のように描画能力が高くない装置がユーザー端末１０として使用された場合であっても、処理落ちなどの問題の発生を低減することが可能となる。When using a standard display device, even if the facial images of as many as 48 people are displayed, the size of each user area 51 becomes small, making it difficult to read each user's facial expression. In such a situation, if each user area 51 is displayed in a color or pattern that corresponds to the emotional state, the user can more easily recognize the emotional state of each user. Furthermore, compared to when facial images are transmitted, transmitting images filled with color or patterns reduces the amount of data required for communication. This makes it possible to hold more stable conferences over the network. Furthermore, by reducing the amount of data, it is possible to reduce the occurrence of problems such as processing slowdowns, even when devices with low rendering capabilities, such as inexpensive smartphones, are used as user terminals 10.

図８は、ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。図８では、同じ会議室に参加しているユーザーの感情状態の数が棒グラフで表示されている。表示情報生成部２３３は、図８に示されるような、各感情状態に分類されたユーザーの数を示す情報を表示するように表示情報を生成してもよい。このような表示情報は、図５～図７に示される画像とともに表示されるように生成されてもよい。Figure 8 shows a specific example of an image displayed on the display unit 13 of the user terminal 10. In Figure 8, the number of emotional states of users participating in the same conference room is displayed as a bar graph. The display information generation unit 233 may generate display information to display information indicating the number of users classified into each emotional state, as shown in Figure 8. Such display information may be generated to be displayed together with the images shown in Figures 5 to 7.

図９は、ユーザー端末１０の表示部１３に表示される画像の具体例を示す図である。図９では、同じ会議室に参加しているユーザーの感情状態の数が、時系列の変化を示す折れ線グラフで表示されている。表示情報生成部２３３は、図９に示されるような、各感情状態に分類されたユーザーの数の時系列変化を示す情報を表示するように表示情報を生成してもよい。このような表示情報は、図５～図７に示される画像とともに表示されるように生成されてもよい。図９の表示情報では、時間の変化を示す横軸において、会議の進行状況（例えば、挨拶、第１部、第２部及び質疑応答）を示す情報が示されてもよい。図９の表示情報では、時間の変化を示す横軸において、所定のイベント（例えば、新商品発表）を示す情報が、そのイベントが発生された時刻にあった位置に示されてもよい。Figure 9 is a diagram showing a specific example of an image displayed on the display unit 13 of the user terminal 10. In Figure 9, the number of emotional states of users participating in the same conference room is displayed as a line graph showing changes over time. The display information generation unit 233 may generate display information to display information showing changes over time in the number of users classified into each emotional state, as shown in Figure 9. Such display information may be generated to be displayed together with the images shown in Figures 5 to 7. In the display information of Figure 9, information showing the progress of the conference (e.g., greetings, part 1, part 2, and Q&A) may be displayed on the horizontal axis showing changes over time. In the display information of Figure 9, information showing a specified event (e.g., a new product launch) may be displayed at the position where the event occurred on the horizontal axis showing changes over time.

図１０は、会議システム１００の処理の流れの具体例を示すシーケンスチャートである。まず、ユーザー端末１０は、所定のタイミングでユーザー（出席者）の画像を撮像して会議制御装置２０に送信する（ステップＳ１０１）。Figure 10 is a sequence chart showing a specific example of the processing flow of the conference system 100. First, the user terminal 10 captures an image of the user (attendee) at a predetermined timing and transmits it to the conference control device 20 (step S101).

会議制御装置２０は、各ユーザーの画像を受信すると、画像修正処理を実行する（ステップＳ１０２）。画像修正処理において、会議制御装置２０は、顔の大きさを略同一にする画像処理や、顔の明るさを略同一にする画像処理等を実行する。会議制御装置２０は、ユーザーの画像に基づいて感情状態を推定する処理を実行する（ステップＳ１０３）。会議制御装置２０は、推定された感情状態に応じて表示情報を生成する（ステップＳ１０４）。そして、会議制御装置２０は、生成された表示情報をユーザー端末１０に送信する（ステップＳ１０５）。ユーザー端末１０は、受信された表示情報に基づいて表示部１３に画像や文字を表示する（ステップＳ１０６）。When the conference control device 20 receives the images of each user, it performs image correction processing (step S102). In the image correction processing, the conference control device 20 performs image processing to make the size of the faces approximately the same, image processing to make the brightness of the faces approximately the same, etc. The conference control device 20 performs processing to estimate the emotional state based on the user's image (step S103). The conference control device 20 generates display information according to the estimated emotional state (step S104). The conference control device 20 then transmits the generated display information to the user terminal 10 (step S105). The user terminal 10 displays images and text on the display unit 13 based on the received display information (step S106).

このように構成された会議システム１００によれば、ネットワークを介して行われる会議において、他の参加者の状況をより容易に確認することが可能となる。より具体的には以下の通りである。会議システム１００では、各ユーザーの顔画像が適切な基準に近づくように画像処理が行われる。例えば、顔の大きさやホワイトバランスやコントラストが適切な基準に近づくように画像処理が行われる。そのため、各参加者の顔の表情や状態をより容易に確認することが可能となる。また、会議システム１００では、各ユーザーの感情状態が推定され、その推定結果に応じた画像処理が行われる。このような処理によっても、各参加者の状態をより容易に確認することが可能となる。The conference system 100 configured in this manner makes it easier to check the status of other participants in a conference held over a network. More specifically, it is as follows: In the conference system 100, image processing is performed so that the facial image of each user approaches appropriate standards. For example, image processing is performed so that the face size, white balance, and contrast approach appropriate standards. This makes it easier to check the facial expressions and state of each participant. In addition, the conference system 100 estimates the emotional state of each user and performs image processing according to the estimated results. This processing also makes it easier to check the state of each participant.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。The above describes in detail an embodiment of the present invention with reference to the drawings, but the specific configuration is not limited to this embodiment and includes designs that do not deviate from the gist of the present invention.

１００…会議システム，１０…ユーザー端末，２０…会議制御装置，１１…通信部，１２…操作部，１３…表示部，１４…音声入力部，１５…音声出力部，１６…記憶部，１６１…ユーザー情報記憶部，１７…制御部，１７１…表示制御部，１７２…会議制御部，１７３…音声制御部，２１…通信部，２２…記憶部，２２１…ユーザー情報記憶部，２２２…会議室情報記憶部，２２３…感情状態情報記憶部，２３…制御部，２３１…ユーザー制御部，２３２…会議室制御部，２３３…表示情報生成部，２３４…音声制御部，２３５…推定部，２３６…評価部，５１…ユーザー領域100...Conference system, 10...User terminal, 20...Conference control device, 11...Communication unit, 12...Operation unit, 13...Display unit, 14...Audio input unit, 15...Audio output unit, 16...Storage unit, 161...User information storage unit, 17...Control unit, 171...Display control unit, 172...Conference control unit, 173...Audio control unit, 21...Communication unit, 22...Storage unit, 221...User information storage unit, 222...Conference room information storage unit, 223...Emotional state information storage unit, 23...Control unit, 231...User control unit, 232...Conference room control unit, 233...Display information generation unit, 234...Audio control unit, 235...Estimation unit, 236...Evaluation unit, 51...User area

Claims

Translated fromJapanese

ネットワークを介して行われる会議に参加しているユーザーの画像について、所定の基準に近づくように画像処理を行い、画像処理が行われた前記ユーザーの画像を含む表示情報を生成し、前記表示情報を他のユーザーによって使用されるユーザー端末に送信する表示情報生成部と、
前記会議における音声を制御する音声制御部と、
前記ユーザーの画像に基づいて前記ユーザーの感情状態を推定する推定部と、
を備え、
前記表示情報生成部は、前記会議に参加しているユーザーについて、各感情状態のユーザーの数の時系列の変化を示す画像を含む表示情報を生成する、会議制御装置。 a display information generation unit that performs image processing on images of users participating in a conference held over a network so that the images conform to a predetermined standard, generates display information including the image-processed images of the users, and transmits the display information to user terminals used by other users;
a voice control unit for controlling voice in the conference;
an estimation unit that estimates an emotional state of the user based on an image of the user;
Equippedwith
The display information generation unit generates display information including an image showing a time series change in the number of users in each emotional state for users participating in the conference .

前記所定の基準は、前記ユーザーの顔の画像の大きさに関する基準である、請求項１に記載の会議制御装置。The conference control device according to claim 1, wherein the predetermined criteria are criteria relating to the size of the image of the user's face.

前記所定の基準は、前記ユーザーの顔の画像のホワイトバランス又はコントラストに関する基準である、請求項１に記載の会議制御装置。The conference control device according to claim 1, wherein the predetermined criteria are criteria related to the white balance or contrast of the image of the user's face.

前記表示情報生成部は、前記推定部における推定結果の感情状態に応じて前記画像に画像処理を行う、請求項１から３のいずれか一項に記載の会議制御装置。The conference control device according to claim 1 , wherein thedisplay information generation unit performs image processing on the image in accordance with the emotional state estimated by the estimation unit.

ネットワークを介して行われる会議に参加しているユーザーの画像について、所定の基準に近づくように画像処理を行い、画像処理が行われた前記ユーザーの画像を含む表示情報を生成し、前記表示情報を他のユーザーによって使用されるユーザー端末に送信する表示情報生成ステップと、
前記会議における音声を制御する音声制御ステップと、
前記ユーザーの画像に基づいて前記ユーザーの感情状態を推定する推定ステップと、
を有し、
前記表示情報生成ステップにおいて、前記会議に参加しているユーザーについて、各感情状態のユーザーの数の時系列の変化を示す画像を含む表示情報を生成する、会議制御方法。 a display information generating step of performing image processing on images of users participating in a conference held via a network so that the images conform to a predetermined standard, generating display information including the image-processed images of the users, and transmitting the display information to user terminals used by other users;
a voice control step of controlling voice in the conference;
an estimation step of estimating an emotional state of the user based on an image of the user;
and
In the display information generating step, display information including an image showing a time series change in the number of users in each emotional state participating in the conference is generated .

請求項１から４のいずれか一項に記載の会議制御装置としてコンピューターを機能させるためのコンピュータープログラム。A computer program for causing a computer to function as the conference control device described in any one of claims 1 to 4.