JP2005284726A

Movatterモバイル変換

Info

Publication number: JP2005284726A
Application number: JP2004097926A
Authority: JP
Inventors: Tomio Watanabe; 富夫渡辺; Michiya Yamamoto; 倫也山本; Koji Ono; 紘司小野
Original assignee: Japan Science and Technology Agency; Okayama Prefectural Government
Current assignee: Japan Science and Technology Agency; Okayama Prefectural Government
Priority date: 2004-03-30
Filing date: 2004-03-30
Publication date: 2005-10-13

Abstract

【課題】テレビ放送等、動画及び音声からなる映像を見る不特定又は多数の視聴者に、前記映像に対する身体的引き込み現象をもたらす映像システムを提供する。
【解決手段】原映像の動画及び音声を外部から入力させる映像入力部11と、表示映像を見る視聴者に代わって視聴者として振る舞う聞き手キャラクタの動画を生成するキャラクタ生成部12と、前記聞き手キャラクタの動画と原映像の動画とを合成して原映像の動画の周囲に聞き手キャラクタが表示される表示映像の動画を生成する映像合成部13と、前記表示映像の動画と原映像の音声とを表示装置へ出力する映像出力部14とからなり、表示装置に表示される表示映像を見る視聴者に、この表示映像の周囲に表示される聞き手キャラクタを見せることにより、身体的引き込み現象を視聴者にもたらし、表示映像を見る視聴者をこの表示映像に引き込ませる映像システムである。
【選択図】図２PROBLEM TO BE SOLVED: To provide a video system that causes a physical pull-in phenomenon to an unspecified or a large number of viewers who watch a video consisting of moving images and sounds such as television broadcasting.
A video input unit 11 for inputting a moving image and audio of an original video from the outside, a character generation unit 12 for generating a video of a listener character acting as a viewer on behalf of a viewer who views the displayed video, and the listener character A video composition unit 13 for synthesizing the video and the video of the original video to generate a video of the display video in which a listener character is displayed around the video of the original video, and the video of the display video and the audio of the original video And a video output unit 14 for outputting to the display device. By showing the viewer character displayed around the display video to the viewer who sees the display video displayed on the display device, the viewer can see the physical pull-in phenomenon. This is a video system that draws viewers into the display video.
[Selection] Figure 2

Description

Translated fromJapanese

本発明は、テレビ放送等、動画及び音声からなる映像を見る不特定又は多数の視聴者に、前記映像に対する身体的引き込み現象をもたらす映像システムに関する。 The present invention relates to a video system that causes an unspecified or a large number of viewers who view videos composed of moving images and sounds, such as television broadcasts, to physically pull in the video.

テレビ放送等、動画及び音声からなる映像を見る不特定又は多数の視聴者は、映像に対する興味の度合いによって映像に対する引き込まれる程度が異なり、映像に引き込まれるほど、映像が伝達する情報をよりよく認識できる。これは、映像から一方的に伝達される情報の認識は、基本的に視聴者の映像に対する引き込み程度に左右されることを意味する。これから、テレビ放送業者等の映像の提供者は、いかに視聴者の興味を誘い、視聴者を映像に対して引き込ませるかを工夫することになる。 Unspecified or a large number of viewers who watch videos consisting of video and audio such as TV broadcasts, the degree of interest in the video differs depending on the degree of interest in the video, and the more information that is transmitted, the better the information that the video conveys it can. This means that the recognition of information transmitted unilaterally from the video basically depends on the degree of the viewer's pulling in the video. From now on, video providers such as television broadcasters will devise how to attract the viewer's interest and attract the viewer to the video.

視聴者を映像に対して引き込ませるには、映像の内容が視聴者の興味に合致すればよいが、これでは視聴者の映像に対する引き込み具合は、各視聴者の嗜好に左右されてしまい、その映像に対してどの程度視聴者が引き込まれるか予測することが、映像の提供者から見て判断することは難しい。とりわけ、特定の嗜好を有する視聴者を対象とせず、不特定又は多数の視聴者を対象とした映像については、こうしたアプローチはあまり効果がない。これから、映像と視聴者との一体感をもたらす別の手段を講じることが考えられる。すなわち、各視聴者の嗜好に左右されず、視聴者に対して映像との一体感を感じさせることで、映像に対して視聴者を引き込ませるわけである。 In order for viewers to be drawn into the video, the content of the video only needs to match the viewer's interests. However, in this case, the level of the viewer's pull-in depends on the preference of each viewer. Predicting how much viewers are drawn into a video is difficult to judge from the video provider. In particular, such an approach is not very effective for a video that is not intended for a viewer who has a specific preference and is intended for an unspecified or large number of viewers. From now on, it is conceivable to take another means of bringing the video and viewers together. That is, the viewer is drawn into the video by making the viewer feel a sense of unity with the video regardless of the preference of each viewer.

例えば特許文献１は、テレビ会議システムを利用した講義において、先生及び学生の一体感を作り出し、先生の講義に対して学生を引き込ませるシステムを提案している。具体的には、先生又は各学生が見る映像に、先生又は各学生の代わりとなる聞き手キャラクタ(本人人格モデル及び他者人格モデル)を、先生の聞き手キャラクタと学生の聞き手キャラクタとを対面関係で同時に表示し、各聞き手キャラクタを先生又は各学生の音声に反応した身体動作をさせ、これら聞き手キャラクタの身体動作を先生又は学生に見せることにより、映像に対して先生又は学生を引き込ませ、もって講義に対して先生又は学生を引き込ませる。 For example, Patent Document 1 proposes a system that creates a sense of unity between teachers and students in a lecture using a video conference system and draws students into the teacher's lecture. Specifically, in the video that the teacher or each student sees, the listener character (personal personality model and other person personality model) that substitutes for the teacher or each student is face-to-face with the teacher's listener character and the student's listener character. At the same time, each listener character performs physical movements that respond to the voice of the teacher or each student, and the teacher or student is drawn into the video by showing the physical movements of these listener characters to the teacher or student, thereby giving a lecture. To attract teachers or students.

上記システムは、先生又は学生に代わって自分又は他人の音声に必ず反応する聞き手キャラクタを先生又は学生に見せることにより、先生又は学生の音声(バーバル情報)以外に、聞き手キャラクタの身体的動作を視覚的な感覚情報(ノンバーバル情報)として与え、先生又は学生に映像に対する身体的引き込み現象をもたらしている。特許文献１は、各聞き手キャラクタの頭の頷き動作とこの頷き聞き手動作タイミングが身体的引き込み現象をもたらすために重要として、具体的には先生の音声から推定される頷き予測値が頷き閾値を越えた頷き聞き手動作タイミングで頭の頷き動作を実行している。 In addition to the teacher's or student's voice (verbal information), the system visualizes the physical movement of the listener's character by showing the teacher or the student a listener character that always reacts to the voice of himself or others on behalf of the teacher or student. It is given as physical sensory information (non-verbal information), and brings physical pull-in phenomenon to the teacher or student. In Patent Document 1, it is important that the whispering motion of each listener character and the whispering listener motion timing bring about a physical pull-in phenomenon. Specifically, the whispering prediction value estimated from the teacher's voice exceeds the whispering threshold. A whispering motion is executed at the timing of the whispering listener.

特開2001-307138号公報(３頁〜７頁、図２〜図７)JP 2001-307138 A (pages 3-7, FIGS. 2-7)

特許文献１により、身体的引き込み現象を利用して、映像に対する視聴者の引き込みを図ることが考えられる。しかし、特許文献１のシステムは、講義という音声主体の情報伝達を補助する形態で映像を利用しているため、映像全体を利用した聞き手キャラクタの配置又は身体動作を用いている。ところが、テレビ放送等の映像は、音声又は音響よりも、むしろ映像そのものによる情報伝達を主体とするため、特許文献１の構成をそのまま利用することはできない。これから、主たる情報伝達を図る映像を阻害しないように、視聴者に身体的引き込み現象をもたらす聞き手キャラクタを用いる必要がある。 According to Patent Document 1, it is conceivable to use the physical pull-in phenomenon to attract the viewer to the video. However, since the system of Patent Document 1 uses video in a form that assists voice-oriented information transmission called a lecture, the arrangement of a listener character or body movement using the entire video is used. However, since a video such as a television broadcast is mainly transmitted by information rather than audio or sound, the configuration of Patent Document 1 cannot be used as it is. From now on, it is necessary to use a listener character that causes a physical pull-in phenomenon to the viewer so as not to obstruct the video for the main information transmission.

また、テレビ放送等、動画及び音声からなる映像は、予め頷き動作をする聞き手キャラクタを組み込んで撮影又は制作されるわけではない。これから、予め頷き動作をする聞き手キャラクタを組み込んで学生に配信する特許文献１の構成をそのまま利用するわけにはいかない。このように、映像に対して視覚的な感覚情報を与える手段を組み合わせることにより、視聴者を映像に対して引き込ませることができると予想されるものの、未だ具体的な手段が提案されていない。そこで、テレビ放送等、動画及び音声からなる映像を見る不特定又は多数の視聴者に、前記映像に対する身体的引き込み現象をもたらす映像システムを開発するため、映像と身体的引き込み現象をもたらす聞き手キャラクタとを合成する手段について検討した。 In addition, video made up of moving images and sounds, such as television broadcasts, is not shot or produced by incorporating a listener character that performs a whispering action in advance. From now on, it is not possible to directly use the configuration of Patent Document 1 in which a listener character that performs a whispering motion is incorporated and distributed to students. Thus, although it is expected that the viewer can be drawn into the video by combining the means for giving visual sense information to the video, no specific means has been proposed yet. Therefore, in order to develop a video system that brings physical pull-in phenomenon to the unspecified or many viewers who view video and sound video such as TV broadcast, The means to synthesize was studied.

検討の結果開発したものが、動画及び音声からなる映像を表示装置に表示する際に視聴者をこの映像に引き込ませるシステムであって、原映像の動画及び音声を外部から入力させる映像入力部と、表示映像を見る視聴者に代わって視聴者として振る舞う聞き手キャラクタの動画を生成するキャラクタ生成部と、前記聞き手キャラクタの動画と原映像の動画とを合成して原映像の動画の周囲に聞き手キャラクタが表示される表示映像の動画を生成する映像合成部と、前記表示映像の動画と原映像の音声とを表示装置へ出力する映像出力部とからなり、キャラクタ生成部は原映像の音声をON/OFF信号とみなして算出される聞き手動作タイミングで視聴者として振る舞う身体動作をする聞き手キャラクタの動画を生成してなり、表示装置に表示される表示映像を見る視聴者に、この表示映像の周囲に表示される聞き手キャラクタを見せることにより、身体的引き込み現象を視聴者にもたらし、表示映像を見る視聴者をこの表示映像に引き込ませる身体的引き込み現象を利用した映像システムである。 What has been developed as a result of the study is a system that draws viewers into this video when displaying video and audio video on a display device, and a video input unit that inputs video and audio of the original video from outside A character generation unit that generates a moving image of a listener character acting as a viewer on behalf of a viewer who views the display image; and a listener character that surrounds the moving image of the original image by synthesizing the moving image of the listener character and the moving image of the original image Is composed of a video composition unit that generates a moving image of the displayed video and a video output unit that outputs the moving image of the displayed video and the audio of the original video to the display device. The character generating unit turns on the audio of the original video Generates a video of a listener character that performs a physical motion that acts as a viewer at the listener motion timing calculated as a / OFF signal and displays it on the display device By showing the viewer character that is displayed around the display video to the viewer who sees the display video, the physical pull-in phenomenon is brought to the viewer and the viewer watching the display video is drawn into the display video. This is a video system that uses the phenomenon.

ここで、本発明に言う「音声」とは、動画と共に配信される音響を代表する意味であり、人の声のほか各種音響を含む。また、「聞き手キャラクタ」とは、基本的には人又は擬人化した動物等、人の動きを表す動画を意味するが、視聴者の反応が最も顕著に現れると推定される時系列上の特定時点、すなわち聞き手動作タイミングで、視聴者として振る舞う動きをするものであれば、植物や無機物の動画であってもよい。以下では、人の動きを表す動画からなる聞き手キャラクタを例に説明する。 Here, “speech” as used in the present invention is representative of sound distributed together with moving images, and includes various sounds in addition to human voices. In addition, “listener character” basically means a moving image representing human movement, such as a human or anthropomorphic animal, but it is specified on the time series that the viewer's reaction is most likely to appear. A moving image of a plant or an inorganic material may be used as long as it moves as a viewer at a point in time, that is, a listener's operation timing. In the following, a listener character composed of a moving image representing a person's movement will be described as an example.

映像入力部はテレビ放送等で受信される原映像を取り込む部分、映像出力部は表示映像をテレビ等の画像表示装置へ出力する部分をそれぞれ意味し、例えば両者を兼ねて映像入出力インタフェースを用いることができる。キャラクタ生成部は、原映像の音声をON/OFF信号とみなして算出される聞き手動作タイミングで視聴者として振る舞う身体動作をする聞き手キャラクタの動画を生成する部分で、聞き手動作タイミングの算出の関係から、コンピュータを用いるとよい。この場合、前記映像入力部又は映像出力部をコンピュータの内蔵又は外付インタフェースで構築してもよい。また、コンピュータで生成された聞き手キャラクタの動画をテレビ放送の動画に変換するコンバータが必要になる。映像合成部は、聞き手キャラクタの動画と原映像の動画とを合成して表示映像の動画を生成する部分で、前記コンピュータで構成してもよいし、聞き手キャラクタ及び原映像をクロマキー合成するとして、映像ミキサ(ビデオミキサ、AVミキサとも言う)を用いてもよい。映像合成部として映像ミキサを用いる場合、キャラクタ生成部はクロマキー背景を含めて聞き手キャラクタの動画を生成する。 The video input unit is a part that captures an original video received by TV broadcasting, and the video output unit is a part that outputs a display video to an image display device such as a TV. For example, the video input / output interface is used as both. be able to. The character generator is a part that generates a video of a listener character that performs a physical motion that acts as a viewer at the listener motion timing calculated by regarding the sound of the original video as an ON / OFF signal. Use a computer. In this case, the video input unit or video output unit may be constructed with a built-in or external interface of a computer. In addition, a converter that converts a computer generated listener character video into a TV broadcast video is required. The video synthesizing unit is a part that generates a video of a display video by synthesizing a video of the listener character and a video of the original video, and may be configured by the computer, or as a chroma key synthesis of the listener character and the original video, A video mixer (also referred to as a video mixer or an AV mixer) may be used. When a video mixer is used as the video composition unit, the character generation unit generates a moving image of the listener character including the chroma key background.

本発明の映像システムは、表示装置に表示する表示映像の周辺に、視聴者として振る舞う聞き手キャラクタを表示し、この聞き手キャラクタを視覚的な感覚情報(ノンバーバル情報)として視聴者に与えることで身体的引き込み現象をもたらし、表示映像を見る視聴者をこの表示映像に引き込ませる。確かに、従来にも身体動作をするキャラクタを映像中に表示する既存技術は存在する。しかし、本発明の聞き手キャラクタは、音声の大小に比例した身体動作ではなく、音声をON/OFF信号とみなして算出される聞き手動作タイミングで、視聴者として振る舞う身体動作をする点が異なる。そして、聞き手キャラクタを映像の周辺に表示することで、動画の表示を邪魔することなく、動画を見る視聴者の代わりとして聞き手キャラクタを違和感なく存在させることができ、視聴者に動画及び聞き手キャラクタを一体に見せることができる。 The video system of the present invention displays a listener character that behaves as a viewer around the display video displayed on the display device, and gives the viewer character physical sense information (non-verbal information) to the viewer. A pull-in phenomenon is caused, and a viewer who views the display image is drawn into the display image. Certainly, there is an existing technique for displaying a character that performs a physical motion in a video. However, the listener character according to the present invention is different in that the listener character of the present invention performs a body motion acting as a viewer at a listener motion timing calculated by regarding the voice as an ON / OFF signal, not a physical motion proportional to the magnitude of the voice. Then, by displaying the listener character in the vicinity of the video, it is possible to make the listener character exist as a substitute for the viewer watching the video without disturbing the display of the video. You can show them together.

本発明の特徴は、聞き手キャラクタが聞き手動作タイミングで、視聴者として振る舞う身体動作をする点にある。聞き手動作タイミングは、視聴者の反応が最も顕著に現れると推定される時系列上の特定時点を意味し、その算出には聞き手動作予測値の推定が重要となる。この聞き手動作予測値は、現在から一定時間範囲の過去に取得した音声の現在に対する影響の度合いを積算して算出するとよい。前記影響の度合いは、現在と過去とを線形結合、非線形結合又はニューラルネットワーク等で関係づけることにより導き出すことができる。例えば、現在から一定時間範囲の過去に取得した音声の現在に対する影響の度合いを線形結合で導き出す場合、キャラクタ生成部は、原映像の音声の移動平均(Moving Average)により推定される聞き手動作予測値が予め定めた聞き手動作閾値を越えた時点を聞き手動作タイミングとして算出するとよい。前記移動平均は、例えば次式を用いる。数１中、ｙ(ｉ)は聞き手動作予測値、ａ(ｊ)は聞き手予測係数、そしてｘ(ｉ−ｊ)は原映像の音声を表す。 A feature of the present invention is that the listener character performs a body motion that acts as a viewer at the listener motion timing. The listener motion timing means a specific time point on the time series where the viewer's reaction is estimated to be most prominent, and estimation of the listener motion prediction value is important for the calculation. The predicted listener motion value may be calculated by integrating the degree of influence of the voice acquired in the past within a certain time range from the present to the present. The degree of influence can be derived by relating the present and the past by a linear combination, a non-linear combination, a neural network, or the like. For example, when the degree of influence of the voice acquired in the past in a certain time range from the present on the present is derived by linear combination, the character generation unit calculates the predicted listener motion estimated by the moving average of the voice of the original video. May be calculated as the listener motion timing when the threshold value exceeds a predetermined listener motion threshold. For example, the moving average is used as the moving average. In Equation 1, y (i) represents a listener motion prediction value, a (j) represents a listener prediction coefficient, and x (i−j) represents the sound of the original video.

上記数１によれば、音声ｘ(ｉ−ｊ)が一定時間ない場合に聞き手動作予測値ｙ(ｉ)が０となるので、この聞き手動作予測値ｙ(ｉ)が０となる時間が長くなると、聞き手キャラクタが動かなくなり、不自然に見える虞れがある。こうした場合、上記数１にノイズを加えた数２により、聞き手動作予測値を算出するとよい。数２中、ｙ(ｉ)は聞き手動作予測値、ａ(ｊ)は聞き手予測係数、ｘ(ｉ−ｊ)は原映像の音声、そしてｗ(ｉ)はノイズを表す。ノイズｗ(ｉ)は乱数により、聞き手動作予測値ｙ(ｉ)を算出する度に異なる値を用いる。これにより、聞き手動作予測値ｙ(ｉ)に自然なゆらぎが加味され、例えば原映像の音声が長く途切れても聞き手動作タイミングを算出し、聞き手キャラクタを適宜身体動作させることができる。聞き手キャラクタが複数ある場合、各聞き手キャラクタは、各聞き手キャラクタ毎に定めた異なる聞き手動作閾値を用いて、それぞれ独立して身体動作をすることになる。 According to the above equation 1, since the listener motion predicted value y (i) becomes 0 when the speech x (i−j) does not exist for a certain time, the time for which the listener motion predicted value y (i) becomes 0 is long. Then, the listener character may not move and may appear unnatural. In such a case, it is preferable to calculate the listener's motion predicted value by the equation (2) obtained by adding noise to the equation (1). In equation (2), y (i) represents a listener motion prediction value, a (j) represents a listener prediction coefficient, x (i−j) represents the voice of the original video, and w (i) represents noise. As the noise w (i), a different value is used every time a predicted listener motion value y (i) is calculated based on a random number. As a result, natural fluctuation is added to the predicted listener motion y (i). For example, the listener motion timing can be calculated even when the sound of the original video is interrupted for a long time, and the listener character can be physically operated. When there are a plurality of listener characters, each listener character performs a body motion independently using a different listener motion threshold value determined for each listener character.

聞き手キャラクタは、視聴者の反応が最も顕著に現れると推定される時系列上の特定時点である聞き手動作タイミングで、視聴者として振る舞う身体動作をすることが重要である。これから、キャラクタ生成部は、身体的引き込み現象をもたらしやすい身体動作として、視聴者の振る舞いとして頭の頷き動作を含む身体動作をする聞き手キャラクタの動画を生成するとよい。頷き動作は、視聴者にとって、音声の応答として最も分かりやすい身体動作である。ここで、聞き手キャラクタは頷き動作をすることが重要であり、聞き手キャラクタの表情は特に問題にならない。これにより、聞き手キャラクタを視聴者の代わりとするため、キャラクタ生成部は、視聴者に対して背面を向けて身体動作をする聞き手キャラクタの動画を生成するとよい。より具体的には、聞き手キャラクタは表示映像の中心に向けた視線を持たせるとよい。例えば、表示映像に対して左寄り又は右寄りに表示させる場合、聞き手キャラクタはやや斜視の姿勢を有することが望ましい。 It is important for the listener character to perform a body motion that acts as a viewer at a listener operation timing that is a specific point in time in which the viewer's reaction is estimated to be most noticeable. From this, it is preferable that the character generation unit generates a moving image of a listener character that performs a body motion including a whispering motion as a viewer's behavior as a body motion that easily causes a physical pull-in phenomenon. The whispering motion is the body motion that is most easily understood by the viewer as a voice response. Here, it is important for the listener character to perform a whispering action, and the expression of the listener character is not particularly problematic. Accordingly, in order to use the listener character instead of the viewer, the character generation unit may generate a moving image of the listener character that performs physical motion with the back facing the viewer. More specifically, the listener character may have a line of sight toward the center of the display image. For example, when displaying the display image to the left or the right, it is desirable that the listener character has a slightly perspective posture.

聞き手キャラクタは、視聴者の代わりに視聴者として振る舞うとして、例えば上述した頷き動作を含む身体動作をする。これから、不特定又は多数の視聴者がいる場合でも、各視聴者の代わりとなる聞き手キャラクタが単数あればよい。しかし、各視聴者から見て、自分以外の視聴者が存在していると感じられれば、各視聴者に聞き手キャラクタとの身体リズムを共有させることができるようになり、より積極的に身体的引き込み現象を各視聴者にもたらすことができる。これから、キャラクタ生成部は、同じ又は異なる身体動作をする複数の聞き手キャラクタの動画を生成するとよい。 For example, the listener character behaves as a viewer instead of a viewer, and performs, for example, a body motion including the above-described whispering motion. From this, even if there are unspecified or a large number of viewers, it is only necessary to have a single listener character instead of each viewer. However, if each viewer feels that there is a viewer other than himself / herself, each viewer can share the physical rhythm with the listener character, and more physically A pull-in phenomenon can be brought to each viewer. From this, the character generation unit may generate moving images of a plurality of listener characters performing the same or different body movements.

このようにして、本発明の映像システムは、視聴者として振る舞う身体動作をする聞き手キャラクタを用いて視聴者に身体的引き込み現象をもたらすが、この聞き手キャラクタが動画の邪魔をしては意味がない。あくまで、聞き手キャラクタの身体動作を見せることにより、映像の本体である動画に視聴者を引き込まなければならないからである。これから、聞き手キャラクタは、単数又は複数を問わず、表示映像の周辺に配置するように、キャラクタ生成部は聞き手キャラクタの動画を生成する。ここで、聞き手キャラクタの表示映像に対する大きさは、例えば20インチの表示画面に対して、高さ方向で20〜30％、横方向で10〜20％が好ましい。 In this way, the video system of the present invention causes a physical pull-in phenomenon to the viewer using a listener character that performs a body motion acting as a viewer, but it is meaningless if the listener character interferes with the moving image. . This is because the viewer must be drawn into the video that is the main body of the video by showing the physical motion of the listener character. From now on, regardless of whether the listener character is a single character or a plurality of listener characters, the character generating unit generates a moving image of the listener character so as to be arranged around the display video. Here, the size of the display image of the listener character is preferably 20 to 30% in the height direction and 10 to 20% in the horizontal direction with respect to a display screen of 20 inches, for example.

聞き手キャラクタの表示位置も重要である。具体的には、キャラクタ生成部は、表示映像に向かって下方中央付近に表示する聞き手キャラクタの動画を生成する、又は表示映像に向かって下方左寄り又は右寄りに表示する聞き手キャラクタの動画を生成するとよい。聞き手キャラクタを表示映像に向かって下方に表示するのは、頷き動作を含む身体動作には下半身が不要であり、この不要な下半身を違和感なく表示せずに済むからである。表示映像に向かって下方中央付近に表示される聞き手キャラクタは、視聴者に認識されやすい。また、表示映像に向かって下方左寄り又は右寄りに表示される聞き手キャラクタは、動画に対する聞き手キャラクタの位置関係から奥行きを視聴者に感じさせ、聞き手キャラクタを各視聴者自身以外の視聴者として認識させることができる。 The display position of the listener character is also important. Specifically, the character generation unit may generate a moving image of the listener character displayed near the lower center toward the display image, or generate a moving image of the listener character displayed toward the lower left or right toward the display image. . The reason why the listener character is displayed downward toward the display image is that the lower body is not necessary for the body motion including the whispering motion, and it is not necessary to display the unnecessary lower body without a sense of incongruity. The listener character displayed near the lower center toward the display image is easily recognized by the viewer. In addition, the listener character displayed on the lower left or right side of the display image should make the viewer feel the depth from the positional relationship of the listener character with respect to the video, and recognize the listener character as a viewer other than each viewer himself / herself. Can do.

聞き手キャラクタは、表示映像の動画と重ね合わせて表示しても、上述の大きさで表示映像の周辺に位置する限り、あまり邪魔にならない。しかし、動画の種類によっては聞き手キャラクタが動画に埋没して一体的に捉えられ、聞き手キャラクタによる身体的引き込み現象がうまく働かない虞れもある。そこで、映像合成部は、聞き手キャラクタの動画と、原映像を縮小した縮小映像の動画とを合成して、この縮小映像の原映像に対する余白に前記聞き手キャラクタを表示する表示映像の動画にするとよい。原映像に対する縮小映像の割合は、画像表示装置の大きさにもよるが、概ね80〜90％が妥当である。ここで、聞き手キャラクタは前記余白内に収める必要はなく、余白に位置して、一部縮小映像に重なる大きさであってもよく、例えば上記聞き手キャラクタの大きさは、前記縮小画像に一部が重なる大きさである。 Even if the listener character is displayed superimposed on the moving image of the display image, as long as it is positioned around the display image with the above-mentioned size, it does not get in the way. However, depending on the type of moving image, the listener character may be embedded in the moving image and be captured as a whole, and the physical pull-in phenomenon by the listener character may not work well. Therefore, the video composition unit may synthesize a video of the listener character and a video of a reduced video obtained by reducing the original video to form a video of a display video that displays the listener character in a margin with respect to the original video of the reduced video. . The ratio of the reduced video to the original video is generally 80 to 90%, although it depends on the size of the image display device. Here, the listener character does not need to be included in the margin, and may be positioned in the margin and partially overlap the reduced video. For example, the size of the listener character is partially included in the reduced image. Is the size of overlapping.

原映像に対する縮小映像の位置関係は、自由である。しかし、原映像の中心と縮小映像の中心とを一致させると、縮小映像の上下左右に余白が形成され、各余白の大きさを十分に確保することができない。これから、映像合成部は、聞き手キャラクタの動画と、原映像の上縁中央を基準としてこの原映像を縮小した縮小映像の動画とを合成して、この縮小映像の原映像に対する下方の余白に前記聞き手キャラクタを表示する表示映像の動画にするとよい。これにより、例えば上述したように、表示映像に向かって下方中央付近や左寄り又は右寄りに聞き手キャラクタを表示させる場合、縮小映像の原映像に対する縮小率を押さえながらも、必要十分な余白を確保して、よりはっきりと聞き手キャラクタの存在を示すことができる。 The positional relationship of the reduced image with respect to the original image is free. However, if the center of the original image and the center of the reduced image are made coincident with each other, margins are formed on the top, bottom, left, and right of the reduced image, and the size of each margin cannot be secured sufficiently. From this, the video compositing unit synthesizes the moving image of the listener character and the moving image of the reduced video obtained by reducing the original video with respect to the center of the upper edge of the original video, and the above-mentioned margin in the lower margin with respect to the original video of the reduced video. A moving image of the display image that displays the listener character may be used. As a result, for example, as described above, when the listener character is displayed near the lower center, left side, or right side of the display image, a necessary and sufficient margin is secured while suppressing the reduction rate of the reduced image with respect to the original image. The presence of the listener character can be more clearly shown.

また、余白における聞き手キャラクタの存在をはっきり示すには、映像合成部は、聞き手キャラクタの動画と、原映像を縮小した縮小映像の動画とを合成し、この縮小映像の原映像に対する余白をモノトーンとし、この余白に前記聞き手キャラクタを表示する表示映像の動画にするとよい。ここで、余白の色は任意でよいが、縮小映像や聞き手キャラクタを際立たせるため、動画や聞き手キャラクタより奥まって見える色(収縮色)を用いることが好ましく、例えば黒や濃い紺色、深緑等を示すことができる。 In order to clearly indicate the presence of the listener character in the margin, the video composition unit synthesizes the video of the listener character and the video of the reduced video obtained by reducing the original video, and uses the margin for the original video of the reduced video as a monotone. A moving image of a display image that displays the listener character in the margin may be used. Here, the color of the margin may be arbitrary, but in order to make the reduced image and the listener character stand out, it is preferable to use a color (contracted color) that appears deeper than the movie or the listener character, such as black, dark amber, dark green, etc. Can show.

本発明は、身体的引き込み現象を利用して、映像に対する視聴者の引き込みを図ることができる映像システムの提供を実現する。聞き手キャラクタにより視聴者にもたらされる身体的引き込み現象は、聞き手キャラクタの聞き手動作タイミングの算出と、前記聞き手動作タイミングで実行する頷き動作に負うところが大きい。また、聞き手キャラクタを、視聴者に向けて背面を向けさせたり、複数用いることで、視聴者に対する身体リズムの共有を図りやすくして、身体的引き込み現象を視聴者にもたらしやすくしている。 The present invention realizes the provision of an image system capable of attracting a viewer to an image using a physical pull-in phenomenon. The physical pull-in phenomenon brought to the viewer by the listener character is largely due to the calculation of the listener motion timing of the listener character and the whirling motion executed at the listener motion timing. In addition, by using a plurality of listener characters facing the viewer back or by using a plurality of listener characters, it is easy to share a physical rhythm with the viewer, and it is easy to bring a physical pull-in phenomenon to the viewer.

また、本発明は、聞き手キャラクタの配置に対しても検討することで、映像が有する視覚的な情報伝達を阻害せず、身体的引き込み現象をもたらすことのできる効果がある。具体的には、表示映像に向かって下方中央付近や下方左寄り又は右寄りに表示したり、縮小映像の原映像に対する余白に聞き手キャラクタを表示することで、映像が有する視覚的な情報伝達を阻害せず、身体的引き込み現象をもたらす聞き手キャラクタの存在を認識させることができ、聞き手キャラクタにより身体的引き込み現象をもたらしやすくしている。 In addition, the present invention has an effect that a physical pull-in phenomenon can be brought about without disturbing the visual information transmission of the video by examining the arrangement of the listener character. Specifically, the display information is displayed near the lower center, lower left or right side of the display image, or the listener character is displayed in the margin of the original image of the reduced image, thereby obstructing visual information transmission of the image. Therefore, it is possible to recognize the presence of a listener character that causes a physical pull-in phenomenon, and it is easy for the listener character to cause a physical pull-in phenomenon.

本発明の映像システムは、例えばテレビチューナからテレビの間に介在させる装置として構成できる。この場合、テレビ放送等の映像を阻害せず、容易に聞き手キャラクタを重ね合わせて合成し、テレビの画面に聞き手キャラクタを含む表示映像を表示できる。これは、不特定の視聴者に対して身体的引き込み現象をもたらす効果を意味する。聞き手キャラクタは、身体動作、特に頷き動作が認識できる程度の簡素なものでよく、また前記頷き動作の聞き手動作タイミングを決定する計算は移動平均等容易であるため、このようにテレビチューナとテレビとの間に映像システムを介在させても、テレビ放送の受信から遅延させることなく、聞き手キャラクタを含む表示映像を生成、表示できる。 The video system of the present invention can be configured as an apparatus interposed between a TV tuner and a TV, for example. In this case, it is possible to easily superimpose and synthesize the listener character and display a display image including the listener character on the television screen without obstructing the image of the television broadcast or the like. This means the effect of causing a physical pull-in phenomenon for an unspecified viewer. The listener character may be simple enough to recognize the body movement, particularly the whispering motion, and the calculation for determining the listener motion timing of the whispering motion is easy, such as moving average. Even if a video system is interposed between them, a display video including a listener character can be generated and displayed without delay from reception of a television broadcast.

また、上述から分かる通り、本発明の映像システムは、予め聞き手キャラクタを組み込んで撮影又は制作されていない映像に対して、聞き手キャラクタをリアルタイムに重ね合わせる装置であるから、その適用範囲が広い効果も有する。上述の例で言えば、テレビ放送のほか、テレビに繋がるビデオとの間に介在させ、ビデオの映像に聞き手キャラクタを重ねて合成することもできる。これは、多数の視聴者に対して身体的引き込み現象をもたらすことのできる効果を意味し、例えば講演会等でのビデオ放映に際して本発明が利用できることを表す。 Further, as can be seen from the above, the video system of the present invention is an apparatus that superimposes the listener character in real time on the video that has not been captured or produced by incorporating the listener character in advance, and thus has a wide application range. Have. In the above example, it is possible to interpose between the television broadcast and the video connected to the television, and to superimpose the listener character on the video image. This means an effect capable of causing a physical pull-in phenomenon for a large number of viewers, and represents that the present invention can be used for video broadcasting at a lecture, for example.

以下、本発明の実施形態について図を参照しながら説明する。図１は本発明の映像システムの構築例を示す斜視図、図２は同映像システムのブロック構成図、図３は別例の映像システムのブロック構成図、図４は原映像21が表示されたテレビの画面を表す図、図５は生成した２体の聞き手キャラクタ22,22が表示されたテレビの画面を表す図、図６は聞き手キャラクタ22の身体動作を説明する模式図、図７は縮小映像23が表示されたテレビの画面を表す図であり、図８は縮小映像23及び聞き手キャラクタ22,22を重ね合わせた表示映像24が表示されたテレビの画面を表す図である。本例では、コンピュータ10を中心に構成した映像システム(図１及び図２参照)により、森林をはじめとする自然を紹介する番組(図４参照)に、２体の聞き手キャラクタ22,22を重ね合わせる例(図７参照)である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. 1 is a perspective view showing a construction example of a video system of the present invention, FIG. 2 is a block diagram of the video system, FIG. 3 is a block diagram of another video system, and FIG. FIG. 5 is a diagram illustrating a television screen, FIG. 5 is a diagram illustrating a television screen on which the two generated listener characters 22 and 22 are displayed, FIG. 6 is a schematic diagram illustrating the body movement of the listener character 22, and FIG. FIG. 8 is a diagram illustrating a television screen on which a display image 24 in which a reduced image 23 and listener characters 22 and 22 are superimposed is displayed. In this example, two listener characters 22 and 22 are superimposed on a program (see FIG. 4) that introduces nature, including forests, using a video system (see FIGS. 1 and 2) that is constructed around the computer 10. This is an example of matching (see FIG. 7).

映像システムに必要な映像入力部11、キャラクタ生成部12、映像合成部13及び映像出力部14は、いずれもコンピュータで構成できる。これから、例えば図１に見られるように、映像システムを構成するコンピュータ10に対して、映像の入力源としてテレビ放送を受信するチューナ31を接続し、更に映像の出力源として画像表示装置であるテレビ32を接続する。通常、テレビはチューナを内蔵しているから、テレビで受信したテレビ放送の映像を、映像システムを構成するコンピュータに入力し、再びコンピュータから元のテレビに出力又は別のテレビに出力してもよい。また、コンピュータにチューナを内蔵させれば、コンピュータのみで映像システムを構築することもできる。 The video input unit 11, the character generation unit 12, the video synthesis unit 13, and the video output unit 14 necessary for the video system can be configured by a computer. For example, as shown in FIG. 1, a tuner 31 that receives a television broadcast is connected as a video input source to a computer 10 that constitutes a video system. Connect 32. Usually, since a television has a built-in tuner, a television broadcast video received by the television may be input to a computer constituting the video system and output from the computer to the original television again or to another television. . In addition, if a tuner is built in a computer, a video system can be constructed only by the computer.

コンピュータ10は、図２に見られるように、テレビ放送を受信するチューナ31から原映像の動画及び音声を取り込む映像入力部11、映像入力部11から原映像の音声のみを受けて聞き手キャラクタ22(図５参照)の動画を生成するキャラクタ生成部12、映像入力部11から原映像の動画及び音声を受け、キャラクタ生成部12から聞き手キャラクタ22の動画を受けて、原映像の縮小映像23(図７参照)の動画と聞き手キャラクタ22の動画とを合成して表示映像24(図８参照)の動画を得る映像合成部13、そして映像合成部13から表示映像24の動画と原映像の音声とを画像表示装置であるテレビ32に出力する映像出力部14を構成する。映像入力部11及び映像出力部14は、コンピュータ10に増設する映像入出力インタフェースで構成できる。 As shown in FIG. 2, the computer 10 receives only the sound of the original image from the image input unit 11 that takes in the moving image and the sound of the original image from the tuner 31 that receives the television broadcast, and the listener character 22 ( (Refer to FIG. 5), the original video and voice are received from the character generator 12 for generating the video and the video input unit 11 and the video of the listener character 22 from the character generator 12, and the reduced video 23 of the original video (see FIG. 5). 7) and the video of the listener character 22 to synthesize the video of the display video 24 (see FIG. 8), and the video of the display video 24 and the audio of the original video from the video synthesis unit 13 Is output to the television 32, which is an image display device. The video input unit 11 and the video output unit 14 can be configured by a video input / output interface added to the computer 10.

キャラクタ生成部12は、生成プログラムを実行し、コンピュータ10の演算処理能力を利用して構成する。この生成プログラムは、前段となる聞き手動作タイミング算出手順と、後段の身体動作生成手順とに分けることができる。聞き手動作タイミング算出手順は、更に聞き手動作予測値の推定手順、聞き手動作予測値と聞き手動作閾値との比較手順、そして聞き手動作予測値が聞き手動作閾値を超えた時点を聞き手動作タイミングとする算出手順に分けることができる。身体動作生成手順は、表示映像24(図８参照)の周辺の特定位置で、特定の身体動作をする聞き手キャラクタ22,22(図５及び図６参照)の動画を生成する。 The character generation unit 12 is configured by executing a generation program and using the arithmetic processing capability of the computer 10. This generation program can be divided into a listener motion timing calculation procedure at the front stage and a body motion generation procedure at the back stage. The listener motion timing calculation procedure further includes a procedure for estimating the listener motion prediction value, a procedure for comparing the listener motion prediction value and the listener motion threshold value, and a calculation procedure for setting the listener motion timing when the listener motion predicted value exceeds the listener motion threshold value. Can be divided into The body motion generation procedure generates a moving image of the listener characters 22 and 22 (see FIGS. 5 and 6) that perform a specific body motion at a specific position around the display video 24 (see FIG. 8).

聞き手動作予測値を原映像の音声の移動平均(Moving Average)により推定する場合、聞き手動作予測値の推定手順は、現在から一定時間範囲の過去に取得した原映像の音声の現在に対する影響の度合いとして聞き手動作予測値を算出する。既述した数１又は数２を用いた場合、前記算出は、微小時間単位における聞き手予測係数ａ(ｊ)及び原映像の音声ｘ(ｉ−ｊ)の積の和となり、計算量は現在から一定時間範囲の分割数に比例する。ここで、前記一定時間範囲を長くしたり、分割数を増やせば、より適切な聞き手動作予測値を推定できる。しかし、原映像の音声ｘ(ｉ−ｊ)は現時点から過去数秒程度を分割し、テレビ放送のフレーム数に応じて、音声ｘ(ｉ−ｊ)が1/30sec単位になる程度が現実的であり、コンピュータ10の負荷も軽減できる。 When estimating the estimated motion of the listener based on the moving average of the audio of the original video, the estimation procedure of the listener's motion prediction is based on the degree of influence on the current audio of the original video acquired in the past for a certain time range from the present. As a result, the listener's motion prediction value is calculated. In the case where the above-described Equation 1 or Equation 2 is used, the calculation is the sum of the products of the listener prediction coefficient a (j) and the original video sound x (ij) in a minute time unit, and the amount of calculation is from the present. It is proportional to the number of divisions in a certain time range. Here, if the predetermined time range is lengthened or the number of divisions is increased, a more appropriate listener motion prediction value can be estimated. However, it is realistic that the original video audio x (ij) is divided about the past few seconds from the present time, and the audio x (ij) becomes 1/30 sec unit according to the number of frames of the television broadcast. Yes, the load on the computer 10 can be reduced.

聞き手キャラクタ22の身体動作は、後掲図６に見られるように、頷き動作のほか、頭の振り動作、腕の振り動作、胴の前後動作又は旋回操作等それぞれに異なる複数の動作パターンを用意し、これらを組み合わせて聞き手動作タイミングで実行させるとよい。本例では、２体の聞き手キャラクタ22,22を用いているが、各聞き手キャラクタ22の身体動作は同じ又は異なってもよい。複数の聞き手キャラクタは、組み合わせる動作パターンの種類を変えることで異なる身体動作をさせることができる。また、それぞれの聞き手動作タイミングを変えることにより、経時的な身体動作の変化を異ならせることができ、結果として異なる身体動作をさせることもできる。 As shown in FIG. 6, the listener character 22 is provided with a plurality of different motion patterns, such as a whirling motion, a head swinging motion, an arm swinging motion, a torso back-and-forth motion, and a turning motion, as shown in FIG. These may be combined and executed at the listener operation timing. In this example, two listener characters 22 and 22 are used, but the body motion of each listener character 22 may be the same or different. Multiple listener characters can perform different body movements by changing the types of movement patterns to be combined. In addition, by changing the timing of each listener's movement, changes in body movement over time can be made different, and as a result, different body movements can be made.

また、キャラクタ生成部12は、本例の聞き手キャラクタ22が表示映像24に向かって右寄りに位置し、視線を表示映像24の中心、すなわち左斜め上方に視線を向けて、視聴者に背面を向けて表示されるように、各聞き手キャラクタ22の動画を生成している。原映像又は縮小映像に対する聞き手キャラクタの表示位置は、例えば映像合成部によっても決定することができるが、このキャラクタ生成部により予め聞き手キャラクタの表示位置を定めておき、映像合成部では単に聞き手キャラクタの動画と原映像又は縮小映像の動画とを合成するほうが容易である。 In addition, the character generation unit 12 has the listener character 22 of this example positioned on the right side of the display image 24, directs the line of sight toward the center of the display image 24, that is, obliquely upward to the left, and faces the viewer back. Thus, a moving image of each listener character 22 is generated. The display position of the listener character with respect to the original video or the reduced video can be determined by, for example, the video composition unit, but the display position of the listener character is determined in advance by this character generation unit, and the video composition unit simply determines the position of the listener character. It is easier to synthesize a moving image and an original image or a reduced image moving image.

映像合成部13は、合成プログラムを実行し、コンピュータ10の演算処理能力を利用して構成する。このほか、図３に見られるように、既存の映像ミキサ40を利用することもできる。映像ミキサ40を利用する場合、キャラクタ生成部12を有するコンピュータ10は、原映像21の音声のみを取り込む映像入力部15を用いる。映像ミキサ40は、内蔵する映像入力部41によって、原映像21の動画及び音声を直接入力させる。ここで、映像ミキサ40はテレビ放送の画像信号を扱うものなので、キャラクタ生成部12から出力される聞き手キャラクタ22の動画は、コンピュータ10で扱うデジタル信号からコンバータ50を用いてテレビ放送の動画に変換して、映像ミキサ40に入力する。映像ミキサ40は、内蔵する映像合成部42で原映像21又は縮小映像23の動画と、聞き手キャラクタ22の動画とを合成して表示映像24の動画を生成し、同じく内蔵する映像出力部43から前記表示映像24をテレビ32に出力する。 The video composition unit 13 executes the composition program and configures it using the arithmetic processing capability of the computer 10. In addition, as can be seen in FIG. 3, an existing video mixer 40 can be used. When the video mixer 40 is used, the computer 10 having the character generation unit 12 uses the video input unit 15 that captures only the sound of the original video 21. The video mixer 40 directly inputs the moving image and audio of the original video 21 through the built-in video input unit 41. Here, since the video mixer 40 handles television broadcast image signals, the video of the listener character 22 output from the character generator 12 is converted from the digital signal handled by the computer 10 into a television broadcast video using the converter 50. And input to the video mixer 40. The video mixer 40 synthesizes the video of the original video 21 or the reduced video 23 with the video of the listener character 22 by the built-in video synthesis unit 42 to generate a video of the display video 24, and also from the built-in video output unit 43. The display image 24 is output to the television 32.

原映像21から表示映像24を得る手順を、図４〜図７により説明する。例えば原映像21が森林をはじめとする自然を紹介する番組である場合、原映像21の動画は例えば図４に見られるように、当然に聞き手キャラクタは存在しない。この原映像21は、動画及び音声とともに映像入力部11からコンピュータ10に取り込まれる(図１参照)。この原映像21の動画は、映像合成部13において縮小映像23の動画を生成し、聞き手キャラクタ22の動画と合成するために用いる。また、原映像21の音声は、聞き手動作タイミングで身体動作する聞き手キャラクタ22を生成するために用いる。 The procedure for obtaining the display image 24 from the original image 21 will be described with reference to FIGS. For example, when the original video 21 is a program that introduces nature such as forests, the moving image of the original video 21 naturally has no listener character as seen in FIG. The original video 21 is taken into the computer 10 from the video input unit 11 together with moving images and audio (see FIG. 1). The moving image of the original image 21 is used for generating a moving image of the reduced image 23 in the image combining unit 13 and combining it with the moving image of the listener character 22. The sound of the original video 21 is used to generate a listener character 22 that moves at the listener's motion timing.

キャラクタ生成部12は、図５に見られるように、表示映像24に対して右寄りに２体の聞き手キャラクタ22,22を表示させる動画を生成している。図５中、破線は縮小映像23の位置及び大きさを表している。ここで、映像入力部11から原映像21の音声を受けると、キャラクタ生成部12は前記音声から２体の聞き手キャラクタ22,22それぞれの聞き手動作タイミングを算出し、この聞き手動作タイミングで各聞き手キャラクタ22を身体動作させる。この身体動作は、図６に見られるように、頷き動作を含む頭の振り動作、腕の振り動作、胴の前後動作又は旋回操作等である。聞き手キャラクタ22を、縮小映像23に対してクロマキー合成する場合、キャラクタ生成部12は聞き手キャラクタ22以外の背景を青色にした動画を生成する。 As shown in FIG. 5, the character generation unit 12 generates a moving image that displays two listener characters 22 and 22 on the right side of the display image 24. In FIG. 5, the broken line represents the position and size of the reduced image 23. Here, when the voice of the original video 21 is received from the video input unit 11, the character generation unit 12 calculates the listener motion timing of each of the two listener characters 22 and 22 from the voice, and at each listener motion timing, each listener character is calculated. Operate 22 physically. As shown in FIG. 6, this body motion includes a head swing motion including a whispering motion, an arm swing motion, a torso back-and-forth motion, or a turning operation. When the listener character 22 is chroma key-combined with the reduced video 23, the character generation unit 12 generates a moving image in which the background other than the listener character 22 is blue.

映像合成部13は、映像入力部11から原映像21の動画及び音声を受け、図７に見られるように、前記原映像21を縮小映像23に変換している。本例では、原映像21の上縁中央を基準として縮小し、原映像21に対する余白25を黒色からなるモノトーンとした縮小映像23を生成している。そして、キャラクタ生成部12から受けた聞き手キャラクタ22の動画と、前記縮小映像23の動画とを合成して、図８に見られるように、表示映像24の動画を得る。後は、この表示映像24の動画と、原映像21の音声とを一体に、画像表示装置であるテレビ32に出力するだけである。 The video composition unit 13 receives the moving image and sound of the original video 21 from the video input unit 11, and converts the original video 21 into a reduced video 23 as seen in FIG. In this example, the reduced image 23 is generated by reducing the center of the upper edge of the original image 21 as a reference, and making the margin 25 for the original image 21 a monotone composed of black. Then, the moving image of the listener character 22 received from the character generation unit 12 and the moving image of the reduced image 23 are combined to obtain a moving image of the display image 24 as seen in FIG. After that, the moving image of the display video 24 and the sound of the original video 21 are simply output to the television 32 as an image display device.

視聴者は、モノトーンの余白25に囲まれた縮小映像23と、前記余白25に表示される２体の聞き手キャラクタ22,22とを一体の表示映像24として見ることになる。視聴者の視線は、縮小映像23の動画に向けられるが、テレビ32の表示画面内に表示される聞き手キャラクタ22,22も、当然に視界に含まれる。また、モノトーンの余白25に表示される聞き手キャラクタ22,22の身体動作は、比較的目立つため、各聞き手キャラクタ22の身体動作を無意識のうちに感得することになる。これにより、原映像21の音声に応答して視聴者の反応が最も顕著に現れると推定される時系列上の特定時点である聞き手動作タイミングで頷き動作する聞き手キャラクタ22を見る視聴者に身体的引き込み現象がもたらされ、結果として表示映像24に対して視聴者が引き込まれることになる。 The viewer views the reduced image 23 surrounded by the monotone margin 25 and the two listener characters 22 and 22 displayed in the margin 25 as an integrated display image 24. The viewer's line of sight is directed to the moving image of the reduced video 23, but the listener characters 22 and 22 displayed on the display screen of the television 32 are naturally included in the field of view. Further, since the body movements of the listener characters 22 and 22 displayed in the monotone margin 25 are relatively conspicuous, the body movements of the respective listener characters 22 are unconsciously sensed. As a result, the viewer's response to the sound of the original video 21 is estimated to be most noticeable. A pull-in phenomenon is caused, and as a result, a viewer is drawn into the display video 24.

本発明の有効性を確認するため、既存の映像機器を利用して映像システムを構成し、複数の被験者に官能試験を実施した。とりわけ、原映像に対して聞き手キャラクタを合成することの官能評価を調べる目的に、第１に縮小映像の大きさをパラメータとした官能評価試験と、第２に聞き手キャラクタの表示位置をパラメータとした官能評価試験とを、それぞれ２体又は３体の聞き手キャラクタの場合について、実施した。 In order to confirm the effectiveness of the present invention, a video system was configured using existing video equipment, and a sensory test was performed on a plurality of subjects. In particular, for the purpose of investigating the sensory evaluation of synthesizing the listener character with the original video, first, the sensory evaluation test using the size of the reduced video as a parameter, and second, the display position of the listener character as a parameter. A sensory evaluation test was conducted for two or three listener characters, respectively.

各官能評価試験に用いた映像システムの構成を図９に示す。この度の官能評価試験では、本発明に基づく専用のシステム構成ではなく、キャラクタ生成部を構成するコンピュータ以外は、既存の映像機器を用いている。具体的には、官能評価試験に用いた映像システムは、原映像用テレビ、コンピュータ、モニタ、コンバータ、映像ミキサ、そして表示映像用テレビからなる。コンピュータはシリンコングリフィックス社製、CPUはPentium(登録商標)III１GHz×２、メモリ768MBのデスクトップ型パソコンであり、OSにはWindows(登録商標)2000を用いている。コンバータは、Digital Arts DSCφ6d-HR用いて、映像ミキサとＳ端子で接続している。映像ミキサは、松下電器産業Digital AV Mixer WJ-AVE55を用いている。表示用テレビはシャープ社製、20インチの液晶型テレビを用いている。 The structure of the video system used for each sensory evaluation test is shown in FIG. In this sensory evaluation test, existing video equipment is used except for the computer constituting the character generation unit, not the dedicated system configuration based on the present invention. Specifically, the video system used for the sensory evaluation test includes an original video television, a computer, a monitor, a converter, a video mixer, and a display video television. The computer is a desktop personal computer having a Pentium (registered trademark) III 1 GHz × 2 and a memory of 768 MB, and Windows (registered trademark) 2000 is used as the OS. The converter is connected to the video mixer via the S terminal using Digital Arts DSCφ6d-HR. The video mixer uses Matsushita Electric Industrial Digital Mixer WJ-AVE55. The display TV uses a 20-inch LCD TV manufactured by Sharp Corporation.

原映像用テレビは、ビデオ再生した原映像(放送大学ビデオ教材・量子力学４)を表示しており、コンピュータは前記原映像用テレビから音声を取り込んで、聞き手動作タイミングで身体動作する２体又は３体の聞き手キャラクタの動画を生成する。モニタは、生成された聞き手キャラクタを表示して、生成された聞き手キャラクタの身体動作を確認する。前記聞き手キャラクタの動画は、コンバータによりコンピュータで扱うデジタル信号からテレビ放送の動画に変換して、映像ミキサに入力する。映像ミキサは、原映像用テレビから原映像の動画及び音声を受け、コンバータから入力された聞き手キャラクタの動画と、原映像を縮小して生成した縮小映像の動画とをクロマキー合成して表示映像の動画を生成する。そして、映像ミキサから表示映像の動画と原映像の音声とを表示用テレビに出力し、表示している。 The original video TV displays the original video reproduced (Video University teaching materials / Quantum Mechanics 4), and the computer takes in the audio from the original video TV and performs two body movements at the listener's operation timing. A movie of three listener characters is generated. The monitor displays the generated listener character and confirms the body motion of the generated listener character. The moving image of the listener character is converted from a digital signal handled by a computer into a moving image of a television broadcast by a converter and input to a video mixer. The video mixer receives the video and audio of the original video from the TV for the original video, and chroma key synthesizes the video of the listener character input from the converter and the video of the reduced video generated by reducing the original video. Generate a video. Then, the video of the display video and the audio of the original video are output from the video mixer to the display television for display.

各官能評価試験は、50名の被験者に、それぞれ個室で表示用テレビを視聴してもらう形式で実施した。各被験者には、キーボードを持たせている。各被験者は、このキーボードにより、官能評価の高い聞き手キャラクタの表示位置、大きさ、そして縮小映像の大きさを選択、回答できるようにしている。各官能試験では、縮小映像の選択の観点から、映像ミキサに入力する原映像は、予めコンピュータで縮小映像に変換し、別途映像ミキサに取り込んだ聞き手キャラクタの動画と前記縮小映像の動画とを合成している。縮小映像の縮小率が固定されれば、映像ミキサにおいて縮小映像を生成し、聞き手キャラクタと縮小映像とを合成することは可能である。 Each sensory evaluation test was conducted in a format in which 50 subjects watched the television for display in a private room. Each subject has a keyboard. Each subject can use this keyboard to select and answer the display position and size of the listener character with high sensory evaluation and the size of the reduced video. In each sensory test, from the viewpoint of selecting a reduced video, the original video input to the video mixer is converted into a reduced video by a computer in advance, and the video of the listener character captured separately in the video mixer and the video of the reduced video are synthesized. doing. If the reduction rate of the reduced video is fixed, it is possible to generate a reduced video in the video mixer and synthesize the listener character and the reduced video.

縮小映像は、原映像の上縁中央を基準として縮小したものである。官能評価試験では、原映像に対して100%(余白なし)、95%(左右に横方向５％の余白、下方に縦方向５％の余白)のほか、被験者が視聴しやすいと考える大きさを原映像に対して30%〜100%の間で選定して提示し、各縮小映像において被験者が視聴しやすいと考える聞き手キャラクタの表示位置及び大きさを選定させた。そして、こうして被験者が選択した聞き手キャラクタの表示位置、大きさ、そして縮小映像の大きさの表示映像を基準とし、聞き手キャラクタなしの場合と、聞き手キャラクタを合成した場合とを切り替えながら視聴させ、聞き手キャラクタなしの場合を基準に聞き手キャラクタを合成した場合を、７段階(中立０)で官能評価させた。前記官能評価試験は、聞き手キャラクタが２体の場合、そして３体の場合それぞれ実施した。 The reduced image is a reduced image based on the center of the upper edge of the original image. In the sensory evaluation test, 100% (no margin), 95% (5% horizontal margin on the left and right, 5% vertical margin on the bottom) of the original video, and the size that the subject thinks it is easy to watch Was selected and presented between 30% and 100% of the original video, and the display position and size of the listener character considered to be easy for the subject to view in each reduced video were selected. Then, based on the display position and size of the listener character selected by the subject in this way and the display image of the reduced video size as a reference, the viewer can watch while switching between the case of no listener character and the case of synthesizing the listener character. When the listener character was synthesized based on the case of no character, sensory evaluation was performed in 7 stages (neutral 0). The sensory evaluation test was performed for two and three listener characters, respectively.

まず、縮小映像の大きさをパラメータとした官能評価試験の結果を、表１に示す。この試験結果から、縮小映像は原映像に対して80%〜90%の大きさが好ましいことが分かる。この縮小画像と聞き手キャラクタとを合成した表示画像の例を、図10及び図11に示す。原映像に対して80％〜90％の縮小画像は、余白の存在が聞き手キャラクタを際立たせ、認識しやすくしていると思われる。このとき、好ましい聞き手キャラクタの大きさは、高さ方向で平均23.7％、横方向で平均15.7％であり、いずれも縮小画像に聞き手キャラクタの一部がかぶさる程度が好ましいとの回答を得ている。これから、聞き手キャラクタのの存在を際立たせる余白が存在すれば、聞き手キャラクタにより視聴者に身体的引き込み現象をもたらしやすいと考えられる。 First, Table 1 shows the results of the sensory evaluation test using the size of the reduced image as a parameter. From this test result, it can be seen that the reduced video is preferably 80% to 90% larger than the original video. An example of a display image obtained by combining the reduced image and the listener character is shown in FIGS. In the reduced image of 80% to 90% with respect to the original video, it seems that the presence of the margin makes the listener character stand out and make it easy to recognize. At this time, the preferred size of the listener character is 23.7% on the average in the height direction and 15.7% on the average in the horizontal direction, and it has been answered that it is preferable that a part of the listener character covers the reduced image. . From this, it is considered that if there is a margin that makes the presence of the listener character stand out, the listener character is likely to cause a physical pull-in phenomenon to the viewer.

次に、聞き手キャラクタの表示位置をパラメータとした官能評価試験の結果を図12に示す。図12は、原映像に対する大きさが80％の縮小画像に対し、２体の聞き手キャラクタをそれぞれ表示映像の下方で自由な位置に表示にしてもらい、どの表示位置が好ましいかを被験者に選んでもらった結果を表した分布図である。図12中、○は１体目の聞き手キャラクタの位置、●は２体目の聞き手キャラクタの位置、□は１体目の聞き手キャラクタの表示位置の平均であり、■は２体目の聞き手キャラクタの表示位置の平均を表している。同様に３体の聞き手キャラクタについて調べた結果、被験者の好みは、２体又は３体の聞き手キャラクタを左右均等配置(２体では左右両端、３体では左右両端と中央付近)するグループ、中央付近にまとめて配置するグループ、そして右寄りにまとめて配置するグループに分かれた。 Next, FIG. 12 shows the results of a sensory evaluation test using the display position of the listener character as a parameter. Fig. 12 shows two listener characters displayed at free positions below the display image for a reduced image with a size of 80% of the original image, and the subject chooses which display position is preferred. It is the distribution map showing the result obtained. In FIG. 12, ○ is the position of the first listener character, ● is the position of the second listener character, □ is the average display position of the first listener character, and ■ is the second listener character Represents the average of the display positions. Similarly, as a result of examining three listener characters, the subject's preference is that two or three listener characters are arranged equally in the left and right (two left and right ends, three left and right ends and near the center), near the center The group is divided into a group that is placed together and a group that is placed on the right side.

上記２種類の官能評価試験をまとめた結果を表２に示す。２体の聞き手キャラクタを用いた場合、原映像に対して90％の大きさである縮小映像に、聞き手キャラクタを下方右寄りに配置した表示映像が好まれる傾向にある。また、３体の聞き手キャラクタを用いた場合、原映像に対して80％の大きさである縮小映像に、聞き手キャラクタを均等に配置した表示映像が好まれる傾向にある。これから、原映像に対して80〜90％の縮小画像に、聞き手キャラクタを下方中央付近又は下方右寄りに配置する表示映像が、身体的引き込み現象をもたらす聞き手キャラクタの合成構成として相応しいことが結論づけられる。 Table 2 shows a summary of the two types of sensory evaluation tests. When two listener characters are used, a display image in which the listener character is arranged on the lower right side tends to be preferred for a reduced image that is 90% of the original image. In addition, when three listener characters are used, a display image in which the listener characters are evenly arranged tends to be preferred to a reduced image that is 80% of the original image. From this, it can be concluded that the display image in which the listener character is arranged near the lower center or the lower right side in the reduced image of 80 to 90% with respect to the original image is suitable as the composition of the listener character that causes the physical pull-in phenomenon.

最後に、どの程度の身体的引き込み現象がもたらされたのか、すなわち本発明の映像システムの有効性を確認するため、聞き手キャラクタを用いない原映像に対する官能評価をしてもらった結果を図13に示す。図13中、●表示は聞き手キャラクタが２体の場合、○は聞き手キャラクタが３体の場合を表す。聞き手キャラクタが２体であるか、３体であるかで多少の差異はあるものの、「楽しさ」、「好き嫌い」、「聴講しやすさ（聞きやすさ）」、「使いたいか」、「安心感」のいずれの評価項目に対しても肯定的な回答が得られた。これから、本発明の映像システムの有効性が確認されたと言える。 Finally, in order to confirm the degree of physical pull-in phenomenon, that is, the effectiveness of the video system of the present invention, the results of sensory evaluation on the original video without using the listener character are shown in FIG. Shown in In FIG. 13, ● indicates that there are two listener characters, and ○ indicates that there are three listener characters. Although there are some differences depending on whether there are two or three listener characters, "fun", "dislikes", "easy to listen (easy to hear)", "why you want to use", " A positive answer was obtained for any of the evaluation items for "Reassurance". From this, it can be said that the effectiveness of the video system of the present invention has been confirmed.

本発明の映像システムの構築例を示す斜視図である。It is a perspective view which shows the construction example of the video system of this invention.同映像システムのブロック構成図である。It is a block block diagram of the video system.別例の映像システムのブロック構成図である。It is a block block diagram of the video system of another example.原映像が表示されたテレビの画面を表す図である。It is a figure showing the screen of the television on which the original image was displayed.生成した２体の聞き手キャラクタが表示されたテレビの画面を表す図である。It is a figure showing the screen of the television on which two generated listener characters were displayed.聞き手キャラクタの身体動作を説明する模式図である。It is a schematic diagram explaining the body movement of a listener character.縮小映像が表示されたテレビの画面を表す図である。It is a figure showing the screen of the television on which the reduced image was displayed.縮小映像及び聞き手キャラクタを重ね合わせた表示映像が表示されたテレビの画面を表す図である。It is a figure showing the screen of the television on which the display image which superimposed the reduced image and the listener character was displayed.官能評価試験に用いた映像システムの構成図である。It is a block diagram of the imaging | video system used for the sensory evaluation test.90％の縮小画像と聞き手キャラクタとを合成した表示画像が表示されたテレビの画面を表す図である。It is a figure showing the screen of the television on which the display image which synthesize | combined the reduced image of 90% and the listener character was displayed.80％の縮小画像と聞き手キャラクタとを合成した表示画像が表示されたテレビの画面を表す図である。It is a figure showing the screen of the television on which the display image which synthesize | combined the reduced image of 80% and the listener character was displayed.80％の縮小画像に対し、２体の聞き手キャラクタの好ましい表示位置を選択してもらった結果を表した分布図である。It is a distribution diagram showing the result of having selected the preferred display position of two listener characters with respect to a reduced image of 80%.原映像に対する官能評価をしてもらった結果を表した図である。It is a figure showing the result of having performed sensory evaluation to an original picture.

符号の説明Explanation of symbols

10 コンピュータ
11 映像入力部
12 キャラクタ生成部
13 映像合成部
14 映像出力部
15 原映像の音声のみを取り込む映像入力部
21 原映像
22 聞き手キャラクタ
23 縮小映像
24 表示映像
25 余白
31 チューナ
32 テレビ
40 映像ミキサ
41 映像入力部
42 映像合成部
43 映像出力部
50 コンバータ10 computers
11 Video input section
12 Character generator
13 Video composition part
14 Video output section
15 Video input section that captures only the audio of the original video
21 Original video
22 Listener character
23 Reduced image
24 Display image
25 margin
31 Tuner
32 TV
40 video mixer
41 Video input section
42 Video composition part
43 Video output section
50 converter

Claims

Translated fromJapanese

動画及び音声からなる映像を表示装置に表示する際に視聴者を該映像に引き込ませるシステムであって、原映像の動画及び音声を外部から入力させる映像入力部と、表示映像を見る視聴者に代わって視聴者として振る舞う聞き手キャラクタの動画を生成するキャラクタ生成部と、前記聞き手キャラクタの動画と原映像の動画とを合成して原映像の動画の周囲に聞き手キャラクタが表示される表示映像の動画を生成する映像合成部と、前記表示映像の動画と原映像の音声とを表示装置へ出力する映像出力部とからなり、キャラクタ生成部は原映像の音声をON/OFF信号とみなして算出される聞き手動作タイミングで視聴者として振る舞う身体動作をする聞き手キャラクタの動画を生成してなり、表示装置に表示される表示映像を見る視聴者に、該表示映像の周囲に表示される聞き手キャラクタを見せることにより、身体的引き込み現象を視聴者にもたらし、表示映像を見る視聴者を該表示映像に引き込ませる身体的引き込み現象を利用した映像システム。A system that draws a viewer into a video when displaying a video and audio video on a display device, a video input unit that inputs the video and audio of the original video from the outside, and a viewer who watches the display video Instead, a character generation unit that generates a video of a listener character acting as a viewer, and a video of a display video in which the video of the listener character and the video of the original video are combined to display the listener character around the video of the original video And a video output unit that outputs the moving image of the display video and the audio of the original video to the display device, and the character generation unit calculates the audio of the original video as an ON / OFF signal. A video of a listener character that performs a body motion that acts as a viewer at the listener's motion timing is generated, and the viewer sees the display video displayed on the display device. By show the listener character displayed around the image results in a physical entrainment viewers video system using physical entrainment retract the viewer viewing the display image on the display image.

キャラクタ生成部は、原映像の音声の移動平均により推定される聞き手動作予測値が予め定めた聞き手動作閾値を越えた時点を聞き手動作タイミングとして算出する請求項１記載の身体的引き込み現象を利用した映像システム。2. The physical pull-in phenomenon according to claim 1, wherein the character generation unit calculates a point in time at which a predicted listener motion estimated by a moving average of the sound of the original video exceeds a predetermined listener motion threshold as a listener motion timing. Video system.

キャラクタ生成部は、視聴者の振る舞いとして頭の頷き動作を含む身体動作をする聞き手キャラクタの動画を生成する請求項１記載の身体的引き込み現象を利用した映像システム。The video system using the physical pull-in phenomenon according to claim 1, wherein the character generation unit generates a moving image of a listener character that performs a physical motion including a whispering motion as a viewer's behavior.

キャラクタ生成部は、視聴者に対して背面を向けて身体動作をする聞き手キャラクタの動画を生成する請求項１記載の身体的引き込み現象を利用した映像システム。The video system using the physical entrainment phenomenon according to claim 1, wherein the character generation unit generates a moving image of a listener character that makes a body motion with the back facing the viewer.

キャラクタ生成部は、同じ又は異なる身体動作をする複数の聞き手キャラクタの動画を生成する請求項１記載の身体的引き込み現象を利用した映像システム。The video system using the physical entrainment phenomenon according to claim 1, wherein the character generation unit generates moving images of a plurality of listener characters performing the same or different body movements.

キャラクタ生成部は、表示映像に向かって下方中央付近に表示する聞き手キャラクタの動画を生成する請求項１記載の身体的引き込み現象を利用した映像システム。The video system using the physical entrainment phenomenon according to claim 1, wherein the character generation unit generates a moving image of the listener character displayed near the lower center toward the display video.

キャラクタ生成部は、表示映像に向かって下方左寄り又は右寄りに表示する聞き手キャラクタの動画を生成する請求項１記載の身体的引き込み現象を利用した映像システム。The video system using the physical entrainment phenomenon according to claim 1, wherein the character generation unit generates a moving image of a listener character displayed on the lower left side or the right side toward the display video.

映像合成部は、聞き手キャラクタの動画と、原映像を縮小した縮小映像の動画とを合成して、該縮小映像の原映像に対する余白に前記聞き手キャラクタを表示する表示映像の動画を生成する請求項１記載の身体的引き込み現象を利用した映像システム。The video composition unit synthesizes a moving image of a listener character and a moving image of a reduced video obtained by reducing the original video, and generates a moving image of a display video that displays the listener character in a margin with respect to the original video of the reduced video. A video system using the physical pull-in phenomenon described in 1.

映像合成部は、聞き手キャラクタの動画と、原映像の上縁中央を基準として該原映像を縮小した縮小映像の動画とを合成して、該縮小映像の原映像に対する下方の余白に前記聞き手キャラクタを表示する表示映像の動画を生成する請求項１記載の身体的引き込み現象を利用した映像システム。The video composition unit synthesizes the moving image of the listener character and a reduced video image obtained by reducing the original video with reference to the center of the upper edge of the original video, and the listener character is placed in a lower margin with respect to the original video of the reduced video. The video system using the physical pull-in phenomenon according to claim 1, wherein a moving image of a display video for displaying is generated.

映像合成部は、聞き手キャラクタの動画と、原映像を縮小した縮小映像の動画とを合成し、該縮小映像の原映像に対する余白をモノトーンとし、該余白に前記聞き手キャラクタを表示する表示映像の動画を生成する請求項１記載の身体的引き込み現象を利用した映像システム。The video composition unit synthesizes the moving image of the listener character and the reduced video image obtained by reducing the original video, and sets the margin for the original video of the reduced video as a monotone, and displays the listener character in the margin. The video system using the physical pull-in phenomenon according to claim 1, wherein: