JP2010039607A

Movatterモバイル変換

Info

Publication number: JP2010039607A
Application number: JP2008199337A
Authority: JP
Inventors: Shuhei Sasakura; 州平笹倉
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2008-08-01
Filing date: 2008-08-01
Publication date: 2010-02-18

Abstract

【課題】表示部を備えたジェスチャー入力装置において、表示部から出力される映像が該映像によりジェスチャー対象者を照らすことによって、ジェスチャー動作を撮影した撮影信号からジェスチャー入力者の抽出ができなくなる問題があった。
【解決手段】表示部からでる映像の色相情報を元に、映像の補正を行うことで映りこみを除去し、正常なジェスチャー認識を行えるようにする。もしくは、入力者の色情報を映像の映りこみを考慮した色情報に変換することで、正常なジェスチャー認識を行えるようにする。
【選択図】図１In a gesture input device provided with a display unit, there is a problem that it becomes impossible to extract a gesture input person from a photographed signal obtained by photographing a gesture operation when a video output from the display unit illuminates a gesture target person with the video. there were.
The image is corrected based on the hue information of the video from the display unit to remove the reflection and to perform normal gesture recognition. Alternatively, normal gesture recognition can be performed by converting the color information of the input person into color information in consideration of the reflection of the video.
[Selection] Figure 1

Description

Translated fromJapanese

本発明は、ディスプレイ表示装置を有した操作入力装置に関するものである。 The present invention relates to an operation input device having a display device.

近年のデジタル放送の普及と表示ディスプレイ部の薄型化に伴い、大型のテレビ受信装置が一般家庭にも普及するようになってきた。リビングなどの大きな部屋に大きなテレビ受信機を置くことで家族の団らんの中心となっている。そのために写真の閲覧機能や、インターネットへ接続してのホームページの視聴など、従来パーソナルコンピュータが行ってきた役割をテレビ受信機が果たすようになってきた。しかしパーソナルコンピュータのユーザとは異なり、覚える必要がある複雑な操作特にマウスやキーボードなどの入力装置では操作が容易に行えない。しかしこれまで使用されてきたようなリモコンでは操作の幅が小さく、自由な操作が行えない。またリモコンなどの道具はテレビと離れた距離で操作することを想定しているため、テレビと離れた別の装置となっており、紛失などの問題も発生していた。 With the recent spread of digital broadcasting and the thin display display unit, large television receivers have come into widespread use in ordinary households. A large TV receiver in a large room such as a living room is the center of the family. For this reason, television receivers have come to perform the roles that personal computers have been playing in the past, such as browsing photos and viewing homepages connected to the Internet. However, unlike a user of a personal computer, complicated operations that need to be remembered, especially operations using an input device such as a mouse or a keyboard, cannot be easily performed. However, remote controls such as those used so far have a small range of operations and cannot be freely operated. Moreover, since it is assumed that tools such as a remote control are operated at a distance from the TV, it is a separate device from the TV, causing problems such as loss.

そこで特許文献１において、手を使用してのジェスチャーによる入力が提案されている。ジェスチャーはＣＣＤカメラを利用して操作者を撮影し、身振りや手振りによってカーソルの移動を実現するものである。 Thus, Patent Document 1 proposes input by gesture using a hand. A gesture captures an operator using a CCD camera and realizes movement of the cursor by gestures or hand gestures.

図９に特許文献１に記載のインターフェイス装置９００の概観図を示す。インターフェイス装置９００は、ホストコンピュータ９０１、表示用ディスプレイ９０２、ＣＣＤカメラ９０３から構成されている。ユーザが操作表示用ディスプレイ９０２に向かって、手振りによるジェスチャーを行うと、ＣＣＤカメラ９０３が手振りの様子を捕らえる。その動きと位置を認識し、表示用ディスプレイ９０２上のカーソル９０４を動かす。 FIG. 9 shows an overview of the interface device 900 described in Patent Document 1. The interface device 900 includes a host computer 901, a display for display 902, and a CCD camera 903. When the user performs a gesture by hand gesture toward the operation display 902, the CCD camera 903 captures the hand gesture. The movement and position are recognized, and the cursor 904 on the display 902 is moved.

図１０に、ユーザの手振りを捕らえるところからカーソル９０４を動かすまでのインターフェイス装置１０００の詳細なブロック図を示す。ＣＣＤカメラ１０１２には、肌色領域の波長の光透過フィルタ１０１１を設置しておき、人間の顔や手などの肌色領域のみが捕らえられる。ＣＣＤカメラ１０１２からの画像はフレームメモリ１０１３に蓄えられる。動作認識部１０１４はフレームメモリ１０１３上から肌色のエッジを検出し輪郭形状を抽出する。さらに輪郭形状から手の形状のものを選び出し、手の位置とユーザ指示の手の形状（握り手、指立て本数など）を検出する。手の位置と手の形状によって表示デバイス上のカーソルを移動させるため表示制御部１０１５が表示用ディスプレイ部１０１６を制御する。
特開２００４−７８９７７号公報FIG. 10 shows a detailed block diagram of the interface apparatus 1000 from the point where the user's hand gesture is captured until the cursor 904 is moved. The CCD camera 1012 is provided with a light transmission filter 1011 having a wavelength in the skin color region, and only the skin color region such as a human face or hand is captured. An image from the CCD camera 1012 is stored in the frame memory 1013. The motion recognition unit 1014 detects a skin color edge from the frame memory 1013 and extracts a contour shape. Further, a hand shape is selected from the contour shape, and the position of the hand and the shape of the hand instructed by the user (gripping hand, number of finger stands, etc.) are detected. The display control unit 1015 controls the display unit 1016 for moving the cursor on the display device according to the position of the hand and the shape of the hand.
JP 2004-78977 A

ジェスチャーとは「身振り手振り」のことであり、人間が眼でジェスチャーを認識する場合は、一瞬で判断できるものではなく、ジェスチャーを行っているものの一連の動きから判断を行うものである。 A gesture is a “gesturing hand gesture”. When a human recognizes a gesture with his / her eyes, the gesture cannot be determined instantaneously, but is determined based on a series of movements of a gesture.

カメラなどの撮像装置を使用した場合も同じである。一枚の静止画からジェスチャーの内容を判断することはできない。連続する複数枚の静止画つまりは動画から、入力者がどういった動きをしているかを推定して判断を行う。そのためには動画の各画像の動きの差分を出すと共に、その差分からジェスチャーを行っている同一の物体（入力者の手など）の動き方を解析する必要がある。より精度よくジェスチャーの動きを解析するためには、撮像装置による撮影枚数を多くする方法が知られている。人間の動きは連続した動きであるため、高速で撮影することにより精密な動きをとらえることができる。 The same applies when an imaging device such as a camera is used. The content of the gesture cannot be determined from a single still image. Judgment is made by estimating the movement of the input person from a plurality of continuous still images, that is, moving images. For this purpose, it is necessary to calculate the difference in motion of each image of the moving image and to analyze how the same object (such as the hand of the input person) performing the gesture from the difference. In order to analyze the movement of the gesture with higher accuracy, a method of increasing the number of images taken by the imaging device is known. Since human movements are continuous movements, precise movements can be captured by shooting at high speed.

このジェスチャーをテレビなどの表示装置の前で行う場合、表示装置に表示される画面も都度変化している。テレビ映像は毎秒６０フレームであり、この速度でつねに映像も変化し続けている。この映像の前でジェスチャーを行った場合、映像の変化に合わせて撮像装置によって撮影される映像の色合いが変化する。この変化によってジェスチャー認識に必要な手の位置・形状を見失ってしまう問題があった。 When this gesture is performed in front of a display device such as a television, the screen displayed on the display device changes every time. Television images are 60 frames per second, and the images are constantly changing at this speed. When a gesture is performed in front of this video, the color of the video shot by the imaging device changes in accordance with the change in the video. Due to this change, there is a problem that the position and shape of the hand necessary for gesture recognition are lost.

上記課題を解決するために、本発明の入力装置は、ジェスチャーを撮影する手段において、撮影される瞬間にディスプレイ部に表示されている映像を元に、撮影された映像の色情報を補正したうえで、ジェスチャーの解析を行う手段を備えている。 In order to solve the above problems, an input device according to the present invention corrects color information of a photographed image based on an image displayed on a display unit at the moment of photographing in a means for photographing a gesture. And it has a means to analyze gestures.

これらの手段を用い、本発明の入力装置は、ディスプレイ上の画面に表示された映像を、ジェスチャーをとらえた映像の補正に用いることで、ディスプレイ映像の映像変化による外乱を除去し、精度良いジェスチャーの認識が行えるようになる。 Using these means, the input device of the present invention uses the video displayed on the screen on the display for correction of the video that captures the gesture, thereby removing disturbance due to the video change of the display video and accurately gesturing. Can be recognized.

（第１の実施の形態）
以下に、本発明を実施するための最良の形態について、図面を参照して説明する。まず、本実施の形態の入力装置を図１に示す。(First embodiment)
The best mode for carrying out the present invention will be described below with reference to the drawings. First, an input device of this embodiment is shown in FIG.

図１は、ディスプレイなどの映像出力装置と、ユーザからの入力を受け付ける入力機能を持った入力装置１００である。入力装置１００は、アンテナ部１０１、受信・変調部１０２、デコード部１０３、オーディオ出力部１０４、映像生成部１０５、映像表示部１０６、撮像部１５１、映像補正部１５２、ジェスチャー認識部１５３、色情報保持部１５４から成り立っている。 FIG. 1 shows an input device 100 having a video output device such as a display and an input function for receiving an input from a user. The input device 100 includes an antenna unit 101, a reception / modulation unit 102, a decoding unit 103, an audio output unit 104, a video generation unit 105, a video display unit 106, an imaging unit 151, a video correction unit 152, a gesture recognition unit 153, color information. The holding unit 154 is configured.

アンテナ部１０１において、放送局から出力される放送波を受信する。受信した放送波は受信・変調部１０２で受け取られＴＳデータに変換される。実施の形態では、受信部と復調部を合わせて１つの機能としたが、実際には、アンテナ１０１から受信するチューナ部と、受信波を変調するＯＦＤＭ変調機に分かれていることが多い。本実施の形態では、チューナをもった放送受信機を想定しているが、ＤＶＤやＨＤＤなどの記録メディアから蓄積済みのコンテンツを再生してもかまわない。 The antenna unit 101 receives a broadcast wave output from a broadcast station. The received broadcast wave is received by the reception / modulation unit 102 and converted into TS data. In the embodiment, the receiving unit and the demodulating unit are combined into one function. However, in practice, the receiving unit and the demodulating unit are often divided into a tuner unit that receives from the antenna 101 and an OFDM modulator that modulates the received wave. In the present embodiment, a broadcast receiver having a tuner is assumed, but stored content may be reproduced from a recording medium such as a DVD or HDD.

受信・変調部１０２で、作られたＴＳデータはデコード部１０３に送れられる。デコード部１０３では、映像ＴＳデータと音声ＴＳデータに分離されてデコードされ映像信号と音声信号が生成される。音声信号は音声出力部１０４で音声に変換される。オーディオ出力１０４はスピーカやヘッドホンである。音声になったデータは聴覚情報１３０として、操作者１４０に到達する。操作者１４０はこれを耳で聞く。 The TS data generated by the reception / modulation unit 102 is sent to the decoding unit 103. In the decoding unit 103, video TS data and audio TS data are separated and decoded to generate a video signal and an audio signal. The audio signal is converted into audio by the audio output unit 104. The audio output 104 is a speaker or headphones. The voiced data reaches the operator 140 as auditory information 130. The operator 140 hears this by ear.

またデコード部１０３においてデコードされた映像信号は、映像生成部１０５に送られる。ここでは、デコードされた映像信号に、メニューやヘルプ画面など、入力装置１００にかかわる映像情報を組み合わせる。この付加する映像情報はアイコンやボタンのようにジェスチャーによって操作されるものであってもよし、ジェスチャーには直接関係ないフレームや演出であってもよい。映像生成部１０５によって生成された映像こそが、操作者が眼にする映像信号となる。 The video signal decoded by the decoding unit 103 is sent to the video generation unit 105. Here, the decoded video signal is combined with video information related to the input device 100 such as a menu or a help screen. The video information to be added may be operated by a gesture such as an icon or a button, or may be a frame or an effect not directly related to the gesture. The video generated by the video generation unit 105 is the video signal that the operator sees.

映像生成部１０５において、生成された映像信号は映像表示部１０６へ送られる。映像表示部１０６は液晶ディスプレイやプラズマディスプレイなどの画像デバイスであり、無数の点によって光に変換され視覚情報１３１として、操作者１４０へ送られる。操作者１４０は、これを眼でとらえ理解する。 In the video generation unit 105, the generated video signal is sent to the video display unit 106. The video display unit 106 is an image device such as a liquid crystal display or a plasma display, and is converted into light by countless points and sent to the operator 140 as visual information 131. The operator 140 grasps and understands this with the eyes.

操作者１４０は、聴覚情報１３０と視覚情報１３１から、入力装置１００へ対しての操作を決定する。そして操作者１４０は体全体や手や足などの体の一部を使用して操作を行う。この情報はジェスチャー入力１３２として入力装置１００にある撮像部１５１によって撮影される。撮像部１５１で撮影された被写体は、撮像信号としてデータ化され、当該データはジェスチャー認識部１５３へ送られて、その内容が入力装置１００への操作内容が解析され、それに従い操作される。本構成では、その撮像部１５１とジェスチャー認識部１５３の間に映像補正部１５２を経由する構造となっている。映像補正部では、撮像部１５１によって得られた撮像信号を、色情報保持部１５４から得た手の色の情報と映像生成部１０５によって生成された映像信号を元に色の補正を行う。映像生成部１０５で生成される映像は時間経過によって常に変化しているため、撮像部１５１において撮影されたときに映像表示部１０６に表示されている映像と同じものを使用する。これにより、映像補正部１５２から出力される映像は、ビデオ信号出力部１０６による影響を除去したものとなる。 The operator 140 determines an operation for the input device 100 from the auditory information 130 and the visual information 131. The operator 140 performs the operation using the whole body or a part of the body such as a hand or a foot. This information is captured by the imaging unit 151 in the input device 100 as the gesture input 132. The subject imaged by the imaging unit 151 is converted into data as an imaging signal, and the data is sent to the gesture recognition unit 153. The content of the subject is analyzed for the operation content of the input device 100, and the operation is performed accordingly. In this configuration, the image correction unit 152 is interposed between the imaging unit 151 and the gesture recognition unit 153. The video correction unit corrects the color of the imaging signal obtained by the imaging unit 151 based on the hand color information obtained from the color information holding unit 154 and the video signal generated by the video generation unit 105. Since the video generated by the video generation unit 105 constantly changes with the passage of time, the same video displayed on the video display unit 106 when taken by the imaging unit 151 is used. As a result, the video output from the video correction unit 152 has the influence of the video signal output unit 106 removed.

図２に、手をモデルとした色相を示す。ジェスチャー操作でもっとも多く行われるのは手の動きによる入力である。図２の上段左が手のない状態で部屋の映像である。上段右は同じ環境で手を写した場合の映像である。両映像を比較すると、手が映っているか映っていないかの差がある。各映像の色成分を分析し、ＨＳＶ分布を用いて図示したものがそれぞれの下段に記載されている。ＨＳＶ分布とは色相（Ｈｕｅ）、彩度（Ｓａｔｕｒａｔｉｏｎ）、明度（Ｖａｌｕｅ）の３成分からなる色空間である。本手法では色相を使用する。色相とは色の種類を０から３６０度で表現したもので、赤が０度、黄色が６０度、緑が１２０度、水色が１８０度、青が２４０度、紫が３００度で表される。 FIG. 2 shows a hue using a hand as a model. The most frequently used gesture operation is input by hand movement. The upper left of FIG. 2 is an image of the room with no hands. The upper right is an image of a hand shot in the same environment. Comparing the two images, there is a difference whether the hand is reflected or not. The color components of each image are analyzed, and those illustrated using the HSV distribution are described in the lower part of each image. The HSV distribution is a color space including three components of hue (Hue), saturation (Saturation), and lightness (Value). In this method, hue is used. Hue is a color type expressed from 0 to 360 degrees. Red is 0 degrees, yellow is 60 degrees, green is 120 degrees, light blue is 180 degrees, blue is 240 degrees, and purple is 300 degrees. .

それぞれの色相を比較すると０度から３０度の位置である赤に近い色相に大きな差があることがわかる。これより手の色である肌色は赤の成分が多く含まれていることがわかる。この手の色相の情報は事前に色情報保持部１５４に保持されている。２４０度あたりの青色に関しても増減があるがこれは手によって隠れたケーブル類の色相である。
よって手によるジェスチャーにおいて、手の可能性がある領域を見つけだす場合、色相において０度近辺となる色を見つけることによって候補を減らすことができる。Comparing the hues, it can be seen that there is a large difference in the hue close to red, which is the position of 0 to 30 degrees. This shows that the skin color, which is the color of the hand, contains a lot of red components. Information on the hand hue is held in advance in the color information holding unit 154. There is also an increase / decrease in blue around 240 degrees, but this is the hue of cables hidden by hand.
Therefore, when an area having a possibility of hand is found in a gesture by hand, candidates can be reduced by finding a color having a hue near 0 degrees.

図３に映像表示部１０６によって出力された映像が、撮像部１５１で撮影された映像に対してどのような変化を及ぼすかを示す。右列より映像表示部１０６の映像が黒の場合、赤の場合、緑の場合、青の場合を示す。また上行より、映像表示部１０６の映像、撮像部１５１でとらえた画像、撮像部１５１でとらえた画像の色相をＨＳＶ表示にて示す。映像表示部１０６の映像が黒の場合については、図２における手を写した映像と同一である。 FIG. 3 shows how the image output by the image display unit 106 changes with respect to the image captured by the image capturing unit 151. From the right column, the video on the video display unit 106 is black, red, green, and blue. Also, from the top row, the video of the video display unit 106, the image captured by the imaging unit 151, and the hue of the image captured by the imaging unit 151 are shown in HSV display. The case where the video on the video display unit 106 is black is the same as the video showing the hand in FIG.

映像表示部１０６からの出力映像が黒であった場合を説明する。撮像部１５１で捕らえた映像は、大きく手を映し出している。この映像の色相を確認すると、０度から３０度の赤に近い色に分布している。映像も肌色をしており、手の色相はこの位置にあることがわかる。 A case where the output video from the video display unit 106 is black will be described. The image captured by the imaging unit 151 shows a large hand. When the hue of this image is confirmed, it is distributed in a color close to red from 0 degrees to 30 degrees. The image is also skin-colored, and it can be seen that the hue of the hand is in this position.

映像表示部１０６の画像が赤であった場合、映像が手に反射し手全体が赤く染まっている。この場合、手の領域に対する色の分布は大きく変化していない。次に、映像表示部１０６の画像が緑であった場合、撮像部１５１で捕らえた映像は、手が緑色に染まっていることがわかる。このときの色相をみると、手の領域に対応する色の分布が緑や黄色に近い分布となっていることがわかる。次に映像表示部１０６の画像が青であった場合、撮像部１５１で捕らえた手は青色に染まってやり、その画像の色相をみると、手の領域に対応する色の分布が青や紫に近い分布となっていることがわかる。 When the image on the video display unit 106 is red, the video is reflected on the hand and the entire hand is stained red. In this case, the color distribution for the hand region has not changed significantly. Next, when the image on the video display unit 106 is green, it can be seen that the video captured by the imaging unit 151 has a hand stained green. Looking at the hue at this time, it can be seen that the color distribution corresponding to the hand region is close to green or yellow. Next, when the image on the video display unit 106 is blue, the hand captured by the imaging unit 151 is dyed blue, and the hue of the image shows that the color distribution corresponding to the hand region is blue or purple. It can be seen that the distribution is close to.

このように映像表示１０６の映像変化によって撮像部１５１で捕らえた映像に色変化が起こることが分かる。これが映像表示部１０６から出力される映像が高速に変化するために、撮像部１５１でジェスチャーの認識に必要な動画をとらえた場合、フレーム間で同じように色の変化がおこり、色での認識を行う場合においては同一の物体（この本実施の形態では手）であると認識することができない。 In this way, it can be seen that a color change occurs in the video captured by the imaging unit 151 due to the video change in the video display 106. Since the video output from the video display unit 106 changes at a high speed, when the moving image necessary for gesture recognition is captured by the imaging unit 151, the color changes in the same way between frames, and the color recognition is performed. Cannot be recognized as the same object (a hand in this embodiment).

図４に補正を行う映像補正部１５２の処理を示す。映像補正部１５２は、ジェスチャー認識のために追跡したい手の色情報Ｚを色情報保持部１５４より受け取る（ステップ４００）。 FIG. 4 shows processing of the video correction unit 152 that performs correction. The video correction unit 152 receives the color information Z of the hand to be tracked for gesture recognition from the color information holding unit 154 (step 400).

さらに映像補正部１５２は、撮像部１５１より操作者１４０を撮影した映像Ａを受け取る（ステップ４０１）。 Further, the video correction unit 152 receives the video A obtained by shooting the operator 140 from the imaging unit 151 (step 401).

撮像部１５１で撮影した時間と同時間に映像表示部１０６に表示した映像Ｂを映像生成部１０５より取得する（ステップ４０２）。 The video B displayed on the video display unit 106 is acquired from the video generation unit 105 simultaneously with the time taken by the imaging unit 151 (step 402).

受け取った映像Ｂを解析し、主要な色相Ｘ度を求める（ステップ４０３）。主要な位相Ｘ度の求め方として、映像Ｂの全画素の色相の平均値を求めたり、もしくは分布から中央値をとったりしても良い。 The received video B is analyzed to obtain a main hue X degree (step 403). As a method for obtaining the main phase X degree, an average value of hues of all the pixels of the video B may be obtained, or a median value may be taken from the distribution.

映像Ｂの解析結果である色相Ｘと手の色相Ｚの差分量Ｙを求める（ステップ４０４）。 A difference amount Y between the hue X and the hue Z of the hand, which is an analysis result of the video B, is obtained (step 404).

Ｙ＝Ｚ−Ｘ
手が映像Ｂの光を浴びた場合に、混ざり合う割合を係数Ｋとしてもとめる（ステップ４０５）。このとき他の光源や映像表示部１０６の表示デバイスの輝度や物体まで距離などの情報を用いて係数化してもよい。Y = Z-X
When the hand is exposed to the light of video B, the mixing ratio is obtained as a coefficient K (step 405). At this time, the coefficient may be obtained by using information such as the brightness of another light source or the display device of the image display unit 106 and the distance to the object.

変色した位相より手の本来の位相との角度差を求め、その角度差分だけ映像Ａの位相をシフトさせる（ステップ４０６）。
補正量ａ＝Ｋ×Ｙ
シフト後の画像Ａをジェスチャー認識部１５３に送る（ステップ４０７）。An angle difference from the original phase of the hand is obtained from the discolored phase, and the phase of the image A is shifted by the angle difference (step 406).
Correction amount a = K × Y
The shifted image A is sent to the gesture recognition unit 153 (step 407).

図４の手順に基づき、図３で示した画像Ｂが赤緑青の３例について補正を行ってみる。補正したい手の色相はおおよそ０から３０度の範囲であることがわかっているが、今回の例では２０度として求める。 Based on the procedure of FIG. 4, correction is performed for three examples of the image B shown in FIG. Although it is known that the hue of the hand to be corrected is in the range of approximately 0 to 30 degrees, it is obtained as 20 degrees in this example.

まず赤の場合を想定する（ステップ４０２）。赤の色相値は０度である（ステップ４０３）。画像Ｂ（赤）の位相と手の位相の差は２０度である（ステップ４０４）。本実施の形態では手の色と映像Ｂの色は１対１で均等に混ざることとする。よって比率Ｋは０．５となる（ステップ４０５）。ステップ４０５より手の色が変色した場合の影響をもとめるが、画像Ｂ（赤）の位相と手の位相は同比率で混ざり変色したものとする。この場合変色した色相は１０度となる。手の本来の色相は２０度であり、変色した色相は１０度であるから、差分の１０度分画像Ａの色相分シフトさせるとよいことがわかる。 First, the case of red is assumed (step 402). The red hue value is 0 degree (step 403). The difference between the phase of image B (red) and the phase of the hand is 20 degrees (step 404). In this embodiment, it is assumed that the color of the hand and the color of the video B are mixed evenly on a one-to-one basis. Therefore, the ratio K is 0.5 (step 405). From step 405, the effect of the discoloration of the hand is determined. It is assumed that the phase of the image B (red) and the phase of the hand are mixed at the same ratio and discolored. In this case, the discolored hue is 10 degrees. Since the original hue of the hand is 20 degrees and the discolored hue is 10 degrees, it can be seen that it is better to shift the image A by the difference of 10 degrees.

画像Ｂが緑の場合、緑の色相は１２０度であるから（ステップ４０３）、手の色相は赤の場合と同じく２０度なので、差は−１００度である（ステップ４０４）。比率Ｋは０．５とする（ステップ４０５）。よって手の本来の位相まで移動させるには、−５０度分シフトさせればよいことがわかる（ステップ４０６）。 Since the hue of green is 120 degrees when the image B is green (step 403), the hue of the hand is 20 degrees as in the case of red, so the difference is -100 degrees (step 404). The ratio K is set to 0.5 (step 405). Therefore, in order to move to the original phase of the hand, it can be understood that it is necessary to shift by −50 degrees (step 406).

画像Ｂが青の場合、青の位相は−１２０度あるから（ステップ４０３）、手の位相との差は、−１４０度である（ステップ４０４）。比率Ｋは０．５とする（ステップ４０５）。よって手の本来の位相まで移動させるには７０度シフトさせればよいことが分かる（ステップ４０６）。 When the image B is blue, the phase of blue is −120 degrees (step 403), so the difference from the phase of the hand is −140 degrees (step 404). The ratio K is set to 0.5 (step 405). Therefore, it can be seen that a 70 degree shift is required to move the hand to the original phase (step 406).

図５に図２の画像Ｂが赤・緑・青の３例について図４の方法において補正した結果を示す。上段に補正後の映像を示す。それぞれの画像の下にその画像の色相を示す。図２の補正前と比較して、手の色を示す色相上に色があることがわかる。この画像をジェスチャー認識部に送ることで（ステップ４０７）、ジェスチャー認識部１５３により手の追跡が可能となる。 FIG. 5 shows the result of correcting the image B in FIG. 2 for three examples of red, green, and blue by the method of FIG. The corrected image is shown in the upper row. The hue of the image is shown below each image. It can be seen that there is a color on the hue indicating the color of the hand as compared to before correction in FIG. By sending this image to the gesture recognition unit (step 407), the gesture recognition unit 153 can track the hand.

（第２の実施の形態）
第２の実施の形態について、図面を参照して説明する。まず本実施の形態の入力装置を図６に示す。(Second Embodiment)
A second embodiment will be described with reference to the drawings. First, an input device of this embodiment is shown in FIG.

図６は、ディスプレイなどの映像出力装置と、ユーザからの入力を受け付ける入力機能を持った入力装置６００である。入力装置６００は、アンテナ部６０１、受信・変調部６０２、デコード部６０３、オーディオ出力部６０４、映像生成部６０５、映像表示部６０６、撮像部６５１、ジェスチャー認識部６５３、色情報算出部６５４から成り立っている。 FIG. 6 shows an input device 600 having a video output device such as a display and an input function for receiving an input from a user. The input device 600 includes an antenna unit 601, a reception / modulation unit 602, a decoding unit 603, an audio output unit 604, a video generation unit 605, a video display unit 606, an imaging unit 651, a gesture recognition unit 653, and a color information calculation unit 654. ing.

アンテナ部６０１において、放送局から出力される放送波を受信する。受信した放送波は受信・変調部６０２で受け取られＴＳデータに変換される。実施の形態では、受信部と復調部を合わせて１つの機能としたが、実際には、アンテナ６０１から受信するチューナ部と、受信波を変調するＯＦＤＭ変調機に分かれていることが多い。本実施の形態では、チューナをもった放送受信機を想定しているが、ＤＶＤやＨＤＤなどの記録メディアから蓄積済みのコンテンツを再生してもかまわない。 The antenna unit 601 receives a broadcast wave output from a broadcast station. The received broadcast wave is received by the reception / modulation unit 602 and converted into TS data. In the embodiment, the receiving unit and the demodulating unit are combined into one function, but in reality, the receiving unit and the demodulating unit are often divided into a tuner unit that receives from the antenna 601 and an OFDM modulator that modulates the received wave. In the present embodiment, a broadcast receiver having a tuner is assumed, but stored content may be reproduced from a recording medium such as a DVD or HDD.

受信・変調部６０２で、作られたＴＳデータはデコード部６０３に送れられる。デコード部６０３では、映像ＴＳデータと音声ＴＳデータに分離されてデコードされ映像信号と音声信号が生成される。音声信号は音声出力部６０４で音声に変換される。オーディオ出力６０４はスピーカやヘッドホンである。音声になったデータは聴覚情報６３０として、操作者６４０に到達する。操作者６４０はこれを耳で聞く。 The TS data generated by the reception / modulation unit 602 is sent to the decoding unit 603. In the decoding unit 603, video TS data and audio TS data are separated and decoded to generate a video signal and an audio signal. The audio signal is converted into audio by the audio output unit 604. The audio output 604 is a speaker or headphones. The voiced data reaches the operator 640 as auditory information 630. The operator 640 hears this by ear.

またデコード部６０３においてデコードされた映像信号は、映像生成部６０５に送られる。ここでは、デコードされた映像信号に、メニューやヘルプ画面など、入力装置６００にかかわる映像情報を組み合わせる。この付加する映像情報はアイコンやボタンのようにジェスチャーによって操作されるものであってもよし、ジェスチャーには直接関係ないフレームや演出であってもよい。映像生成部６０５によって生成された映像こそが、操作者が眼にする映像信号となる。 The video signal decoded by the decoding unit 603 is sent to the video generation unit 605. Here, video information related to the input device 600 such as a menu or a help screen is combined with the decoded video signal. The video information to be added may be operated by a gesture such as an icon or a button, or may be a frame or an effect not directly related to the gesture. The video generated by the video generation unit 605 is the video signal that the operator sees.

映像生成部６０５において、生成された映像信号は映像表示部６０６へ送られる。映像表示６０６は液晶ディスプレイやプラズマディスプレイなどの画像デバイスであり、無数の点によって光に変換され視覚情報６３１として、操作者６４０へ送られる。操作者６４０は、これを眼でとらえ理解する。 In the video generation unit 605, the generated video signal is sent to the video display unit 606. The video display 606 is an image device such as a liquid crystal display or a plasma display, and is converted into light by countless points and sent to the operator 640 as visual information 631. The operator 640 grasps and understands this with the eyes.

操作者６４０は、聴覚情報６３０と視覚情報６３１から、入力装置６００へ対しての操作を決定する。そして操作者６４０は体全体や手や足などの体の一部を使用して操作を行う。この情報はジェスチャー入力６３２として入力装置６００にある撮像部６５１によって撮影される。撮像部６５１で撮影された被写体は、撮像信号としてデータ化され、当該データはジェスチャー認識部６５３へ送られて、その内容が入力装置６００への操作内容が解析され、それに従い操作される。本構成では、映像生成部６０５から映像信号が色情報算出部６５４へ送られる。色情報算出部６５４ではジェスチャーを行っている対象物の色（本実施の形態では手）の変色量（色相角度）を計算し色情報とする。映像生成部６０５で生成される映像は時間経過によって常に変化しているため、撮像部６５１において撮影されたときに映像表示部６０６に表示されている映像と同じものを使用する。これにより、色情報算出部６５４から出力される色情報は、ビデオ信号出力部６０６による外乱を考慮したものとなる。この色情報算出部６５４の色情報をジェスチャー認識部６５３に送りジェスチャーの解析を行う。 The operator 640 determines an operation for the input device 600 from the auditory information 630 and the visual information 631. The operator 640 operates using the whole body or a part of the body such as a hand or a foot. This information is photographed by the imaging unit 651 in the input device 600 as the gesture input 632. The subject imaged by the imaging unit 651 is converted into data as an imaging signal, and the data is sent to the gesture recognition unit 653, and the content of the operation is analyzed for the input device 600, and the operation is performed accordingly. In this configuration, a video signal is sent from the video generation unit 605 to the color information calculation unit 654. The color information calculation unit 654 calculates a color change amount (hue angle) of the color of the target object (the hand in the present embodiment) on which the gesture is being performed, and obtains the color information. Since the video generated by the video generation unit 605 is constantly changing with the passage of time, the same video as that displayed on the video display unit 606 when taken by the imaging unit 651 is used. As a result, the color information output from the color information calculation unit 654 takes into account the disturbance caused by the video signal output unit 606. The color information of the color information calculation unit 654 is sent to the gesture recognition unit 653 to analyze the gesture.

本実施の形態において、映像表示部６０６で表示された映像が図３の映像表示部１０６で捕らえられた画像と同じである場合、撮像部６５１で捕らえられた映像については図３の撮像部１５１の画像と同一である。 In this embodiment, when the video displayed on the video display unit 606 is the same as the image captured by the video display unit 106 in FIG. 3, the video captured by the imaging unit 651 is the imaging unit 151 in FIG. 3. It is the same as the image.

図７に補正を行う色情報算出部６５４の処理を示す。色情報算出部６５４は、映像表示部６０６に表示した映像Ｂを映像生成部６０５より取得する（ステップ７０１）。 FIG. 7 shows processing of the color information calculation unit 654 that performs correction. The color information calculation unit 654 acquires the video B displayed on the video display unit 606 from the video generation unit 605 (step 701).

受け取った映像Ｂを解析し、主要な色相Ｘを求める（ステップ７０２）。 The received video B is analyzed to obtain a main hue X (step 702).

映像Ｂの解析結果である色相Ｘと手の色相の差分量Ｙを求める（ステップ７０３）。 A difference amount Y between the hue X and the hue of the hand, which is an analysis result of the video B, is obtained (step 703).

Ｙ＝Ｘ−Ｚ
手が映像Ｂの光を浴びた場合に、混ざり合う割合を係数Ｋとしてもとめる（ステップ７０４）。このとき他の光源や映像表示部６０６の表示デバイスの輝度や物体まで距離などの情報を用いて係数化してもよい。
差分Ｙと比率Ｋと手の色Ｚから、変化した手の位相Ｚ´を求める（ステップ７０５）。Y = X-Z
When the hand is exposed to the light of video B, the mixing ratio is obtained as a coefficient K (step 704). At this time, the coefficient may be converted using information such as the brightness of another light source or the display device of the video display unit 606 and the distance to the object.
The phase Z ′ of the changed hand is obtained from the difference Y, the ratio K, and the hand color Z (step 705).

Ｚ´＝Ｚ＋Ｋ×Ｙ
変色した位相値をジェスチャー認識部６５３に送る（ステップ７０６）。Z ′ = Z + K × Y
The changed phase value is sent to the gesture recognition unit 653 (step 706).

図７の手順に基づき、図３で示した画像Ｂが赤緑青の３例について補正を行ってみる。補正したい手の色相はおおよそ０から３０度の範囲であることがわかっているが、今回の例では２０度として求める。まず赤の場合を想定する（ステップ７０１）。 Based on the procedure of FIG. 7, correction is performed for three examples of the image B shown in FIG. Although it is known that the hue of the hand to be corrected is in the range of approximately 0 to 30 degrees, it is obtained as 20 degrees in this example. First, the case of red is assumed (step 701).

赤の色相値は０度である（ステップ７０２）。画像Ｂ（赤）の位相と手の位相の差は２０度である（ステップ７０３）。続いて手の色が変色した場合の影響をもとめるが、画像Ｂ（赤）の位相と手の位相は同比率で混ざり変色したものとする（ステップ７０４）。この場合変色した手色相は１０度となる。本実施の形態では同比率としたが、物体までの距離や、室内の照明と映像表示部６０６との明るさの差などの情報を利用して割合を変えてもよい。この１０度を映像Ｂでの手の位相Ｚ´として（ステップ７０５）、ジェスチャー認識部６５３へ送る（ステップ７０７）。 The red hue value is 0 degrees (step 702). The difference between the phase of the image B (red) and the phase of the hand is 20 degrees (step 703). Subsequently, the influence when the color of the hand is changed is calculated. It is assumed that the phase of the image B (red) and the phase of the hand are mixed at the same ratio and changed color (step 704). In this case, the discolored hand hue is 10 degrees. Although the ratio is the same in this embodiment, the ratio may be changed using information such as the distance to the object and the difference in brightness between the room illumination and the video display unit 606. This 10 degrees is sent to the gesture recognition unit 653 (step 707) as the hand phase Z ′ in the video B (step 705).

画像Ｂが緑の場合、緑の色相は１２０度であるから（ステップ７０２）、手の色相は赤の場合と同じく２０度なので、差分Ｙは１００度である（ステップ７０３）。比率Ｋは０．５とする（ステップ７０４）。よって変化した手の位相Ｚ´は、この７０度となる（ステップ７０５）。この７０度を手の色相としてジェスチャー認識部６５３へ送る（ステップ７０６）。 Since the hue of green is 120 degrees when the image B is green (step 702), the hue of the hand is 20 degrees as in the case of red, so the difference Y is 100 degrees (step 703). The ratio K is set to 0.5 (step 704). Therefore, the changed hand phase Z ′ is 70 degrees (step 705). This 70 degrees is sent to the gesture recognition unit 653 as the hue of the hand (step 706).

画像Ｂが青の場合、青の位相は−１２０度あるから（ステップ７０２）、手の位相との差は、−１４０度である（ステップ７０３）。比率Ｋは０．５とする（ステップ７０４）。てよって変化した手の位相Ｚ´は、この−５０度となる（ステップ７０５）。この−５０度を手の色相としてジェスチャー認識部６５３へ送る（ステップ７０６）。 When the image B is blue, the phase of blue is −120 degrees (step 702), so the difference from the phase of the hand is −140 degrees (step 703). The ratio K is set to 0.5 (step 704). The phase Z ′ of the hand thus changed becomes −50 degrees (step 705). This -50 degrees is sent to the gesture recognition unit 653 as the hand hue (step 706).

図８に図２の画像Ｂが赤・緑・青の３例について手の色相を移動した場合の領域を示す。それぞれ１０度、７０度、−５０度であるため色相空間を赤丸で表す。図のとおり手の領域をあらわしていることがわかる。手の候補となる色相そのものを変更することで、手の空間を手の候補とすることができる。 FIG. 8 shows a region when the hue of the hand is moved in three cases of image B in FIG. 2 of red, green, and blue. Since they are 10 degrees, 70 degrees, and -50 degrees, respectively, the hue space is represented by a red circle. It can be seen that it represents the hand area as shown in the figure. The hand space can be made a hand candidate by changing the hue itself that is a hand candidate.

前述した映像生成部１０５と６０５、映像補正部１５２、色情報保持部１５４、色情報算出部６５４、ジェスチャー認識部１５３と６５３は、入力機器内のＣＰＵにて動作するソフトウェアで実現しても、同様の機能を持つハードウェアで実現してもかまわない。また映像表示部１０６と６０６入力装置１００と６００に内蔵される形ではなく、外部に存在しケーブルや無線ネットワークを用いて入力装置１００と６００から出力された映像を表示してもかまわない。撮像部１５１と６５１は入力装置１００と６００に内蔵される形ではなく、外部に存在しケーブルや無線ネットワークを用いて入力装置１００と６００に入力される形でもかまわない。 The video generation units 105 and 605, the video correction unit 152, the color information holding unit 154, the color information calculation unit 654, and the gesture recognition units 153 and 653 described above may be realized by software operating on the CPU in the input device. It may be realized by hardware having similar functions. In addition, the video display unit 106 and 606 may not be built in the input devices 100 and 600, but may display images output from the input devices 100 and 600 using a cable or a wireless network. The imaging units 151 and 651 are not included in the input devices 100 and 600, but may be externally input to the input devices 100 and 600 using a cable or a wireless network.

本発明にかかる入力装置は、ディスプレイなどの発光する表示装置の前でジェスチャー操作を行う場合に、表示装置の影響で実際の色とは異なる色に変更されてしまった画像を補正することで、正しく物体を認識することができるようになり、正しくジェスチャーを認識できるようになる点で有用である。 When an input device according to the present invention performs a gesture operation in front of a display device that emits light, such as a display, by correcting an image that has been changed to a color different from the actual color due to the influence of the display device, This is useful in that an object can be correctly recognized and a gesture can be correctly recognized.

入力装置の構成を示すブロック図Block diagram showing the configuration of the input device手の色位相を表す図Diagram showing hand color phase映像変化による色変化を表す図Diagram showing color change due to image change映像補正部の処理内容を示すフローチャートFlow chart showing processing contents of video correction unit補正された映像結果を示す図Diagram showing corrected video results入力装置の構成を示すブロック図Block diagram showing the configuration of the input device色情報算出部の処理内容を示すフローチャートFlow chart showing processing contents of color information calculation unit補正された手の色相を示す図Diagram showing corrected hand hue従来技術のインターフェイス装置の概観図Overview of prior art interface device従来技術のインターフェイス装置詳細なブロック図Detailed block diagram of prior art interface device

符号の説明Explanation of symbols

１００，６００入力装置
１０１，６０１アンテナ部
１０２，６０２受信・変調部
１０３，６０３デコード部
１０４，６０４音声出力部
１０５，６０５映像生成部
１０６，６０６映像表示部
１３０，６３０聴覚情報
１３１，６３１視覚情報
１３２，６３２ジェスチャー入力
１４０，６４０操作者
１５１，６５１撮像部
１５２映像補正部
１５３，６５３ジェスチャー認識部
１５４色情報保持部
６５４色情報算出部100,600 Input device 101,601 Antenna unit 102,602 Reception / modulation unit 103,603 Decoding unit 104,604 Audio output unit 105,605 Video generation unit 106,606 Video display unit 130,630 Auditory information 131,631 Visual information 132, 632 Gesture input 140, 640 Operator 151, 651 Imaging unit 152 Video correction unit 153, 653 Gesture recognition unit 154 Color information holding unit 654 Color information calculation unit

Claims

Translated fromJapanese

映像信号を表示する映像表示部と、
撮影した光入力を撮像信号に変換する撮像部と、
前記映像表示部へ出力する映像信号に基づいて、前記撮像信号を補正する映像補正部と、前記映像補正部により補正された撮像信号に基づいて、撮影対象物を認識するジェスチャー認識部とを備えることを特徴とする入力装置。A video display for displaying video signals;
An imaging unit that converts the captured optical input into an imaging signal;
A video correction unit that corrects the imaging signal based on a video signal output to the video display unit, and a gesture recognition unit that recognizes a shooting target based on the imaging signal corrected by the video correction unit. An input device characterized by that.

請求項１に記載の入力装置において、
色情報を保持する色情報保持部を備え、
前記映像補正部は、前記映像信号と色情報に基づいて、前記撮像信号の色情報を補正することを特徴とする入力装置。The input device according to claim 1,
A color information holding unit for holding color information;
The input device, wherein the video correction unit corrects color information of the imaging signal based on the video signal and color information.

映像信号を表示する映像表示部と、
撮影した光入力を撮像信号に変換する撮像部と、
前記撮像信号に基づいて、撮影対象物を認識するジェスチャー認識部を備えることを特徴とする入力装置。A video display for displaying video signals;
An imaging unit that converts the captured optical input into an imaging signal;
An input device comprising a gesture recognizing unit for recognizing an object to be photographed based on the imaging signal.

請求項３に記載の入力装置において、
前記映像表示部が表示する映像信号から、前記撮像信号の変色量を算出する色情報算出部を備え、
前記ジェスチャー認識部は、前記撮像信号と変色量に基づいて、撮影対象物を認識することを特徴とする入力装置。The input device according to claim 3,
A color information calculation unit that calculates a color change amount of the imaging signal from a video signal displayed by the video display unit;
The gesture recognition unit recognizes a photographing object based on the imaging signal and a color change amount.