JP2004153382A

Movatterモバイル変換

Info

Publication number: JP2004153382A
Application number: JP2002313942A
Authority: JP
Inventors: Yutaka Kuramochi; 裕倉持
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2002-10-29
Filing date: 2002-10-29
Publication date: 2004-05-27

Abstract

<P>PROBLEM TO BE SOLVED: To provide a motion vector detecting apparatus capable of enhancing the detection accuracy of a motion vector without increasing a computing amount for block matching. <P>SOLUTION: A flowchart explaining operations in this embodiment includes: a step S04 of selecting a band having a characteristic frequency from a macro block of a plurality of processing image data including different frequency components included in a first hierarchical data; a step S05 of searching a motion vector through block matching between the macro blocks of the processing image data including the selected frequency region and of reference image data including the selected frequency region; a step S06 of correcting the motion vector obtained in the S05 through block matching using a second hierarchical image data in a searching area around the motion vector detected in the S05; and further a step S07 of correcting the motion vector detected by the S06 through block matching using uppermost hierarchical image data in a search area around the motion vector detected in the S06. <P>COPYRIGHT: (C)2004,JPO

Description

Translated fromJapanese

【０００１】
【発明の属する技術分野】
本発明は、画像の高能率符号化において、サブサンプリングにより解像度の異なる画像データを階層的に利用することで画像の動きベクトル検出に費やされる演算量を削減する技術に関し、画像予測符号化における画像圧縮エンコード処理に利用されるものである。そして、本発明は、ブロックマッチングの演算量を増やさずに動きベクトルの検出精度を上げることのできる装置を提供するものである。
【０００２】
【従来技術】
従来技術による動きベクトルの階層探索を図３を用いて説明する。本説明では、便宜的に最上位階層も含めて３階層で説明するが、階層数は任意に設定できる。図３において画像メモリ３０１に参照画像データ、画像メモリ３１１に処理画像データが記憶され最上位階層画像と呼ぶ。
【０００３】
画像メモリ３０１，３１１の画像データから２次元ローパスフィルタ３０２，３１２により、水平・垂直の低域周波数成分を抽出する。例えば、タップ係数［１／２，１／２］の１次元水平ローパスフィルタと１次元垂直ローパスフィルタを従属接続することで、２次元ローパスフィルタは実現される。２次元ローパスフィルタ３０２，３１２により抽出された低域周波数成分の画像に対して、水平・垂直方向に画素を１つ置きに間引くサブサンプリング処理をサブサンプリング処理回路３０３，３１３によって施す。サブサンプリングにより面積比１／４に画像サイズを縮小した画像を画像メモリ３０４，３１４に記憶し、この記憶した画像を第２階層画像と呼ぶ。
【０００４】
第２階層画像に対して、さらに２次元ローパスフィルタ３０５，３１５により水平・垂直の低域周波数成分を抽出し、サブサンプリング処理をサブサンプリング処理回路３０６，３１６によって施す。サブサンプリングにより更に面積比１／４に画像サイズを縮小した画像を画像メモリ３０７，３１７に記憶し、この記憶した画像を第１階層画像と呼ぶ。
【０００５】
このようにして解像度の異なる画像を階層的に作成する。これら階層画像に対して、最も解像度の低い画像で動きベクトルの検出を行い、順次解像度の高い画像を遡りながら検出した動きベクトルの修正を繰り返していく。
【０００６】
図３では、先ず、最も画素数の少ない参照画像データ３０７（画像メモリに記憶された画像データをその画像メモリの符号を用いて呼ぶことととする。以下も同様）の探索範囲３２１に対して、最も画素数の少ない処理画像データ３１７のマクロブロック３３１とのブロックマッチングによりベクトルを広範囲に探索する（ブロックマッチング回路３０８により実行する）。
【０００７】
次に、１つ上の解像度の参照画像データ３０４で、ブロックマッチング回路３０８によるブロックマッチングで求めた動きベクトル値を補正することで決定された探索の中心位置とその近傍を含む探索範囲３２２に対して、処理画像データ３１４のマクロブロック３３２とのブロックマッチングにより、ブロックマッチング回路３０８で求めた動きベクトル値を修正する。（ブロックマッチング回路３０９により実行する。）
この過程を繰り返して、最終的には最上位階層の参照画像データ３０１の探索範囲３２３に対して、最上位階層の処理画像データ３１１のマクロブロック３３３とのブロックマッチングが行われ（ブロックマッチング回路３１０により実行され）、動きベクトル値が決定され出力される。
【０００８】
この技術は、画素数の少ない第１階層でのみ広範囲な探索が行われ他の階層では近傍探索となるため、画素数の多い最上位階層画像のみで広範囲にブロックマッチングを実施するよりも少ない演算量でベクトル検出が実現される。（詳しくは、特開平１１−２８９５４５号公報を参照。）
【０００９】
【特許文献１】
特開平１１−２８９５４５号公報
【００１０】
【発明が解決しようとする課題】
画像の動きベクトル階層型探索では、階層数を増やすほど演算量が減り広域的な動きベクトル検出が可能となる。この際、各階層画像はローパスフィルタにより低域周波数成分が抽出され、画素をサブサンプリングすることにより縮小画像を生成する。この結果、階層数が増えるほど高域周波数成分が除去されること、また、折り返し成分により信号波形の区別がつき難くなることなどで正確なブロックマッチングができなという問題が発生していた。
【００１１】
この問題を解決するために、上述の特開平１１−２８９５４５号公報では、探索開始階層を最下位階層だけでなく中間階層からも探索を行い、探索開始階層の異なった動きベクトルを選択的に利用する方法を提案している。
【００１２】
しかし、提案されている方法では演算量の大部分をブロックマッチング演算が占めていることから、複数の動きベクトル検出は演算量の増加を招くこととなり、さらなる演算量の削減が求められていた。
【００１３】
本発明は、前記問題を解決するために、ブロックマッチングの演算量を増やさずに動きベクトルの検出精度を上げることのできる動きベクトル検出装置を提供することを目的としている。
【００１４】
【課題を解決するための手段】
そこで、上記課題を解決するために本発明は、
時系列的に前後する処理画像データと参照画像データとを記憶する最上位画像メモリと、
それぞれの前記画像データから特定の周波数領域を抽出する第１のプリフィルタと、
前記第１のプリフィルタから出力された画像の画素を均等に間引く第１のサブサンプリング器と、
前記第１のサブサンプリング器から出力される画素数の削減された画像を記憶する第１の画像メモリと、
前記画素数の削減された画像から２次元周波数空間を複数の領域に分割しそれぞれの成分を出力する第２のプリフィルタと、
前記第２のプリフィルタから出力される周波数領域の異なる複数の画像の画素を均等に間引く第２のサブサンプリング器と、
前記第２のサブサンプリング器から出力される画素数の削減された複数の画像を記憶する第２の画像メモリと、
前記第２の画像メモリに記憶された周波数領域の異なる処理画像データ毎に、各処理画像データの画素数に比例するマクロブロックサイズに含まれる画素値の絶対値の総和を求めるヒストグラム演算器と、
前記ヒストグラム演算器から出力される各ヒストグラム値に所定の定数を掛ける掛け算器と、
複数の周波数領域の異なる処理画像データの同じ位置にあるマクロブロックの、前記掛け算器により前記所定の定数が掛けられたヒストグラム値の中から最大の値を求めて、その最大値となるヒストグラム値が得られた処理画像データの周波数領域を選択する機能を有し、前記選択された周波数領域から抽出された処理画像データのマクロブロックと、前記選択された周波数領域と同じ領域から抽出された前記第２の画像メモリに記憶されている参照画像データとの間で、ブロックマッチングにより動きベクトル値を計算する第２の動きベクトル検出器と、
前記第２の動きベクトル検出器から得られた動きベクトル値を補正して、前記第１の画像メモリに記憶された処理画像データのマクロブロックと、前記第１の画像メモリに記憶された参照画像データとから動きベクトル値を計算する第１のベクトル検出器と、
前記第１のベクトル検出器から得られた動きベクトル値を補正して、前記最上位画像メモリに記憶された処理画像データのマクロブロックと、前記最上位画像メモリに記憶された参照画像データとから動きベクトル値を計算する最上位ベクトル検出器と、
を備えたことを特徴とする動きベクトル検出装置、
を提供するものである。
【００１５】
【発明の実施の形態】
従来の方法においては、サブサンプリング時のプリフィルタ特性を緩やかに減衰する特性に選ぶことで、高域成分を折り返させ高域情報がある程度保存されるようになっている。しかし、基本的には、画像データ中に含まれる周波数成分に係わりなく、単一のフィルタ出力のみでブロックマッチングを行う点に問題がある。
【００１６】
そこで、ブロックマッチングを行うマクロブロック内に含まれる周波数成分に応じて適応的にプリフィルタの特性を選択してサブサンプリングを行えばこの問題は解決する。
【００１７】
例えば、図１１に示すフローチャートの処理を実現すればよい。まず、処理画像データと参照画像データの原画像を最上位階層画像として画像メモリに記憶する（ステップ０１、以下Ｓ０１とする。他も同様）。原画像から水平・垂直低域成分を抽出後、サブサンプリングによりサイズを縮小した画像を第２階層画像データとして画像メモリに記憶する（Ｓ０２）。
【００１８】
２次元周波数空間を複数の領域に分割するフィルタにより各周波数領域の成分を第２階層画像データから抽出し、それぞれの周波数成分を含む画像を第１階層画像データとして画像メモリに記憶する（Ｓ０３）。
【００１９】
第１階層画像データに保存されている異なる周波数成分を保持する複数の処理画像データのマクロブロックから特徴的な周波数成分を持つ帯域を選択する（Ｓ０４）。
【００２０】
特徴的な周波数成分とは、例えば後述する輝度のヒストグラムなどがある。第１階層画像データにおいて選択された周波数領域を含む処理画像データのマクロブロックと選択された周波数領域を含む参照画像データとの間でブロックマッチングにより動きベクトルを探索する（Ｓ０５）。
【００２１】
Ｓ０５で検出された動きベクトルを中心とした探索領域で第２階層画像データを用いたブロックマッチングによりＳ０５の動きベクトルを修正する（Ｓ０６）。さらに、Ｓ０６で検出された動きベクトルを中心としたサーチ領域で最上位階層画像データを用いたブロックマッチングによりＳ０６の動きベクトルを修正する（Ｓ０７）。Ｓ０７で検出された動きベクトルを探索結果として出力する（Ｓ０８）。
【００２２】
本実施手段では、１回目のサブサンプリングでは低域成分を使用し、２回目のサブサンプリングでは適応選択された周波数領域を使用する３層構成を示した。これは、１回目のサブサンプリングに比べて２回目のサブサンプリングの方が画像の特徴を多く含む周波数成分に大きな影響を及ぼすためである。
【００２３】
本発明の実施例の詳細を、図１のシステム構成図と図１１の処理の手順を示したフローチャートを用いて説明する。
【００２４】
図１１のフローチャートのＳ０１の処理に対応して、（水平画素数）７２０×（垂直画素数）４８０画素を有する参照画像データを図１の画像メモリ１０１に、処理画像データを画像メモリ１２１に記憶し最上位階層画像と呼ぶ。
【００２５】
次に図１１のＳ０２に対応して以下の処理により第２階層の画像を得る。図１の画像メモリ１０１と画像メモリ１２１とに記憶されたそれぞれの画像に対して、２次元ローパスフィルタ１０２，１２２により水平・垂直低域周波数成分が抽出される。例えば、２次元ローパスフィルタはタップ係数［１／４，１／２，１／４］の１次元水平ローパスフィルタと１次元垂直ローパスフィルタを従属接続することで実現される。
【００２６】
２次元ローパスフィルタ１０２，１２２の出力に、水平・垂直ともに１画素おきにサブサンプリング処理をサブサンプリング処理回路１０３、１２３によって施す。サブサンプリングの方法として、例えば図１０の黒丸の画素を削除して白丸のみ残す方法を用いて、面積比１／４の画像サイズを作成する。このように、３６０×２４０画素にサイズ縮小された参照画像データと処理画像データとを作成し、それぞれを画像メモリ１０４，１２４に記憶する。この画像を第２階層画像と呼ぶ。
【００２７】
図１１のＳ０３に対応して以下の処理により第１階層の画像を得る。前記第２階層の３６０×２４０画素の画像を図２に示す水平低域・垂直低域ＬＬ、水平低域・垂直高域ＬＨ、水平高域・垂直低域ＨＬ、水平高域・垂直高域ＨＨの４つの領域にプリフィルタ１０５、１２５により分割する。例えば、タップ係数［１／４，１／２，１／４］の１次元水平ローパスフィルタと１次元垂直ローパスフィルタを従属接続することでＬＬ領域が抽出され、前記１次元水平ローパスフィルタの出力からＬＬ領域を引き算することでＬＨ領域が抽出できる。
【００２８】
また、タップ係数［ −１／４，１／２， −１／４］の１次元水平ハイパスフィルタとタップ係数［１／４，１／２，１／４］の１次元垂直ローパスフィルタを従属接続することでＨＬ領域が抽出され、前記１次元水平ハイパスフィルタの出力からＨＬ領域を引き算することでＨＨ領域が抽出できる。
【００２９】
生成された４枚の３６０×２４０画素を有する画像データに水平・垂直とも１画素おきにサブサンプリング処理をサブサンプリング処理回路１０６，１２６によって施し、サイズ縮小した１８０×１２０画素を有する各４枚の画像データを得る。ＬＬ領域の各画像データは画像メモリ１０７，１２７に、ＬＨ領域の各画像データは画像メモリ１０８，１２８に、ＨＬ領域の各画像データは画像メモリ１０９，１２９に、ＨＨ領域の各画像データは画像メモリ１１０，１３０に記憶される。この画像を第１階層画像と呼ぶ。この結果、第１階層には、最上位階層の画像から面積比１／１６にサイズ縮小された参照画像データと処理画像データがそれぞれ４枚記憶されている。
【００３０】
図１１のＳ０４に対応して以下の方法で周波数帯域の選択を行う。本実施例では、７２０×４８０画素の最上位階層画像で１６×１６画素を単位とするマクロブロック（以下、ＭＢと記す）で動きベクトルを検出する場合を考える。第１階層画像１２７〜１３０は、最上位階層画像１２１の面積比１／１６にサイズ縮小されているので、ＭＢも面積比１／１６にサイズ縮小され４×４画素を単位としてブロックマッチングを行い、動きベクトルを検出する。第１階層画像には周波数領域の異なる４枚の画像１２７〜１３０が存在するので、画像中の同じ座標に位置するＭＢも４つ存在する。このＭＢ内の座標（ｘ，ｙ）における輝度レベル（Ｙｉ：ｉ＝ＬＬ，ＬＨ，ＨＬ，ＨＨ）の絶対値の総和（ＭＢｉ：ｉ＝ＬＬ，ＬＨ，ＨＬ，ＨＨ）
【００３１】
【数１】

を計算し、それぞれのＭＢのヒストグラムＭＢＬＬ，ＭＢＬＨ，ＭＢＨＬ，ＭＢＨＨを求める（周波数領域選択器１３１内のヒストグラム演算器で求める）。この際、ＬＬ領域は直流成分を含むため単純に他の領域と比較すると不具合が生じる可能性がある。そこで、ＭＢＬＬのみ
【００３２】
【数２】

【００３３】
【数３】

として直流成分を除去する方法を用いている。
【００３４】
動きベクトル探索を行う周波数領域選択器１３１では、内蔵する掛け算器によりＭＢＬＬ，ＭＢＬＨ，ＭＢＨＬ，ＭＢＨＨにそれぞれ重み係数ＫＬＬ，ＫＬＨ，ＫＨＬ，ＫＨＨを掛け、
【００３５】
【数４】

により最大値をもつＭＢｉをＭｃとして選択後、画像メモリ１３２に記憶する。本実施例では、Ｋｉ＝１とした。Ｍｃが属する画像データと同じ周波数領域の参照画像データを画像メモリ１０７〜１１０から選択し画像メモリ１１１に記憶する。
【００３６】
図１１のＳ０５の処理に対応して、図１の第１階層のＭＢであるＭｃ（１３２）と参照画像データ１１１の探索範囲１１２との間でブロックマッチングにより動きベクトルを検出する。（ブロックマッチング回路１４０により実行する。）
以下で、ブロックマッチングについて説明する。また、説明で用いている図において同一のものを示すものには同一の番号が付けられている。図４で第１階層の選択された処理画像データ４０１のＭｃ（４０２）において左上の点４０３を原点とするとき、Ｍｃ内の画素位置４０４を座標（ｘ，ｙ）で表現しその輝度値をＹｔとする。また、図５で第１階層の選択された参照画像データ５０１において左上の点５０３を原点とするとき、Ｍｃ（４０２）の位置を中心とする探索範囲５０２内の画素位置５０４を座標（ｈ，ｖ）で表現しその輝度値をＹｒとする。
【００３７】
この条件の下で、ブロックマッチングはＹｔとＹｒとの差分の絶対値の総和（Ｈ）
【００３８】
【数５】

を、例えば第１階層のサーチレンジで与えられる水平・垂直ともにＭｃの位置を中心に±４画素の範囲内で、座標（ｈ，ｖ）を１画素ずつ水平・垂直にずらしながら求める。この結果、Ｈを最小にする座標（ｈ，ｖ）と処理画像データのＭｃの座標から第１階層の動きベクトル値が求められる。
【００３９】
この様子を図６で説明する。以下の説明では、原点を参照画像データ５０１の左上の点５０３とする。第１階層の参照画像データ５０１の内部にベクトル探索領域５０２が与えられる。処理画像データのＭｃ（４０２）の左上の点６０２を座標（ｈｍ，ｖｍ）とする。このときＨを最小にする領域６０３の左上の点６０４が座標（ｈｐ，ｖｐ）であったとするとベクトル（ｍｘ，ｍｙ）は、
【００４０】
【数６】

として求められる。
【００４１】
図１１のＳ０６の処理に対応して以下のように第２階層で動きベクトルの修正を行う。第１階層で求めたベクトル値を水平・垂直方向共に２倍して得られる位置を、第２階層でのブロックマッチングを行う中心位置とし、例えば第２階層のサーチレンジとして水平・垂直ともに±２画素で与えられる図１の参照画像データ１０４の探索範囲１１３に対して、処理画像データ１２４のＭＢ（１３３）との間でブロックマッチングによりベクトル値を修正する（ブロックマッチング回路１４１により実行する）。この様子を図７、８、９で説明する。
【００４２】
図７のＨＮ×ＶＮ画素の第Ｎ階層の参照画像データ７０１に対して、第Ｎ階層の処理画像データのＭＢ（７０２）の左上座標が（ａＮ，ｂＮ）であり、動きベクトル７０３が（ｍｘ，ｍｙ）であったとする。このとき、上位階層である図８の第Ｎ＋１階層の参照画像データ８０１の画像サイズは水平・垂直ともに２倍の２ＨＮ×２ＶＮ画素となるので、第Ｎ＋１階層の処理画像データのＭＢ（８０２）のサイズも水平・垂直ともに２倍となり、左上座標も２倍の（２ａＮ，２ｂＮ）、動きベクトル８０３も２倍の（２ｍｘ，２ｍｙ）となる。
【００４３】
この結果、図９で動きベクトル８０３により与えられるＭＢと同じサイズの領域９０１を中心に第Ｎ＋１階層での動きベクトル探索範囲９０２が決定され、第Ｎ＋１階層の動きベクトルが検出される。
【００４４】
さらに、図１１のＳ０７の処理に対応して以下のように最上位階層で動きベクトルの修正を行う。第２階層で求めたベクトル値を水平・垂直方向共に２倍して得られる位置を最上位階層でのブロックマッチングの中心位置として、例えば最上位階層画像のサーチレンジで与えられる水平・垂直ともに±２画素の範囲内において、図１の参照画像データ１０１の探索範囲１１４に対して、処理画像データ１２１のＭＢ（１３４）との間でブロックマッチングによりベクトルが修正される。（ブロックマッチング回路１４２により実行する。）
最後に、図１１のＳ０８の処理として、最上位階層で修正された動きベクトルが最終的な値として出力される。
【００４５】
以上、実施例を説明したが、本発明は階層数を限定するものではなく、また何階層目に本発明の特徴である周波数領域分割によるブロックマッチングを適用するかも限定されるものではない。さらに、周波数空間を分割する領域数や分割するフィルタの特性も限定するものではない。また、サブサンプリングの位相も限定されない。
【００４６】
即ち、時系列的に前後する処理画像データと参照画像データとの間の相関を求める際に、サブサンプリングにより画素を一定間隔で間引かれた画像データを用いる階層的相関検出において、、画像データの２次元周波数空間を複数の領域に分割するフィルタを備え、前記フィルタをサブサンプリングのプリフィルタとして用いて周波数領域を選択した後、サブサンプリングした画像データから相関を求めることを特徴とする動きベクトル検出装置とすればよい。
【００４７】
さらには、この動きベクトル検出装置において、プリフィルタで抽出する２次元周波数領域を選択する際に、相関を求めるマクロブロックに含まれる周波数領域毎の輝度値の絶対値の総和に特定の係数を掛けた値の中から最大値を示す周波数領域を選択するようにすればよい。
【００４８】
なお、上記した動きベクトル検出装置の機能を動きベクトル検出用プログラム（例えば図１１のフローチャートに基づくプログラム）によりコンピュータに実現させるようにしいもよい。この動きベクトル検出用プログラムは、記録媒体から読みとられてコンピュータに取り込まれてもよいし、通信ネットワークを介して伝送されてコンピュータに取り込まれてもよい。
【００４９】
【発明の効果】
以上説明したように、本発明の動きベクトル検出装置は、ブロックマッチングの演算量を増加させることなく、従来技術では除去されていた高域周波数成分を適応的に利用して動きベクトルを検出できるため、検出精度が向上する。（本発明においては、従来技術に比べてフィルタ処理とサブサンプリング処理の演算量が増加するものの、フィルタ処理とサブサンプリング処理の演算量は、ブロックマッチング演算の演算量に比べて非常に小さいものである。よって、ブロックマッチング演算の演算量増加に比べて、フィルタ処理とサブサンプリング処理の演算量増加は、動きベクトル検出の全体的な演算量増加に対する影響が非常に少なく、ブロックマッチングの演算量を増加させないことの効果は非常に大きいものである。）
【図面の簡単な説明】
【図１】本発明の実施形態に係わる動きベクトル検出装置のシステム構成図である。
【図２】２次元画像周波数空間の領域分割を表す図である。
【図３】階層動きベクトル検出装置の従来例のシステム構成図である。
【図４】第１階層における処理画像データの様子を示す図である。
【図５】第１階層における参照画像データの様子を示す図である。
【図６】第１階層における動きベクトル検出の様子を示す図である。
【図７】第Ｎ階層における動きベクトルを示す図である。
【図８】第Ｎ＋１階層における動きベクトルを示す図である。
【図９】第Ｎ＋１階層における動きベクトル検出の探索範囲を示す図である。
【図１０】画素をサブサンプリングする方法の例を示した図である。
【図１１】本発明の実施形態における動作説明用のフローチャートである。
【符号の説明】
１０２，１２２２次元ローパスフィルタ
１０３，１２３サブサンプリング処理回路
１０４，１２４画像メモリ
１０５，１２５プリフィルタ
１０６，１２６サブサンプリング処理回路
１０７〜１１０、１２７〜１３０画像メモリ
１３１周波数領域選択器
１４０〜１４２ブロックマッチング回路[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a technique for reducing the amount of calculation used for detecting a motion vector of an image by hierarchically using image data having different resolutions by sub-sampling in high-efficiency encoding of an image. It is used for compression encoding processing. The present invention provides an apparatus that can increase the accuracy of detecting a motion vector without increasing the amount of calculation for block matching.
[0002]
[Prior art]
A hierarchical search for a motion vector according to the related art will be described with reference to FIG. In this description, three layers including the highest layer are described for convenience, but the number of layers can be set arbitrarily. In FIG. 3, the reference image data is stored in theimage memory 301, and the processed image data is stored in the image memory 311 and is referred to as the highest hierarchical image.
[0003]
Horizontal and vertical low frequency components are extracted from the image data in theimage memories 301 and 311 by the two-dimensional low-pass filters 302 and 312. For example, a two-dimensional low-pass filter is realized by cascading a one-dimensional horizontal low-pass filter and a one-dimensional vertical low-pass filter with tap coefficients [1/2, 1/2]. Thesub-sampling processing circuits 303 and 313 perform a sub-sampling process on the image of the low frequency components extracted by the two-dimensional low-pass filters 302 and 312 to thin out every other pixel in the horizontal and vertical directions. Images whose image size is reduced to an area ratio of 1/4 by subsampling are stored in the image memories 304 and 314, and the stored images are referred to as second-layer images.
[0004]
The horizontal and vertical low frequency components are further extracted from the second hierarchical image by the two-dimensional low-

pass filters

305 and 315, and sub-sampling processing is performed by the

sub-sampling processing circuits

306 and 316. Images whose image size is further reduced to 1/4 by subsampling are stored in the

image memories

307 and 317, and the stored images are referred to as first-layer images.
[0005]
In this way, images having different resolutions are created hierarchically. For these hierarchical images, a motion vector is detected with the image with the lowest resolution, and the correction of the detected motion vector is repeated while going back to the image with the higher resolution.
[0006]
In FIG. 3, first, asearch range 321 ofreference image data 307 having the smallest number of pixels (image data stored in an image memory is referred to using the code of the image memory; the same applies to the following description). The vector is searched over a wide range by block matching with the macroblock 331 of the processedimage data 317 having the least number of pixels (executed by the block matching circuit 308).
[0007]
Next, asearch range 322 including the center position of the search determined by correcting the motion vector value obtained by the block matching by theblock matching circuit 308 using the reference image data 304 having the next higher resolution and the vicinity thereof is determined. Then, the motion vector value obtained by the block matchingcircuit 308 is corrected by block matching of the processed image data 314 with themacroblock 332. (Executed by theblock matching circuit 309.)
By repeating this process, finally, the search range 323 of thereference image data 301 of the highest hierarchy is subjected to block matching with themacroblock 333 of the processed image data 311 of the highest hierarchy (block matching circuit 310). ), And the motion vector value is determined and output.
[0008]
In this technique, since a wide range search is performed only in the first layer having a small number of pixels and a neighborhood search is performed in other layers, the number of operations is smaller than that in a case where block matching is performed extensively only with the highest hierarchical image having a large number of pixels. Vector detection is realized by the quantity. (For details, refer to JP-A-11-289545.)
[0009]
[Patent Document 1]
JP-A-11-289545
[Problems to be solved by the invention]
In the motion vector hierarchical search of an image, as the number of hierarchies increases, the amount of calculation decreases, and wide area motion vector detection becomes possible. At this time, a low-pass frequency component is extracted from each hierarchical image by a low-pass filter, and pixels are sub-sampled to generate a reduced image. As a result, there has been a problem that as the number of layers increases, higher frequency components are removed, and it is difficult to distinguish signal waveforms due to aliasing components, so that accurate block matching cannot be performed.
[0011]
In order to solve this problem, Japanese Patent Application Laid-Open No. H11-289545 described above searches a search start layer not only from the lowest layer but also from an intermediate layer, and selectively uses motion vectors having different search start layers. Suggest a way to do it.
[0012]
However, in the proposed method, since most of the calculation amount is occupied by the block matching calculation, the detection of a plurality of motion vectors causes an increase in the calculation amount, and a further reduction in the calculation amount has been required.
[0013]
SUMMARY OF THE INVENTION It is an object of the present invention to provide a motion vector detecting device capable of improving the accuracy of detecting a motion vector without increasing the amount of calculation for block matching in order to solve the above problem.
[0014]
[Means for Solving the Problems]
Therefore, in order to solve the above problems, the present invention
A top-level image memory that stores processed image data and reference image data that are chronologically preceding and following,
A first pre-filter for extracting a specific frequency region from each of the image data;
A first sub-sampler for evenly thinning out pixels of an image output from the first pre-filter;
A first image memory for storing an image with a reduced number of pixels output from the first sub-sampler;
A second prefilter that divides a two-dimensional frequency space from the image with the reduced number of pixels into a plurality of regions and outputs respective components;
A second sub-sampler for evenly thinning out pixels of a plurality of images having different frequency domains output from the second pre-filter;
A second image memory for storing a plurality of images with a reduced number of pixels output from the second sub-sampler;
A histogram calculator for calculating a sum of absolute values of pixel values included in a macroblock size proportional to the number of pixels of each of the processed image data, for each of the different processed image data in the frequency domain stored in the second image memory;
A multiplier that multiplies each histogram value output from the histogram calculator by a predetermined constant,
For the macroblock located at the same position of different processed image data in a plurality of frequency domains, the maximum value is obtained from the histogram values multiplied by the predetermined constant by the multiplier, and the maximum histogram value is obtained. A function of selecting a frequency region of the obtained processed image data, a macroblock of the processed image data extracted from the selected frequency region, and the second extracted from the same region as the selected frequency region. A second motion vector detector that calculates a motion vector value by block matching with reference image data stored in the second image memory;
A motion vector value obtained from the second motion vector detector is corrected, and a macro block of the processed image data stored in the first image memory and a reference image stored in the first image memory are corrected. A first vector detector that calculates a motion vector value from the data and
The motion vector value obtained from the first vector detector is corrected, and the macroblock of the processed image data stored in the uppermost image memory and the reference image data stored in the uppermost image memory are corrected. A top-level vector detector for calculating a motion vector value;
A motion vector detection device, comprising:
Is provided.
[0015]
BEST MODE FOR CARRYING OUT THE INVENTION
In the conventional method, by selecting the pre-filter characteristic at the time of sub-sampling to a characteristic that attenuates slowly, the high-frequency component is turned back, and the high-frequency information is preserved to some extent. However, basically, there is a problem in that block matching is performed using only a single filter output regardless of the frequency components included in the image data.
[0016]
Thus, this problem can be solved by adaptively selecting the characteristics of the prefilter and performing subsampling according to the frequency components included in the macroblock to be subjected to block matching.
[0017]
For example, the processing of the flowchart shown in FIG. 11 may be realized. First, the original images of the processed image data and the reference image data are stored in the image memory as the highest hierarchical image (Step 01, hereinafter referred to as S01, and so on). After extracting the horizontal and vertical low-frequency components from the original image, the image reduced in size by sub-sampling is stored in the image memory as second-layer image data (S02).
[0018]
The components of each frequency region are extracted from the second layer image data by a filter that divides the two-dimensional frequency space into a plurality of regions, and the image including each frequency component is stored in the image memory as the first layer image data (S03). .
[0019]
A band having a characteristic frequency component is selected from a plurality of macroblocks of processed image data holding different frequency components stored in the first hierarchical image data (S04).
[0020]
The characteristic frequency component includes, for example, a luminance histogram described later. A motion vector is searched by block matching between the macroblock of the processed image data including the selected frequency domain in the first hierarchical image data and the reference image data including the selected frequency domain (S05).
[0021]
The motion vector of S05 is corrected by block matching using the second layer image data in the search area centered on the motion vector detected in S05 (S06). Further, the motion vector of S06 is corrected by block matching using the highest hierarchical image data in the search area centered on the motion vector detected in S06 (S07). The motion vector detected in S07 is output as a search result (S08).
[0022]
In the present embodiment, a three-layer configuration using a low-frequency component in the first sub-sampling and using an adaptively selected frequency domain in the second sub-sampling is shown. This is because the second sub-sampling has a greater effect on frequency components including many image features than the first sub-sampling.
[0023]
An embodiment of the present invention will be described in detail with reference to a system configuration diagram of FIG. 1 and a flowchart showing a procedure of processing of FIG.
[0024]
In correspondence with the processing of S01 in the flowchart of FIG. 11, reference image data having (the number of horizontal pixels) 720 × (the number of vertical pixels) 480 pixels is stored in the image memory 101 of FIG. 1, and the processed image data is stored in the image memory 121. Then, it is called the highest hierarchical image.
[0025]
Next, an image of the second hierarchy is obtained by the following processing corresponding to S02 in FIG. The horizontal and vertical low frequency components are extracted from the respective images stored in the image memory 101 and the image memory 121 in FIG. 1 by the two-dimensional low-pass filters 102 and 122. For example, a two-dimensional low-pass filter is realized by cascade-connecting a one-dimensional horizontal low-pass filter having a tap coefficient of [, ２，, １／] and a one-dimensional vertical low-pass filter.
[0026]
Thesub-sampling processing circuits 103 and 123 apply sub-sampling processing to the outputs of the two-dimensional low-pass filters 102 and 122 every other pixel both horizontally and vertically. As a sub-sampling method, for example, an image size having an area ratio of 1/4 is created by using a method of deleting black pixels in FIG. 10 and leaving only white circles. In this way, the reference image data and the processed image data reduced in size to 360 × 240 pixels are created and stored in the image memories 104 and 124, respectively. This image is called a second hierarchical image.
[0027]
The first layer image is obtained by the following processing corresponding to S03 in FIG. The image of 360 × 240 pixels of the second hierarchy is shown in FIG. 2 as horizontal low / vertical low LL, horizontal low / vertical high LH, horizontal high / vertical low HL, horizontal high / vertical high HH is divided into four regions byprefilters 105 and 125. For example, an LL region is extracted by cascade-connecting a one-dimensional horizontal low-pass filter and a one-dimensional vertical low-pass filter having tap coefficients [, ２，, １／], and an LL region is extracted from the output of the one-dimensional horizontal low-pass filter. The LH area can be extracted by subtracting the LL area.
[0028]
In addition, a one-dimensional horizontal high-pass filter having tap coefficients [−１／, ２，, −１／] and a one-dimensional vertical low-pass filter having tap coefficients [, ２，, ４] are cascaded. Then, the HL area is extracted, and the HH area can be extracted by subtracting the HL area from the output of the one-dimensional horizontal high-pass filter.
[0029]
Sub-sampling processing is performed by the sub-sampling processing circuits 106 and 126 on the generated four pieces of image data having 360 × 240 pixels every other pixel both horizontally and vertically, and each of the four pieces of image data having 180 × 120 pixels is reduced in size. Obtain image data. Each image data in the LL area is stored in theimage memories 107 and 127, each image data in the LH area is stored in the

image memories

108 and 128, each image data in the HL area is stored in theimage memories 109 and 129, and each image data in the HH area is stored in the image memory. It is stored in thememories 110 and 130. This image is called a first hierarchical image. As a result, in the first layer, four pieces of reference image data and four pieces of processed image data, each of which has been reduced in size to an area ratio of 1/16 from the image of the highest layer, are stored.
[0030]
A frequency band is selected by the following method corresponding to S04 in FIG. In the present embodiment, a case is considered in which a motion vector is detected in a macroblock (hereinafter, referred to as MB) in units of 16 × 16 pixels in the highest hierarchical image of 720 × 480 pixels. Since the first hierarchical images 127 to 130 are reduced in size to thearea ratio 1/16 of the top hierarchical image 121, the MBs are also reduced in size to thearea ratio 1/16, and block matching is performed in units of 4 × 4 pixels. , Detect the motion vector. Since the first hierarchical image includes four images 127 to 130 having different frequency domains, there are also four MBs located at the same coordinates in the image. Sum of absolute values of luminance levels (Yi: i = LL, LH, HL, HH) at coordinates (x, y) in this MB (MBi: i = LL, LH, HL, HH)
[0031]
(Equation 1)

Is calculated, and histograms MBLL, MBLH, MBHL, and MBHH of each MB are obtained (calculated by a histogram calculator in the frequency domain selector 131). At this time, since the LL region includes a DC component, there is a possibility that a problem may occur when compared with other regions simply. Therefore, only MBLL
(Equation 2)

[0033]
[Equation 3]

A method of removing a DC component is used.
[0034]
In thefrequency domain selector 131 for performing a motion vector search, MBLL, MBLH, MBHL, MBHH are respectively multiplied by weighting factors KLL, KLH, KHL, KHH by a built-in multiplier.
[0035]
(Equation 4)

After the MBi having the maximum value is selected as Mc, it is stored in theimage memory 132. In the present embodiment, Ki = 1. Reference image data in the same frequency region as the image data to which Mc belongs is selected from theimage memories 107 to 110 and stored in the image memory 111.
[0036]
In response to the processing in S05 in FIG. 11, a motion vector is detected by block matching between Mc (132), which is the first layer MB in FIG. 1, and the search range 112 of the reference image data 111. (Executed by theblock matching circuit 140.)
Hereinafter, block matching will be described. In the drawings used in the description, the same components are denoted by the same reference numerals. In FIG. 4, when the upperleft point 403 is the origin in the Mc (402) of the selected processed image data 401 of the first hierarchy, thepixel position 404 in the Mc is represented by the coordinates (x, y), and the luminance value is obtained. Let it be Yt. In FIG. 5, when the upperleft point 503 is selected as the origin in the selected reference image data 501 of the first hierarchy, thepixel position 504 in thesearch range 502 centered on the position of Mc (402) is set to the coordinates (h, v) and the luminance value is Yr.
[0037]
Under this condition, the block matching is the sum (H) of the absolute values of the differences between Yt and Yr.
[0038]
(Equation 5)

For example, the coordinates (h, v) are obtained by shifting the coordinates (h, v) horizontally and vertically by one pixel within a range of ± 4 pixels around the position of Mc both in the horizontal and vertical directions given by the search range of the first hierarchy. As a result, the motion vector value of the first layer is obtained from the coordinates (h, v) that minimize H and the coordinates of Mc of the processed image data.
[0039]
This will be described with reference to FIG. In the following description, the origin is defined as apoint 503 at the upper left of the reference image data 501. Avector search area 502 is provided inside the first-layer reference image data 501. Apoint 602 at the upper left of Mc (402) of the processed image data is defined as coordinates (hm, vm). At this time, if the upper left point 604 of the area 603 where H is minimized is the coordinates (hp, vp), the vector (mx, my) becomes
[0040]
(Equation 6)

Is required.
[0041]
The correction of the motion vector is performed in the second layer as follows in correspondence with the processing of S06 in FIG. A position obtained by doubling the vector value obtained in the first hierarchy in both the horizontal and vertical directions is defined as a center position for performing block matching in the second hierarchy, for example, ± 2 in both the horizontal and vertical directions as a search range in the second hierarchy. The vector value is corrected by block matching between the MB (133) of the processed image data 124 and the search range 113 of the reference image data 104 of FIG. 1 given by pixels (executed by the block matching circuit 141). This situation will be described with reference to FIGS.
[0042]
7, the upper left coordinates of the MB (702) of the processed image data of the Nth layer are (aN, bN), and themotion vector 703 is (mx) , My). At this time, the image size of the reference image data 801 of the (N + 1) -th layer in the upper layer of FIG. 8 is 2HN × 2VN pixels, which is twice as large in both the horizontal and vertical directions, so that the MB (802) of the processed image data of the (N + 1) -th layer The size is also doubled both horizontally and vertically, the upper left coordinate is also doubled (2aN, 2bN), and themotion vector 803 is also doubled (2mx, 2my).
[0043]
As a result, the motionvector search range 902 in the (N + 1) -th layer is determined centering on the area 901 having the same size as the MB given by themotion vector 803 in FIG. 9, and the (N + 1) -th layer motion vector is detected.
[0044]
Further, the motion vector is corrected in the highest hierarchy as follows in correspondence with the processing in S07 in FIG. A position obtained by doubling the vector value obtained in the second hierarchy in both the horizontal and vertical directions is used as the center position of the block matching in the highest hierarchy, for example, in both the horizontal and vertical directions given by the search range of the highest hierarchy image. In the range of two pixels, the vector is corrected by block matching between thesearch range 114 of the reference image data 101 in FIG. 1 and the MB (134) of the processed image data 121. (Executed by theblock matching circuit 142.)
Finally, as the processing of S08 in FIG. 11, the motion vector corrected in the highest hierarchy is output as the final value.
[0045]
Although the embodiment has been described above, the present invention does not limit the number of layers, and the number of layers to which the block matching based on frequency domain division, which is a feature of the present invention, is not limited. Further, the number of regions into which the frequency space is divided and the characteristics of the filter into which the frequency space is divided are not limited. Further, the phase of the sub-sampling is not limited.
[0046]
That is, when calculating the correlation between the processed image data and the reference image data which are successive in time series, in the hierarchical correlation detection using the image data in which the pixels are thinned out at regular intervals by sub-sampling, the image data A filter for dividing a two-dimensional frequency space into a plurality of regions, selecting a frequency region by using the filter as a pre-filter for sub-sampling, and obtaining a correlation from the sub-sampled image data. What is necessary is just to be a detection apparatus.
[0047]
Further, in the motion vector detecting device, when selecting a two-dimensional frequency region to be extracted by the pre-filter, a specific coefficient is multiplied by a sum of absolute values of luminance values for each frequency region included in a macroblock for which correlation is to be obtained. What is necessary is just to select the frequency region showing the maximum value from the values.
[0048]
Note that the functions of the motion vector detection device described above may be realized by a computer using a motion vector detection program (for example, a program based on the flowchart in FIG. 11). This program for detecting a motion vector may be read from a recording medium and taken into a computer, or may be transmitted via a communication network and taken into a computer.
[0049]
【The invention's effect】
As described above, the motion vector detecting device of the present invention can detect a motion vector by adaptively using a high-frequency component which has been removed in the related art without increasing the amount of calculation for block matching. The detection accuracy is improved. (In the present invention, although the calculation amount of the filtering process and the sub-sampling process increases as compared with the conventional technology, the calculation amount of the filtering process and the sub-sampling process is very small compared to the calculation amount of the block matching calculation. Therefore, compared to the increase in the amount of operation of the block matching operation, the increase in the amount of operation of the filtering process and the sub-sampling process has a very small effect on the increase in the overall amount of operation of the motion vector detection. The effect of not increasing is very large.)
[Brief description of the drawings]
FIG. 1 is a system configuration diagram of a motion vector detection device according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating area division in a two-dimensional image frequency space.
FIG. 3 is a system configuration diagram of a conventional example of a hierarchical motion vector detection device.
FIG. 4 is a diagram showing a state of processed image data in a first hierarchy.
FIG. 5 is a diagram illustrating a state of reference image data in a first hierarchy.
FIG. 6 is a diagram showing a state of motion vector detection in a first hierarchy.
FIG. 7 is a diagram illustrating a motion vector in an N-th layer.
FIG. 8 is a diagram showing a motion vector in the (N + 1) -th layer.
FIG. 9 is a diagram illustrating a search range of motion vector detection in the (N + 1) -th layer.
FIG. 10 is a diagram illustrating an example of a method of sub-sampling pixels.
FIG. 11 is a flowchart for explaining an operation in the embodiment of the present invention.
[Explanation of symbols]
102, 122 Two-dimensional low-pass filter 103, 123 Sub-sampling processing circuit 104, 124Image memory 105, 125 Pre-filter 106, 126 Sub-sampling processing circuit 107-110, 127-130Image memory 131 Frequency domain selector 140-142 Block matching circuit

Claims

Translated fromJapanese

時系列的に前後する処理画像データと参照画像データとを記憶する最上位画像メモリと、
それぞれの前記画像データから特定の周波数領域を抽出する第１のプリフィルタと、
前記第１のプリフィルタから出力された画像の画素を均等に間引く第１のサブサンプリング器と、
前記第１のサブサンプリング器から出力される画素数の削減された画像を記憶する第１の画像メモリと、
前記画素数の削減された画像から２次元周波数空間を複数の領域に分割しそれぞれの成分を出力する第２のプリフィルタと、
前記第２のプリフィルタから出力される周波数領域の異なる複数の画像の画素を均等に間引く第２のサブサンプリング器と、
前記第２のサブサンプリング器から出力される画素数の削減された複数の画像を記憶する第２の画像メモリと、
前記第２の画像メモリに記憶された周波数領域の異なる処理画像データ毎に、各処理画像データの画素数に比例するマクロブロックサイズに含まれる画素値の絶対値の総和を求めるヒストグラム演算器と、
前記ヒストグラム演算器から出力される各ヒストグラム値に所定の定数を掛ける掛け算器と、
複数の周波数領域の異なる処理画像データの同じ位置にあるマクロブロックの、前記掛け算器により前記所定の定数が掛けられたヒストグラム値の中から最大の値を求めて、その最大値となるヒストグラム値が得られた処理画像データの周波数領域を選択する機能を有し、前記選択された周波数領域から抽出された処理画像データのマクロブロックと、前記選択された周波数領域と同じ領域から抽出された前記第２の画像メモリに記憶されている参照画像データとの間で、ブロックマッチングにより動きベクトル値を計算する第２の動きベクトル検出器と、
前記第２の動きベクトル検出器から得られた動きベクトル値を補正して、前記第１の画像メモリに記憶された処理画像データのマクロブロックと、前記第１の画像メモリに記憶された参照画像データとから動きベクトル値を計算する第１のベクトル検出器と、
前記第１のベクトル検出器から得られた動きベクトル値を補正して、前記最上位画像メモリに記憶された処理画像データのマクロブロックと、前記最上位画像メモリに記憶された参照画像データとから動きベクトル値を計算する最上位ベクトル検出器と、
を備えたことを特徴とする動きベクトル検出装置。A top-level image memory for storing processed image data and reference image data that are chronologically preceding and following,
A first pre-filter for extracting a specific frequency region from each of the image data;
A first sub-sampler for evenly thinning out pixels of an image output from the first pre-filter;
A first image memory for storing an image with a reduced number of pixels output from the first sub-sampler;
A second prefilter that divides a two-dimensional frequency space from the image with the reduced number of pixels into a plurality of regions and outputs respective components;
A second sub-sampler for evenly thinning out pixels of a plurality of images having different frequency domains output from the second pre-filter;
A second image memory for storing a plurality of images with a reduced number of pixels output from the second sub-sampler;
A histogram calculator for calculating a sum of absolute values of pixel values included in a macroblock size proportional to the number of pixels of each of the processed image data, for each of the different processed image data in the frequency domain stored in the second image memory;
A multiplier that multiplies each histogram value output from the histogram calculator by a predetermined constant,
For the macroblock located at the same position of different processed image data in a plurality of frequency domains, the maximum value is obtained from the histogram values multiplied by the predetermined constant by the multiplier, and the maximum histogram value is obtained. A function of selecting a frequency region of the obtained processed image data, a macroblock of the processed image data extracted from the selected frequency region, and the second extracted from the same region as the selected frequency region. A second motion vector detector that calculates a motion vector value by block matching with reference image data stored in the second image memory;
A motion vector value obtained from the second motion vector detector is corrected, and a macro block of the processed image data stored in the first image memory and a reference image stored in the first image memory are corrected. A first vector detector for calculating a motion vector value from the data and
The motion vector value obtained from the first vector detector is corrected, and the macroblock of the processed image data stored in the uppermost image memory and the reference image data stored in the uppermost image memory are corrected. A top-level vector detector for calculating a motion vector value;
A motion vector detecting device comprising: