JP5227439B2

Movatterモバイル変換

Info

Publication number: JP5227439B2
Application number: JP2011091222A
Authority: JP
Inventors: ノヴェロ，マニュエルラファエルグティエレス
Original assignee: ティディヴィジョンコーポレイションエス．エー．デシー．ヴィ．
Priority date: 2011-04-15
Filing date: 2011-04-15
Publication date: 2013-07-03
Anticipated expiration: 2024-02-27
Also published as: JP2011223588A

Description

本発明は、３ＤＶｉｓｏｒ（商標）デバイスでの立体ビデオイメージ表示に関し、具体的には、標準化された圧縮技法を使用することによって３次元情報の保管を可能にする、ディジタルデータ圧縮システムによるビデオイメージコーディング方法に関する。 The present invention relates to stereoscopic video image display on 3D Visor ™ devices, and in particular, video image coding with a digital data compression system that enables storage of 3D information by using standardized compression techniques. Regarding the method.

現在、データ圧縮技法は、１つのイメージまたは一連のイメージの表現のビット消費を減らすために使用されている。標準化作業が、国際標準化機構の専門家のグループによって実行された。現在、これらの方法を、通常、ＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃＥｘｐｅｒｔＧｒｏｕｐ）、およびＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅｓＥｘｐｅｒｔＧｒｏｕｐ）と称する。 Currently, data compression techniques are used to reduce the bit consumption of a representation of an image or series of images. Standardization work was performed by a group of experts from the International Organization for Standardization. Currently, these methods are commonly referred to as JPEG (Joint Photographic Expert Group) and MPEG (Moving Pictures Expert Group).

これらの技法の共通の特性は、イメージブロックが、通常は離散コサイン変換（ＤＣＴ）と称する、ブロックに適当な変換の適用によって処理されることである。形成されたブロックが、量子化プロセスにサブミットされ、その後、可変長コードを用いてコード化される。 A common property of these techniques is that the image block is processed by applying an appropriate transform to the block, commonly referred to as the discrete cosine transform (DCT). The formed block is submitted to a quantization process and then encoded using a variable length code.

可変長コードは、可逆プロセスであり、可変長コードを用いてコーディングされたものの正確な再構成を可能にする。 The variable length code is a reversible process and allows accurate reconstruction of what was coded using the variable length code.

ディジタルビデオ表示に、３０Ｈｚから７５Ｈｚの周波数で連続して表示されまたは提示される、ある個数のイメージフレーム（３０ｆｐｓから９６ｆｐｓ）が含まれる。各イメージフレームは、それでも、特定のシステムのディスプレイ解像度に従って、画素アレイによって形成されたイメージである。たとえば、ＶＨＳシステムは、３２０列４８０行のディスプレイ解像度を有し、ＮＴＳＣシステムは、７２０列４８０行のディスプレイ解像度を有し、高品位テレビジョンシステム（ＨＤＴＶ）は、１３６０列１０２０行のディスプレイ解像度を有する。低解像度のディジタル化された形式である３２０列４８０行ＶＨＳフォーマットに関して、２時間のムービーは、１００ギガバイトのディジタルビデオ情報と同等とすることができる。これと比較して、従来のコンパクト光ディスクは、約０．６ギガバイトの容量を有し、磁気ハードディスクは、１〜２ギガバイトの容量を有し、現在のコンパクト光ディスクは、８ギガバイト以上の容量を有する。 The digital video display includes a number of image frames (30 fps to 96 fps) that are continuously displayed or presented at a frequency of 30 Hz to 75 Hz. Each image frame is still an image formed by the pixel array according to the display resolution of the particular system. For example, a VHS system has a display resolution of 320 columns and 480 rows, an NTSC system has a display resolution of 720 columns and 480 rows, and a high definition television system (HDTV) has a display resolution of 1360 columns and 1020 rows. Have. For a 320 column 480 row VHS format, a low resolution digitized format, a two hour movie can be equivalent to 100 gigabytes of digital video information. In comparison, the conventional compact optical disk has a capacity of about 0.6 gigabytes, the magnetic hard disk has a capacity of 1 to 2 gigabytes, and the current compact optical disk has a capacity of 8 gigabytes or more. .

そのような大量の情報量に関するストレージ制限および伝送制限に答えて、複数の標準圧縮プロセスが確立された。これらのビデオ圧縮技法は、フレームからフレームへの画素の表現に基づく、フレームごとの圧縮を提供するために、時空間相関と称する連続するイメージフレームの間の類似する特性を使用する。 In response to such storage and transmission restrictions on large amounts of information, multiple standard compression processes have been established. These video compression techniques use a similar property between successive image frames, called space-time correlation, to provide frame-by-frame compression based on frame-to-frame pixel representation.

我々が映画またはＴＶの画面で見るすべてのイメージが、高い速度で完全なイメージ（写真に似た静止画）を提示するという原理に基づく。これらが、３０フレーム毎秒速度（３０ｆｐｓ）ですばやく順次の形で提示される時に、我々は、人間の目の保持力に起因して、これらを動画として知覚する。 Every image we see on a movie or TV screen is based on the principle that it presents a complete image (still like a photo) at high speed. When they are presented in rapid and sequential form at a rate of 30 frames per second (30 fps), we perceive them as moving images due to the retention of the human eye.

順次の形で提示されるイメージを分類し、ビデオ信号を形成するために、各イメージを行に分割する必要があり、ここで、各行は、画素またはピクセルに分割され、各画素は、２つの関連する値すなわちルマおよびクロマを有する。ルマは、各点での光強度を表し、ルマは、定義された色空間（ＲＧＢ）の関数として色を表し、この色空間は、３バイトによって表すことができる。 In order to classify images presented in sequential form and form a video signal, each image needs to be divided into rows, where each row is divided into pixels or pixels, Has associated values, luma and chroma. The luma represents the light intensity at each point, the luma represents the color as a function of a defined color space (RGB), and this color space can be represented by 3 bytes.

イメージは、水平垂直ラスタで上から下へ、左から右へと周期的に画面に表示される。走査線の本数および表示の周波数は、ＮＴＳＣ、ＰＡＬ、またはＳＥＣＡＭなどのフォーマットの関数として変化し得る。 Images are displayed on the screen periodically from top to bottom and left to right in a horizontal and vertical raster. The number of scan lines and the frequency of display can vary as a function of formats such as NTSC, PAL, or SECAM.

理論的に、ルマ、クロマＵ、およびクロマＶごとに画素に値を割り当てることが可能であるが、これは、４バイトを表し（クロマに１バイト、色に３バイト）、これは、ＮＴＳＣの４８０行７２０列フォーマットおよび約３０フレーム毎秒では、４×４８０×７２０×３０すなわち約４０メガバイト毎秒のメモリをもたらし、これは、使用可能な帯域幅に起因して、ストレージおよび伝送がむずかしい。現在、１：４画素でクロマデータを減らす、すなわち、４画素おきに色サンプルをとり、欠けている３画素について同一の情報を複製することが可能であり、人間は、この差を知覚しない。このフォーマットは、すなわち、次の通りである。
４：４：４（４×４＝１６画素グループ内の４つのルマサンプルおよび４つのクロマサンプル）。
４：２：２（４×２＝８画素グループ内の４つのルマサンプルおよび２つのクロマサンプル）。
４：１：１（４×１＝４画素グループ内の４つのルマサンプルおよび１つのクロマサンプル）。
ＭＰＥＧ１の４：２：０（４×２＝８画素グループ内の８つのルマサンプルおよび水平画素の間の２つのクロマサンプル）。
ＭＰＥＧ１の４：２：０（４×２＝８画素グループ内の８つのルマサンプルおよび垂直画素の間の２つのクロマサンプル）。Theoretically, it is possible to assign a value to a pixel for each luma, chroma U, and chroma V, which represents 4 bytes (1 byte for chroma and 3 bytes for color), which is NTSC's The 480 row 720 column format and about 30 frames per second results in 4 × 480 × 720 × 30 or about 40 megabytes per second of memory, which is difficult to store and transmit due to available bandwidth. Currently, it is possible to reduce chroma data by 1: 4 pixels, ie, take color samples every 4 pixels and duplicate the same information for the missing 3 pixels, and humans do not perceive this difference. This format is as follows.
4: 4: 4 (4 × 4 = 4 luma samples and 4 chroma samples in a 16 pixel group).
4: 2: 2 (4 × 2 = 4 luma samples and 2 chroma samples in 8 pixel group).
4: 1: 1 (4 × 1 = 4 luma samples and 1 chroma sample in 4 pixel group).
MPEG1 4: 2: 0 (4 × 2 = 8 luma samples in 8 pixel groups and 2 chroma samples between horizontal pixels).
MPEG1 4: 2: 0 (4 × 2 = 8 luma samples in 8 pixel groups and 2 chroma samples between vertical pixels).

情報がこの形で減らされた時であっても、ＮＴＳＣフォーマットで１秒の情報を保管するのに必要な、必要なディジタル情報量は、４：２：０品質で１５メガバイト、２時間ファイルの場合には１０８ギガバイトである。 Even when the information is reduced in this way, the amount of digital information required to store 1 second of information in NTSC format is 15 megabytes in 4: 2: 0 quality, In this case, it is 108 GB.

２次元ビデオシーケンスからの３次元シーン再構築の複数の方法が存在する。最近のテクノロジ開発に鑑み、将来に関して、ＭＰＥＧ４標準規格は、時空間関連グラフィックスコーディング媒体の提供を試みており、これは、工学応用例の立体イメージ、設計、および製造の重要なツールになる。仮想空間が作成され、ここでシーンの幾何学モデルが再構築される。たとえば、２００３年１２月９日にＣｅｃｉｌｅＤｕｆｏｕｒに与えられた米国特許第６６６１９１４号であるが、この場合に、新しい３次元再構築方法が記載されており、シーン連続物が単純なカメラによって撮影され、イメージの輪郭が再構築され、各ビューの隠れ部分の深さが、後に射影され、リファイニングプロセスにサブミットされる。 There are several methods for 3D scene reconstruction from 2D video sequences. In view of recent technology developments, the future, the MPEG4 standard is attempting to provide a space-time related graphics coding medium, which will be an important tool for stereoscopic images, design and manufacture of engineering applications. A virtual space is created, where the geometric model of the scene is reconstructed. For example, US Pat. No. 6,661,914 granted to Ceci Dufour on Dec. 9, 2003, in which a new three-dimensional reconstruction method is described, where a scene continuum is captured by a simple camera. The outline of the image is reconstructed and the depth of the hidden part of each view is later projected and submitted to the refining process.

イメージ処理に関する競争で、多くの人が貴重な貢献を行い、たとえば２００３年１０月２１日にＩｔｏｋａｗａに与えられた米国特許第６６３６６４４号は、ＭＰＥＧ４を使用するイメージングプロセスに言及しているが、イメージ境界にまたがって延びるクロマ値イメージが抽出され、これを用いて、コーディングでのより高い効率が達成され、自然な色削減を、イメージの輪郭で達成することができる。 In the competition for image processing, many people make valuable contributions, for example US Pat. No. 6,636,644 issued to Itokawa on October 21, 2003, refers to an imaging process using MPEG4, Chroma value images that extend across the border are extracted and can be used to achieve higher efficiency in coding and to achieve natural color reduction at the contours of the image.

２００３年１０月１４日にＫｌｅｉｈｏｒｓｔ他に与えられた米国特許第６６３３６７６号など、ビデオ信号をコーディングする複数の方法および配置が存在し、米国特許第６６３３６７６号の方法は、カメラシステム内のコーダ検出器に適用され、ビデオ信号が、補償された動きを用いてコーディングされ（Ｉ．Ｂ．Ｐ．）、高解像度イメージが生成され、このイメージは、前のイメージの補間であり、要約すると、より高い関心を持たれている領域が、ビデオ信号内で判定され、一緒に、より少ないメモリを占める。 There are a number of methods and arrangements for coding video signals, such as US Pat. No. 6,633,676 to Kleihorst et al. On Oct. 14, 2003, which includes a coder detector in a camera system. And the video signal is coded with compensated motion (IBP) to produce a high-resolution image, which is an interpolation of the previous image, which, in summary, is higher Regions of interest are determined in the video signal and together occupy less memory.

イメージ圧縮コーディングは、本質的に、効率的な形でディジタルイメージを保管するか伝送するのに使用され、圧縮ディジタルイメージコーディングの方法は、ＪＰＥＧおよびＭＰＥＧなどの一般的な標準規格で支配的なテクノロジなのでＤＣＴを使用する。２００２年２月５日にＢｏｏｎに与えられた米国特許第６３４５１２３号に、通常のＤＣＴ法を用いて係数を変換することによるディジタルイメージコーディングの方法が記載されており、事前に書き込まれた量子化スケールで変換するために前記係数の量子化プロセスを適用し、最後に、可変長コーディングプロセスが、量子化され変換された係数に、これらを可変長コードテーブルと比較することによって適用される。 Image compression coding is essentially used to store or transmit digital images in an efficient manner, and compressed digital image coding methods are the dominant technologies in common standards such as JPEG and MPEG. Therefore, DCT is used. US Pat. No. 6,345,123 issued to Boon on Feb. 5, 2002, describes a method of digital image coding by transforming coefficients using a conventional DCT method, and includes pre-written quantization. Applying the coefficient quantization process to transform on a scale, finally, a variable length coding process is applied to the quantized and transformed coefficients by comparing them to a variable length code table.

イメージは、コーディングのために複数の小さい領域に分割され、これらの小さい領域は、互いに隣接し、サンプルが、１領域からとられ、次のイメージ領域の環境が、予測される。この予測コーディング方法が、２０００年１１月１４日にＢｏｏｎ他に与えられた米国特許第６１４８１０９号で使用されており、米国特許第６１４８１０９号では、小さい領域の間の差の生成されたイメージデータが、コーディングされ、抽出される。 The image is divided into a plurality of small regions for coding, these small regions are adjacent to each other, samples are taken from one region, and the environment of the next image region is predicted. This predictive coding method is used in U.S. Pat. No. 6,148,109 to Boon et al. On Nov. 14, 2000, in which U.S. Pat. Coded and extracted.

２０００年８月１日にＭｕｒａｋａｍｉ他に与えられた米国特許第６０９７７５９号に、フィールドコーディングされたイメージのバイブロック（ｂｙ−ｂｌｏｃｋ）コーディングシステムが記載されている。ブロックパターンに、１つの個別ブロックフィールドと１つの非インターレースブロックが含まれ、コーディングシステムは、補償された動き予測信号を作るために奇数フィールドおよび偶数フィールドの動きを調査し、したがって、高効率コーディングがもたらされる。 U.S. Pat. No. 6,097,759, issued August 1, 2000 to Murakami et al., Describes a field-coded image by-block coding system. The block pattern includes one individual block field and one non-interlaced block, and the coding system examines the motion of the odd and even fields to produce a compensated motion prediction signal, thus high efficiency coding Brought about.

Ｋａｔａｔａ他に与えられた米国特許第５９７８５１５号、米国特許第５９６３２５７号、および米国特許第５８１５６０１号は、イメージデータを記述するのに使用されるデータ量を増やさずに、他の区域と比較した選択された区域のイメージ品質を高める形でイメージデータをコーディングするイメージコーダに言及している。 US Pat. No. 5,978,515, US Pat. No. 5,963,257, and US Pat. No. 5,815,601, granted to Katata et al., Select a comparison with other areas without increasing the amount of data used to describe the image data. Refers to an image coder that codes image data in a manner that enhances the image quality of the generated area.

１９９６年１１月２６日にＧｉｓｌｅに与えられた米国特許第５５７９４１３号に、量子化されたイメージブロック内のデータ信号を変換し、これを可変長コーディングされたデータ信号に変換する方法が記載されており、ここで、各イベントが、３次元量として表される。 US Pat. No. 5,579,413 issued to Gisle on November 26, 1996 describes a method for converting a data signal in a quantized image block and converting it to a variable length coded data signal. Here, each event is represented as a three-dimensional quantity.

より少ないスペースで同一内容のストレージを可能にするデータ圧縮システムを使用する必要が生じ、専門家グループが、情報を圧縮し、イメージを表示する形であって、実装詳細には言及せずに、ＭＰＥＧと互換であるならばすべてのソフトウェア開発者およびハードウェア開発者がそのプロセスを実行する新しい形を作成できるという目標をもつ形の作成に専念した。現在、ＭＰＥＧ２が、全世界の標準であり、テレビジョン、ビデオ、およびオーディオに関連する会社によって幅広く使用されている。 There is a need to use a data compression system that allows storage of the same content in less space, a group of experts compresses information and displays images, without mentioning implementation details, Dedicated to creating a shape with the goal that all software and hardware developers could create a new shape to perform the process if it was compatible with MPEG. Currently, MPEG2 is a worldwide standard and is widely used by companies related to television, video, and audio.

オーディオおよびビデオは、要素パッケージ（ＰＥＳ）にパッケージ化され、前記オーディオパッケージおよびビデオパッケージは、ＭＰＥＧ２データストリームを作成するために一緒にインターレースされる。各パッケージは、再生時のオーディオおよびビデオの同期化用の時間識別（タイムスタンプ）を有し、たとえば、３つのビデオフレームおきに、１つのオーディオフレームが関連する。 Audio and video are packaged in an element package (PES), which is interlaced together to create an MPEG2 data stream. Each package has a time identification (time stamp) for audio and video synchronization during playback, eg, one audio frame is associated with every three video frames.

ＭＰＥＧは、システムのストリームにビデオおよびオーディオをインターレースする２つの異なる方法を有する。
トランスポートストリームは、干渉を受けやすい衛星システムなど、より高い誤りの可能性を有するシステムで使用される。各パッケージは、１８８バイト長であり、識別ヘッダから始まり、この識別ヘッダが、ギャップの認識および誤りの修復を可能にする。さまざまなオーディオプログラムおよびビデオプログラムを、単一のトランスポートストリーム上でトランスポートストリームを介して同時に伝送することができ、ヘッダに起因して、これらを独立に個別にデコードし、多数のプログラムに統合することができる。
プログラムストリームは、ＤＶＤ再生時など、より低い誤りの可能性を有するシステムで使用される。この場合に、パッケージは、可変長を有し、トランスポートストリームで使用されるパッケージより実質的に大きいサイズを有する。主要な特性として、プログラムストリームは、単一のプログラム内容だけを許容する。MPEG has two different ways to interlace video and audio into a system stream.
Transport streams are used in systems with a higher likelihood of error, such as satellite systems that are susceptible to interference. Each package is 188 bytes long and begins with an identification header that allows gap recognition and error correction. Various audio and video programs can be transmitted simultaneously over a single transport stream over a transport stream, and due to the header, these can be independently decoded and integrated into multiple programs can do.
Program streams are used in systems that have a lower likelihood of error, such as during DVD playback. In this case, the package has a variable length and has a substantially larger size than the package used in the transport stream. As a key characteristic, the program stream allows only a single program content.

ＭＰＥＧ２の下のビデオシステムは、インターレースタイプおよびプログレッシブタイプのビデオイメージのコーディングを可能にする。 Video systems under MPEG2 allow coding of interlaced and progressive video images.

すなわち、プログレッシブビデオ方式は、フルフレーム（フレームピクチャ、ｆｐ）で保管され、インターレースビデオ方式では、２つの形すなわち、フルフレームイメージ（フレームピクチャ）またはフィールドイメージ（フィールドピクチャ）で保管することができる。 That is, the progressive video system can be stored in a full frame (frame picture, fp), and the interlace video system can be stored in two forms, that is, a full frame image (frame picture) or a field image (field picture).

圧縮方式では、３つのＭＰＥＧ２フォーマットイメージが存在する。
イントラコーディング（Ｉ）、その情報は、そのイメージ自体の内部データの関数としてコーディングされる。
予測コーディング（Ｐ）、その情報は、他の将来の時点に置かれたデータだけに依存する。
両方向予測コーディング（Ｂ）、その情報は、過去と将来に置かれたデータに依存する。In the compression method, there are three MPEG2 format images.
Intra coding (I), the information is coded as a function of the internal data of the image itself.
Predictive coding (P), the information depends only on data placed at other future time points.
Bidirectional predictive coding (B), the information of which depends on past and future data.

次に、上のパッケージに適用される、たとえば時間予測、圧縮、および空間圧縮など、３つの圧縮タイプがある。 Next, there are three compression types that apply to the above packages, such as temporal prediction, compression, and spatial compression.

時間での予測圧縮は、時間的に異なるが関連する動きを有する２つのフレームを参照し、フレームの間でイメージがほとんど変化しないという事実を利用する。 Predictive compression in time refers to two frames that are temporally different but have associated motion and takes advantage of the fact that the image changes little between frames.

空間圧縮は、同一フレーム内に置かれた情報をコンパクト化し（イントラコーディング）、たとえば色に３バイト、ルマに１バイトの１００×１００画素イメージで、この情報を保管することが望まれる場合に、４０キロバイト毎フレームが必要になり、逆に、このイメージが完全に白である場合に、ｃｏｌｏｒ：２５５Ｒ、２５５Ｇ、２５５Ｂ、Ｘｓｔａｒｔ＝０、Ｙｓｔａｒｔ＝０、Ｘｅｎｄ＝９９、Ｙｅｎｄ＝９９として表すことができ、これは、この区域全体が白であることを表し、４０キロバイトではなく７バイトまたは８バイトだけが使用される。したがって、ＭＰＥＧ圧縮が達成されるが、この処理ステップは、複雑であり、本発明の範囲の外である。 Spatial compression reduces the information placed in the same frame (intra coding), for example when it is desired to store this information in a 100 × 100 pixel image with 3 bytes for color and 1 byte for luma. On the contrary, if this image is completely white, it can be expressed as color: 255R, 255G, 255B, Xstart = 0, Ystart = 0, Xend = 99, Yend = 99. This means that the entire area is white and only 7 or 8 bytes are used instead of 40 kilobytes. Thus, although MPEG compression is achieved, this processing step is complex and outside the scope of the present invention.

タイプ（Ｉ）イメージは、自己完結型であり、前後のフレームを参照せず、したがって、圧縮は、時間予測では使用されず、それ自体の空間の関数としてのみ使用される。 Type (I) images are self-contained and do not refer to previous or subsequent frames, so compression is not used in temporal prediction, but only as a function of its own space.

タイプ（Ｐ）イメージは、それ自体をコーディングするために基準イメージに基づき、したがって、時間予測圧縮および空間圧縮も使用する。これらのイメージは、（Ｉ）タイプイメージまたは他の（Ｐ）タイプイメージを参照することができるが、１イメージの基準イメージだけを使用する。 Type (P) images are based on reference images to code themselves, and therefore also use temporal prediction compression and spatial compression. These images can refer to (I) type images or other (P) type images, but use only one reference image.

（Ｂ）タイプイメージは、再構築に前および後の２つの基準を必要とし、このタイプのイメージは、最良の圧縮率を有する。（Ｂ）タイプイメージを得るための基準は、（Ｐ）タイプまたは（Ｉ）タイプのどちらかとすることだけができ、絶対に（Ｂ）タイプとすることはできない。 (B) Type images require two criteria before and after reconstruction, and this type of image has the best compression ratio. The reference for obtaining the (B) type image can only be either the (P) type or the (I) type, and not the (B) type.

コーディングシーケンスとデコーディングシーケンスは異なる。 The coding sequence and the decoding sequence are different.

情報量を減らすために、完全なイメージが、マクロブロックと呼ばれる単位のフルフレームに分割され、各マクロブロックは、１６画素×１６画素の部分から構成され、上から下、左から右に順序付けられ、名前を付けられ、画面上のマクロブロックマトリックスアレイが作成され、マクロブロックは、情報ストリーム内で順序付けられた順次の形すなわち、０、１、２、３、…、ｎで送られる。 In order to reduce the amount of information, the complete image is divided into full frames of units called macroblocks, each macroblock consisting of a 16 pixel x 16 pixel portion, ordered from top to bottom, left to right. , Named and a macroblock matrix array on the screen is created, and the macroblocks are sent in sequential form,ie 0, 1, 2, 3, ..., n, in the information stream.

（Ｉ）タイプイメージを有するマクロブロックは、空間圧縮だけの自己完結型であり、（Ｐ）タイプイメージは、制限なしにイントラコーディングされたマクロブロック（インターレースマクロブロック）である可能性を伴って、前のイメージを参照するために（Ｐ）タイプマクロブロックを含むことができる。 (I) Macroblocks with type images are self-contained with only spatial compression, and (P) type images can be intracoded macroblocks (interlaced macroblocks) without restriction, (P) type macroblocks can be included to reference previous images.

（Ｂ）タイプイメージは、イントラコーディングされた（インターレース）タイプのマクロブロックによって形成することもでき、このマクロブロックは、前のイメージ、後のイメージ、またはその両方を参照する。 (B) The type image can also be formed by intra-coded (interlaced) type macroblocks, which reference the previous image, the subsequent image, or both.

次に、マクロブロックがブロックに分割され、１ブロックは、８×８のデータ行列またはサンプル行列であり、クロマフォーマットが分類される形に起因して、４：４：４フォーマットは、１つのルマサンプルＹ、１つのクロマサンプルＣｒ、および１つのクロマサンプルＣｂを必要とし、したがって、４：４：４フォーマットマクロブロックは、１マクロブロックあたり１２ブロックを必要とし、４：２：０フォーマットでは、１マクロブロックあたり６ブロックが必要である。 Next, the macroblock is divided into blocks, and one block is an 8 × 8 data matrix or sample matrix, and due to the way the chroma format is classified, the 4: 4: 4 format is one luma. Sample Y, one chroma sample Cr, and one chroma sample Cb are required, so a 4: 4: 4 format macroblock requires 12 blocks per macroblock, and in a 4: 2: 0 format, 1 Six blocks are required per macroblock.

連続するマクロブロックの組は、スライスを表し、１スライス内に任意の個数のマクロブロックを設けることができ、これらのマクロブロックは、単一の行に属する必要があり、マクロブロックと同一の形で、スライスは、左から右、上から下に命名される。スライスは、イメージ全体をカバーする必要がない。というのは、コーディングされたイメージが、画素ごとのサンプルを必要としないからである。 A set of consecutive macroblocks represents a slice, and an arbitrary number of macroblocks can be provided in one slice. These macroblocks must belong to a single row and have the same shape as the macroblock. So slices are named from left to right and from top to bottom. A slice need not cover the entire image. This is because the coded image does not require a sample per pixel.

一部のＭＰＥＧプロファイルは、固定したスライス構造を必要とし、このスライス構造によって、イメージ全体が満足されなければならない。適当なハードウェアアルゴリズムおよびソフトウェアアルゴリズムの組合せの使用が、ＭＰＥＧイメージ圧縮を可能にする。 Some MPEG profiles require a fixed slice structure that must satisfy the entire image. Use of a combination of appropriate hardware and software algorithms enables MPEG image compression.

コーディングされたデータは、ブロック固有情報、マクロブロック、フィールド、フレーム、イメージ、およびＭＰＥＧ２フォーマットビデオを有するバイトである。 The coded data is bytes with block specific information, macroblocks, fields, frames, images, and MPEG2 format video.

この情報を、ブロックによってグループ化しなければならず、情報コーディング、たとえば（ＶＬＣ）から得られた結果は、線形ビット−バイトストリームである。 This information must be grouped by block, and the result obtained from information coding, eg (VLC), is a linear bit-byte stream.

ここで、ＶＬＣ（可変長デコーダ）は、最も頻繁なパターンがより短いコードによって置換され、頻繁に発生しないパターンがより長いコードによって置換される圧縮アルゴリズムである。この情報の圧縮された版は、より少ないスペースを占め、ネットワークによってより高速に伝送することができる。しかし、これは、簡単に編集可能なフォーマットではなく、ルックアップテーブルを使用する圧縮解除を必要とする。 Here, VLC (Variable Length Decoder) is a compression algorithm in which the most frequent pattern is replaced by a shorter code and a pattern that does not occur frequently is replaced by a longer code. The compressed version of this information takes up less space and can be transmitted faster over the network. However, this is not an easily editable format and requires decompression using a lookup table.

逆スキャン、情報を、ブロックによってグループ化しなければならず、ＶＬＣによって情報をコーディングする時に得られるものが、線形ストリームである。ブロックは、８×８データ行列であり、したがって、線形情報を正方形の８×８行列に変換する必要がある。これは、プログレッシブイメージまたはインターレースイメージのどちらであるかに応じて、両方のシーケンスタイプで上から下、左から右への降下するジグザグの形で行われる。 Reverse scanning, information must be grouped by block, and what is obtained when coding information by VLC is a linear stream. The block is an 8 × 8 data matrix, so the linear information needs to be converted to a square 8 × 8 matrix. This is done in the form of a zigzag descending from top to bottom and from left to right in both sequence types, depending on whether it is a progressive image or an interlaced image.

逆量子化は、単純に、各データ値に係数をかけることからなる。コーディングされた時に、ブロック内のデータの大半が、人間の目が知覚できない情報を除去するために量子化され、この量子化は、より大きいＭＰＥＧ２ストリーム変換を得ることを可能にし、デコーディングプロセスで逆プロセス（逆量子化）を実行することも必要とする。 Inverse quantization simply consists of multiplying each data value by a coefficient. When coded, most of the data in the block is quantized to remove information that the human eye cannot perceive, and this quantization makes it possible to obtain a larger MPEG2 stream transform, and in the decoding process It is also necessary to perform an inverse process (inverse quantization).

ＭＰＥＧビデオシーケンス構造これは、ＭＰＥＧ２フォーマットで使用される最大の構造であり、次のフォーマットを有する。
ビデオシーケンス（Ｖｉｄｅｏ＿Ｓｅｑｕｅｎｃｅ）
シーケンスヘッダ（Ｓｅｑｕｅｎｃｅ＿Ｈｅａｄｅｒ）
シーケンス拡張（Ｓｅｑｕｅｎｃｅ＿Ｅｘｔｅｎｓｉｏｎ）
ユーザデータ（０）および拡張（Ｅｘｔｅｎｓｉｏｎ＿ａｎｄ＿Ｕｓｅｒ＿Ｄａｔａ（０））
イメージグループヘッダ（Ｇｒｏｕｐ＿ｏｆ＿Ｐｉｃｔｕｒｅ＿Ｈｅａｄｅｒ）
ユーザデータ（１）および拡張（Ｅｘｔｅｎｓｉｏｎ＿ａｎｄ＿Ｕｓｅｒ＿Ｄａｔａ（１））
イメージヘッダ（Ｐｉｃｔｕｒｅ＿Ｈｅａｄｅｒ）
コーディングされたイメージ拡張（Ｐｉｃｔｕｒｅ＿Ｃｏｄｉｎｇ＿Ｅｘｔｅｎｓｉｏｎ）
ユーザデータ（２）および拡張（Ｅｘｔｅｎｓｉｏｎ＿ａｎｄ＿Ｕｓｅｒ＿Ｄａｔａ（２））
イメージデータ（Ｐｉｃｔｕｒｅ＿Ｄａｔａ）
スライス（Ｓｌｉｃｅ）
マクロブロック（Ｍａｃｒｏｂｌｏｃｋ）
動きベクトル（Ｍｏｔｉｏｎ＿Ｖｅｃｔｏｒｓ）
コーディングされたブロックパターン（Ｃｏｄｅｄ＿Ｂｌｏｃｋ＿Ｐａｔｔｅｒｎ）
ブロック（Ｂｌｏｃｋ）
最終シーケンスコード（Ｓｅｑｕｅｎｃｅ＿ｅｎｄ＿Ｃｏｄｅ）MPEG Video Sequence Structure This is the largest structure used in the MPEG2 format and has the following format:
Video sequence (Video_Sequence)
Sequence header (Sequence_Header)
Sequence extension (Sequence_Extension)
User data (0) and extension (Extension_and_User_Data (0))
Image group header (Group_of_Picture_Header)
User data (1) and extension (Extension_and_User_Data (1))
Image header (Picture_Header)
Coded image extension (Picture_Coding_Extension)
User data (2) and extension (Extension_and_User_Data (2))
Image data (Picture_Data)
Slice
Macroblock
Motion vector (Motion_Vectors)
Coded block pattern (Coded_Block_Pattern)
Block
Final sequence code (Sequence_end_Code)

ビデオシーケンスは、これらの構造からなり、ビデオシーケンスは、ＭＰＥＧ１フォーマットおよびＭＰＥＧ２フォーマットについて適用され、各バージョンを区別するために、シーケンスヘッダの直後にシーケンス拡張が存在することが検証されなければならず、シーケンス拡張がヘッダに続かない場合には、そのストリームはＭＰＥＧ１フォーマットビデオストリームである。 The video sequence consists of these structures, the video sequence is applied for the MPEG1 format and the MPEG2 format, and in order to distinguish each version, it must be verified that a sequence extension exists immediately after the sequence header, If the sequence extension does not follow the header, the stream is an MPEG1 format video stream.

本発明の目的は、伝送、受取、および３Ｄｖｉｓｏｒｓ（登録商標）での表示のためのコーディングされたデータを提供する、立体３Ｄイメージディジタルコーディングの方法およびシステムを提供することである。 It is an object of the present invention to provide a method and system for stereoscopic 3D image digital coding that provides coded data for transmission, receipt, and display in 3Dvisors®.

本発明のもう１つの目的は、ビデオデータストリームｖｉｄｅｏ＿ｓｅｑｕｅｎｃｅ構造が変更され、識別フラグがビットレベルで含まれるコーディング方式を提供することである。 Another object of the present invention is to provide a coding scheme in which the video data stream video_sequence structure is modified and an identification flag is included at the bit level.

本発明のもう１つの目的は、ｖｉｄｅｏ＿ｓｅｑｕｅｎｃｅ、識別フラグ、データフィールド、およびイメージフィールドが変更される形で３Ｄイメージディジタルコーディングソフトウェアプロセスを提供することである。 Another object of the present invention is to provide a 3D image digital coding software process in which the video_sequence, identification flag, data field, and image field are modified.

本発明のもう１つの目的は、左チャネルと右チャネルの間で電子比較が行われ、イメージの間の差の誤り訂正が行われ、処理されたイメージがＴＤＶｉｓｉｏｎ（登録商標）テクノロジ識別子と共にｖｉｄｅｏ＿ｓｅｑｕｅｎｃｅに保管される形で３Ｄイメージディジタルコーディングハードウェアプロセスを提供することである。 Another object of the present invention is to perform an electronic comparison between the left and right channels, error correction of the difference between images, and the processed image in the video_sequence along with the TDVision® technology identifier. It is to provide a 3D image digital coding hardware process in a stored form.

本発明のもう１つの目的は、ＤＳＰの入力バッファメモリが２倍にされ、２つの独立のビデオ信号の同時入力が使用可能であり、ＤＳＰが、両方のビデオ入力の入力バッファを比較することを可能にされる形で３Ｄイメージディジタルコーディングハードウェアプロセスを提供することである。 Another object of the present invention is that the DSP input buffer memory is doubled, simultaneous input of two independent video signals can be used, and the DSP compares the input buffers of both video inputs. It is to provide a 3D image digital coding hardware process in an enabled manner.

立体３Ｄビデオイメージコーディングに関するハードウェア変更およびソフトウェア変更を表す図である。It is a figure showing the hardware change and software change regarding a stereoscopic 3D video image coding.ＭＰＥＧ２−４互換立体３Ｄビデオイメージのコンパイルプロセスを表す図である。It is a figure showing the compilation process of an MPEG2-4 compatible stereoscopic 3D video image.ＭＰＥＧ２−４互換立体３Ｄビデオイメージのコンパイルに関するソフトウェアフォーマットを表す図である。It is a figure showing the software format regarding the compilation of an MPEG2-4 compatible stereoscopic 3D video image.ＭＰＥＧ２−４互換立体３Ｄビデオイメージのコンパイルに関するハードウェアフォーマットを表す図である。It is a figure showing the hardware format regarding the compilation of an MPEG2-4 compatible stereoscopic 3D video image.本発明のコーダが属するテクノロジの枝すなわち、立体３Ｄイメージ処理と、そのコーディングと、デコーディングと、ケーブル、衛星、およびＤＶＤを介する伝送と、ＨＤＴＶと、３ＤＶｉｓｏｒｓ（登録商標）ディスプレイとの地図を表す図である。Represents a map of technology branches to which the coder of the present invention belongs: stereoscopic 3D image processing, its coding, decoding, transmission via cable, satellite and DVD, HDTV and 3D Visors (R) display FIG.

ディジタルビデオストリームから３次元イメージを得るという目的をもって、コーディングプロセスの異なる部分のハードウェア変更およびソフトウェア変更によって、現在のＭＰＥＧ２コーダに対する変更が行われた。図１からわかるように、ＭＰＥＧ２−４互換ＴＤＶｉｓｉｏｎ（登録商標）コーダ（１）は、ソフトウェア変更（３）およびハードウェア変更（４）を介して達成されるそれ自体のコーディングプロセス（２）を有する。 With the goal of obtaining a 3D image from a digital video stream, changes to current MPEG2 coders have been made by hardware and software changes in different parts of the coding process. As can be seen from FIG. 1, the MPEG2-4 compatible TDVision® coder (1) has its own coding process (2) achieved through software changes (3) and hardware changes (4). .

図２では、本発明のコーダオブジェクトのコンパイルプロセスが示されており、実際には、イメージ（１０）が、撮影され、動き補償および誤り検出プロセス（１１）にサブミットされ、離散コサイン変換関数が、周波数パラメータを変更するために適用され（１２）、次に、量子化行列（１３）が、正規化プロセスを実行するために適用され、行列から行への変換プロセス（１４）が適用され、ここで、可変長コーディング（１５）を実行する可能性が存在し、最後に、コーディングされたデータ（１６）を有するビデオシーケンスが得られる。このコンパイルプロセスを実行するために、フォーマット（図３の３０）またはＭＰＥＧ２互換３Ｄイメージコンパイル方法に従わなければならず、実際に、図３に示されているように、ｖｉｄｅｏ＿ｓｅｑｕｅｎｃｅ（３１）を、ｓｅｑｕｅｎｃｅ＿ｈｅａｄｅｒ（３２）、ｕｓｅｒ＿ｄａｔａ（３３）、ｓｅｑｕｅｎｃｅ＿ｓｃａｌａｂｌｅ＿ｅｘｔｅｎｓｉｏｎ（３４）、ｐｉｃｔｕｒｅ＿ｈｅａｄｅｒ（３５）、ｐｉｃｔｕｒｅ＿ｃｏｄｉｎｇ＿ｅｘｔｅｎｓｉｏｎ（３６）、およびｄｐｉｃｔｕｒｅ＿ｔｅｍｐｏｒａｌ＿ｓｃａｌａｂｌｅ＿ｅｘｔｅｎｓｉｏｎ（３７）の各構造で変更し、したがって、ＴＤＶｉｓｉｏｎ（登録商標）立体カメラを用いて撮影された立体３Ｄディジタルイメージに適するコンパイルフォーマットを得なければならない。 In FIG. 2, the process of compiling the coder object of the present invention is shown. In practice, an image (10) is taken and submitted to the motion compensation and error detection process (11), and the discrete cosine transform function is Applied to change the frequency parameter (12), then the quantization matrix (13) is applied to perform the normalization process, and the matrix-to-row conversion process (14) is applied, where There is a possibility to perform variable length coding (15) and finally a video sequence with coded data (16) is obtained. In order to perform this compilation process, the format (30 in FIG. 3) or the MPEG2 compatible 3D image compilation method must be followed, and in fact, as shown in FIG. 3, the video_sequence (31) is replaced with the sequence_header. (32), user_data (33), sequence_scalable_extension (34), picture_header (35), picture_coding_extension (36), and picture_temporal_scalable_extension (37) are used in the structure, and the camera is changed to D. Compilation format suitable for 3D digital images There must be.

ビデオデータストリームの構造およびｖｉｄｅｏ＿ｓｅｑｕｅｎｃｅを、ＴＤＶｉｓｉｏｎ（登録商標）テクノロジコーディングされたイメージタイプをビットレベルで識別するのに必要なフラグを含めるために変更しなければならない。 The structure and video_sequence of the video data stream must be changed to include the necessary flags to identify the TDVision ™ technology coded image type at the bit level.

この変更は、次のコーディングステージですなわち、ＭＰＥＧ２で二重イメージをコーディングする時（ソフトウェア）とハードウェアによってイメージをコーディングする時に行われる。 This change is made at the next coding stage, ie when coding a double image in MPEG2 (software) and when coding the image by hardware.

ソフトウェア：
ｖｉｄｅｏ＿ｓｅｑｕｅｎｃｅヘッダを変更する。
識別フラグを識別する。
データフィールドを変更する。
イメージフィールドを変更する。software:
Change the video_sequence header.
Identify the identification flag.
Change the data field.
Change the image field.

ハードウェア：
左チャネルと右チャネルの間で電子比較を行う。
差をＢタイプイメージとして処理する（誤り訂正）。
次に、これをＴＤＶｉｓｉｏｎ（登録商標）識別子と共に保管する。
変更を、相補バッファに適用する。
結果を副バッファに保存し、保管する。hardware:
An electronic comparison is made between the left and right channels.
The difference is processed as a B type image (error correction).
This is then stored with the TDVision® identifier.
Apply changes to complementary buffers.
Save the result in the secondary buffer and store it.

実際には、ＤＳＰバッファの入力メモリが２倍にされ、立体ＴＤＶｉｓｉｏｎ（登録商標）カメラからの既存の左右の立体信号に対応する２つの独立のビデオ信号の同時入力が可能にされ、ＤＳＰが、両方のビデオ信号の入力バッファを比較することを可能にされる。 Actually, the input memory of the DSP buffer is doubled, and two independent video signals corresponding to the existing left and right stereoscopic signals from the stereoscopic TDVision (registered trademark) camera can be input simultaneously. It is possible to compare the input buffers of both video signals.

ハードウェアコーディングプロセスは、単一のビデオ入力チャネルの関数として通常のＭＰＥＧ２の形で実行され、両方の信号（左および右）がとられ、電子的に比較され、左信号と右信号の間の比較の差が、得られ、前記差が、一時バッファに保管され、誤り訂正が、左信号に関してルマおよびクロマで計算され、ＤＣＴ（離散コサイン変換）関数が適用され、情報が、Ｂタイプブロック内で、
ａ）ＵＳＥＲ＿ＤＡＴＡ（）（ＳＷ）識別子構造内
ｂ）ＰＩＣＴＵＲＥ＿ＤＡＴＡ３Ｄ（）構造内
に保管される。次のフレームに継続する。The hardware coding process is performed in the form of normal MPEG2 as a function of a single video input channel, both signals (left and right) are taken, compared electronically, and between the left and right signals A comparison difference is obtained, the difference is stored in a temporary buffer, error correction is calculated with luma and chroma for the left signal, a DCT (discrete cosine transform) function is applied, and the information is in a B-type block so,
a) In USER_DATA () (SW) identifier structure b) Stored in PICTURE_DATA3D () structure. Continue to the next frame.

ハードウェアは、図４のブロック図で表されており、実際には、左信号（４１）および右信号（４２）がとられ、両方の信号が、一時バッファ（４３）に保管され、左信号と右信号の間の差が、比較され、誤り差分が、計算され、その情報が保管され（４５）、正しいイメージが、コーディングされ（４６）、このコーディングは、「Ｉ」、「Ｂ」、または「Ｐ」タイプイメージ（４７）として実行され、最終的にｖｉｄｅｏ＿ｓｅｑｕｅｎｃ
ｅ（４８）に保管される。The hardware is represented in the block diagram of FIG. 4, in fact, the left signal (41) and the right signal (42) are taken, both signals are stored in a temporary buffer (43), and the left signal The difference between the right signal and the right signal is compared, the error difference is calculated, the information is stored (45), the correct image is coded (46), this coding is "I", "B", Or as a “P” type image (47) and finally video_sequence
stored in e (48).

ＤＳＰによって処理されるメモリを複製し、８つまでの出力バッファを配置する可能性を有することが必須であり、これによって、ＴＤＶｉｓｉｏｎ（登録商標）社の３ＤＶｉｓｏｒ（登録商標）などのデバイスでの立体イメージの以前の表現および同時表現が可能になる。
実際には、ＴｅｘａｓＩｎｓｔｒｕｍｅｎｔｓ社のＴＭＳ３２０Ｃ６２ＸＤＳＰのプログラミングＡＰＩを呼び出す時に、２チャネルを初期化しなければならない。
ＭＰＥＧ２ＶＤＥＣ＿ｃｒｅａｔｅ（ｃｏｎｓｔＩＭＰＥＧ２ＶＤＥＣ＿ｆｘｎｓ＊ｆｘｎｓ，ｃｏｎｓｔＭＥＰＧ２ＶＤＥＣ＿Ｐａｒａｍｓ＊ｐａｒａｍｓ）
ここで、ＩＭＰＥＧ２ＶＤＥＣ＿ｆｘｎｓｙＭＥＰＧ２ＶＤＥＣ＿Ｐａｒａｍｓは、各ビデオチャネルの動作パラメータを定義するポインタ構造であり、たとえば、
３ＤＬｈａｎｄｌｅ＝ＭＰＥＧ２ＶＤＥＣ＿ｃｒｅａｔｅ（ｆｘｎｓ３ＤＬＥＦＴ，Ｐａｒａｍｓ３ＤＬＥＦＴ）
３ＤＲｈａｎｄｌｅ＝ＭＰＥＧ２ＶＤＥＣ＿ｃｒｅａｔｅ（ｆｘｎｓ３ＤＲＩＧＨＴ，Ｐａｒａｍｓ３ＤＲＩＧＨＴ）
である。
これによって、左右の立体チャネルごとに１つずつの、２つのビデオチャネルをデコードできるようになり、２つのビデオハンドラが得られる。It is imperative to duplicate the memory processed by the DSP and have the possibility to place up to eight output buffers, which makes it possible for 3D devices on devices such as 3D Visor® from TDVision® Allows previous and simultaneous representation of images.
In practice, when calling the Texas Instruments TMS320C62X DSP programming API, two channels must be initialized.
MPEG2VDEC_create (const IMPEG2VDEC_fxns * fxns, const MPEG2VDEC_Params * params)
Here, IMPEG2VDEC_fxns y MPEG2VDEC_Params is a pointer structure that defines the operation parameters of each video channel.
3DLhandle = MPEG2VDEC_create (fxns3DLEFT, Params3DLEFT)
3DRhandle = MPEG2VDEC_create (fxns3DLIGHT, Params3DLIGHT)
It is.
This makes it possible to decode two video channels, one for each left and right stereo channel, resulting in two video handlers.

二重表示出力バッファが必要であり、ソフトウェアによって、ＡＰ関数を呼び出すことによって、２つのバッファのうちのどちらが出力を表示しなければならないかが定義される。
すなわち、ＭＰＥＧ２ＶＤＥＣ＿ＡＰＰＬＹ（３ＤＲｈａｎｄｌｅ，ｉｎｐｕｔＲ１，ｉｎｐｕｔＲ２，ｉｎｐｕｔＲ３，３ｄｏｕｔｒｉｇｈｔ＿ｐｂ，３ｄｏｕｔｒｉｇｈｔ＿ｆｂ）
ＭＰＥＧ２ＶＤＥＣ＿ＡＰＰＬＹ（３ＤＬｈａｎｄｌｅ，ｉｎｐｕｔＬ１，ｉｎｐｕｔＬ２，ｉｎｐｕｔＬ３，３ｄｏｕｔｌｅｆｔ＿ｐｂ，３ｄｏｕｔｌｅｆｔ＿ｆｂ）
３ＤＬｈａｎｄｌｅは、ＤＳＰの作成関数によって返されるハンドルへのポインタであり、ｉｎｐｕｔ１パラメータは、ＦＵＮＣ＿ＤＥＣＯＤＥ＿ＦＲＡＭＥまたはＦＵＮＣ＿ＳＴＡＲＴ＿ＰＡＲＡのアドレスであり、ｉｎｐｕｔ２は、外部入力バッファアドレスへのポインタであり、ｉｎｐｕｔ３は、外部入力バッファのサイズである。A dual display output buffer is required and the software defines which of the two buffers should display the output by calling the AP function.
That is, MPEG2VDEC_APPLY (3DRhandle, inputR1, inputR2, inputR3, 3doutright_pb, 3doutright_fb)
MPEG2VDEC_APPLY (3DLhandle, inputL1, inputL2, inputL3, 3outleft_pb, 3outleft_fb)
3DLhandle is a pointer to the handle returned by the DSP creation function, the input1 parameter is the address of FUNC_DECODE_FRAME or FUNC_START_PARA, input2 is a pointer to the external input buffer address, and input3 is the size of the external input buffer It is.

３ｄｏｕｔｌｅｆｔ＿ｐｂは、パラメータバッファのアドレスであり、３ｄｏｕｔｌｅｆｔ＿ｆｂは、デコードされたイメージが保管される出力バッファの先頭である。 3dleft_pb is the address of the parameter buffer, and 3dleft_fb is the head of the output buffer where the decoded image is stored.

ｔｉｍｅｃｏｄｅおよびｔｉｍｅｓｔａｍｐは、順次同期式の形での最終デバイスへの出力に使用される。 Timecode and timestamp are used for output to the final device in a sequential synchronous fashion.

ソフトウェアプロセスおよびハードウェアプロセスの統合は、ＤＳＰと称するデバイスによって実行され、このＤＳＰは、ハードウェアプロセスのほとんどを実行する。これらのＤＳＰは、製造業者によって供給されるＣ言語およびアセンブリ言語のハイブリッドによってプログラムされる。各ＤＳＰは、ＤＳＰ内に置かれ、ソフトウェアによって呼び出される関数リストまたは手続き呼出しからなる、それ自体のＡＰＩを有する。 The integration of software and hardware processes is performed by a device called a DSP, which performs most of the hardware processes. These DSPs are programmed by a hybrid of C and assembly language supplied by the manufacturer. Each DSP has its own API that consists of a function list or procedure call that is placed in the DSP and called by software.

この参照情報を用いて、ＭＰＥＧ２フォーマット互換３Ｄイメージコーディングに関する本願が作られる。 This reference information is used to create an application relating to MPEG2 format compatible 3D image coding.

実際には、ビデオシーケンスの始めに、シーケンスヘッダおよびシーケンス拡張が必ず現れる。シーケンス拡張の反復は、最初と同一でなければならない。対照的に、シーケンスヘッダ反復は、最初の出現と比較してほとんど変化せず、量子化行列を定義する部分だけが変化しなければならない。シーケンス反復を有することによって、ビデオストリームへのランダムアクセスが可能になる、すなわち、デコーダがビデオストリームの途中から再生を開始しなければならない場合に、これを行うことが可能であり、デコーダは、ビデオストリーム内の次のイメージをデコードできるようになるために、前のシーケンスヘッダおよびシーケンス拡張を検索するだけでよい。これは、プログラムが既に開始されている時に同調された衛星デコーダなど、先頭から開始できないビデオストリームについても発生する。 In practice, the sequence header and sequence extension always appear at the beginning of a video sequence. The sequence extension iteration must be the same as the first. In contrast, sequence header iterations change little compared to the first occurrence, and only the part defining the quantization matrix must change. By having sequence repetition, random access to the video stream is possible, i.e. if the decoder has to start playing in the middle of the video stream, this can be done and the decoder In order to be able to decode the next image in the stream, it is only necessary to retrieve the previous sequence header and sequence extension. This also occurs for video streams that cannot start from the beginning, such as a satellite decoder that is tuned when the program is already started.

すなわち、シーケンスヘッダは、ビデオストリームに対するより高い情報レベルを提供し、わかりやすくするために、それぞれに対応するビット数も示され、上位ビットは、シーケンス拡張（Ｓｅｑｕｅｎｃｅ＿Ｅｘｔｅｎｓｉｏｎ）構造内に置かれ、次の構造によって形成される。

That is, the sequence header provides a higher level of information for the video stream, and for clarity, the number of bits corresponding to each is also shown, and the upper bits are placed in the sequence extension (Sequence_Extension) structure and Formed by structure.

拡張データ（ｉ）およびユーザデータ（ｉ）
これは、他の構造を保管するコンテナであり、それ自体のデータを有しておらず、基本的に、一連のｅｘｔｅｎｓｉｏｎ＿ｄａｔａ（１）構造およびｕｓｅｒ＿ｄａｔａ（）構造であり、いくつかの場合に、この構造が完全に空である可能性がある。Extended data (i) and user data (i)
This is a container that stores other structures, does not have its own data, and is basically a series of extension_data (1) and user_data () structures, in some cases this The structure may be completely empty.

Ｅｘｔｅｎｓｉｏｎ＿ｄａｔａ（ｉ）
この構造には、単純な構造拡張が含まれる。含まれる拡張構造タイプは、（ｉ）の値に依存し、この値は、１または２の値を有することができる。この値が「０」と等しい場合には、ｄａｔａ＿ｅｘｔｅｎｓｉｏｎがｓｅｑｕｅｎｃｅ＿ｅｘｔｅｎｓｉｏｎに続き、ｅｘｔｅｎｓｉｏｎ＿ｄａｔａ（ｉ）が、両方すなわち１つのｓｅｑｕｅｎｃｅ＿ｄｉｓｐｌａｙ＿ｅｘｔｅｎｓｉｏｎまたは１つのｓｅｑｕｅｎｃｅ＿ｓｃａｌａｂｌｅ＿ｅｘｔｅｎｓｉｏｎを含むことができる。Extension_data (i)
This structure includes simple structural extensions. The included extended structure type depends on the value of (i), which can have a value of 1 or 2. If this value is equal to “0”, data_extension can follow sequence_extension, and extension_data (i) can include both, ie one sequence_display_extension or one sequence_scalable_extension.

ｉ＝２の場合に、この構図はｐｉｃｔｕｒｅ＿ｃｏｄｉｎｇ＿ｅｘｔｅｎｓｉｏｎに続き、このｐｉｃｔｕｒｅ＿ｃｏｄｉｎｇ＿ｅｘｔｅｎｓｉｏｎに、ｑｕａｎｔ＿ｍａｔｒｉｘ＿ｅｘｔｅｎｓｉｏｎ（）、ｃｏｐｙｒｉｇｈｔ＿ｅｘｔｅｎｓｉｏｎ（）、ｐｉｃｔｕｒｅ＿ｄｉｓｐｌａｙ＿ｅｘｔｅｎｓｉｏｎ（）、ｐｉｃｔｕｒｅ＿ｓｐａｔｉａｌ＿ｓｃａｌａｂｌｅ＿ｅｘｔｅｎｓｉｏｎ（）、または１つのｐｉｃｔｕｒｅ＿ｔｅｍｐｏｒａｌ＿ｓｃａｌａｂｌｅ＿ｅｘｔｅｎｓｉｏｎを含めることができる。この構造は、必ず０ｘ０００００１Ｂ５から始まる。 In the case of i = 2, the composition continues to picture_coding_extension, this picture_coding_extension, quant_matrix_extension (), copyright_extension (), picture_display_extension (), can be included picture_spatial_scalable_extension (), or one Picture_temporal_scalable_extension. This structure always starts from 0x000001B5.

Ｕｓｅｒ＿ｄａｔａ
ｕｓｅｒ＿ｄａｔａ構造は、アプリケーション用の特定のデータをビデオシーケンス（ｖｉｄｅｏ＿ｓｅｑｕｅｎｃｅ）内に保管することを可能にする。ＭＰＥＧ２仕様は、この機能のフォーマットを定義せず、ユーザデータのフォーマットも定義していない。この構造は、ｕｓｅｒ＿ｄａｔａ＿ｓｔａｒｔ＿ｃｏｄｅ＝０ｘ０００００１Ｂ５から始まり、データストリーム（ｓｔｒｅａｍ）内の次の開始コードまで続く任意の個数のデータ（ｕｓｅｒ＿ｄａｔａ）を含む。唯一の条件は、２３個を超える連続する０を有してはならないことである。というのは、２３個を超える連続する０が、開始コードとして解釈される可能性があるからである。User_data
The user_data structure allows specific data for an application to be stored in a video sequence (video_sequence). The MPEG2 specification does not define the format of this function and does not define the format of user data. This structure includes any number of data (user_data) starting from user_data_start_code = 0x000001B5 and continuing to the next start code in the data stream (stream). The only condition is that it must not have more than 23 consecutive zeros. This is because more than 23 consecutive zeros may be interpreted as start codes.

Ｓｅｑｕｅｎｃｅ＿ｄｉｓｐｌａｙ＿ｅｘｔｅｎｓｉｏｎ（）
この構造は、デコーディングプロセスで使用されない情報すなわち、デコードされたビデオを正しく表示するのに有用な、コーディングされた内容に関する情報を提供する。

Sequence_display_extension ()
This structure provides information that is not used in the decoding process, that is, information about the coded content that is useful for correctly displaying the decoded video.

Ｓｅｑｕｅｎｃｅ＿ｓｃａｌａｂｌｅ＿ｅｘｔｅｎｓｉｏｎ
この構造は、すべてのスケーラブルビデオストリームに存在しなければならず、スケーラブルビデオストリームとは、ベースレイヤと１つまたは複数のエンハンスメントレイヤを含むビデオストリームである。異なるタイプのＭＰＥＧ２スケーラビリティがあり、メインレイヤのスケーラビリティの例が、ビデオ内容の標準解像力を含むことであり、エクステンションレイヤは、解像力を高める追加データを有する。

Sequence_scalable_extension
This structure must be present in every scalable video stream, which is a video stream that includes a base layer and one or more enhancement layers. There are different types of MPEG2 scalability, an example of main layer scalability is to include standard resolution of video content, and the extension layer has additional data to enhance the resolution.

Ｐｉｃｔｕｒｅ＿ｔｅｍｐｏｒａｌ＿ｓｃａｌａｂｌｅ＿ｅｘｔｅｎｓｉｏｎ（）
時間スケーラビリティを有する場合に、２つの空間解像度ストリームが存在し、下レイヤは、ビデオフレームのより小さいインデックスの版を提供し、上レイヤは、同一ビデオのフレームのより大きいインデックスの版を導出するのに使用することができる。時間スケーラビリティは、低品質、低コスト、または無料のデコーダによって使用することができ、より高いフレーム毎秒は、有料で使用される。

Picture_temporal_scalable_extension ()
In the case of temporal scalability, there are two spatial resolution streams, the lower layer provides a smaller index version of the video frame, and the upper layer derives a larger index version of the same video frame. Can be used for Temporal scalability can be used by low quality, low cost, or free decoders, with higher frames per second being used for a fee.

Ｐｉｃｔｕｒｅ＿ｓｐａｔｉａｌ＿ｓｃａｌａｂｌｅ＿ｅｘｔｅｎｓｉｏｎ（）
イメージ空間スケーラビリティの場合に、エンハンスメントレイヤにデータが含まれ、このデータは、ベースレイヤのよりよい解像度を可能にし、したがって、ベースレイヤのよりよい解像度を再構築することができる。エンハンスメントレイヤが、動き補償の基準としてのベースレイヤの関数として使用される時に、エンハンスメントレイヤのより高い解像度を得るために、下レイヤをエスカレートし、オフセットしなければならない。

Picture_spatial_scalable_extension ()
In the case of image space scalability, data is included in the enhancement layer, which allows for a better resolution of the base layer and can thus reconstruct a better resolution of the base layer. When the enhancement layer is used as a function of the base layer as a reference for motion compensation, the lower layer must be escalated and offset to obtain a higher resolution of the enhancement layer.

ＥＸＴＥＮＳＩＯＮ＿ＡＮＤ＿ＵＳＥＲ＿ＤＡＴＡ（２）
このＭＰＥＧ２互換コーディングプロセスは、現在、図５の立体カメラ（５２）を用いて撮影される３Ｄディジタルイメージをコーディングするのに使用され、そのコードは、次にコンパイラ（５１）を通り、次に、それをＰＣ（５０）、ＤＶＤ（５３）で表示するための信号を得ることができ、デコーダ（５４）内で信号がコーディングされる時に、これをデコーダ（５５）内に送って、イメージをケーブル（５６）、衛星（５７）、高品位テレビジョン（５９）（ＨＤＴＶ）、または３ＤＶｉｓｏｒ（登録商標）デバイス（５９）などで表示することができる。したがって、イメージは、
ＤＶＤ（ディジタル多用途ディスク）
ＤＴＶ（ディジタルテレビジョン）
ＨＤＴＶ（高品位テレビジョン）
ケーブル（ＤＶＢ、ディジタルビデオ放送）
衛星（ＤＤＳ、ディジタル衛星システム）
で表示することができ、これはソフトウェアプロセスとハードウェアプロセスの統合である。EXTENSION_AND_USER_DATA (2)
This MPEG2-compatible coding process is currently used to code 3D digital images taken with the stereoscopic camera (52) of FIG. 5, which code then passes through the compiler (51), then A signal for displaying it on a PC (50), DVD (53) can be obtained, and when the signal is coded in the decoder (54), it is sent to the decoder (55) to cable the image. (56), satellite (57), high-definition television (59) (HDTV), or 3D Visor (registered trademark) device (59). So the image is
DVD (digital versatile disc)
DTV (digital television)
HDTV (High Definition Television)
Cable (DVB, digital video broadcasting)
Satellite (DDS, digital satellite system)
This is the integration of software and hardware processes.

ハードウェアに関して、プロセスのほとんどが、ＤＳＰ（ディジタル信号プロセッサ）と称するデバイスによって実行される。すなわち、１つのＭｏｔｏｒｏｌａ社のモデルおよび１つのＴｅｘａｓＩｎｓｔｒｕｍｅｎｔｓ社のモデル（ＴＭＳ３２０Ｃ６２Ｘ）である。 With regard to hardware, most of the process is performed by a device called a DSP (Digital Signal Processor). That is, one Motorola model and one Texas Instruments model (TMS320C62X).

これらのＤＳＰは、当の製造業者によって供給されるＣ言語およびアセンブリ言語のハイブリッドによってプログラムされる。各ＤＳＰは、ＤＳＰ内に置かれ、ソフトウェアによって呼び出される関数リストまたは手続き呼出しからなる、それ自体のＡＰＩを有する。この参照情報から、３Ｄイメージがコーディングされ、このコードは、ＭＰＥＧ２フォーマットおよびそれ自体のコーディングアルゴリズムと互換である。情報がコーディングされる時に、ＤＳＰは、ＭＰＥＧ２圧縮されたビデオストリームを形成するために、予測プロセス、比較プロセス、量子化プロセス、およびＤＣＴ関数適用プロセスを実行する責任を負う。 These DSPs are programmed with a hybrid of C and assembly language supplied by the manufacturer. Each DSP has its own API that consists of a function list or procedure call that is placed in the DSP and called by software. From this reference information, a 3D image is coded, which is compatible with the MPEG2 format and its own coding algorithm. When the information is coded, the DSP is responsible for performing a prediction process, a comparison process, a quantization process, and a DCT function application process to form an MPEG2 compressed video stream.

本発明の特定の実施形態を図示し、説明したが、当業者には、本発明の範囲から逸脱せずに複数の修正または変更を作ることができることは明白であろう。そのような修正および変更のすべてが、添付の特許請求の範囲に含まれ、その結果、すべての変更および修正が、本発明の範囲に含まれるようになることが意図されている。 While particular embodiments of the present invention have been illustrated and described, it would be obvious to those skilled in the art that multiple modifications or changes can be made without departing from the scope of the invention. All such modifications and changes are intended to be included within the scope of the appended claims, and as a result, all changes and modifications are intended to be included within the scope of the present invention.