BACKGROUND OF THE INVENTION1. FIELD OF THE INVENTION[0001]
This invention relates to a method and apparatus for inputting and encoding a moving image and to an apparatus for decoding the encoded moving image. This invention particularly relates to a technique for encoding an image frame by first partitioning it into multiple regions and to a technique for decoding the encoded image frame.[0002]
2. DESCRIPTION OF THE RELATED ART[0003]
FIG. 1 is a block diagram of a first prior art showing the configuration of a moving image encoder based on ITU-T recommendation H.263, wherein[0004]numeral1 indicates an input digital image signal (hereinafter referred to simply as an input image),numeral101 indicates a differentiator,numeral102 indicates a prediction signal,numeral103 indicates a prediction error signal,numeral104 indicates an encoder,numeral105 indicates encoded data,numeral106 indicates a decoder,numeral107 indicates a decoded prediction error signal, numeral108 indicates an adder,numeral109 indicates a local decoded image signal,numeral110 indicates a memory,numeral111 indicates a prediction section, andnumeral112 indicates a motion vector.
The[0005]input image1 to be encoded is first input todifferentiator101.Differentiator101 takes the difference betweeninput image1 andprediction signal102 for output asprediction error signal103.Encoder104 encodesinput image1, which is an original signal, orprediction error signal103, and outputs encodeddata105. The encoding method inencoder104 employs a technique in the above-mentioned recommendation whereprediction error signal103 is transformed from a space region to a frequency region using Discrete Cosine Transformation (DCT), a type of orthogonal transformation, and the obtained transformation coefficient is linearly quantized.
Encoded[0006]data105 is branched into two directions, where one is transmitted to a receiver, or an image decoding apparatus (not shown) and the other is input todecoder106 within the present apparatus.Decoder106 performs an operation which is the opposite ofencoder104, and generates and outputs decodedprediction error signal107 from encodeddata105. Adder108 addsprediction signal102 with decodedprediction error signal107 and outputs the result as decodedimage signal109.Prediction section111 performs motion-compensated prediction usinginput image1 and decodedimage signal109 of the previous frame stored inmemory110, andoutputs prediction signal102 andmotion vector112. At this time, motion compensation is performed in block units of a fixed size called a macro block comprising 16×16 pixels. As an optional function for a block within a region having large movements, motion-compensated prediction can be performed with the macro block partitioned into four sub-block units of 8×8 pixels. The obtainedmotion vector112 is transmitted toward the image decoding apparatus, andprediction signal102 is sent todifferentiator102 and adder108. According to this apparatus, the amount of data of the moving image can be compressed while maintaining image quality through the use of motion-compensated prediction.
In this prior art, the shape of the encoding unit region is limited to two types. Moreover, both shapes are rectangular. Therefore, there is naturally a limit in the encoding which can be adapted to the scene structure or features of an image. For example, if it is desired to increase the amount of code only for an object having large movements, it is preferable, although difficult in this prior art, to define a region having a shape identical to that of the object.[0007]
FIG. 2 is a block diagram of an image encoding apparatus concerning a second prior art. This apparatus is based on an encoding method that was proposed in “A Very Low Bit Rate Video Coder Based on Vector Quantization” by L. C. Real et al (IEEE Transactions on Image Processing, Vol. 5, No. 2, February 1996). In the same figure,[0008]numeral113 indicates a region partitioning section,numeral114 indicates a prediction section,numeral115 indicates a region determination section,numeral116 indicates encoding mode information including inter-frame encoding and intra-frame encoding information,numeral117 indicates a motion vector,numeral118 indicates an encoder, andnumeral119 indicates encoded data.
In this apparatus,[0009]input image1 is first partitioned into multiple regions byregion partitioning section113.Region partitioning section113 determines the sizes of regions in accordance with the motion-compensated prediction error.Region partitioning section113 performs judgment using a threshold with regard to dispersion of the inter-frame signal and assigns small blocks to regions having large movement and large blocks to regions, such as backgrounds, having small movement from among ten types of block sizes prepared in advance of 4×4, 4×8, 8×4, 8×8, 8×16, 16×8, 16×16, 16×32, 32×16, and 32×32. In concrete terms, a dispersion value is calculated byregion determination section115 for the prediction error signal obtained byprediction section114, and based on it the block size is determined.Attribute information116, such as region shape information and encoding mode information, as well asmotion vector117 are determined at this time, and the prediction error signal or the original signal is encoded byencoder118 in accordance with the encoding mode information to yield encodeddata119. Subsequent processes are the same as those of the first prior art.
This prior art is richer in processing flexibility than the first prior art from the viewpoint of preparing multiple sized blocks. However, this apparatus also limits each region to a rectangular shape. Therefore, even with rectangular shapes in ten sizes, there is room for improvement in adaptability with respect to arbitrarily shaped image regions.[0010]
SUMMARY OF THE INVENTIONThe present invention takes into consideration these problems with the object of providing a moving image encoding technique for performing more flexible processing according to the conditions of the image to be processed. The object of this invention, in more concrete terms, is to provide a moving image encoding technique using region partitioning techniques that can accurately handle various image structures. Another object of this invention is to provide a partitioning criterion based on various points of view when partitioning regions for encoding. Still another object of this invention is to provide a technique for correctly decoding the encoded data of regions that have been partitioned into various shapes.[0011]
The moving image encoding method of this invention includes two steps. A first step partitions an input image into multiple regions based on a predetermined partitioning judgment criterion. The encoding until this point is the same as the general conventional region-based encoding. However, in a second step, this invention integrates each of partitioned multiple regions with adjacent regions based on a predetermined integration judgment criterion. Thereafter, in a third step, the image signal is encoded for each of the regions remaining after integration. According to this method, the integration process allows regions to take on various shapes. Thus, a region having a shape closely matching the structure of an image or outline of an object can be generated.[0012]
The moving image encoding apparatus of this invention includes a region partitioning section and an encoder. The region partitioning section includes a partitioning processing section for partitioning the input image into multiple regions based on a predetermined partitioning judgment criterion, and a integration processing section for integrating each of multiple regions partitioned by the partitioning processing section with adjacent regions based on a predetermined integration judgment criterion. The encoder encodes the image signal for each of the regions remaining after integration by the integration processing section. According to this apparatus, a comparatively high image quality can be achieved at comparatively high data compression ratios while flexibly supporting the structures of images.[0013]
The above-mentioned integration processing section performs preliminary encoding and decoding of images for each region, and may examine the amount of code and the encoding distortion. In such a case, the encoding distortion can be minimized under the constraint of a predetermined amount of code.[0014]
The above-mentioned partitioning processing section includes a class identifying section for classifying the importance of regions into classes, and may judge whether or not to partition each region based on an activity to be described later and the class. If the class identifying section references feature parameters in images, the recognition of objects becomes possible thus facilitating more accurate region partitioning.[0015]
On the other hand, the moving image decoding apparatus of this invention inputs and decodes the encoded data of the image that was encoded after being partitioned into multiple regions. This apparatus includes a region shape restoring section and an image data decoder. The region shape restoring section restores, based on region shape information included in the encoded data, the shape of each region that was partitioned during encoding. The image data decoder, after specifying the sequence in which regions were encoded based on the shapes of the restored regions, decodes the image for each region from the encoded data. According to this apparatus, accurate decoding is achieved even if regions having various shapes are generated in the encoding stage.[0016]
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 shows a moving image encoding apparatus relating to a first prior art.[0017]
FIG. 2 shows a moving image encoding apparatus relating to a second prior art.[0018]
FIG. 3 is a block diagram common to general moving image encoding apparatus relating to an embodiment.[0019]
FIG. 4 is a flowchart showing an operation of the encoding apparatus of FIG. 3.[0020]
FIG. 5 is an internal block diagram of the region partitioning section of FIG. 3.[0021]
FIG. 6 is an internal block diagram of the partitioning processing section of FIG. 5.[0022]
FIG. 7 is a flowchart showing an operation of the partitioning processing section of FIG. 6.[0023]
FIG. 8 shows an example of a uniform partitioning result in the partitioning processing section of FIG. 6.[0024]
FIG. 9 shows a result of a first initial partitioning in the partitioning processing section of FIG. 6.[0025]
FIG. 10 shows a final result of initial partitioning in the partitioning processing section of FIG. 6.[0026]
FIG. 11 is an internal block diagram of the integration processing section of FIG. 5.[0027]
FIG. 12 is a flowchart showing an operation of the integration processing section of FIG. 11.[0028]
FIG. 13 shows an example of labeling a region in the integration processing section of FIG. 11.[0029]
FIG. 14 shows an example of setting adjacent regions in the integration processing section of FIG. 11.[0030]
FIG. 15 is a flowchart showing the procedure of S[0031]19 of FIG. 12.
FIG. 16 is an internal block diagram of another embodiment of the partitioning processing section of FIG. 5.[0032]
FIG. 17 shows a final result of initial partitioning in the partitioning processing section of FIG. 16.[0033]
FIG. 18 is an internal block diagram of another embodiment of the partitioning processing section of FIG. 5.[0034]
FIG. 19 is a flowchart showing an operation of the partitioning processing section of FIG. 18.[0035]
FIG. 20 shows another embodiment of the class identifying section of FIG. 18.[0036]
FIG. 21 shows motion-compensated prediction based on block matching.[0037]
FIG. 22 is an internal block diagram of another embodiment of the partitioning processing section of FIG. 5.[0038]
FIG. 23 is a flowchart showing an operation of the partitioning processing section of FIG. 22.[0039]
FIG. 24 is an internal block diagram of another embodiment of the integration processing section of FIG. 5.[0040]
FIG. 25 is a flowchart showing an operation of the integration processing section of FIG. 24.[0041]
FIG. 26 is an internal block diagram of another embodiment of the integration processing section of FIG. 5.[0042]
FIG. 27 is an internal block diagram of a moving image decoding apparatus relating to the embodiment.[0043]
FIG. 28 is a flowchart showing an operation of the decoding apparatus of FIG. 24.[0044]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTSFirst Embodiment[0045]
FIG. 3 is a block diagram showing a configuration of a moving image encoding apparatus related to this embodiment. This apparatus can be used in portable or stationary equipment for image communications, such as TV telephones and TV conferencing. It can also be used as a moving image encoding apparatus in image storage and recording apparatus such as digital VCRs and video servers. Furthermore, the processes in this apparatus can also be used as a moving image encoding program to be installed in the form of software or DSP firmware.[0046]
In FIG. 3,[0047]numeral1 indicates the input image, numeral2 indicates a region partitioning section, numeral3 indicates region shape information,numeral4 indicates a region image signal, numeral5 indicates region motion information,numeral6 indicates region attribute information,numeral7 indicates an encoder, numeral8 indicates a local decoded image, numeral9 indicates a memory, numeral10 indicates a reference image, and numeral11 indicates an encoded bit stream. FIG. 4 is a flowchart showing an operation of the apparatus. The overall operation of the apparatus is first described with reference to FIGS. 3 and 4.
[0048]Input image1 is input to region partitioning section2 (S1) where it is partitioned into multiple regions.Region partitioning section2 performs initial partitioning (S2) and adjacent region integrating (S3), as will to be described later.Region partitioning section2 passes shapeinformation3,image signal4, attributeinformation6 such as encoding modes of the regions and,motion information5 for each region obtained as a result of partitioning toencoder7.Encoder7 transforms and multiplexes these information items into a bit pattern based on a predetermined encoding method for output as encoded bit stream11 (S4, S5). In order to perform region partitioning and encoding based on motion-compensated prediction,encoder7 generates local decodedimage8 for each region and stores it intomemory9.Region partitioning section2 andencoder7 fetches the local decoded image stored inmemory9 asreference image10 to perform motion-compensated prediction.
FIG. 5 is a detailed block diagram of[0049]region partitioning section2 wherein numeral12 indicates a partitioning processing section, numeral13 indicates initial partition shape information, and numeral14 indicates a integration processing section.
(1) Initial Partitioning[0050]
The initial partitioning corresponding to S[0051]2 of FIG. 4 is performed atpartitioning processing section12. Initial partitioning refers to the partitioning which is performed before proceeding to integration, and the total partitioning count is dependent on the state of the image, namely, the features or characteristics of the image.
FIG. 6 shows an internal configuration of[0052]partitioning processing section12 wherein numeral15 indicates a uniform partitioning section, numeral16 indicates an activity calculating section, numeral17 indicates an activity, numeral18 indicates a partitioning judgment section, and numeral19 indicates a partition state instruction signal. The activity refers to an evaluated value for judging the features or characteristics of the image regarding a predetermined property. A prediction error power accompanying motion-compensated prediction for a region is employed as the activity in this embodiment.
FIG. 21 shows a method of motion-compensated prediction based on a block matching method. In the block matching method, vector v given in the following formula is found as the motion vector of the region S to be predicted.
[0053]The term fs(x, y, t) is the pixel value on (x, y) at time t of the predicted region S, fs(x, y, t−1) is the pixel value on (x, y) at time t−1, and fs(x+v[0054]x, y+vy, t−1) is the pixel value of the position that is displaced from position (x, y, t−1) by the amount of vector v. R represents the motion vector search range.
From the obtained vector v, the prediction image is obtained by fs(x+v[0055]x, y+vy, t−1), and the prediction error power, or activity, becomes Dmin, Defining the activity with this method enables region partitioning to be performed according to the complexity of the local motion of the image. Control becomes possible, such as for detailed encoding for portions having large movements and rough encoding for portions having small movements. Affine motion compensation for obtaining affine motion parameters and perspective motion compensation for detecting three-dimensional motion may be used.
FIG. 7 is a flowchart showing an operation of[0056]partitioning processing section12 wherein unconditional uniform block partitioning is first performed (S8) byuniform partitioning section15. At this time, one frame is partitioned, for example, into blocks of 32×32 pixels as shown in FIG. 8. This partitioning process is called a 0thpartitioning stage. The number of blocks generated in the 0thpartitioning stage is denoted by N0and each block by B0n(1≦n≦N0).
Next, a judgment is made individually as to whether or not to perform further block partitioning for each B[0057]0n(S9). For this purpose,activity17 for each B0nis calculated inactivity calculating section16.Partitioning judgment section18 compares threshold TH0 that was set in advance with the activity of each block, and ifactivity17 is larger than TH0, the corresponding B0nis further partitioned into four blocks (S10). This is called a 1stpartitioning stage.
FIG. 9 illustrates the partitioned image at the 1[0058]stpartitioning stage. The number of newly generated 16×16 pixel blocks is denoted by N1and each block by B1n(1≦n≦N1). Hereafter, the activity of each B1nis calculated and a 2ndpartitioning stage is performed using threshold TH1. Thereafter, threshold THj is applied to block Bjngenerated in a jthpartitioning stage and the j+1thpartitioning stage is executed (S13 to S16). The initial partitioning is terminated when j reaches a predetermined upper limit value. It is assumed here for the purpose of description that the process is terminated at the end of the 2ndpartitioning stage. In this case, blocks as shown in FIG. 10 are generated. Block sizes range from 8×8 pixels to 32×32 pixels. The number of blocks at the end of initial partitioning is denoted by M0and the initial region of each block by S0n. The shape information for S0nis passed tointegration processing section14 as initialpartition shape information13.
(2) Integrating Adjacent Regions[0059]
[0060]Integration processing section14 performs integration with adjacent regions for each S0n. The internal configuration ofintegration processing section14 is shown in FIG. 11 wherein numeral20 indicates a labeling section, numeral21 indicates an adjacent region setting section, numeral22 indicates a provisional encoder, numeral23 indicates a decoder, numeral24 indicates an encoding distortion calculating section, numeral25 indicates an evaluation value calculating section, numeral26 indicates a constant for evaluation value calculation, numeral27 indicates a integration judgment section, and numeral28 indicates a integration process iteration instruction signal.
FIG. 12 is a flowchart showing an operation of[0061]integration processing section14. As shown in the flowchart, numbers or labels, are first assigned to initial regions S0nby labelingsection20 in accordance to a predetermined rule (S17). For example, numbers are assigned in sequence to regions while the image frame is scanned horizontally in pixel units from the top left corner to the bottom right corner. A simple example of labeling is shown in FIG. 13 wherein labels “1”, “2”, and so forth are assigned to the regions in their sequence of appearance on the scanning line. At this time, region size is ignored. Hereinafter, the label value of region Sknis denoted by 1(Skn). The k here corresponds to a kthpartitioning stage to be described later, where the initial state is k=0.
Next, the “adjacent regions” of each region are defined by adjacent region setting section[0062]21 (S18) using labels. FIG. 14 is an example of adjacent regions wherein the adjacent regions of region S0nare based on the labels of FIG. 13. Regions B, C, and D, which are adjacent to the edges of region A and have label values larger than that of region A, are defined as adjacent regions.
Next, a judgment is made for each region as to whether or not the region can be integrated with its adjacent regions. For this reason, an evaluation value for integration is calculated (S[0063]19) byprovisional encoder22,decoder23, encodingdistortion calculating section24, and evaluationvalue calculating section25. The evaluation value is amount of code−distortion cost L(Skn) expressed in the following formula.
L(Skn)=D(Skn)+λR(Skn) Formula 1
Here, D(S[0064]kn) is the encoding distortion of Skn, namely, the square error summation, R(Skn) is the amount of code of Skn, and λ is the constant26. The integration proceeds in the direction of decreasing L(Skn). Decreasing L(Skn) is equivalent to decreasing the encoding distortion within the range of the predetermined amount of code based on the given constant λ. Decreasing the summation of L(Skn) enables the encoding distortion to be reduced when the same amount of code is used.
FIG. 15 is a detailed flowchart of S[0065]19. First, Sknis preliminarily encoded (S22) atprovisional encoder22. The purpose of this encoding is to prepare for the, calculation of the amount of code R(Skn) and the derivation of encoding distortion D(Skn). In this embodiment,provisional encoder22 performs motion compensation usingreference image10. The data to be encoded includes image data, namely, the prediction error signal or original signal, motion information to specify the prediction image, and attribute information such as of the encoding mode, where the summation of the amounts of these codes is R(Skn). The prediction error signal is obtained as the difference of the original signal of Sknand the prediction image.
[0066]Decoder23 generates the local decoded image for Skn(S23) using the encoded data obtained byprovisional encoder22. Next, distortion D(Skn) of the local decoded image and original image is calculated (S24) by encodingdistortion calculating section24. Evaluationvalue calculating section25 calculates (S25) amount of code−distortion cost L(Skn) from R(Skn) and D(Skn).
[0067]Step19 performs the preceding evaluation value calculation for all regions for the three types of
1. Each region S[0068]knitself: L(Skn)
2. Adjacent regions N[0069]i[Skn] of Skn: L(Ni[Skn])
3. Region temporarily integrating S[0070]knand Ni[Skn]: L(Skn+Ni[Skn])
Here, N[0071]i[Skn] denotes an adjacent region of Skn, and i is a number for distinguishing the multiple adjacent regions.
Next, in[0072]integration judgment section27, a location within the image frame where
DL=L(Skn)+L(Ni[Skn])−L(Skn+Ni[Skn])
is a maximum is searched for, and the corresponding S[0073]knand Ni[Skn] are integrated (S20). This is the kthintegration stage. Hereafter,integration judgment section27 instructslabeling section20 to update labels through integration processiteration instruction signal28.Labeling section20 replaces label 1(Ni[Skn]) with 1(Skn), and again sets adjacent regions with adjacentregion setting section21. This yields new region Sk+1nand adjacent regions Ni[Sk+1n], thus determining L(Sk+1n), L(Ni[Sk+1n]), and L(Sk+1n+Ni[Sk+1n]).Integration judgment section27 halts the instructions tolabeling section20 when there are no further combinations yielding positive values of DLand terminates the integration process (S21).
This terminates the processing for partitioning and integrating, and[0074]information3 expressing the region partitioned state ofinput image1,image data4 for each region,motion information5, and attributeinformation6 is output toencoder7. Hereafter, encoding is performed according to a predetermined encoding method.
In this embodiment, integrating was performed as well as partitioning and each region can be expressed as a set of rectangular blocks of various sizes. For example, an object within an image having large movements can be integrated into a single region having a shape similar to the outline of the object. As a result, the amount of code is controlled by changing the quantization parameter for each object so as to enable flexible handling of images based on their actual structures. Furthermore, optimum region partitioning which minimizes encoding distortion is achieved under a fixed amount of code. Thus, compared to the conventional moving image encoding apparatus, higher image quality can be achieved with a smaller amount of code.[0075]
Although the initial partitioning in this embodiment was terminated at the end of the 2[0076]ndpartitioning stage, it may of course be terminated at another stage. For example, if the overall movement of the image is small, the initial partitioning may be terminated at the 1ststage and, if not, the number of stages may be increased. Furthermore, although image frames were encoded in this embodiment, it is also possible to apply this encoding in a similar manner to a rectangular image area including an object of arbitrary shape in the image frame.
For described[0077]encoder7 andprovisional encoder22, the encoding of Sknwas performed through a combination of DCT and linear quantization. However, other encoding methods, such as vector quantization, sub-band encoding, or wavelet encoding, may be used. Multiple encoding methods may be prepared and a configuration selectively using the method having the best encoding efficiency may be employed.
Although prediction error power was adopted for the activity in this embodiment, other examples given below may be considered.[0078]
A first example is a dispersion value within the region. The dispersion value expresses the complexity of the pixel distribution of the region, and the dispersion value becomes larger for a region that includes images where pixel values, such as at edges, vary suddenly. Dispersion value σ
[0079]sis given by the following formula when the pixel value within region S is set to fs(x, y, t) and the mean of pixel value within region S is set to μ
s.
By this activity, regions can be partitioned according to the complexity of the local structure of the image, and control is possible for detailed encoding of portions where pixel values change drastically and rough encoding of portions where pixel values change minimally.[0080]
A second example is the edge intensity within the region. The edge intensity can be solved using a Sobel operator as mentioned in “Edge detection by compass gradient masks” by G. Robinson (Journal of Computer Graphics and Image Processing, Vol. 6, No. 5, October 1977) as the number of pixels distributed on the edge or edge distribution area. In the case of this method, regions can be partitioned according to the edge structure of the image, and control is possible for detailed encoding of portions where edges are located and rough encoding of portions where edges do not exist.[0081]
As a third example, the magnitude of the motion parameter based on motion-compensated prediction of the region can be given. As a result of motion-compensated prediction, the motion parameter is obtained. This corresponds to vector v in the block matching method. According to this method, regions can be partitioned according to the degree of motion of the image, and control is possible for detailed encoding of portions where localized large movements occur, such as object regions, and rough encoding of portions where movements rarely occur, such as background regions.[0082]
A fourth example is the linear sum of the amount of code of the motion parameter based on motion-compensated prediction of the region and the prediction error power. The evaluation value of this case may be defined in the following formula.[0083]
Lmc=Dmc+λRmc Formula 2
Here, D[0084]mcis the prediction error power determined in the course of motion parameter detection, λ is a constant, and Rmcis the amount of code of the motion parameter. The motion parameter minimizing Lmcis determined and the evaluation value at the time is set as the activity. According to this method, regions are partitioned so as to lower the total encoding cost including the amount of information of the motion parameter and the amount of information based on the complexity of motion of the image, enabling encoding of partitions to be performed with a small amount of information.
A fifth example is the linear sum of the activity values. By performing appropriate weighting for each activity, it becomes possible to handle a variety of images.[0085]
Although initial partitioning is performed in the[0086]partitioning processing section12 in this embodiment, this section or the like can be provided outside theregion partitioning section2. With that arrangement, the initial partitioning is done outside the moving image encoding apparatus shown in FIG. 1 and a pre-partitioned image is directly input to theregion partitioning section2.
Second Embodiment[0087]
This embodiment relates to an apparatus wherein[0088]region partitioning section2 of the first embodiment has been partially modified. FIG. 16 is an internal block diagram ofregion partitioning section2 in this embodiment. As shown in this diagram,region partitioning section2 of the second embodiment has a configuration whereinpartitioning processing section12 of FIG. 5 has been replaced byuniform partitioning section15. As shown in FIG. 17, a threshold judgment of the activity is not performed in the initial partitioning process in this configuration, and uniform partitioning is unconditionally performed in square blocks of minimum region area. This minimum region area may be made selectable.
Setting of the threshold is unnecessary in this embodiment, and region partitioning is performed only for amount of code—distortion cost as the evaluation value. Therefore, the procedure associated with threshold setting becomes unnecessary, as do activity calculation and comparison judgment processing. Thus, this embodiment can be used in addition to the first embodiment in order to lighten the computational load relating to these processes.[0089]
Third Embodiment[0090]
In the partitioning process of this embodiment, a judgment is made as to whether or not partitioning is possible, not only including the activity, but also including an index (hereinafter called a class) indicating the importance of the region. It is preferable to perform detailed encoding for regions having high importance, and to reduce region areas. Regions having low importance are made as large as possible so as to reduce the amount of code per pixel.[0091]
The activity is, for example, a closed, local statistical value within the region. On the other hand, the classes in this embodiment are based on the features of the image spanning regions. In this embodiment, the classes are defined on the basis as to what degree a person views the region, namely, a person's degree of observation, due to the object structure traversing the region. For example, when the edge distribution of a given region spans a wide range and the connection with adjacent regions is strong, it is highly possible the region is located at the boundary of an object.[0092]
FIG. 18 is an internal block diagram of[0093]partitioning processing section12 in this embodiment. Besides that shown, the configuration is identical to that of the first embodiment and the following description centers on the differences from the first embodiment. In the same diagram, numeral29 indicates a class identifying section, numeral30 indicates a class identifier, and numeral31 indicates a partitioning judgment section. FIG. 19 is a flowchart showing an operation ofpartitioning processing section12 shown in FIG. 18.
As shown in FIG. 19, uniform partitioning (S[0094]26) is first performed. Hereafter,class30 of each region is determined (S27) byclass identifying section29.Class identifying section29 determines the class by evaluating magnitude α of the dispersion within the region, state β of the edge distribution within the region (includes edge direction and distribution area), and connectivity γ of the edges with adjacent regions. For example, a region having a dispersion α that is less than a predetermined value is set as the lowest class (class A), while the edge distribution β within the region is further determined for regions having dispersion α that is larger than the predetermined value. The determination of β can be accomplished, for example, by the previously mentioned Sobel operator. If β is less than the predetermined value, the region is considered to be a small area having an independent edge rather than an object boundary, then set as an intermediate class (class B). When β is to a certain extent large, connectivity γ is evaluated, and if γ is large, the region is classified into the most important class (class C).
After classification into classes,[0095]activity17 is calculated inactivity calculating section16, and a threshold judgment relating to the activity is first performed (S28) bypartitioning judgment section31. For a region judged here to require partitioning, a judgment is made for permission to partition based on class30 (S29). Thus, partitioningjudgment section31 holds a criterion in advance which defines to what extent of size a region of each class is to be partitioned. If permission is granted for partitioning with regard to a class, the region is partitioned (S30). This is performed for all regions, and the same partitioning process is also performed for the newly created partitioned regions (S33 to S38).
According to this embodiment, the encoding of images can be performed while taking into consideration the features of images spanning multiple regions, particularly the outlines of objects. Control is possible so that regions with a low degree of observation are roughly encoded to reduce the amount of information, and the amount of information reduced is applied to regions having a high degree of observation.[0096]
Fourth Embodiment[0097]
The degree of observation of the person was employed in class determination in the third embodiment. In this embodiment, features of a known image are stored, and classes are determined according to the degree of coincidence between the stored features and the features calculated from each region.[0098]
For example, for images of faces, considerable research has been conducted, and many techniques have been proposed for digitizing face structures. Once these features are stored, a person's face (generally having high importance) can be detected from within the image. For other objects, there are also many instances where they can be described by features based on luminance and texture information. In order to clearly express a person's face, the region having features coinciding with features of the person's face is set as the most important class A, while other regions are set as class B of normal importance.[0099]
FIG. 20 is a block diagram of[0100]class identifying section29 in this embodiment. The other blocks are equivalent to those in the third embodiment. In FIG. 20, numeral32 indicates a features memory, numeral33 indicates a degree of feature coincidence calculating section, and numeral34 indicates a class determination section.
[0101]Features memory32 holds the features relating to objects for each object classified into classes. Degree of featurecoincidence calculating section33 calculates the degree of coincidence ofinput image1 and the features of the object classified into classes. The degree of coincidence is determined, for example, as an error between the features ofinput image1 and the features withinfeatures memory32. Next, the object having the highest degree of coincidence is detected byclass determination section34, and the concerned regions are classified into that object class.
According to this embodiment, the identification or detection of objects becomes possible depending on features of the image. Image quality can be further improved where necessary. The classification of objects into classes may be performed according to the features associated with the person's degree of observation, in which case encoding can be performed while taking into consideration human visual characteristics with respect to the image.[0102]
Fifth Embodiment[0103]
Encoding distortion during the integration process was taken into consideration in the first embodiment. In this embodiment, encoding distortion in the partitioning process stage is taken into consideration.[0104]
FIG. 22 is an internal block diagram of[0105]partitioning processing section12 in this embodiment, wherein numeral35 indicates a partitioning judgment section and numeral36 indicates a partitioning process iteration instruction signal. FIG. 23 is a flowchart showing an operation ofpartitioning processing section12 of FIG. 22.
[0106]Partitioning processing section12 of this embodiment employsformula1 that was introduced in the first embodiment. Through the use of this formula, the initial partitioning process is performed in a direction of reducing the summation of L(Skn) within the frame so that the encoding distortion can be reduced when the same amount of code is used.
As shown in FIG. 23, uniform block partitioning is first performed (S[0107]39) inuniform partitioning section15, for example, so that the state of FIG. 8 is obtained. This corresponds to the 0thpartitioning stage. The number of blocks obtained at this time is denoted by N0and each block is denoted by B0n(1≦n≦N0). A judgment is made for each B0nas to whether or not to perform further block partitioning. A comparison is made between L(B0n) relating to B0nand the summation of L(SB0n(i)) relating to each sub-block SB0n(i) (1≦i≦4) obtained after B0nis partitioned into four parts. Partitioning is permitted if the latter is smaller.
In calculating the amount of code—distortion cost, encoding of B[0108]0nand SB0n(i) is first performed inprovisional encoder22. Next, indecoder23, the local decoded images of B0nand SB0n(i) are generated from the encoded data obtained fromprovisional encoder22. Next, the distortion between the local decoded images and the original image, D(B0n) and D(SB0n(i)), are calculated by encodingdistortion calculating section24. Evaluationvalue calculating section25 calculates L(B0n) and L(SB0n(i)) from amount of code R(B0n) and R(SB0n(i)) and encoding distortion D(B0n) and D(SB0n(i)) (S40, S41).
[0109]Partitioning judgment section35 compares L(B0n) and the summation of the four sub-blocks of L(SB0n(i)) (i=1, 2, 3, 4) (S42), and partitions B0ninto four parts of SB0n(i) if the latter is smaller (S43). This corresponds to the 1stpartitioning stage. The blocks partitioned into SB0n(i) are newly denoted by B1n(1≦n≦N1), and the same partitioning judgment is performed with respect to B1n(S46 to S51). Subsequently, the same partitioning process is performed a predetermined number of times. The partitioned state shown in FIG. 10, for example, is achieved as a result.
Since activity-related operations are not performed in this embodiment, this embodiment is particularly advantageous if importance is placed on reducing the amount of operations.[0110]
Sixth Embodiment[0111]
Another example of[0112]integration processing section14 shown in FIG. 11 of the first embodiment is described. FIG. 24 is an internal block diagram ofintegration processing section14 of this embodiment wherein numeral37 indicates a quantization parameter setting section, numeral38 indicates a quantization parameter, and numeral39 indicates a provisional encoder. The operation ofintegration processing section14 is basically the same as shown in FIG. 12, with the exception of S19.
FIG. 25 is a flowchart showing a process of evaluation value calculation corresponding to S[0113]19. The evaluation value calculation is performed byprovisional encoder39,decoder23, encodingdistortion calculating section24, and evaluationvalue calculating section25.
First, an initial parameter value is set in quantization[0114]parameter setting section37 and output (S52) toprovisional encoder39. Next, encoding of region Sknis performed (S53) inprovisional encoder39. During encoding, quantization is performed using the set quantization parameter.
[0115]Decoder23 generates the local decoded image of Sknfrom the encoded data obtained in this manner (S54). Next, distortion D(Skn) between the local decoded image and the original image is calculated (S55) at encodingdistortion calculating section24. Evaluationvalue calculating section25 calculates L(Skn) from amount of code R(Skn) and encoding distortion D(Skn) (S56). The value of cost obtained from the initial calculation is held as Lmin, after which the quantization parameter is varied and the same cost calculation is performed. Because varying the quantization parameter changes the balance between the amount of code and distortion, the parameter for when the amount of code—distortion cost is at a minimum is employed, resulting in amount of code—distortion cost L(Skn) of region Skn(S57 to S60). The remainder is the same as the first embodiment.
According to this embodiment, an optimum integration process is achieved while taking into consideration the quantization parameter. This method of including the quantization parameter is also applicable to the partitioning process based on the amount of code—distortion cost described in the fifth embodiment.[0116]
Seventh Embodiment[0117]
Yet another example of the sixth embodiment is described in this embodiment. FIG. 26 is an internal block diagram of[0118]integration processing section14 of this embodiment wherein numeral40 indicates a motion-compensated prediction cost calculating section, numeral41 indicates a motion-compensated prediction cost, and numeral42 indicates a provisional encoder.
[0119]Provisional encoder42 uses encoding based on motion-compensated prediction to determine the motion parameter. At this time, the motion-compensated prediction cost (formula2) described in the first embodiment is used. In other words, determination of the motion parameter during temporary encoding is performed so that the cost is minimized by taking a balance between motion-compensation based matching distortion and the amount of code of the motion parameter. In concrete terms, in the encoding byprovisional encoder42, the motion parameter is determined from the value of cost that is calculated by motion-compensated predictioncost calculating section40. The remainder of the process is similar to that of the sixth embodiment.
According to this embodiment, from a given constant λ, the region shape can be determined while minimizing the overall amount of code—distortion cost from motion compensation to encoding. As a result, the encoding distortion based on a predetermined amount of code can be reduced.[0120]
Eighth Embodiment[0121]
In this embodiment, a moving image decoding apparatus is described for decoding encoded bit streams that are generated by various moving image encoding apparatuses. FIG. 27 shows a configuration of the decoding apparatus wherein numeral[0122]43 indicates a bit stream analyzer, numeral44 indicates a region shape decoder, numeral45 indicates an attribute information decoder, numeral46 indicates an image data decoder, numeral47 indicates a motion information decoder, numeral48 indicates a motion parameter, numeral49 indicates a motion compensation section, numeral50 indicates a prediction image, numeral51 indicates an image decoder, numeral52 indicates an external memory, and numeral53 indicates a reproduced image.
This decoding apparatus decodes encoded bit streams consisting of region shape information representing region partitioned state related to an image frame or partial image within an image frame (referred to as “image frames and the like” hereinafter), image data for regions encoded by a predetermined method, attribute information of regions, and motion information of regions; restores region images; and reproduces image frames and the like.[0123]
For this embodiment, the description method for region shape information differs from general conventional methods in that non-rectangular shaped regions are generated in the process of encoding. The description method employed in this embodiment is based on[0124]
i) explicit coordinates of vertices of each region,[0125]
ii) explicit process in encoding when regions are partitioned or integrated,[0126]
or the like. In the method of ii), for example, the number of the region partitioned in the i[0127]thpartitioning stage and the number of the region integrated in the jthintegration stage for arbitrary i and j are noted. As in the encoding apparatus, the 0thpartitioning stage is first performed according to FIG. 8 at the decoding apparatus, after which the final partitioned state can be restored by following the identical procedure as the encoding apparatus. In the method of ii), the amount of data is generally small compared to a method of directly noting the coordinate data.
FIG. 28 is a flowchart showing an operation of this decoding apparatus. Encoded[0128]bit stream11 is first input bybit stream analyzer43 wherein the bit stream is converted to encoded data (S61). Among the encoded data, the region shape information is decoded inregion shape decoder44, and the region partitioned state is restored (S62) for image frames and the like using the above-mentioned method. By restoring the region, the encoded sequence of region information encoded in subsequent bit streams is identified. The regions are designated Sn.
Next, the data of regions is decoded in sequence from the bit stream according to the encoded sequence. First, the attribute information for region S[0129]nis decoded byattribute information decoder45, and the encoding mode information for the region is decoded (S63). If the current mode is inter-mode (inter-frame encoding mode), namely, a mode in which the prediction error signal is encoded (S64),motion parameter48 is decoded in motion information decoder47 (S65).Motion parameter48 is sent tomotion compensation section49 and, based on this,motion compensation section49 calculates a memory address corresponding to the prediction image among reference images stored inexternal memory52, and retrievesprediction image50 from external memory52 (S66). Next, the image data for region Snis decoded in image data decoder46 (S67). In the case of inter-mode, the decoded image data andprediction image50 are added to obtain the final reproduced image for region Sn.
On the other hand, in the case of intra-mode (intra-frame encoding mode), the decoded image data directly becomes the final reproduced[0130]image53 for region Sn. The reproduced image is used as the reference image for subsequent prediction image generation so is written toexternal memory52. This judgment and restoration of the reproduced image are performed in image decoder51 (S68).
The series of processes terminates when it is performed for all regions included in image frames and the like. Similar processes may be also performed for other subsequent image frames and the like.[0131]
While there have been described what are at present considered to be preferred embodiments of the invention, it will be understood that various modifications may be made thereto, and it is intended that the appended claims cover all such modifications as fall within the true spirit and scope of the invention.[0132]