Details of the technology are described below.
The structural compression of this technology for webcasting compresses each of packets rather than a whole file. Generally, compressing image calls for a compression algorithm. The reason why the structural compression causes less image distortion is that it uses a Fourier transformation algorithm. A Fourier transformation disassembles a function into a number of continuous spectrums of each of frequency components and assembles the continuous spectrums back to the original function which may be a numeric integration value from minus infinity to infinity.
Although conceptually 100% transformation efficiency should be achieved when performing integration from minus infinity to infinity, practically the integration of minus infinity up to finite value will result. "Infinite" is not a finite or real value but a hypothetical concept; therefore an error between the finite value and infinity causes distortion of an original image.
Therefore reproducing the original image becomes more difficult as the compression level increases. The further a video file is compressed, the greater the image distortion becomes. This is a fatalistic character of compression processing. Suppose, for instance, the distortion rate is 5%. When compressing an object containing 100,000 bits at 90% compression efficiency, 5,000 bits constitute the distortion information in this case. On the other hand, compressing an object containing 1,000 bits at 90% compression efficiency would result in the distortion information containing 5 bits only.
The cases above indicate that the smaller the compressing object, the fewer the total bits of distortion information become. Therefore, instead of compressing an object containing a total of 100,000 bits at once, compressing each of the object segments yields a reduced distortion level. Figure 1 illustrates the phenomena above. In Figure 1, 10 is the image before compression, 21 and 22 are the compressed images, 31 is compression of whole file and 32 is compression of each packet.
The structural compression works effectively for restoring (decompressing) the data sent by webcasting. In video streaming process, the video data webcasted from the server is decompressed and then played by a media player or shown on a web browser along with html files related to the website. The parallel processing in which multiple videos need to be processed becomes highly complicated.
As illustrated in Figure 2, the parallel processing of a total of 4 tasks from Task A to D progresses sequentially fromStep 1. When Task A is video data processing and it takes a long time to processSteps 1, 5, and 9, other tasks are kept on hold. A waiting time is usually assigned with a specific time-out duration. When the waiting time becomes very long because Task A, B and C are all video data processing, the ongoing computer processing may be aborted due to the time-out. This causes freeze-up of a PC during video data processing.
Generally, it takes a substantial length of time to restore (decompress) a compressed video file and a common PC can not simultaneously process more than 2 or 3 files. However it takes only a short period of time for a PC to decompress a packet and the PC can proceed to the next process without freeze-up, enabling multiple decompressions at the same time.
For example, this technology allows a PC to process multiple movie windows on a single webpage simultaneously. In other words, the structural compression and decompression achieves "lighter" webcasting compared to a file compression and decompression. This "lightness" enables quick webpage downloading and viewing even if the top page of the website has movie segments attached. Although creating a top page with a video file has been avoided due to the "weight" of the file compressed with a conventional file compression technology, the file compressed with the new technology does not add burden to such a page.
In the present process a flag parameter in a digital image decoding is calculated. For a macroblock consisting of MxN blocks, a first operation is performed on M block along a first edge to obtain M first parameters, and a second operation is performed on N blocks along a second edge to obtain N second parameters. The first and second parameters are stored into corresponding locations in a first and a second buffer array. Then a flag parameter corresponding to a given block is calculated according to corresponding values stored in the first and second buffer arrays. Calculation for all of the MxN blocks is performed in the order that neighboring left and upper blocks next to the give block is processed prior to the given block.
This process employs a method for calculating a flag parameter, and more particularly to a method and a system for calculating a flag parameter in a decoding process of a digital image.
The idea of video compression is to omit certain data of an image that are imperceptible to human's eyes, i.e. so-called visual redundancy. The data to be omitted are generally similar to other data in space or time dimension, and thus can be removed according to a compression algorithm.
In brief, an image frame is divided into a plurality of rectangular areas called as macroblocks (MB). The macroblocks are then encoded. First, intra-frame prediction and inter-frame prediction techniques are used to remove the similarities between images so as to obtain the residual differences. Then the residual differences are spatially transformed and quantized to remove the visual redundancy.
An important factor of the high compression efficiency in this process is the utilization of context adaptive binary arithmetic coding (CABAC). For CABAC decoding, the decoder executes a decoding operation for each block to generate a corresponding coded_block_flag parameter. The coded_block_flag parameter indicates whether a given block contains any non-zero residual difference or not. If there is no residual difference, the coded_block_flag parameter is set to "0" for the given block. Otherwise, the coded_block_flag parameter is set to "1" if there exists non-zero residual difference in the given block.
The decoding operation of the decoder is based on a context ID which includes a base portion and an offset portion. The base portion can be obtained by a lookup table, but the offset portion is calculated from the coded_block_flag. Therefore, a key point to calculate the coded_block_flag parameter is to obtain corresponding offset portion. For calculating the offset operation, the coded_block_flag parameters of neighboring left and top blocks next to the target block have to be determined in advance. For example, as illustrated an image frame is divided into a plurality of macroblocks, and each of the macroblocks is further divided into a plurality of blocks. Meaning MxN blocks. For example, M=N=4 shows the macroblock is divided into 16 blocks and each of the blocks includes 4x4 pixels. The offset value of each block is calculated according to coded_block_flag parameters of the neighboring left and top blocks of the current block.
Due to the reason that characteristics of neighboring left or top blocks of each block may be different (for example the size of each, inter or intra macroblock), the present process utilizes a universal algorithm for calculating the offset value of all blocks. The present process provides a simplified method and device for calculating a flag parameter.
The present process provides a method for calculating a flag parameter in a digital image decoding, wherein the method comprises: receiving a macroblock comprising MxN blocks; performing a first operation on M block, along a first edge in a first direction to obtain corresponding M first parameters. Then storing the those first parameters into corresponding locations in a first buffer array; performing a second operation on N blocks along a second edge in a second operation to obtain corresponding N second parameters, and storing the second parameters into corresponding locations in a second buffer array. The present process starts calculating a flag parameter corresponding to a given block according to corresponding values stored in the first and second buffer arrays according to a third operation; storing the flag parameter into location corresponding to a neighboring right block next to the given block in the first buffer array and location corresponding to a neighboring lower block next to the given block in the second buffer array; repeating steps for each of the MxN blocks in the order from blocks along a left most edge, to blocks along a top most edge, blocks along a second left edge, blocks along a second top edge and so on.
The present process also provides a system for calculating a flag parameter in a digital image decoding, wherein the system comprises: a buffer configured with a plurality of arrays for storing flag parameters of a plurality of blocks within a macroblock into corresponding locations; and a computation unit coupled to the buffer and configured to perform a plurality of operations to obtain offset values corresponding to the plurality of blocks based on the flag parameters stored in the buffer.
The present process also provides a method for updating flag parameters in CABAC processing, wherein the method comprises: providing two buffer arrays having locations corresponding to a plurality of blocks respectively; storing with first parameters corresponding to a first plurality of blocks along a left most edge of the macroblock in the first buffer array; storing with second parameters corresponding to a second plurality of blocks along a top most edge of the macroblock in the second buffer array; and updating the first and second buffer arrays with flag parameters corresponding to the plurality of blocks obtained by an operation performed on each of the plurality of blocks within the macroblock in a specific order; wherein the specific order is arranged that neighboring left and upper blocks next to a given block are processed prior to the given block.
The present process, for calculating offset values of the blocks, coded_block_flag parameters of two neighboring left and top blocks have to be determined first. For example, the offset value ofblock 3 is determined according to coded_block_flag parameters ofblock 1 andblock 2; the offset value ofblock 6 is determined according to coded_block_flag parameters ofblock 3 andblock 4; offset value ofBlock 7 is determined according to coded_block_flag parameters ofblock 5 andblock 6; and so on. However, it can be observed that the determination of coded_block_flag parameter forblocks 3, 6, 7, 9, 11, 12, 13, 14 and 15 does not require complex processing and it is easier to obtain necessary information. Thus the present process makes use of the observation to simplify processing of coded_block_flag parameters.
In the present process, when the block processing unit receives a digital image, the digital image may comprise a plurality of macroblocks, and each of the macroblock further comprises MxN blocks
The buffer includes two buffer arrays corresponding to the blocks for storing coded_block_flag parameters of the neighboring left and upper blocks next to a given block, respectively. In the present process, each of the buffer arrays includes 16 units, and each of the unit may store 1 bit of data. That is, a total of 32 bits can be stored. The pair of coded_block_flag parameters corresponding to the given block together forms the offset value of the given block. Then the offset value will be combined with corresponding base value of the given block so as to determine the coded_block_flag parameter.
The computation unit is coupled to the block processing unit and buffer, and is configured to implement various operations. In the present process, the computation unit may perform a first operation, a second operation and a third operation on the current macroblock being processed to obtain relating coded_flag_parameters. The macroblock comprises MxN blocks such as 4x4 blocks. The first operation is performed on blocks along a first edge in a first direction, the second operation is performed on blocks along a second edge in a second direction, and the third operation is performed on every block of the current macroblock in a specific order. The resulting parameters are written and/or updated to the corresponding location in the buffer array.
As explained, for blocks lying within a macroblock, coded_block_flag parameters of the neighboring left and upper blocks need not be obtained by the conventional method and may take advantage of prior determined parameters. But for blocks lying on the left most and top most edges of the macroblock, reference must be made to the macroblocks at left or at top (i.e. macroblock A and/or B) of the current macroblock. As a result, once the coded_block_flag parameters of blocks on the left most and top most edges are determined, calculation of the remaining blocks may be simplified using these determined parameters without repeating same processing for every block. Efficient computation can be thus achieved as well as reduction of time.
In the present process, the first operation, M blocks on the first edge in the first direction are processed to obtain respective first parameters. The first parameters represent coded_block_paramteters of the neighboring left blocks next to the M blocks respectively. These M blocks areblocks 0, 2, 8 and 10 on the left most edge. For each of theblocks 0, 2, 8 and 10, coded_block_flag parameters of the neighboring left block (lying within macroblock A) are calculated. The resulting first parameters are then written to corresponding locations in the buffer respectively. Here the first operation can be the universal algorithm adopted.
Next, the second operation is performed on N blocks on the second edge in the second direction to obtain respective second parameters. The second parameters represent coded_block_parameters of the neighboring upper blocks (lying within macroblock B) next to the N blocks. In the present process the second edge is taken as the top mostedge having blocks 0, 1, 4 and 5. The resulting second parameters are then written to corresponding locations in the buffer. Determination of the second operation or a given block is described as follows.
1. In the case that the neighboring upper macroblock B does not exist, the coded_block_flag parameter of the neighboring upper block next toblocks 0, 1, 4, 5 is set to 0.
2. In the case that macroblock B is an inter-MB, the current macroblock is an intra-MB and is transmitted by data partition, and a constrained intra prediction flag (constrained_intra_pred_flag) corresponding to the given block is set to "1"; the coded_block_flag parameter of the neighboring upper block next toblocks 0, 1, 4, 5 is set to 0.
3. In the case that macroblock B is encoded in intra-frame pulse code modulation (I-PCM), the coded_block_flag parameter of the neighboring upper block next toblocks 0, 1, 4, 5 is set to 1.
4. In the case that macroblock B consists of four blocks B0, B1, B2 and B3, each having 8x8 pixels.
Forblocks 0 and 1, the coded_block_flag parameter of block B2 is used and written in response tobit 2 of a coded block pattern (CBP) of macroblock B is set to 1; otherwise, the coded_block_parameters are set to 0.
Forblocks 4 and 5, the coded_block_flag parameter of block B3 is used and written in response tobit 3 of CBP of macroblock B is set to 1; otherwise, the coded_block_parameters are set to 0.
5. In the case that none of the above conditions is true and macroblock B consists of four Blocks, each having 4x4 pixels.
Forblocks 0 and 1, the coded_block_flag parameters of blocks B10 and B11 are used respectively in response tobit 2 of CBP of macroblock B is set to 1; otherwise, the coded_block_parameters are set to 0.
Forblocks 4 and 5, the coded_block_flag parameters of blocks B14 and B15 are used and written to nB[4] and nB[5] respectively in response tobit 3 of CBP of macroblock B is set to 1; otherwise, the coded_block_flag parameters are set to 0.
In the present process, the second operation is much simplified than the conventional first operation due to the reason that in macroblock-adaptive frame/field (MBAFF) encoding is more flexible about the format of the neighboring left macroblock. In order to cope with the variety, the first operation adopts the conventional method. However, in the present process when it is needed only to meet baseline requirement without using full MBAFF encoding, the first operation can be simplified the same as the second operation so as to reach fast computation.
After the first and second operations are performed, coded_block_flag parameters that would be referenced by blocks on the first and second edges are obtained and stored in corresponding position within the buffer. The third operation can proceed on each of the blocks by use of the stored parameters. The order of the third operation is arranged so that a given block will not be processed unless the neighboring left and upper blocks next to the given block have been processed and corresponding coded_block_flag parameters have been written or updated in the buffer arrays. Then the offset value is calculated for the given block x. Thus, the context ID of the given block can be determined as context ID(x)=base+offset(x).
In the present process the CABAC decoder may decode the context ID and thus generate the corresponding coded_block_flag parameter of the given block. Furthermore, the coded_block_flag parameter of the given block is written or updated into corresponding locations of the buffer arrays for the use of other blocks that need reference to the value, which are blocks at the right of and lower to the given block. In such way, computation can be largely reduced and thus performance is improved without repeating complex processing for every block.
Following the first and second operations, coded_block_flag parameters of neighboring left or upper blocks next toblocks 0, 1, 2, 4, 5, 8, 10 are determined and stored in the buffer. Starting fromblock 0, a pair of coded_block_flag parameters corresponding to the upper-leftblock 0 are obtained, the computation unit further performs the third operation to obtain the offset value and further processes the offset value into a coded_block_flag parameter ofblock 0. The procedure is executed as follows:
1. Set the offset value ofblock 0
2. Calculate context ID ofblock 0=base value+offset value ofblock 0
3. The CABAC decoder decodes context ID ofblock 0 into the coded_block_flag parameter ofBlock 0.
After the coded_block_flag ofblock 0 is decoded, it is stored in a corresponding location forblock 1 and forblock 2. Thus block 0 is the neighboring left block next to block 1 and the neighboring upper block next to block 2. Bothblocks 1 and 2 will need reference to the coded_block_flag parameter ofblock 0, thus the value is stored for later use.
The coded_block_flag parameter ofblock 1 can be obtained according toblock 0 stored, and the value determined in the second operation:
offset value ofblock 1 and
context ID ofblock 1=base value+offset value ofblock 0.
Then the CABAC decoder decodes context ID ofblock 1 into coded_block_flag parameter ofblock 1 and stores the value. Thus block 1 is the neighboring left block to block 4 and the neighboring upper block to block 3.
Similarly, the coded_block_flag parameter ofblock 2 can be obtained according to the value stored during the first operation ofblock 0.
the offset value ofblock 2 and
context ID ofblock 2=base value+offset value ofblock 2.
The CABAC decoder decodes context ID ofBlock 2 into coded_block_flag parameter ofblock 2 and stores the value. Thus block 2 is the neighboring left block to block 3 and the neighboring upper block to block 8.
Accordingly, after the coded_block_flag parameters of blocks on the left most and top most edges in a current macroblock are calculated through the first and second operations, coded_block_flag parameters of all the blocks in the current macroblock can be obtained based on coded_block_flag parameters of neighboring left and upper blocks. Since the second and third operations are simplified, the efficiency for calculating the coded_block_flag parameters are improved in this process
The present invention is explained in detail in the following by referring to Embodiments, which are not to be construed as limitative. In the present invention, a packet may be a digital data corresponding to a macroblock that is a rectangular area obtained by dividing the image frame.
Embodiment (1): Usage of said video compression in the CAVLC/CABAC mode is described.
Definitions:
CABAC: Context-adaptive binary arithmetic coding (CABAC) is a form of entropy coding used in the present process.
CAVLC: Context-adaptive variable-length coding (CAVLC) is a form of entropy coding used in the present process.
Figure 3 is a diagram showing the placement of the CABAC and CAVLC components. In the present, entropy coding via CABAC or CAVLC is done as a last step, after ME and quantization have taken place. In a lossless process that "writes down to the bitstream" the already-computed data (transformed & quantized coefficients, macroblock types, motion vectors, plus additional side-infos). In this process we decode the CAVLC/CABAC stream and reencode it to the other format, while signalling in the slice headers that we've switched entropy coder type. No data is lost in this process, since detail removal (which comes from quantization) need not be performed again.
Embodiment (2): Usage of said video compression in the Multi-References mode is described.
In the present process, in the Main Profile code listing, Multi-References are associated with data header files, and help limit the data blocks, and macroblock transactions from scene change to scene change in a given file encode. This step in the process helps by using one reference frame as a macroblock data flag file, and thus decreasing the over all file size, and end encoded output.
The header files will turn out relevant path files, as shown in the example in Figure 4.
The Header provision step on those divided blocks, include the data points shown in Figure 5. The compression step for the block with header provision completed, can be outlined in the recovery point of the information structure, and subsequent data fields.
Embodiment (3): Usage of said video compression in the Intra Mode macroblock types (16x16, 8x8, and 4x4 with prediction modes is described.
In the present process a block or macroblock is encoded in intra mode, thus a prediction block is formed based on previously encoded and reconstructed (but un-filtered) blocks. This prediction block P is subtracted from the current block prior to encoding. For the luminance (luma) samples, P may be formed for each 4x4 subblock or for a 16x16 macroblock. There are a total of 9 optional prediction modes for each 4x4 luma block; 4 optional modes for a 16x16 luma block; and one mode that is always applied to each 4x4 chroma block.
Figure 6 shows a luminance macroblock in a QCIF frame and a 4x4 luma block that is required to be predicted. The samples above and to the left have previously been encoded and reconstructed and are therefore available in the encoder and decoder to form a prediction reference. The prediction block P is calculated based on the samples labeled A-M in Figure 7, as follows. Note that in some cases, not all of the samples A-M are available within the current slice: in order to preserve independent decoding of slices, only samples within the current slice are available for prediction. DC prediction (mode 0) is modified depending on which samples A-M are available; the other modes (1-8) may only be used if all of the required prediction samples are available (except that, if E, F, G and H are not available, their value is copied from sample D). The arrows in Figure 8 indicate the direction of prediction in each mode. For modes 3-8, the predicted samples are formed from a weighted average of the prediction samples A-Q. The encoder may select the prediction mode for each block that minimizes the residual between P and the block to be encoded.
The 9 prediction modes (0-8) are calculated for the 4x4 block shown in Figure 6. Figure 9 shows the prediction block P created by each of the predictions. The Sum of Absolute Errors (SAE) for each prediction indicates the magnitude of the prediction error. In this case, the best match to the actual current block is given by mode 7 (vertical-right) because this mode gives the smallest SAE; a visual comparison shows that the P block appears quite similar to the original 4x4 block.
16x16 Luma prediction modes
As an alternative to the 4x4 luma modes described above, the entire 16x16 luma component of a macroblock may be predicted. Four modes are available, shown in diagram form in Figure 10:
Mode 0 (vertical): extrapolation from upper samples (H).
Mode 1 (horizontal): extrapolation from left samples (V).
Mode 2 (DC): mean of upper and left-hand samples (H+V).
Mode 4 (Plane): a linear "plane" function is fitted to the upper and left-hand samples H and V. This works well in areas of smoothly-varying luminance.
8x8 Chroma prediction mode
Each 8x8 chroma component of a macroblock is predicted from chroma samples above and/or to the left that have previously been encoded and reconstructed. The 4 prediction modes are very similar to the 16x16 luma prediction modes described insection 3 and illustrated in Figure 10, except that the order of mode numbers is different: DC (mode 0), horizontal (mode 1), vertical (mode 2) and plane (mode 3). The same prediction mode is always applied to both chroma blocks. If any of the 8x8 blocks in the luma component are coded in Intra mode, both chroma blocks are Intra coded.
Encoding intra prediction modes, Data Flag Points
The choice of intra prediction mode for each 4x4 block must be signalled to the decoder and this could potentially require a large number of bits. However, intra modes for neighbouring 4x4 blocks are highly correlated. For example, if previously-encoded 4x4 blocks A and B in Figure 11 were predicted usingmode 2, it is likely that the best mode for block C (current block) is alsomode 2. For each current block C, the encoder and decoder calculate the most_probable_mode. If A and B are both coded in 4x4 intra mode and are both within the current slice, most_probable_mode is the minimum of the prediction modes of A and B; otherwise most_probable_mode is set to 2 (DC prediction).
The encoder sends a flag for each 4x4 block, use_most_probable_mode. If the flag is "1", the parameter most_probable_mode is used. If the flag is "0", another parameter remaining_mode_selector is sent to indicate a change of mode. If remaining_mode_selector is smaller than the current most_probable_mode then the prediction mode is set to remaining_mode_selector; otherwise the prediction mode is set to remaining_mode_selector+1. In this way, only 8 values of remaining_mode_selector are required (0 to 7) to signal the current intra mode (0 to 8).
Embodiment (3) Continued: Prediction steps
Figure 12 is a diagram showing the present process. In the process, we make use of a "hybrid" video encoder model in which a prediction (formed from previously-transmitted frames) is subtracted from the current frame to form a residual (e.g. motion-compensated difference frame) which is then transformed (using a block transform such as the DCT), quantized and coded for transmission. In parallel, the coded frame is reconstructed and stored for future predictions.
The data division step includes the directed encoding of the video source. It can be divided into subsets, for the output to the header provisions, below.
Embodiment (3): Continued: Data Division Steps
Figure 13 is a sample diagram of the data division steps. The below is a sample code of the data division steps.
/*! Invisible class because of truncation */
class Invisible { };
/*! Truncated class, inheritance relation is hidden */
class Truncated : public Invisible { };
/* Class not documented with doxygen comments */
class Undocumented { };
/*! Class that is inherited using public inheritance */
class PublicBase : public Truncated { };
/*! A template class */
template<class T> class Templ { };
/*! Class that is inherited using protected inheritance */
class ProtectedBase { };
/*! Class that is inherited using private inheritance */
class PrivateBase { };
/*! Class that is used by the Inherited class */
class Used { };
/*! Super class that inherits a number of other classes */
class Inherited : public PublicBase,
protected ProtectedBase,
private PrivateBase,
public Undocumented,
public Templ<int>
{
private:
Used *m_usedClass;
};
Embodiment (3) Continued: Header Provision Steps
Figure 5 is an example of the header provisioning steps in the diagram.
Embodiment (4): Weighted prediction support module
In the weighted prediction support module, we utilize the following header structure in compression, and this can be programmed to express any other call functions after the compression as needed. This is used in the above sampling examples.
Embodiment (5): Usage of compression in a mobile hardware device
In utilizing video compression in a hardware device, such as is embodied in this example, a cell phone, or mobile device, can act as a receiver or producer of video, via the built in hardware camera, or built in modem.
In Figure 14, the video compression outlined in this application, proposes to add, in a module form, the video compression of a mobile device.
The video compression module is represented in this diagram as "A".
In this example. The code interacts with the call functions of the hardware device, in the video processing module. Thus, the Module "A" will take its input and instructions from the data stream assigned by the embedded instructions. Thus, if a set parameter of frames per second, bitrate and output resolution are fixed in the framing code, then the Module "A" compression will act according to those instructions.
This produces a clear video stream, and acts as the gatekeeper to the video in this mobile device.
The data stream that is input in "A" is representative of a video stream, consisting of the raw video, (incoming) to be encoded. This encode sequence in this instance can have a dynamic structure.
The steps in "A" in this example, are plural in nature, and are shown below:
In the following sequence, DCT (Discreet Cosine Transform) is used as a transformation technique to compress original input images. Pixels are broken up into small blocks, and then the compression process is applied to each block.
DCT thus transforms the method of expressing an image from "pixel blocks to frequency components." preserving al l information contained in the original input.
A motion vector is used for a block to be coded, and prediction images are searched for the most similar block, and the motion between these blocks is represented by a motion vector and prediction error information. Prediction images are produced by using motion vectors and then each prediction error is computed.
In the present process, intra-prediction based on similarities among adjacent pixels is used to compress an image. More specifically, from an original input image, the pixel value in an 8x8 block to be coded is predicted and produced by using pre-coded adjacent pixel values.
A prediction mode is set for luminance and chroma modes. An appropriate mode is optimized. The present process intra-compresses and records this intra-prediction mode information and the DCT of the residual image together. This accurate intra-prediction can reduce the amount of data of theresidual image, and thus can achieve high efficiency compression.
In addition, because the intra-prediction process predicts within the limits of a single frame, it has an advantage over inter-frame prediction in preventing the deterioration of prediction accuracy even for volatile movement.
In the individual step, the data is handled in the following manner:
Individual steps in compression processes
To compress the image, the codec will usually remove "spatial" and "temporal" redundancies.
To remove spatial redundancy, the process will:
1. use intra-picture prediction of pixel values to build a "residual" or "difference" matrix.
2. transform the residuals, using a Discrete Cosine Transform (DCT), or equivalent, which does not compress (and may be done losslessly with adequate precision), but sets the stage for the remaining steps.
3. quantize the transformed residuals, by breaking the value range into a set of categories and transmitting only category numbers,
4. select the order in which values are transmitted (e.g., zig-zag traversal) so as to maximize "runs" of identical category numbers.
5. run-length encode the resulting series of values (that is, send the number of identical values in a "run" along with that value, instead of sending every value separately), and, finally.
6. variable-length (entropy) encode the resulting stream.
This activity is (usually) performed on "blocks" of 8x8 pixels, and such blocks are sometimes aggregated into "macroblocks" of 4 blocks and/or "slices" for other purposes.
A spatial redundancy elimination example using DCT
Given a matrix f(x,y) of dimension N by N, the Discrete Cosine Transform will generate another N by N matrix, F(u,v), as follows:
where
a = ( ( 2x + 1 ) * u * pi ) / 2N
b = ( ( 2y + 1 ) * v * pi ) / 2N
C(u) = 1 / sqrt( 2 ) if u = 0 else C(u) = 1
C(v) = 1 / sqrt( 2 ) if v = 0 else C(v) = 1
Here is a history of the changes experienced by an example block of luminance values in the process:
This block can be transformed via DCT into the next block:
which has concentrated information from the entire matrix into the upper left-hand corner.
In particular, the upper-left-most DCT value represents an average block value (actually 8 times the mean value).
It is called the "DC coefficient"; the other values are "AC coefficients". The DC coefficient must be sent with higher precision, so it is not included in the stream of values to be quantized, run-length encoded, and then variable-length encoded. Instead, the DC coefficients from all blocks may be combined and treated as a separate block to which the DCT transformation and quantizing and encoding steps are applied prior to transmission.
Quantizing and traversing the DCT block
The transformed example block can be quantized into this block using quantization tables. Note that these Tables treat each location within the block separately, so as to improve subsequent encoding efficiency:
Using a zig-zag pattern starting from the upper-left corner (and going right and then down-left) will usually give the longest run(s) of terminal zeros. The entropies of these 3 blocks are 5.04, 1.42, and .30 (so that these matrices could be encoded losslessly in 6, 2, and 1 bits per pixel or less). The example can be encoded at .69 bits per pixel (rather than the original 8 bits per pixel) using a modified Huffman scheme that results in a compression ratio of 11.6 to 1.
In this process:
- subsampling at 4:2:0 would reduce the original pixel matrices to one full matrix plus two partial (chroma) matrices of one-half their original dimensions, which is one-fourth of their original sizes, or 50% of the original data,
- predicting values and encoding only differences would keep 50% of that result,
- transformation and quantizing would retain around 9% of that result,
leaving only about 1.5% of the original 109 Mbps....around 1.6 Mbps. This represents a cumulative compression ratio of somewhat more than 40:1.
Eliminating temporal redundancy
In addition to this intra-picture encoding, inter-picture prediction can be used to remove "temporal redundancy". A frame in this instance is a matrix of 8x8 pixel blocks, encoded using the DCT process. However, it often occurs that a block in a frame being encoded is quite similar to a block in another frame, either before the current frame or after it. As an example, consider the following sequence of 3 frames. If you want to encode the center, or "Current" frame, you can find all of the visual content in either the "Previous" frame or the "Next" frame. The front portion of the ball can be found in the "Previous" frame, and the back portion can be found in the subsequent ("Next") frame.
In the present process, we:
- pointing directly to a portion of a previous or subsequent frame while encoding a specific frame, and
- predicting a block by using a similar block in another frame.
To implement this approach there exist several frame types:
- I (Index) frames are built only from INTRA-frame data.
- P (Predicted) frames can contain pointers to blocks within the most recent previous I or P frames.
- B (Bidirectional) frames can point to blocks within the most recent previous I or P frames or the closest subsequent I or P frames.
Thus a Group of Pictures (GOP) will be an I frame, followed by several sets composed of....one P frame followed by several B frames.
For example, frames are thought of as a matrix of 16x16 pixel blocks ("macroblocks"), each of which contains the DCT-encoded data for 4 8x8 blocks (along with the smaller chroma blocks) that will be used to create such an image, pointers to similar blocks elsewhere in the data stream, DCT-encoded data using similar blocks elsewhere in the data stream, or some combination of these.
How similar blocks are identified
The process of searching frames for similar portions is described below. The first problem is "How can you tell how similar two visual regions are?" The approach is to develop a formula that compares pixel values at the same relative positions within each block. The approaches are to compute the:
- "mean square error" (MSE) or
- "minimum absolute difference" (MAD)
Mean Square Error is calculated by calculating the mean value of...the list of values generated by squaring the difference between corresponding values in each block. The two most similar regions should have the smallest MSE or MAD. The next problem is "Which blocks do you compare?" The idea is to start with the block in the same relative location as the block being encoded, and then try examine block-size regions within some number of pixels from the center of that block, say 75, plus or minus and up or down.
Figure 15 is a diagram showing how the encoder searches for a segment of a subsequent ("Next") frame that is similar to a block to be encoded within the "Current" frame, outlined in heavy-line (after Li).
The search begins with the block in the same position as the reference block, as described above, and proceeds in this example by moving the comparison region (shown as a square) to the right some number of pixels at each step. Here, the step size is several pixels. Since pixels are discrete, this requires interpolation of the pixel values between two known pixels.
Here is a single data point presenting a clear-cut comparison in the present process:
Thus the individual steps as shown above contribute to the end compression results in "A".
Embodiment (6): Encoding setting examples, and decompression
The following settings are examples of different encoding option combinations that affect the speed vs quality tradeoff at the same target bitrate. All the encoding settings were tested on a 720x448 @30000/1001 fps video sample, the target bitrate was 900kbps, and the machine was an AMD-64 3400+ at 2400 MHz in 64 bits mode. Each encoding setting features the measured encoding speed (in frames per second) and the PSNR loss (in dB) compared to the "very high quality" setting.
In decompression, the video decoder receives the compressed bitstream, decodes each of the syntax elements and extracts the information described quantized transform coefficients, prediction information, etc. This information is then used to reverse the coding process and recreate a sequence of video images. The header information in this sequence would be the main item needed to extract the compression at the destinations (see Figure 16).
Data Structures
struct RTPpacket_t
Defines
#define MAXRTPPAYLOADLEN (65536 - 40)
#define MAXRTPPACKETSIZE (65536 - 28)
#define H264PAYLOADTYPE 105
#define H264SSRC 0x12345678
#define RTP_TR_TIMESTAMP_MULT 1000
Functions
void RTPUpdateTimestamp (int tr)
void OpenRTPFile (char *Filename)
void CloseRTPFile (void)
int WriteRTPNALU (NALU_t *n)
Embodiment (6): Usage of scene cut technology
To efficiently access a large quantity of videos, scene cut detection to divide a video into units easy to handle is necessary as base technology comprehensively judges scene change by putting together change on image information and change on audio signal No data for machine learning is required This is utilized to improve efficiency of basic functionality and performance of a video access system such as cueing video by clicking a thumbnail image.
The first step for video-content analysis, content-based video browsing and retrieval is the partitioning of a video sequence into shots. A shot is the fundamental unit of a video, it captures a continuous action from a single camera and represents a spatio-temporally coherent sequence of frames. Thus, shots are considered as the primitives for higher level content analysis, indexing and classification.
Figure 17 is a diagram that represents the Learning-based Approach for video cut detection. Feature vectors Ft ,Zt , . . .Ct represent Fourier Mellin measurement
moments, Zernike moments, Color histogram, from frame ft . . dt = D( ft , ft+1) is the similarity distance for each feature where D is one of the similarity measures.
Embodiment (7): Usage of Adaptive B-frame placement. B-frames as references / arbitrary frame order
B-frames are bi-directional predicted frames. As shown in Figure 18, this diagram means that when producing B-frames, the method used in our encoder can look both forwards and backwards for redundant picture information. This makes B-frames the most efficient frame review process to use.
In the present process, the combined skip and inter prediction method in the encoder, when B pictures is analyzed. The rate distortion costs of coding or skipping a macroblock are estimated prior to processing. A decision whether to code the macroblock or stop further processing is made based on a Lagrangian cost function. Also, selective intra fast mode decision in encoders is proposed by reducing the number of candidate modes using the directional information. Coding time is reduced by 35-42 % through early identification of macroblocks that are likely to be skipped during the coding process and through reducing the number of candidate modes. There is not significant loss of rate-distortion performance. Thus, in the present method coding time is substantially reduced because a significant number of macroblocks are not processed by the encoder. The computational saving depends on the activity of video sequences.
Embodiment (8): Usage of 8x8 and 4x4 adaptive spatial transform
In the present process, an exact-match integer 4x4 spatial block transform is used, allowing precise placement of residual signals without the feedback often found with prior codec designs. This is conceptually similar to the discrete cosine transform design, but simplified and made to provide exactly-specified decoding. Also in the present process, an exact-match integer 8x8 spatial block transform can also be used, allowing highly correlated regions to be compressed more efficiently than just with the 4x4 transform alone. The block and diagram in Figure 19 show the relevant placement of parameters in this sequence.
Figure 20 represents the low-pass filtering of the predictor to improve prediction performance, and Table 3 represents the 8 x 8 Integer transform prediction performance.
Embodiment (9): Usage of Custom quantization matrices
In the present process, transformation coefficients are quantized using a control parameter, using 8-bit/source sample: 52 quantization steps. For greater bit depth, 6 more quant. steps for each additional bit are used.
Quantization step-sizes are not linearly related to the quantization parameter in this scenario, and perceptual-based quantization scaling matrices are separate for each block size. We also utilize separate quantization scaling matrices for intra and inter prediction. The default values of matrices are specified in the code.
Figure 21 is a diagram showing the placement of the quantization matrices components.
Functions
void store_adaptive_rounding_4x4 (ImageParameters *img, int ****ARCofAdj, int mode, int block_y, int block_x)
void update_adaptive_rounding_4x4 (ImageParameters *img, int ****ARCofAdj, int mode, int block_y, int block_x)
void store_adaptive_rounding_16x16 (ImageParameters *img, int ****ARCofAdj, int mode)
void update_adaptive_rounding_16x16 (ImageParameters *img, int ****ARCofAdj, int mode)
void update_offset_params (Macroblock *currMB, int mode, int luma_transform_size_8x8_flag)
void reset_adaptive_rounding ()
void reset_adaptive_rounding_direct ()
void update_adaptive_rounding_8x8 (RD_8x8DATA *dataTr, int ****ARCofAdj)
Variables
static const int AdaptRndCrPos [2][5]
static const int AdaptRndPos [4][5]
int ** fadjust8x8 = NULL
int ** fadjust4x4 = NULL
int *** fadjust4x4Cr = NULL
int *** fadjust8x8Cr = NULL
Embodiment (10): Usage of Lossless Mode
In the present process, entropy-coded transform-bypass is utilized, to enhance the lossless representation of specific regions. In the present process we utilize the specific characteristics of images such as the common phenomenon of contiguous 2-D areas of similar tones. Every pixel but the first is replaced by the difference to its left neighbor. This leads to small values having a much higher probability than large values.
We utilize a form of the Huffyuv. This process is similar to that of lossless JPEG-LS modules in that it predicts each sample and then encodes the error.
Huffuv coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code (sometimes called "prefix-free codes") (that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common characters using shorter strings of bits than are used for less common source symbols. No other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code.
Figure 22 is a diagram and code block, representing the lossless mode utilized in the present system.
#include <math.h>
#include <limits.h>
#include <float.h>
#include "global.h"
#include "rdopt_coding_state.h"
#include "mb_access.h"
#include "intrarefresh.h"
#include "image.h"
#include "transform8x8.h"
#include "ratectl.h"
#include "mode_decision.h"
#include "fmo.h"
#include "me_umhex.h"
#include "me_umhexsmp.h"
#include "macroblock.h"
#include "conformance.h"
#include "vlc.h"
#include "rdopt.h"
Embodiment (11): Usage of Interlacing
Interlaced scan refers to one of two common methods for "painting" a video image on an electronic display screen (the second is progressive scan) by scanning or displaying each line or row of pixels. This technique uses two fields to create a frame. One field contains all the odd lines in the image, the other contains all the even lines of the image. A PAL based television display, for example, scans 50 fields every second (25 odd and 25 even). The two sets of 25 fields work together to create a full frame every 1/25th of a second, resulting in a display of 25 frames per second.
In the present process, interlacing is utilized in the multiple ref frames and in the weighted prediction support. Figure 23 is a diagram, and code block referencing this process.
Functions
void InitWP (ImageParameters *img, InputParameters *params)
void EstimateWPBSliceAlg0 (ImageParameters *img, InputParameters *params)
void EstimateWPPSliceAlg0 (ImageParameters *img, InputParameters *params, int offset)
int TestWPPSliceAlg0 (ImageParameters *img, InputParameters *params, int offset)
int TestWPBSliceAlg0 (ImageParameters *img, InputParameters *params, int method)
double ComputeImgSum (imgpel **CurrentImage, int height, int width)
Variables
void(* EstimateWPBSlice )(ImageParameters *img, InputParameters *params)
void(* EstimateWPPSlice )(ImageParameters *img, InputParameters *params, int offset)
int(* TestWPPSlice )(ImageParameters *img, InputParameters *params, int offset)
int(* TestWPBSlice )(ImageParameters *img, InputParameters *params, int method)