Movatterモバイル変換


[0]ホーム

URL:


HK1215835A1 - Decoding of inter-layer reference picture set and reference picture list construction - Google Patents

Decoding of inter-layer reference picture set and reference picture list construction
Download PDF

Info

Publication number
HK1215835A1
HK1215835A1HK16103766.8AHK16103766AHK1215835A1HK 1215835 A1HK1215835 A1HK 1215835A1HK 16103766 AHK16103766 AHK 16103766AHK 1215835 A1HK1215835 A1HK 1215835A1
Authority
HK
Hong Kong
Prior art keywords
layer
inter
picture
maximum number
reference picture
Prior art date
Application number
HK16103766.8A
Other languages
Chinese (zh)
Inventor
.德希潘德 薩琴.
萨琴.G.德希潘德
Original Assignee
夏普株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/857,990external-prioritypatent/US9532067B2/en
Application filed by 夏普株式会社filedCritical夏普株式会社
Publication of HK1215835A1publicationCriticalpatent/HK1215835A1/en

Links

Classifications

Landscapes

Abstract

A method for video coding is described. Signaling of a maximum number of sub-layers for inter-layer prediction is obtained. A sub-layer non-reference picture is also obtained. It is determined whether a value of a temporal identifier of the sub-layer non-reference picture is greater than the maximum number of sub-layers for inter-layer prediction minus 1. The sub-layer non-reference picture is marked as "unused for reference" if the value of the temporal identifier of the sub-layer non-reference picture is greater than the maximum number of sub-layers for inter-layer prediction minus 1. In some cases a sub-layer non-reference picture is also obtained. It is determined whether a value of a temporal identifier of the sub-layer non-reference picture is greater than the maximum number of sub-layers for inter-layer prediction. The sub-layer non-reference picture is marked as "unused for reference" if the value of the temporal identifier of the sub-layer non-reference picture is greater than the maximum number of sub-layers for inter-layer prediction.

Description

Decoding of inter-layer reference picture set and reference picture list construction
Technical Field
The present disclosure relates generally to electronic devices. More particularly, the present disclosure relates to a decoding system and method of an inter-layer reference picture set and reference picture list construction.
Background
Electronic devices are becoming smaller and more powerful to meet consumer needs and to provide portability and convenience. Consumers become dependent on electronic devices and demand more functionality. Some examples of electronic devices include: desktop computers, laptop computers, cellular phones, smart phones, media players, integrated circuits, and the like.
Some electronic devices are used to process and display digital media. For example, portable electronic devices today allow digital media to be consumed at almost any location that a consumer can go. In addition, some electronic devices may provide for the downloading or streaming of digital media content for consumer use and entertainment.
The wide popularity of digital media presents a number of problems. For example, efficiently representing high quality digital media for storage, transmission, and playback presents a number of challenges. It can be seen from this discussion that systems and methods for representing digital media more efficiently can be beneficial.
Disclosure of Invention
The above and other objects, features, and advantages of the present invention will be more readily understood upon consideration of the following detailed description of the invention taken in conjunction with the accompanying drawings.
One embodiment of the present invention discloses a video encoding method, including: signaling (signaling) to obtain a maximum number of inter-layer prediction sub-layers; starting a decoding process for an inter-layer Reference Picture Set (RPS); obtaining a value of a temporal identifier of an inter-layer image; determining whether a value of a temporal identifier is greater than a maximum number of inter-layer prediction sub-layers minus 1; and adding the inter-layer picture to an inter-layer Reference Picture Set (RPS) if a value of a temporal identifier of the inter-layer picture is not greater than the maximum number of inter-layer prediction sub-layers minus 1.
Another embodiment of the present invention discloses an electronic device configured for video encoding, comprising: a processor; a memory in electrical communication with the processor, the instructions stored in the memory being executable to: obtaining signaling of a maximum number of inter-layer prediction sub-layers; starting a decoding process for an inter-layer Reference Picture Set (RPS); obtaining a value of a temporal identifier of an inter-layer image; determining whether a value of a temporal identifier is greater than a maximum number of inter-layer prediction sub-layers minus 1; and adding the inter-layer picture to an inter-layer Reference Picture Set (RPS) if a value of a temporal identifier of the inter-layer picture is not greater than the maximum number of inter-layer prediction sub-layers minus 1.
Drawings
Fig. 1 is a block diagram illustrating video encoding between multiple electronic devices.
FIG. 2 is a block diagram of an image tagging module used in the systems and methods of the present invention.
Fig. 3 is a flow chart of a method for marking sub-layer non-reference pictures.
FIG. 4 is a block diagram illustrating additional images labeled "unused for reference" using the systems and methods of the present invention.
Fig. 5 is a block diagram illustrating an inter-layer Reference Picture Set (RPS) update module.
Fig. 6 is a flow chart of a method for updating an inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer).
FIG. 7 is a block diagram illustrating one configuration of an encoder on an electronic device.
Fig. 8 is a block diagram illustrating one configuration of a decoder on an electronic device.
FIG. 9 illustrates various components that may be used in a transmitting electronic device.
FIG. 10 is a block diagram illustrating various components that may be used in a receiving electronic device.
Detailed Description
A video encoding method is disclosed. Signaling of the maximum number of inter-layer prediction sub-layers is obtained. Decoding processing for an inter-layer Reference Picture Set (RPS) is started. A value of a temporal identifier of the inter-layer image is obtained. It is determined whether the value of the temporal identifier is greater than the maximum number of inter-layer prediction sub-layers minus 1. Adding the inter-layer picture to an inter-layer Reference Picture Set (RPS) if a value of a temporal identifier of the inter-layer picture of the temporal identifier of the inter-layer picture is not greater than a maximum number of inter-layer prediction sub-layers minus 1.
An inter-layer Reference Picture Set (RPS) may be used for reference picture list construction. The inter-layer image may include a value of a layer identifier corresponding to a layer that is a direct reference layer of the current layer. The inter-layer picture may include a picture number equal to that of the current picture. If the maximum number of inter-layer prediction sublayers has a value of 0 and an inter-layer picture is a non-RAP (random access point) picture, the inter-layer picture may not be added to an inter-layer Reference Picture Set (RPS). If the maximum number of inter-layer prediction sublayers has a value of 0 and an inter-layer picture is a RAP (random access point) picture, the inter-layer picture may be added to an inter-layer Reference Picture Set (RPS).
The maximum number of inter-layer prediction sublayers may have an index [ LayderIDInVps [ RefLayerId [ LayerIdInVps [ nuh _ layer _ id ] ] [ i ] ] ]. Alternatively, the maximum number of inter-layer prediction sublayers may have an index [ RefLayerId [ LayerIdInVps [ nuh _ layer _ id ] ] [ i ] ]. The maximum number of inter-layer prediction sublayers may also have an index [ layer _ id _ in _ nuh [ RefLayerId [ LayerIdInVps [ nuh _ layer _ id ] ] [ i ] ] ]. The maximum number of inter-layer prediction sublayers may also have an index [ layer _ id _ in _ nuh [ i ] ].
The maximum number of inter-layer prediction sub-layers may also have an index i. The inter-layer reference picture list refpicsetinter layer may have an index [ numinter layer rpspics [ layer idlnvps [ nuh layer id ] ] ]. NumInterLayerRPSpics [ LayerIdInVps [ nuh _ layer _ id ] ] can be derived to be different from NumDirectRefLayers [ LayerIdInVps [ nuh _ layer _ id ] ].
An electronic device configured for video decoding is also disclosed. The electronic device includes a processor and a memory in electrical communication with the processor. Instructions stored in the memory may be executed to obtain a maximum number of signaling for inter-layer prediction sub-layers. The instructions stored in the memory may also be executable to begin a decoding process for an inter-layer Reference Picture Set (RPS). The instructions stored in the memory may also be executable to obtain a value of a temporal identifier of the inter-layer image. The instructions stored in the memory may also be executable to determine whether a value of the temporal identifier is greater than a maximum number of inter-layer prediction sub-layers minus 1. The instructions stored in the memory may also be executable to add the inter-layer picture to an inter-layer Reference Picture Set (RPS) if a value of a temporal identifier of the inter-layer picture is not greater than a maximum number of inter-layer prediction sub-layers minus 1.
Various configurations are now described with reference to the figures, where like reference numbers may indicate functionally similar elements. The systems and methods generally described and illustrated in the figures herein can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the various configurations illustrated in the accompanying figures is not intended to limit the scope of the claims, but is merely illustrative of the systems and methods.
FIG. 1 is a block diagram illustrating video encoding among multiple electronic devices 102 a-b. A first electronic device 102a and a second electronic device 102b are shown. It should be noted, however, that in some configurations, one or more features and functions described in connection with the first electronic device 102a and the second electronic device 102b may be combined into a single electronic device 102. Each electronic device 102 may be configured to encode and/or decode video. In one configuration, each electronic device may conform to the High Efficiency Video Coding (HEVC) standard. HEVC is the successor video compression standard to h.264/MPEG-4AVC (advanced video coding), which provides better video quality and higher data compression rates. The HEVC standard compliant electronic device 102 may include additional picture marking functionality, inter-layer Reference Picture Set (RPS)120 updating functionality, and reference picture list construction functionality. Herein, an image is an array of luma (luma) samples in a monochrome format or an array of luma samples in 4: 2: 0, 4: 2, and 4: 4 color formats with an array of two corresponding chroma samples.
The first electronic device 102a can include a video encoder 182, the video encoder 182 including an enhancement layer encoder 106 and a base layer encoder 109. The enhancement layer encoder 106 and the base layer encoder 109 will be discussed in detail below in conjunction with fig. 7. Each of the elements included in the first electronic device 102a (i.e., the enhancement layer encoder 106 and the base layer encoder 109) may be implemented as hardware, software, or a combination of both. The first electronic device 102a may obtain an input image 104. In some configurations, the input image 104 may be captured on the first electronic device 102a using an image sensor, or retrieved from memory, or received from another electronic device 102. In one configuration, video encoder 182 may conform to the scalable high efficiency video Standard (SHVC) or the multi-view high efficiency video coding standard (MV-HEVC).
The enhancement layer encoder 106 may encode the input image 104 to produce encoded data. For example, the enhancement layer encoder 106 may encode a sequence of input images 104 (e.g., video). In one configuration, the enhancement layer encoder 106 may be a High Efficiency Video Coding (HEVC) encoder. In another configuration, the enhancement layer encoder 106 may be a scalable high efficiency video Standard (SHVC) encoder or a multi-view high efficiency video coding standard (MV-HEVC) encoder. The encoded data may be included in the encoded enhancement layer video bitstream 110. The enhancement layer encoder 106 may generate overhead (overhead) signaling based on the input image 104.
The base layer encoder 109 may also encode the input image 104. In one configuration, the base layer encoder 109 may use the same input image 104 as used by the enhancement layer encoder 106. In another configuration, the base layer encoder 109 may use a different (but similar) input image than the input image 104 used by the enhancement layer encoder 106. For example, for signal-to-noise ratio (SNR) scalability (also referred to as quality scalability), the enhancement layer encoder 106 and the base layer encoder 109 may use the same input image 104. As another example, for spatial scalability, the base layer encoder 109 may use the following adopted image. In yet another example, the base layer encoder 109 may use different view images for multi-view scalability. The base layer encoder 109 can generate encoded data for inclusion in the encoded base layer video bitstream 107. The base layer encoder 109 may also be a scalable high efficiency video (SHVC) encoder or a multi-view high efficiency video coding (MV-HEVC) encoder.
The encoded enhancement layer video bitstream 110 and the encoded base layer video bitstream 107 may each include encoded data based on the input image 104. In one example, the encoded enhancement layer video bitstream 110 and the encoded base layer video bitstream 107 can include encoded image data. In some configurations, the encoded enhancement layer video bitstream 110 and/or the encoded base layer video bitstream 107 may also include overhead data, such as Sequence Parameter Set (SPS) information, Picture Parameter Set (PPS) information, Video Parameter Set (VPS) information, slice header information, and so forth.
The first electronic device 102a may provide 108 the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1) to the second electronic device 102 b. The maximum number of inter-layer prediction sub-layers (max _ sublayer _ for _ ilp _ plus1)108 may be signaled in the VPS extension syntax structure (i.e., using the video parameter set raw byte payload (RBSP) semantics defined in the f.7.4.3.1 portion of JCTVC-L1008).
The maximum number of inter-layer prediction sub-layers (max _ sublayer _ for _ ilp _ plus1)108 may be signaled in the encoded base layer video bitstream 107 or the encoded enhancement layer video bitstream 110. In one configuration, the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1)108 may be provided to the electronic device 102b in overhead data (e.g., Sequence Parameter Set (SPS) information, Picture Parameter Set (PPS) information, Video Parameter Set (VPS) information, slice header information, etc.). In another configuration, the maximum number of inter-layer prediction sub-layers (max _ sub _ for _ ilp _ plus1)108 may be provided to the second electronic device 102b in a separate "metadata" bitstream or file.
The second electronic device 102b may use the maximum number of inter-layer prediction sub-layers (max _ summer _ for _ ilp _ plus1)108 to determine whether to mark the image as "unused for reference". The second electronic device 102b may also use the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1)108 to add a picture to the inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer) 120. RefPicSetInterLayer may refer to an inter-layer reference picture list. The second electronic device 102b may also construct a reference picture list (RefPicList0RefPicList1) using the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1) 108.
The encoded enhancement layer video bitstream 110 can be provided to the second electronic device 102 b. Similarly, the encoded base layer video bitstream 107 may be provided to the second electronic device 102 b. The second electronic device 102b may include a video decoder 112 and a base layer decoder 113. The video decoder 112 may include an enhancement layer decoder 115. In one configuration, the base layer decoder 113 decodes the encoded base layer video bitstream 107 while the enhancement layer decoder 115 decodes the encoded enhancement layer video bitstream 110. Base layer decoder 113 and enhancement layer decoder 115 are discussed in further detail below in conjunction with fig. 8. In one configuration, the video decoder 112 may conform to the scalable high efficiency video coding (SHVC) standard. In another configuration, the video decoder 112 may conform to the multi-view high efficiency video coding (MV-HEVC) standard. The base layer decoder 113 and the enhancement layer decoder 115 may each be a High Efficiency Video Coding (HEVC) decoder. Base layer decoder 112 and enhancement layer decoder 115 may also be scalable high efficiency video coding (SHVC) decoders or multi-view high efficiency video coding (MV-HEVC) decoders.
In one example, the encoded enhancement layer video bitstream 110 and the encoded base layer video bitstream 107 can be transmitted to the second electronic device 102b using a wired or wireless link. In some cases, this may be done over a network, such as the Internet, a Local Area Network (LAN), or other type of network used for communication between devices. It should be noted that in some configurations, the encoders (i.e., enhancement layer encoder 106 and base layer encoder 109) and decoders (i.e., video decoder 112, base layer decoder 113, and enhancement layer decoder 115) may be implemented on the same electronic device 102 (i.e., first electronic device 102a and second electronic device 102b may be part of a single electronic device 102). For example, in the case where the encoder and decoder are implemented on the same electronic device 102, the encoded enhancement layer video bitstream 110 and the encoded base layer video bitstream 107 may be made available to the video decoder 112 in various ways. For example, the encoded enhancement layer video bitstream 110 and the encoded base layer video bitstream 107 may be provided to the video decoder 112 over a bus or stored in memory for retrieval by the video decoder 112.
The video decoder 112 (e.g., the base layer decoder 113 and the enhancement layer decoder 115) may be implemented in hardware, software, or a combination of both. In one configuration, video decoder 112 may be an HEVC decoder. The video decoder 112 may obtain (e.g., receive) the encoded enhancement layer video bitstream 110 and the encoded base layer video bitstream 107. The video decoder 112 may then generate one or more decoded images 116 based on the encoded enhancement layer video bitstream 110 and the encoded base layer video bitstream 107. Decoded image 116 may be displayed, played back, stored in memory, and/or transmitted to another device, etc.
The video decoder 112 may include an image tagging module 114. Image tagging module 114 may tag some images as "unused for reference". Pictures marked as "unused for reference" will not be used as reference pictures for inter-picture or inter-layer prediction. One advantage of marking additional pictures as "unused for reference" is that the Decoded Picture Buffer (DPB) size/memory can be reduced. Image tagging module 114 is discussed in further detail below in conjunction with fig. 2-4.
The video decoder 112 may also include an inter-layer Reference Picture Set (RPS) update module 118. The video decoder 112 may update an inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer)120 using an inter-layer Reference Picture Set (RPS) update module 118. For example, the inter-layer Reference Picture Set (RPS) update module 118 may use signaling of the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1)108 to determine whether to add an inter-layer picture to the inter-layer Reference Picture Set (RPS) (refpicsetinter layer) 120. The inter-layer Reference Picture Set (RPS) update module 118 is discussed in further detail below in conjunction with fig. 5-6.
In some configurations, the second electronic device 102b may output the decoded image 116. In one example, the decoded image 116 may be sent to another device or sent back to the first electronic device 102 a. The decoded image 116 may be stored or otherwise maintained on the second electronic device 102 b. In another example, the second electronic device 102b can display the decoded image 116. In other configurations, the decoded image 116 may include elements of the input image 104 having different properties based on encoding and other operations performed on the bitstream 110. In some configurations, the decoded image 116 may be included in an image stream having a different resolution, format, specification, or other property than the input image 104.
The bitstream 110 may be relayed from the first electronic device 102a to the second electronic device 102b through an intermediary device (not shown). For example, the intermediary device may receive the bitstream 110 from the first electronic device 102a and relay the bitstream 110 to the second electronic device 102 b.
It should be noted that one or more elements or components included in the electronic device 102 may be implemented as hardware. For example, one or more of these elements or components may be implemented as a chip, a circuit, or a hardware component, among others. The functions or methods described herein may be implemented and/or performed using hardware. For example, one or more of the methods described herein may be implemented using and/or within a chipset, an Application Specific Integrated Circuit (ASIC), a large scale integrated circuit (LSI), or an integrated circuit, among others.
FIG. 2 is a block diagram of an image tagging module 214 used in the systems and methods of the present invention. The image tagging module 214 of fig. 2 may be one configuration of the image tagging module 114 of fig. 1. The image tagging module 214 may be part of the video decoder 112 on the electronic device 102.
The image marking module 214 may include sub-layer non-reference images 222. Herein, a temporal subset of a scalable layer is not referred to as a layer, but as a sub-layer or temporal sub-layer. The sub-layer is a temporal scalable layer of a temporal scalable bitstream, comprising Video Coding Layer (VCL) Network Abstraction Layer (NAL) units with temporal identifier specific values and associated non-VCL NAL units. The sub-layer non-reference picture 222 is a picture including samples that cannot be used for inter-layer prediction in a decoding process of a subsequent picture of the same sub-layer in decoding order. The samples of the sub-layer non-reference picture 222 may be used for inter-layer prediction in a decoding process of a subsequent picture of a higher sub-layer in decoding order.
The sub-layer non-reference picture 222 may be received from the first electronic device 102a via the bitstream 110. Each sub-layer non-reference picture 222 may include a temporal identifier (temporalld) 224. The picture marking module 214 may also include a maximum number of inter-layer prediction sub-layers (max _ sub _ for _ ilp _ plus1_208, in some cases, the maximum number of inter-layer prediction sub-layers (max _ sublayer _ for _ ilp _ plus1)208 is equal to max _ sublayer _ for _ ilp _ plus1-1, not max _ background _ for _ ilp _ plus 1. furthermore, compared to the syntax and semantics described herein, by adding plus1 or plus2 (plus) or minus1 or minus2 (minus), for each layer, one value for the maximum number of inter-layer prediction sub-layers (max _ sublayer _ for _ ilp _ plus1)208 may be sent max _ sublayer _ for _ ilp _ plus1[ i ] thus, the range of from 0 to vps _ max _ layers _ minus 1. JCTVC _ L0449 defines the syntax and semantics for signaling the use of sub-layers of inter-layer prediction and random access point pictures, as in Table 1 below:
TABLE 1
Herein, random access refers to an action of starting a bitstream decoding process at a point different from a stream start point. This decoding may start at a Random Access Point (RAP) picture. A non-RAP picture refers to a picture of a non-Random Access Point (RAP) picture. In some cases, a RAP picture may alternatively be referred to as an intra random access point picture (IRAP). Similarly, the non-RAP pictures at this time may be referred to as non-IRAP pictures. max _ summer _ for _ ilp _ plus1[ i ] equal to 0 indicates that non-RAP pictures with layer identifier (nuh _ layer _ id)236 equal to layer identifier syntax element value layer _ id _ in _ nuh [ i ]226 are not used as reference for inter-layer prediction. A max _ Sublayer _ for _ ilp _ plus1[ i ] greater than 0 indicates that pictures with a layer identifier (nuh _ layer _ id)236 equal to the layer identifier syntax element value layer _ id _ in _ nuh [ i ]226 and a temporal identifier (Temporalld) 224 greater than max _ Sublayer _ for _ ilp _ plus1[ i ] -1 are not used for reference for inter-layer prediction. When the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1) does not exist, the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1[ i ])208 is ambiguous.
In another embodiment, when i < vps _ max _ layers _ minus1, the loop signaling max _ summer _ for _ ilp _ plus1[ i ] may end as follows:
for(i=0;i<vps_max_layers_minus1;i++){
max_sublayer_for_ilp_plus1[i]u(3)
}
the labeling process for the sub-layer non-reference image 222 is described in JCTVC-L1008, JCTVC-L0452, and JCTVC-L0453. However, this marking process does not use signaling of the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1) 208. Benefits may be realized by using a new method of marking pictures for inter-layer prediction. When the new method of marking pictures for inter-layer prediction is used, the sub-layer non-reference picture 222 of the target layer may be marked as "unused for reference" based on the sub-layer of each layer of inter-layer prediction and the order level usage of the RAP picture.
The decoding process as defined in JCTVC-L1008 is given below in section F.8. Similar processing is also specified in JCTVC-L0452 and JCTVC-L0453.
"F.8 decoding process"
"F.8.1 general decoding Process"
For the description in subclause 8.1, the following additional applies.
When the current picture has nuh _ layer _ id greater than 0, the following applies.
-constructing the following decoding process according to the value of separate _ colour _ plane _ flag:
-if separate _ colour _ plane _ flag is equal to 0, invoking the following decoding process once and taking the current picture as output.
-otherwise (separate _ colour _ plane _ flag equal to 1), invoke the following decoding process three times. The decoding processAll NAL units that are coded pictures with the same colour _ plane _ id value are input. The decoding process for NAL units having a specific value colour _ plane _ id value is specified so that only CVS in monochrome format having the specific value colour _ plane _ id value exist in the bitstream. The output of each of the three decoding processes is assigned to one of the three sample arrays of the current picture, and a NAL unit with colour _ plane _ id equal to 0 is assigned to SLNAL units with colour _ plane _ id equal to 0, 1 and 2 are assigned to SL、SCbAnd SCr
Note that the variable ChromaArrayType is derived as 0 when separate _ colour _ plane _ flag is equal to 1 and chroma _ format _ idc is equal to 3. In the decoding process, the value of the variable is found, resulting in the same operation as that of the monochrome image (when chroma _ format _ idc is equal to 0)
-performing the following decoding operation for the current picture CurrPic.
-for the decoding of the slice segment header of the first slice of the current picture in decoding order, invoking the decoding process specified in subclause f.8.1.1 for starting the decoding of the coded picture with nuh layer id greater than 0.
-if the ViewId nuh layer id is greater than 0, invoking the decoding process specified in subclause g.8.1 for coded pictures with nuh layer id greater than 0.
Otherwise, when dependencyld [ nuh _ layer _ id ] is greater than 0, the decoding process specified in subclause x.x.x for coded pictures with nuh _ layer _ id greater than 0 is invoked.
-after all slices of the current picture have been decoded, invoking the decoding process specified in subclause f.8.1.2 for ending the decoding of coded pictures with nuh _ layer _ id greater than 0.
"f.8.1.1 decoding processing for starting decoding of a coded picture with nuh _ layer _ id greater than 0"
Each picture referred to in this subclause is a complete encoded picture.
For the current picture CurrPic, the following decoding operations are performed:
1. decoding of NAL units is specified in subclause 8.2.
2. The process in subclause 8.3 specifies the following decoding process using the slice segment layer and syntax elements above:
-deriving variables and functions relating to the picture sequence number in subclause 8.3.1. This is only invoked when used for the first cut segment of the image. The requirement for bitstream consistency, PicOrderCntVal, should remain unchanged within an access unit.
-for pictures with nuh layer id equal to CurrPic, invoke the decoding process for RPS performed in subclause 8.3.2, where the reference pictures can be marked as "unused for reference" or "used for long term reference". This only needs to be invoked when used for the first cut segment of the image.
-invoking the decoding process specified in subclause 8.3.3 for generating unusable reference pictures when CurrPic is a BLA picture or CRA picture with NoRaslOutputFlag equal to 1, which is invoked only when used for the first slice of a picture.
"f.8.1.2 decoding processing for ending decoding of a coded picture whose nuh _ layer _ id is greater than 0"
PicOutputFlag is set as follows:
-setting PicOutputFlag to 0 if the current picture is a RASL picture and NoRaslOutputFlag of the associated IRAP picture is equal to 1.
Otherwise, PicOutputFlag is set to pic _ output _ flag.
The following applies:
-marking the decoded picture as "used for short-term reference".
-when temporalld is equal to HighestTid, call the marking process of the sub-layer non-reference pictures not needed for inter-layer prediction specified in sub-clause f.8.1.2.1, and latestDecLayerId equal to nuh _ layer _ id as input.
"F.8.1.2.1 labeling processing of sub-layer non-reference pictures that are not needed for inter-layer prediction"
The inputs to this process are:
-latestDeccLayerId of nuh _ layer _ id value
The output of this process is:
possibly updating some of the decoded pictures to be marked as "unused for reference"
Note that this process marks pictures that are not needed for inter-picture or inter-layer prediction as "unused for reference". When temporalld is less than HighestTid, the current picture may be used for reference for inter-picture prediction, and this process may not be invoked.
The variables targetDecLayerIdList, numTargetDecDescLayers and latestDecIdx are derived as follows:
specifying a layer identifier list, TargetDecLayerIdList, that specifies a list of nuh _ layer _ id values of the NAL units being decoded in ascending order of the nuh _ layer _ id values by:
-if a certain external device is available for setting targetdeclayerldlist, setting targetdeclayerldlist by the external device.
Otherwise, if the decoding process is invoked in the bitstream conformance test, the targetdeclayerldlist is set accordingly.
Otherwise, targetdeclayerldlist contains only nuh _ layer _ id equal to 0.
-setting numTargetDecLayers equal to the number of entries in targetdeclayerldlist.
-setting latestDecIdx to the value of i for i where TargetDecLayerIdList [ i ] equals latestDecLayerId.
As described above, the separate _ color _ plane _ flag refers to a flag indicating the number of individual color planes used to encode an image. The term colour _ plane _ id refers to an identifier of a colour component. The term ChromaArrayType refers to the type of chroma array. Term SL、SCbAnd SCrRefers to a sampling array. The term NAL refers to a Network Abstraction Layer (NAL). The term PicOrderCntVal refers to the picture sequence number of the current picture. CurrPic refers to the current picture. The term NoRaslOutputFlag refers to a flag indicating whether a random access hopping preamble (RASL) picture is output (and whether the picture can be decoded correctly). The term pic _ output _ flag refers to a possible syntax element in the associated slice header. The term TargetDecLayerIdList is a layer identifier list that specifies a list of nuh _ layer _ id values of the NAL units being decoded in ascending order of the nuh _ layer _ id values. The term NumNegativePics specifies the number of entries in the stRpsIdx-th candidate short-term Reference Picture Set (RPS) having a picture sequence number value greater than that of the current picture.
The term UsedByCurrPicS0 specifies whether the ith entry of the stRpsIdx candidate short-term RPSs having a picture sequence number value less than the picture sequence number value of the current picture is used for reference of the current picture. The term UsedByCurrPicS1 specifies whether the ith entry in the current candidate short-term RPS having a picture sequence number value greater than the picture sequence number value of the current picture is used for reference by the current picture.
The term num _ long _ term _ SPS specifies the number of entries in the long-term RPS in the current picture that are derived based on the candidate long-term reference picture specified in the active SPS. The term num _ long _ term _ pics specifies the number of entries in the long-term RPS in the current picture that are directly signaled in the slice header. The term UsedByCurrPicLt specifies whether the i-th entry in the long-term RPS of the current picture is used for reference of the current picture.
layer _ id _ in _ nuh [ i ] specifies the value of the syntax element nuh _ layer _ id in the VCLNAL unit of the i-th layer. When not present, the value of layer _ id _ in _ nuh [ i ] is inferred to be equal to i. The variable LayerIdInVps [ layer _ id _ in _ nuh [ i ] ] is set equal to i. Direct _ dependency _ flag i j equal to 0 specifies that the layer with index j is not a direct reference layer to the layer with index i. The variable direct dependency flag i j equal to 1 specifies that the layer with index j may be a direct reference layer to the layer with index i. For i and j ranging from 0 to vps _ max _ layers _ minus1, when direct _ dependency _ flag [ i ] [ j ] is not present, it is inferred that direct _ dependency _ flag [ i ] [ j ] is equal to 0. Variables NumDirectRiefLayers [ i ] and RefLayerId [ i ] [ j ] are derived as follows:
for(i=1;i<=vps_max_layers_minus1;i++)
for(j=0,NumDirectRefLayers[i]=0;j<i;j++)
if(direct_dependency_flag[i][j]==1)
RefLayerId[i][NumDirectRefLayers[i]++]=layer_id_in_nuh[j]
scalability _ mask [ i ] equal to 1 indicates the presence of a dimension _ Id syntax element corresponding to the i-th scalability dimension shown in the table "scalability Id to scalability dimension mapping" (mapping of scalability _ idtonality _ dimensions). The variable scalability _ mask [ i ] equal to 0 indicates that there is no dimension _ id syntax element corresponding to the ith scalability dimension. The following table F-1 gives the scalability ld to scalability dimension mapping:
TABLE F-1: mapping of scalability Id to scalability dimension
scalability _ Task indexScalability dimensionScalability Id mapping
0Multiple viewsView Id
1spatial/SNR scalabilityDependency Id
2-15Reservation
TABLE F-1
dimension _ id [ i ] [ j ] specifies the identifier of the jth scalability dimension type present in the ith layer. When not present, the value of dimension _ id [ i ] [ j ] is inferred to be equal to 0. The number of bits for representation of dimension _ id [ i ] [ j ] is dimension _ id _ len _ minus1[ j ] +1 bit. When the splitting _ flag is equal to 1, bitstream consistency requires that dimension _ id [ i ] [ j ] should be equal to ((layer _ id _ in _ nuh [ i ] & ((1 < dimBitOffset [ j +1]) -1)) > dimBitOffset [ j ]).
The variable ScalabilityId [ i ] [ smIdx ] specifies the identifier of the smldx scalability dimension type in the i-th layer, the variable ViewId [ layer _ id _ in _ nuh [ i ] ] specifies the view identifier of the i-th layer, and the variable dependencyld [ layer _ id _ in _ nuh [ i ] ] specifies the spatial/SNR scalability identifier of the i-th layer, derived as follows:
HighestTid is the highest temporal identifier (temporalld) present in the bitstream. PicOutputFlag is a variable that is set based on the picture type (e.g., whether the picture is a random access skip leading picture) and based on a signaled syntax element pic _ output _ flag.
In one configuration, section f.8.1.2.1 may include the language of table 2 that marks image 222 as "unused for reference".
TABLE 2
In table 2, each sub-layer non-reference picture 222 has a defined temporal identifier (temporalld) 224. The temporal identifier (temporalld) 224 of the picture 222 is compared with the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1)208 for the specified layer (i.e., layerldinvps [ targetdelayerlisti [ i ] ]). Therefore, the index of max _ background _ for _ ilp _ plus1 is LayerIdInVps [ TargetDecLayerIdList [ i ] ]. A picture 222 is marked as "unused for reference" if its temporal identifier (temporalld) 224 is greater than max _ background _ for _ ilp _ plus1208-1 for the specified layer. In table 2, targetdeclayerldlist refers to a target layer identifier list. Thus, in the marking stage, an image is marked as "unused for reference" even if the image belongs to a layer that is used as a reference layer by any layer in the target layer identifier list.
In another configuration, section f.8.1.2.1 may include the language of table 3 that marks image 222 as "unused for reference".
TABLE 3
Similar to table 2, each image 222 in table 3 has a defined temporal identifier (temporalld) 224. However, in Table 3, the temporal identifier (TemporalId) of the image 222 is compared to max _ Sublayer _ for _ ilp _ plus1[ i ]. If the temporal identifier (TemporalId) of the image 222 is greater than max _ background _ for _ ilp _ plus1[ i ] -1, the image 222 is marked as "unused for reference". In table 3, targetdeclayerldlist refers to a target layer identifier list. Thus, in the marking stage, an image is marked as "unused for reference" even if the image belongs to a layer that is used as a reference layer by any layer in the target layer identifier list.
In yet another configuration, section F.8.1.2.1 may include the language of Table 4 that marks image 222 as "unused for reference
TABLE 4
The language in table 4 is similar to table 2 except table 4 does not include a specific language for time identifier (temporalld) 224 in the markup language. In this case, when the image 222 is marked, no additional check is made at this stage on the temporal identifier (temporalld) 224 of the image 222.
In another configuration, section f.8.1.2.1 may include the language of table 5 that marks image 222 as "unused for reference".
TABLE 5
The language in table 5 is similar to table 3 except table 5 does not include a specific language for time identifier (temporalld) 224 in the markup language. In this case, when the image 222 is marked, no additional check is made at this stage on the temporal identifier (temporalld) 224 of the image 222.
FIG. 3 is a flow chart of a method 300 for marking a sub-layer non-reference image 222. The method 300 may be performed by the electronic device 102. In one configuration, the method 300 may be performed by the video decoder 112 on the electronic device 102. The electronic device 102 may obtain 302 signaling of the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1) 208. As described above, the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1)208 may be provided to the electronic device 102 via the bitstream 110.
The electronic device 102 may also obtain 304 a sub-layer non-reference image 222. The sub-layer non-reference pictures 222 may be provided to the electronic device 102 via the bitstream 110. The electronic device 102 may determine 306 whether a value of a temporal identifier (temporalld) 224 of a sub-layer non-reference picture 222 is greater than a maximum number of inter-layer prediction sub-layers (max _ sub _ for _ ilp _ plus1) 208-1. In one configuration, the electronic device 102 may compare the temporal identifier (temporalld) 224 of the sub-layer non-reference picture 222 with the maximum number of inter-layer prediction sub-layers (max _ sublayer _ for _ ilp _ plus1)208 using a language of one of table 2, table 3, table 4, and table 5 above.
If the temporal identifier (temporalld) 224 of the sub-layer non-reference picture 222 is greater than the maximum number of inter-layer predicted sub-layers (max _ sub _ for _ ilp _ plus1)208-1, the electronic device 102 may mark 308 the sub-layer non-reference picture 222 as "unused for reference" even if these pictures belong to layers that are used as reference layers by any of the layers in the target layer identifier list. Therefore, the sub-layer non-reference pictures 222 will not be used for inter-layer prediction. If the temporal identifier (temporalld) 224 of the sub-layer non-reference picture 222 is not greater than the maximum number of inter-layer prediction sub-layers (max _ sub _ for _ ilp _ plus1)208-1, the method 300 may end. In other words, if the temporal identifier (temporalld) 224 of the sub-layer non-reference picture 222 has a value less than or equal to the maximum number of inter-layer prediction sub-layers (max _ sub _ for _ ilp _ plus1)208-1, the sub-layer non-reference picture 222 is not marked as "unused for reference", and if the sub-layer non-reference picture 222 belongs to a layer used as a reference layer by any layer in the target layer identifier list, they can be used for inter-layer prediction.
In another scenario (not shown), sub-layer non-reference pictures 222 belonging to layers that are not used as reference layers by any layer in the target layer identifier list may also be marked as "unused for reference". In some cases, the steps described in fig. 3 for marking sub-layer non-reference pictures 222 are only performed when the temporal identifier (temporalld) 224 of the sub-layer non-reference pictures 222 is equal to the highest temporal identifier present in the bitstream.
FIG. 4 is a block diagram illustrating an additional image 432 labeled "unused for reference" using the systems and methods of the present invention. In the example shown three layers (one base layer and two enhancement layers) are used, including temporal sub-layers. In the second enhancement layer EL2, the plurality of images 430 are labeled as "unused for reference" by the standards defined by JCTVC-L1008, JCTVC-L0452, and JCTVC-L0453. In the first enhancement layer EL1, the additional picture 432 is marked as "unused for reference" based on the maximum number of inter-layer prediction sub-layers (max _ sub _ for _ ilp _ plus1) 208. By marking additional pictures 432 as "unused for reference," the Decoded Picture Buffer (DPB) size/memory may be reduced.
Fig. 5 is a block diagram illustrating the inter-layer Reference Picture Set (RPS) update module 518. The inter-layer Reference Picture Set (RPS) update module 518 of fig. 5 may be one configuration of the inter-layer Reference Picture Set (RPS) update module 118 of fig. 1. The inter-layer Reference Picture Set (RPS) update module 518 may be part of the video decoder 112 on the electronic device 102. The video decoder 112 may use an inter-layer Reference Picture Set (RPS) update module 518 to update the inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer) 120.
The inter-layer Reference Picture Set (RPS) update module 518 may include an inter-layer picture 534. In one configuration, the inter-layer picture 534 may be a non-RAP (random access point) picture or a Random Access Point (RAP) picture. The inter-layer image 534 may be an image received from another electronic device 102 via the bitstreams 110 and 107. The inter-layer Reference Picture Set (RPS) update module 518 may determine whether to add the inter-layer picture 524 to the inter-layer Reference Picture Set (RPS) 120.
The inter-layer picture 534 may include a layer identifier (nuh _ layer _ id)536, a temporal identifier (temporalld) 538, and a picture order number (POC) 553. If the layer identifier (nuh _ layer _ id)536 of the inter-layer picture 534 corresponds to a layer that is a direct reference layer for the current picture and the Picture Order Count (POC)553 of the inter-layer picture 534 is equal to the Picture Order Count (POC)561 of the current picture (nuh _ layer _ id)559, an additional check is performed (for determining whether the inter-layer picture 534 should be added to the inter-layer Reference Picture Set (RPS) 120).
The additional check performed compares the temporal identifier (temporalld) 538 of the inter-layer picture 534 with the signaled maximum number of inter-layer prediction sub-layers (max _ sub _ for _ ilp _ plus1) 508. If the value of the temporal identifier (temporalld) 538 of the inter-layer picture 534 is greater than the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1)508-1, the inter-layer picture 534 is not added to the inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer) 120. Similarly, if the value of the temporal identifier (temporalld) 538 of the inter-layer picture 534 is less than or equal to the maximum number of inter-layer prediction sub-layers (max _ sub _ for _ ilp _ plus1)508-1, the inter-layer picture 534 is added to the inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer) 120. Additionally, if the value of the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1)508 is yes, the inter-layer picture 534 is not added to the inter-layer Reference Picture Set (RPS) (refpicsetinter layer)120 unless the inter-layer picture is a Random Access Point (RAP) picture.
The semantics as defined in JCTVC-1008 are given below in section G7.4.7.2 (the underlining indicates the added change to the system and method of the present invention). Similar processing is also specified in JCTVC-L0452 and JCTVC-L0453.
"G7.4.7.2 semantics"
The following modifications apply to the description of clause f.7.4.7.2 and all of its clauses.
The variable NumPocTotalCurr is derived as follows:
in the sample code provided above, numinterslayerrpspics replaces NumDirectRefLayers.
An alternative configuration for determining the variable NumPocTotalCurr120 as defined in JCTVC-L1008 is given below in section G7.4.7.3 (the underlining indicates the added change to the system and method of the present invention). Similar processing is also specified in JCTVC-L0452 and JCTVC-L0453.
"G7.4.7.3 semantics"
The following modifications apply to the description of clause f.7.4.7.2 and all of its clauses.
The variable numinterslayerrpstics is derived as follows:
the variable NumPocTotalCurr is derived as follows:
in the sample code provided above, the derivation of numinterslayerrsps is new, and numinterslayerrsps replaces NumDirectRefLayers when calculating NumPocTotalCurr.
The decoding process as defined in JCTVC-L1008 is given below in section g.2 (the underlining indicates the added change to the system and method of the present invention). Similar processing is also specified in JCTVC-L0452 and JCTVC-L0453.
"G.2 decoding Process"
"g.2.1 decoding process for coded picture with nuh _ layer _ id greater than 0"
For the current picture CurrPic, the decoding process operates as follows:
1. decoding of NAL units is specified in subclause 8.2.
2. The processes in subclauses g.8.1.1 and g.8.3.4 specify the following decoding processes using slice segment layers and syntax elements above:
-invoking sub-clause g.8.1.1 before decoding the first slice of the current picture.
At the start of the decoding process of each P or B slice, the decoding process specified in subclause g.8.3.4 for reference picture list construction is invoked for the derivation of reference picture list 0(RefPicList0) and reference picture list1 when decoding a B slice (RefPicList 1).
3. The processes in subclauses 8.4, 8.5, 8.6, and 8.7 specify a decoding process using syntax elements in all syntax structure layers. Bitstream conformance requires that the encoded slices of an image should contain slice segment data for each coding tree unit of the image, such that a partition dividing the image into slices, a partition dividing the slices into slices, and a partition dividing the slices into coding tree units each form part of an image segmentation.
4. After all slices of the current picture are decoded, the marking process specified in sub-clause g.8.1.2 for ending the decoding of the coded picture whose nuh _ layer _ id is greater than 0 is called.
"decoding process of inter-layer reference picture set by g.2.1.1"
The output of this process is an update list RefPicSetInterLayer of the inter-layer image.
The list RefPicSetInterLayer is first emptied and then the following derivation is performed:
in the modification of section g.2.1.1, the temporal identifier (temporalld) 538 is considered when determining whether to add the inter-layer picture 534 to the inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer) 120. If the value of the temporal identifier (temporalld) 538 of the inter-layer picture 534 is less than or equal to the value-1 of the maximum number of inter-layer prediction sub-layers (max _ sub _ for _ ilp _ plus1)508, the inter-layer picture 534 is added to an inter-layer Reference Picture Set (RPS) (refpicsetinter layer)120 and used for inter-layer prediction. Also, if the value-1 of the maximum number of inter-layer prediction sublayers (max _ sub _ for _ ilp _ plus1)508 is 0, the inter-layer picture 534 is added to an inter-layer Reference Picture Set (RPS) (refpicsetinter layer)120 if the inter-layer picture 534 is a RAP picture.
If the value of the temporal identifier (temporalld) 538 of the inter-layer picture 534 is greater than the maximum number of inter-layer prediction sub-layers (max _ sub _ for _ ilp _ plus1)508-1, the inter-layer picture 534 is not added to the inter-layer Reference Picture Set (RPS) (refpicsetinter layer)120, and the inter-layer picture 534 is not used for inter-layer prediction. Also, if the value-1 of the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1)508 is 0, the inter-layer picture 534 is not added to the inter-layer Reference Picture Set (RPS) (refpicsetinter layer)120 if the inter-layer picture 534 is not a RAP picture. The value of the temporal identifier (temporalld) 538 of the inter-layer picture 534 is compared with the value of the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1)508 only when the value of the layer identifier (nuh _ layer _ id)536 corresponds to a layer that is a direct reference layer of the current picture and the picture sequence number 553 of the inter-layer picture is equal to the picture sequence number (POC)561 of the current picture (nuh _ layer _ id) 559.
An alternative configuration for determining whether to add an inter-layer picture 534 to an inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer)120 as defined in JCTVC-L1008 (the underlining indicates the added change to the system and method of the present invention) is given in section g.2.1.2 below. Similar processing is also specified in JCTVC-L0452 and JCTVC-L0453.
"decoding process of inter-layer reference picture set by g.2.1.2"
The output of this process is an update list RefPicSetInterLayer of the inter-layer image.
The list RefPicSetInterLayer is first emptied and then the following derivation is performed:
similar to the decoding process for the inter-layer Reference Picture Set (RPS) described in g.2.1.1, the decoding process for the inter-layer Reference Picture Set (RPS) described in g.2.1.2 uses the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1)508 to determine whether to add the inter-layer picture 534 to the inter-layer Reference Picture Set (RPS) (refpicsetinter layer) 120.
Another configuration for determining whether to add an inter-layer picture 534 to an inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer)120 as defined in JCTVC-L1008 (the underlining indicates the added change to the system and method of the present invention) is given in section g.2.1.3 below. Similar processing is also specified in JCTVC-L0452 and JCTVC-L0453.
"decoding process of inter-layer reference picture set by g.2.1.3"
The output of this process is an update list RefPicSetInterLayer of the inter-layer image.
The list RefPicSetInterLayer is first emptied and then the following derivation is performed:
similar to the decoding process for the inter-layer Reference Picture Set (RPS) described in g.2.1.1 and g.2.1.2, the decoding process for the inter-layer Reference Picture Set (RPS) described in g.2.1.3 uses the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1)508 to determine whether to add the inter-layer picture 534 to the inter-layer Reference Picture Set (RPS) (refpicsetinter layer) 120.
Yet another configuration for determining whether to add an inter-layer picture 534 to an inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer)120 as defined in JCTVC-L1008 (the underlining indicates the added change to the system and method of the present invention) is given in section g.2.1.4 below. Similar processing is also specified in JCTVC-L0452 and JCTVC-L0453.
"decoding process of inter-layer reference picture set by g.2.1.4"
The output of this process is an update list RefPicSetInterLayer of the inter-layer image.
The list RefPicSetInterLayer is first emptied and then the following derivation is performed:
similar to the decoding process for the inter-layer Reference Picture Set (RPS) described in g.2.1.1, g.2.1.2, and g.2.1.3, the decoding process for the inter-layer Reference Picture Set (RPS) described in g.2.1.4 uses the maximum number of inter-layer prediction sublayers (max _ Sublayer _ for _ ilp _ plus1)508 to determine whether to add the inter-layer picture 534 to the inter-layer Reference Picture Set (RPS) (RefPicSetInterlayer) 120.
Another configuration for determining whether to add an inter-layer picture 534 to an inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer)120 as defined in JCTVC-L1008 (the underlining indicates the added change to the system and method of the present invention) is given in section g.2.1.5 below. Similar processing is also specified in JCTVC-L0452 and JCTVC-L0453.
"decoding processing of inter-layer reference picture set by g.2.1.5"
The output of this process is an update list RefPicSetInterLayer of the inter-layer image.
The list RefPicSetInterLayer is first emptied and then the following derivation is performed:
similar to the decoding process for the inter-layer Reference Picture Set (RPS) described in g.2.1.1, g.2.1.2, g.2.1.3, and g.2.1.4, the decoding process for the inter-layer Reference Picture Set (RPS) described in g.2.1.5 uses the maximum number of inter-layer prediction sublayers (max _ Sublayer _ for _ ilp _ plus1)508 to determine whether to add the inter-layer picture 534 to the inter-layer Reference Picture Set (RPS) (RefPicSetInterlayer) 120.
Another configuration for determining whether to add an inter-layer picture 534 to an inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer)120 as defined in JCTVC-L1008 (the underlining indicates the added change to the system and method of the present invention) is given in section g.2.1.6 below. Similar processing is also specified in JCTVC-L0452 and JCTVC-L0453.
"decoding process of inter-layer reference picture set by g.2.1.6"
The output of this process is an update list RefPicSetInterLayer of the inter-layer image.
The list RefPicSetInterLayer is first emptied and then the following derivation is performed:
similar to the decoding process for the inter-layer Reference Picture Set (RPS) described in g.2.1.1, g.2.1.2, g.2.1.3, g.2.1.4, and g.2.1.5, the decoding process for the inter-layer Reference Picture Set (RPS) described in g.2.1.6 uses the maximum number of inter-layer prediction sublayers (max _ Sublayer _ for _ ilp _ plus1)508 to determine whether to add the inter-layer picture 534 to the inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer) 120.
The marking process for ending decoding of an encoded picture as defined in JCTVC-L1008 is given in section g.2.1.7 below (underlining indicates the added change to the system and method of the present invention).
"g.2.1.7 flag processing for ending decoding of coded picture with nuh _ layer _ id greater than 0"
The output of this process is:
-a "for short-term reference" flag for possible updates of some decoded pictures.
The following applications were carried out:
for(i=0;i<NumInterLayerRPSPics[LayerIdInVps[nuh_layer_id]];i++)
RefPicSetInterLayer[i]ismarkedas″usedforshort-termreference″
at the start of the decoding process for each P and B slice, reference picture list construction is performed. The decoding process for reference picture list construction as defined in JCTVC-L1008 is given in section G.2.1.8 below (the underlining indicates the added change to the system and method of the present invention). Similar processing is also specified in JCTVC-L0452 and JCTVC-L0453.
"G.2.1.8 decoding Process for reference Picture construction"
This process is invoked at the beginning of the decoding process for each P and B slice.
The reference index is addressed by the reference index specified in subclause 8.5.3.2.1. The reference index is an index in the reference picture list. When decoding a P slice, there is a single reference picture list RefPicList 0. When decoding a B slice, there is a second independent reference picture list, RefPicList1, in addition to RefPicList 0.
At the start of the decoding process for each slice, a reference picture list RefPicList0 is derived, as well as RefPicList1 for B slices, as follows:
the variable numrpsccurrtemplist 0 is set equal to Max (num _ ref _ idx _ l0_ active _ minus1+1, NumPocTotalCurr), and a list RefPicListTemp0 is constructed as follows:
a list RefPicList0 is constructed as follows:
when the slice is a B slice, the variable numrpsccurrtemplist 1 is set equal to Max (num _ ref _ idx _ l1_ active _ minus1+1, NumPocTotalCurr), and a list RefPicListTemp1 is constructed as follows:
when the slice is a B slice, a list RefPicList1 is constructed as follows:
for(rIdx=0;rIdx<=num_ref_idx_l1_ative_minus1;rIdx++)(8-11)RefPicList1[rIdx]=ref_pic_list_modification_flag_l1?RefPicListTemp1[list_entry_l1[rIdx]]:RefPicListTemp1[rIdx]
in section g.2.1.8, the number of inter-layer Reference Picture Set (RPS) pictures (numinterlevel rpspics) is used instead of the number of direct reference layers (NumDirectRefLayers).
Fig. 6 is a flow chart of a method 600 for updating an inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer) 120. The method 600 may be performed by the electronic device 102. In one configuration, the method 600 may be performed by the video decoder 112 on the electronic device 102. The electronic device 102 may obtain 602 signaling of the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1) 508. In one configuration, the electronic device 102 may obtain 602 signaling of the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1)508 from another electronic device 102 via the bitstream 110.
The electronic device 102 may begin 604 a decoding process for an inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer) 120. The electronic device 102 may obtain 606 a value of a temporal identifier (temporalld) 538 of an inter-layer picture 534, the inter-layer picture 534 having a layer identifier (nuh layer id)536 corresponding to a layer that is a direct reference layer for the current layer and having a picture order number (POC)553 equal to a picture order number (POC)561 of the current picture (nuh layer id) 559.
The electronic device 102 may determine 608 whether a value of a temporal identifier (temporalld) 538 of the inter-layer picture 534 is less than or equal to a maximum number of inter-layer prediction sub-layers (max _ sub _ for _ ilp _ plus1) 508-1. If the value of the temporal identifier (temporalld) 538 of the inter-layer picture 534 is less than or equal to the maximum number of inter-layer prediction sub-layers (max _ sub _ for _ ilp _ plus1)508-1, the electronic device 102 may add 610 the inter-layer picture 534 to an inter-layer Reference Picture Set (RPS) (refpicsetinter layer) 120. Also, if the value-1 of the maximum number of inter-layer prediction sublayers (max _ sub _ for _ ilp _ plus1)508 is 0, the inter-layer picture 534 is added to an inter-layer Reference Picture Set (RPS) (refpicsetinter layer)120 if the inter-layer picture 534 is a RAP picture. The electronic device 102 may then use 612 the inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer)120 for reference picture list construction (as described above in section g.2.1.7).
If the value of the temporal identifier (temporalld) 538 of the inter-layer picture 534 is not less than or equal to the maximum number of inter-layer prediction sublayers (max _ sub _ for _ ilp _ plus1)508-1 (e.g., the value of the temporal identifier (temporalld) 538 of the inter-layer picture 534 is greater than the maximum number of inter-layer prediction sublayers (max _ sub _ for _ ilp _ plus1)508-1), the inter-layer picture 534 is not added to the inter Reference Picture Set (RPS) (RefPicSetInterLayer) 120. Also, if the value-1 of the maximum number of inter-layer prediction sublayers (max _ sublayer _ for _ ilp _ plus1)508 is 0, the inter-layer picture 534 is not added to the inter-layer Reference Picture Set (RPS) (refpicsetinter layer)120 if the inter-layer picture 534 is not a RAP picture. The electronic device 102 may then use 612 the inter-layer Reference Picture Set (RPS) (RefPicSetInterLayer)120 for reference picture list construction (as described above in section g.2.1.7).
Fig. 7 shows a block diagram of one configuration of a video encoder 782 on the electronic device 702. Video encoder 782 of fig. 7 may be one configuration of video encoder 182 of fig. 1. The video encoder 782 may include an enhancement layer encoder 706, a base layer encoder 709, a resolution up-scaling module 770, and an output interface 780.
The enhancement layer encoder 706 may include a video input 781 that receives the input image 704. The output of video input 781 may be provided to adder/subtractor 783 which receives the output of prediction selection 750. The output of the adder/subtractor 783 may be provided to a transform and quantization module 752. The output of the transform and quantization module 752 may be provided to an entropy encoding module 748, as well as a scaling and inverse transform module 772. After performing entropy encoding, the output of entropy encoding module 748 may be provided to output interface 780. The output interface 780 may output the encoded base layer video bitstream 707 and the encoded enhancement layer video bitstream 710.
The output of the scaling and inverse transform module 772 may be provided to an adder 779. Adder 779 may also receive the output of prediction select 750. The output of the adder 779 may be provided to a deblocking module 751. The output of the deblocking module 751 may be provided to a reference buffer 794. The output of the reference buffer 794 may be provided to a motion compensation module 754. The output of the motion compensation module 754 may be provided to a prediction selection 750. The output of reference buffer 794 may also be provided to intra predictor 756. The output of intra predictor 756 may be provided to prediction selection 750. Prediction selection 750 may also receive the output of resolution up-scaling module 770.
The base layer encoder 709 may include a video input 762 that receives a downsampled input image or an alternate view input image or the same input image 703 (i.e., the same input image 704 as received by the enhancement layer encoder 706). The output of video input 762 may be provided to coding prediction loop 764. Entropy coding 766 may be provided on the output of the coding prediction loop 764. The output of the coded prediction loop 764 may also be provided to a reference buffer 768. Reference buffer 768 may provide feedback to coding prediction loop 764. The output of the reference buffer 768 may also be provided to a resolution up-scaling module 770. Once entropy encoding 766 is performed, the output may be provided to an output interface 780.
Fig. 8 is a block diagram illustrating one configuration of a video decoder 812 on an electronic device 802. The video decoder 812 of fig. 8 may be one configuration of the video decoder 112 of fig. 1. The video decoder 812 may include an enhancement layer decoder 815 and a base layer decoder 813. The video decoder 812 may also include an interface 889 and a resolution up-scaling 870.
Interface 889 may receive an encoded video stream 885. The encoded video stream 885 may include a base layer encoded video stream and an enhancement layer encoded video stream. The base layer encoded video stream and the enhancement layer encoded video stream may be transmitted separately or in combination. The interface 889 may provide a portion or all of the encoded video stream 885 to an entropy decoding module 886 in the base layer decoder 813. The output of the entropy decoding module 886 may be provided to a decoding prediction loop 887. The output of the decode prediction loop 887 may be provided to a reference buffer 888. The reference buffer may provide feedback to the decoding prediction loop 887. The reference buffer 888 may also output the decoded base layer video 884.
The interface 889 may also provide a portion or all of the encoded video stream 885 to an entropy decoding module 890 in the enhancement layer decoder 815. The output of the entropy decoding module 890 may be provided to an inverse quantization module 891. The output of inverse quantization module 891 may be provided to adder 892. The adder 892 may add the output of the inverse quantization module 891 and the output of the prediction selection module 895. The output of adder 892 may be provided to a deblocking module 893. The output of the deblocking module 893 may be provided to a reference buffer 894. The reference buffer 894 may output decoded enhancement layer video 882.
The output of the reference cache 894 may also be provided to an intra predictor 897. The enhancement layer decoder 815 may include motion compensation 896. Motion compensation 896 may be performed after resolution up-scaling 870. The prediction selection module 895 may receive the output of the intra predictor 897 and the output of the motion compensation 896.
Fig. 9 illustrates various components that may be used in transmitting electronic device 902. One or more of the electronic apparatuses 102 described herein may be implemented in accordance with the transmitting electronic device 902 shown in fig. 9.
The transmitting electronic device 902 includes a processor 939 that controls the operation of the transmitting electronic device 902. The processor 939 can also be referred to as a Central Processing Unit (CPU). Memory 933 can include Read Only Memory (ROM), Random Access Memory (RAM), or any type of device that can store information and provide instructions 935a (e.g., executable instructions) and data 937a to processor 939. A portion of the memory 933 can also include non-volatile random access memory (NVRAM). A memory 933 can be in electronic communication with the processor 939.
Instructions 935b and data 937b may also reside in the processor 939. The instructions 935b and/or data 937b loaded into the processor 939 may also include data 935a and/or data 937a loaded from the memory 933 for execution or processing by the processor 939. The instructions 935b may be executed by the processor 939 to implement one or more of the methods disclosed herein.
The transmitting electronic device 902 may include one or more communication interfaces 941 for communicating with other electronic devices (e.g., receiving electronic devices). The communication interface 941 may be based on wired communication techniques, wireless communication techniques, or both. Examples of communication interface 941 include a serial port, a parallel port, a Universal Serial Bus (USB), an ethernet adapter, an IEEE1394 bus interface, a Small Computer System Interface (SCSI) bus interface, an Infrared (IR) communication port, a bluetooth wireless communication adapter, and a wireless transceiver according to the third generation partnership project (3GPP) specifications, among others.
The transmitting electronic device 902 may include one or more output devices 945 and one or more input devices 943. Examples of output devices 945 include speakers, printers, and so forth. One output device that may be included in the transmitting electronic device 902 is a display device 947. The display device 947 used in conjunction with the configurations disclosed herein may utilize any suitable image projection technology, such as Cathode Ray Tubes (CRTs), Liquid Crystal Displays (LCDs), Light Emitting Diodes (LEDs), gaseous plasmas, electroluminescence, and the like. A display controller 949 may be provided for converting data stored in the memory 933 into text, graphics, and/or moving images that are displayed on the display 947 (as needed). Examples of input devices 943 include a keyboard, mouse, microphone, remote control device, buttons, joystick, trackball, touch pad, touch screen, light pen, and the like.
The various components of the transmitting electronic device 902 are coupled together by a bus system 951, which may include a power bus, a control signal bus, a status signal bus, and a data bus. However, for the sake of clarity, the various buses are illustrated in FIG. 9 as bus system 951. The transmitting electronic device 902 shown in fig. 9 is a functional block diagram rather than a listing of specific components.
Fig. 10 is a block diagram illustrating various components that may be used in the receiving electronic device 1002. One or more of the electronic devices 102 may be implemented in accordance with the transmitting electronic device 1002 shown in fig. 10.
The receiving electronic device 1002 includes a processor 1039 that controls the operation of the receiving electronic device 1002. Processor 1039 may also be referred to as a CPU. The memory 1033 may include ROM, RAM, or any type of device that can store information and provide instructions 1035a (e.g., executable instructions) and data 1037a to the processor 1039. A portion of the memory 1033 may also include NVRAM. Memory 1033 may be in electronic communication with processor 1039.
Instructions 1035b and data 1037b may also reside in the processor 1039. The instructions 1035b and/or data 1037b loaded into the processor 1039 can also include data 1035a and/or data 1037a loaded from the memory 1033 for execution or processing by the processor 1039. The instructions 1035b may be executable by the processor 1039 to implement one or more of the methods 200, 300, 400, 500 disclosed herein.
Receiving electronic device 1002 may include one or more communication interfaces 1041 for communicating with other electronic devices (e.g., a transmitting electronic device). Communication interface 1041 may be based on a wired communication technology, a wireless communication technology, or both. Examples of communication interface 1041 include a serial port, a parallel port, a USB, an ethernet adapter, an IEEE1394 bus interface, a SCSI bus interface, an IR communication port, a bluetooth wireless communication adapter, a wireless transceiver according to the 3GPP technical specification, and so forth.
Receiving electronic device 1002 may include one or more output devices 1045 and one or more input devices 1043. Examples of output devices 1045 include speakers, printers, and so forth. One output device that may be included in the receiving electronic device 1002 is a display device 1047. The display device 1047 used in conjunction with the configurations disclosed herein may utilize any suitable image projection technology, such as CRT, LCD, LED, gas plasma, electroluminescence, and the like. A display controller 1049 may be provided for converting data stored in memory 1033 into text, graphics, and/or moving images for display on display 1047 (as needed). Examples of input devices 1043 include a keyboard, mouse, microphone, remote control device, buttons, joystick, trackball, touch pad, touch screen, light pen, and the like.
The various components of the receiving electronic device 1002 are coupled together by a bus system 1051, which bus system 1051 may include a power bus, a control signal bus, a status signal bus, and a data bus. However, for clarity, the various buses are illustrated in FIG. 10 as bus system 1051. The receiving electronic device 1002 shown in fig. 10 is a functional block diagram rather than a listing of specific components.
The term "computer-readable medium" refers to any available medium that can be accessed by a computer or processor. The term "computer-readable medium" as used herein may represent non-transitory and tangible computer and/or processor readable media. By way of example, and not limitation, computer-readable or processor-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer or processor. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and Blu-ray (registered trademark) disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.
It should be noted that one or more of the methods described herein may be implemented and/or carried out using hardware. For example, one or more of the methods or aspects described herein may be implemented and/or carried out using a chipset, an ASIC, an LSI, or an integrated circuit, etc.
Each method disclosed herein comprises one or more method steps or actions for achieving the described method. The method steps and/or actions may be interchanged with one another and/or combined into one step without departing from the scope of the claims. In other words, unless a specific step or action is required for proper operation of the described method, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims.
It is to be understood that the claims are not limited to the precise configuration and components illustrated above. Modifications may be made in the arrangement, operation and details of the systems, methods and apparatuses described herein without departing from the scope of the claims.

Claims (21)

HK16103766.8A2013-04-052014-04-02Decoding of inter-layer reference picture set and reference picture list constructionHK1215835A1 (en)

Applications Claiming Priority (5)

Application NumberPriority DateFiling DateTitle
JP13/857,9902013-04-05
US13/857,990US9532067B2 (en)2013-04-052013-04-05Decoding of inter-layer reference picture set and reference picture list construction
US201361818804P2013-05-022013-05-02
JP61/818,8042013-05-02
PCT/JP2014/001923WO2014162739A1 (en)2013-04-052014-04-02Decoding of inter-layer reference picture set and reference picture list construction

Publications (1)

Publication NumberPublication Date
HK1215835A1true HK1215835A1 (en)2016-09-15

Family

ID=51658047

Family Applications (1)

Application NumberTitlePriority DateFiling Date
HK16103766.8AHK1215835A1 (en)2013-04-052014-04-02Decoding of inter-layer reference picture set and reference picture list construction

Country Status (5)

CountryLink
EP (1)EP2982123A4 (en)
JP (1)JP2016519853A (en)
CN (1)CN105122816A (en)
HK (1)HK1215835A1 (en)
WO (1)WO2014162739A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR20130116782A (en)2012-04-162013-10-24한국전자통신연구원Scalable layer description for scalable coded video bitstream
WO2014050090A1 (en)*2012-09-302014-04-03Sharp Kabushiki KaishaSignaling scalability information in a parameter set
US9325997B2 (en)2012-11-162016-04-26Huawei Technologies Co., LtdSignaling scalability information in a parameter set
US9426468B2 (en)2013-01-042016-08-23Huawei Technologies Co., Ltd.Signaling layer dependency information in a parameter set
KR20150009424A (en)*2013-07-152015-01-26한국전자통신연구원Method and apparatus for image encoding and decoding using inter-layer prediction based on temporal sub-layer information
EP3831056B1 (en)2018-08-172024-11-27Huawei Technologies Co., Ltd.Reference picture management in video coding
CN118741157A (en)*2018-09-212024-10-01夏普株式会社 System and method for signaling reference pictures in video coding
KR20210055278A (en)*2019-11-072021-05-17라인플러스 주식회사Method and system for hybrid video coding
EP4070549A4 (en)*2019-12-312023-03-29Huawei Technologies Co., Ltd.Encoder, decoder and corresponding methods and apparatus

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
EP1806930A1 (en)*2006-01-102007-07-11Thomson LicensingMethod and apparatus for constructing reference picture lists for scalable video
US8170116B2 (en)*2006-03-272012-05-01Nokia CorporationReference picture marking in scalable video encoding and decoding
CN103024397B (en)*2013-01-072015-07-08华为技术有限公司Method and device for determining time domain motion vector predictor

Also Published As

Publication numberPublication date
JP2016519853A (en)2016-07-07
CN105122816A (en)2015-12-02
WO2014162739A1 (en)2014-10-09
EP2982123A4 (en)2016-09-07
EP2982123A1 (en)2016-02-10

Similar Documents

PublicationPublication DateTitle
US9532067B2 (en)Decoding of inter-layer reference picture set and reference picture list construction
US10104390B2 (en)Marking pictures for inter-layer prediction
HK1215835A1 (en)Decoding of inter-layer reference picture set and reference picture list construction
CN110708541B (en)Video decoding method, video decoder, apparatus and storage medium
CN105103561B (en) A method of encoding multi-layer video data, a method and device for decoding, and a non-transitory computer-readable storage medium storing instructions
KR101751153B1 (en)Apparatus, method, and non-transitory computer readable recording medium for decoding
JP2021517392A (en) Methods and equipment for video coding
US20170026655A1 (en)Parameter set signaling
US20170150160A1 (en)Bitstream partitions operation
US12238307B2 (en)Decoded picture buffer management for video coding
US12401814B2 (en)Signaling of output layer set for scalable video stream
WO2014162747A1 (en)Reference picture set signaling and restriction on an electronic device
KR102707343B1 (en) Signaling of timeout and completion data inputs in cloud workflows
US20160255353A1 (en)Highest temporal sub-layer list
JP7237410B2 (en) Method, Apparatus, and Computer Program for Efficient Signaling of Picture Size and Segmentation Information in Video Bitstreams
CN114600460A (en)System and method for decoding based on inferred set of video parameters
RU2787711C1 (en)Managing a buffer of decoded images for encoding video signals
EP2887673A1 (en)Method for coding a sequence of pictures and method for decoding a bitstream and corresponding devices
WO2015136945A1 (en)Systems and methods for constraining a bitstream

[8]ページ先頭

©2009-2025 Movatter.jp