Movatterモバイル変換


[0]ホーム

URL:


RFC 0000Video Codec Requirements and EvaluationFebruary 2020
Filippov, et al.Informational[Page]
Stream:
Internet Engineering Task Force (IETF)
RFC:
0000
Category:
Informational
Published:
ISSN:
2070-1721
Authors:
A. Filippov
Huawei Technologies
A. Norkin
Netflix
J.R. Alvarez
Huawei Technologies

RFC 0000

Video Codec Requirements and Evaluation Methodology

Abstract

This document provides requirements for a video codec designed mainly for use over the Internet. In addition, this document describes an evaluation methodology needed for measuring the compression efficiency to ensure whether the stated requirements are fulfilled or not.

Status of This Memo

This document is not an Internet Standards Track specification; it is published for informational purposes.

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are candidates for any level of Internet Standard; see Section 2 of RFC 7841.

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained athttps://www.rfc-editor.org/info/rfc0000.

Copyright Notice

Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

Table of Contents

1.Introduction

This document presents the requirements for a video codec designed mainly foruse over the Internet. The requirements encompass a wide rangeof applications that use data transmission over the Internet, includingInternet video streaming, IPTV, peer-to-peer video conferencing, videosharing, screencasting, game streaming, and video monitoring /surveillance. For each application, typical resolutions, frame rates, andpicture-access modes are presented.Specific requirements related to data transmission over packet-loss networksare considered, as well. In this document, when we discuss data-protectiontechniques, we only refer to methods designed and implemented to protect datainside the video codec, since there are many existing techniques that protectgeneric data transmitted over networks with packet losses. From thetheoretical point of view, both packet-loss and bit-error robustness can bebeneficial for video codecs. In practice, packet losses are a more significantproblem than bit corruption in IP networks. It is worth noting that there isan evident interdependence between possible amount of delay and the necessityof error-robust video streams:

o If an amount of delay is not crucial for an application, then reliabletransport protocols such as TCP that retransmit undelivered packets can beused to guarantee correct decoding of transmitted data.

o If the amount of delay must be kept low, then either data transmissionshould be error free (e.g., by using managed networks), or the compressedvideo stream should be error resilient.

Thus, error resilience can be useful for delay-critical applications toprovide low delay in a packet-loss environment.

2.Definitions and Abbreviations Used in This Document

High dynamic range imaging
A set of techniques that allow a greater dynamic range of exposures orvalues (i.e., a wider range of values between light and dark areas) than normaldigital imaging techniques. The intention is to accurately represent the widerange of intensity levels found in examples such as exterior scenes thatinclude light-colored items struck by direct sunlight and areas of deep shadow[HDR].
Random access period
The period of time between the two closest independently decodable frames(pictures).
RD-point
A point in a two-dimensional rate-distortion space where the values ofbitrate and quality metric are used as x- and y-coordinates, respectively
Visually lossless compression
A form or manner of lossy compression where the data that are lostafter the file is compressed and decompressed is not detectable to the eye;the compressed data appears identical to the uncompressed data[COMPRESSION].
Wide color gamut
A certain complete color subset (e.g., considered in[PARAMETER])that supports a wider range of colors (i.e., an extended range of colors thatcan be generated by a specific input or output device such as a video camera, monitor, or printer and can be interpreted by a color model) than conventionalcolor gamuts (e.g., considered in[STUDIO] or[HDTV]).
Table 1:Abbreviations used in the text of this document
AbbreviationMeaning
AIAll-Intra (each picture is intra-coded)
BD-RateBjontegaard Delta Rate
FIZDjust the First picture is Intra-coded, Zero structural Delay
GOPGroup of Picture
HBRHigh Bitrate Range
HDRHigh Dynamic Range
HRDHypothetical Reference Decoder
IPTVInternet Protocol Television
LBRLow Bitrate Range
MBRMedium Bitrate Range
MOSMean Opinion Score
MS-SSIMMulti-Scale Structural Similarity quality index
PAMPicture Access Mode
PSNRPeak Signal-to-Noise Ratio
QoSQuality of Service
QPQuantization Parameter
RARandom Access
RAPRandom Access Period
RDRate-Distortion
SEISupplemental Enhancement Information
SNRSignal-to-Noise Ratio
UGCUser-Generated Content
VDIVirtual Desktop Infrastructure
VUIVideo Usability Information
WCGWide Color Gamut

3.Applications

In this section, an overview of video codec applications that are currentlyavailable on the Internet market is presented. It is worth noting that thereare different use cases for each application that define a target platform,and hence there are different types of communication channels involved (e.g.,wired or wireless channels), which are characterized by different qualities ofservice as well as bandwidth; for instance, wired channels are considerablymore free from error than wireless channels and therefore require different QoSapproaches. The target platform, the channel bandwidth, and the channel qualitydetermine resolutions, frame rates, and quality (bitrates) for encoding ordecoding video streams. By default, color format YCbCr 4:2:0 is assumed forthe application scenarios listed below.

3.1.Internet Video Streaming

Typical content for this application is movies, TV shows, andanimation. Internet video streaming uses a variety of client devices and hasto operate under changing network conditions. For this reason, an adaptivestreaming model has been widely adopted. Video material is encoded atdifferent quality levels and different resolutions, which are then chosen by aclient depending on its capabilities and current network bandwidth. An examplecombination of resolutions and bitrates is shown inTable 2.

A video encoding pipeline in on-demand Internet video streaming typically operates as follows:

Table 2:Internet video streaming: typical values of resolutions, frame rates, and RAPs
Resolution *Frame rate, fpsPAM
4K, 3840x216024/1.001, 24, 25, 30/1.001, 30, 50, 60/1.001, 60, 100, 120/1.001, 120. The set of frame rates presented in this table is taken from Table 1 in[PARAMETER]RA
2K (1080p), 1920x1080RA
1080i, 1920x1080*RA
720p, 1280x720RA
576p (EDTV), 720x576RA
576i (SDTV), 720x576*RA
480p (EDTV), 720x480RA
480i (SDTV), 720x480*RA
512x384RA
QVGA, 320x240RA

*NB: Interlaced content can be handled at the higher system level and not necessarily by using specialized video coding tools. It is included in this table only for the sake of completeness, as most video content today is in the progressive format.

The characteristics and requirements of this application scenario are as follows:

3.2.Internet Protocol Television (IPTV)

This is a service for delivering television content over IP-based networks. IPTV may be classified into two main groups based on the type of delivery, as follows:

In the IPTV scenario, traffic is transmitted over managed (QoS- based)networks. Typical content used in this application is news, movies, cartoons,series, TV shows, etc. One important requirement for both groups is randomaccess to pictures -- i.e., random access period (RAP) -- should be kept smallenough (approximately, 1-5 seconds). Optional requirements are as follows:

For this application, typical values of resolutions, frame rates, and RAPsare presented inTable 3.

Table 3:IPTV: typical values of resolutions, frame rates, and RAPs
Resolution *Frame rate, fpsPAM
2160p (4K),3840x216024/1.001, 24, 25, 30/1.001, 30, 50, 60/1.001, 60, 100, 120/1.001, 120. The set of frame rates presented in this table is taken from Table 2 in[PARAMETER]RA
1080p, 1920x1080RA
1080i, 1920x1080*RA
720p, 1280x720RA
576p (EDTV), 720x576RA
576i (SDTV), 720x576*RA
480p (EDTV), 720x480RA
480i (SDTV), 720x480*RA

*NB: Interlaced content can be handled at the higher system level and not necessarily by using specialized video coding tools. It is included in this table only for the sake of completeness, as most video content today is in a progressive format.

3.3.Video Conferencing

This is a form of video connection over the Internet. This form allowsusers to establish connections to two or more people by two- way video andaudio transmission for communication in real time. For this application, bothstationary and mobile devices can be used. The main requirements are asfollows:

Support of resolution and quality (SNR) scalability is highlydesirable. For this application, typical values of resolutions, frame rates,and RAPs are presented inTable 4.

Table 4:Video conferencing: typical values of resolutions, frame rates, and RAPs
ResolutionFrame rate, fpsPAM
1080p, 1920x108015, 30FIZD
720p, 1280x72030, 60FIZD
4CIF, 704x57630, 60FIZD
4SIF, 704x48030, 60FIZD
VGA, 640x48030, 60FIZD
360p, 640x36030, 60FIZD

3.4.Video Sharing

This is a service that allows people to upload and share video data (usinglive streaming or not) and watch those videos. It is also known as video hosting. Atypical User-Generated Content (UGC) scenario for this application is tocapture video using mobile cameras such as GoPros or cameras integrated intosmartphones (amateur video). The main requirements are as follows:

Support of resolution and quality (SNR) scalability is highlydesirable. For this application, typical values of resolutions, frame rates,and RAPs are presented inTable 5.

Table 5:Video sharing: typical values of resolutions, frame ratesYOUTUBE, and RAPs
ResolutionFrame rate, fpsPAM
2160p (4K),3840x216024, 25, 30, 48, 50, 60RA
1440p (2K),2560x144024, 25, 30, 48, 50, 60RA
1080p, 1920x108024, 25, 30, 48, 50, 60RA
720p, 1280x72024, 25, 30, 48, 50, 60RA
480p, 854x48024, 25, 30, 48, 50, 60RA
360p, 640x36024, 25, 30, 48, 50, 60RA

3.5.Screencasting

This is a service that allows users to record and distributecomputer-desktop screen output. This service requires efficient compression ofcomputer-generated content with high visual quality up to visually andmathematically (numerically) lossless[HEVC-EXT]. Currently, this applicationincludes business presentations (PowerPoint, Word documents, email messages,etc.), animation (cartoons), gaming content, and data visualization -- i.e., thetype of content that is characterized by fast motion, rotation, smooth shade,3D effect, highly saturated colors with full resolution, clear textures,sharp edges with distinct colors[HEVC-EXT]), virtual desktopinfrastructure (VDI),screen/desktop sharing and collaboration, supervisory control and dataacquisition (SCADA) display, automotive/navigation display, cloud gaming, factory automationdisplay, wireless display, display wall,digital operating room (DiOR), etc. For this application, an importantrequirement is the support of low-delay configurations with zero structuraldelay for a wide range of video formats (e.g., RGB) in addition to YCbCr 4:2:0and YCbCr 4:4:4[HEVC-EXT]. For this application, typical values of resolutions,frame rates, and RAPs are presented inTable 6.

Table 6:Screencasting for RGB and YCbCr 4:4:4 format: typical values of resolutions, frame rates, and RAPs
ResolutionFrame rate, fpsPAM
Input color format: RGB 4:4:4
5k, 5120x288015, 30, 60AI, RA, FIZD
4k, 3840x216015, 30, 60AI, RA, FIZD
WQXGA, 2560x160015, 30, 60AI, RA, FIZD
WUXGA, 1920x120015, 30, 60AI, RA, FIZD
WSXGA+, 1680x105015, 30, 60AI, RA, FIZD
WXGA, 1280x80015, 30, 60AI, RA, FIZD
XGA, 1024x76815, 30, 60AI, RA, FIZD
SVGA, 800x60015, 30, 60AI, RA, FIZD
VGA, 640x48015, 30, 60AI, RA, FIZD
Input color format: YCbCr 4:4:4
5k, 5120x288015, 30, 60AI, RA, FIZD
4k, 3840x216015, 30, 60AI, RA, FIZD
1440p (2K), 2560x144015, 30, 60AI, RA, FIZD
1080p, 1920x108015, 30, 60AI, RA, FIZD
720p, 1280x72015, 30, 60AI, RA, FIZD

3.6.Game Streaming

This is a service that provides game content over the Internet to differentlocal devices such as notebooks and gaming tablets. In this category ofapplications, the server renders 3D games in a cloud server and streams the game toany device with a wired or wireless broadband connection[GAME]. There are low-latency requirements for transmitting user interactions andreceiving game data with a turnaround delay of less than 100 ms. This allowsanyone to play (or resume) full-featured games from anywhere in the Internet[GAME]. An example of this application is Nvidia Grid[GAME]. Another category application is broadcast of video gamesplayed by people over the Internet in real time or for later viewing[GAME]. There are many companies, such as Twitch and YY in China, that enablegame broadcasting[GAME]. Games typically contain a lot ofsharp edges and large motion[GAME]. The main requirements areas follows:

Support of resolution and quality (SNR) scalability is highlydesirable. For this application, typical values of resolutions, frame rates,and RAPs are similar to ones presented inTable 4.

3.7.Video Monitoring / Surveillance

This is a type of live broadcasting over IP-based networks. Video streamsare sent to many receivers at the same time. A new receiver may connect to thestream at an arbitrary moment, so the random access period should be kept smallenough (approximately, ~1-5 seconds). Data are transmitted publicly in thecase of video monitoring and privately in the case of video surveillance,respectively. For IP cameras that have to capture, process, and encode videodata, complexity -- including computational and hardware complexity, as well asmemory bandwidth -- should be kept low to allow real-time processing. Inaddition, support of a high dynamic range and a monochrome mode (e.g., forinfrared cameras) as well as resolution and quality (SNR) scalability is anessential requirement for video surveillance. In some use cases, highvideo-signal fidelity is required even after lossy compression. Typical valuesofresolutions, frame rates, and RAPs for video monitoring / surveillanceapplications are presented inTable 7.

Table 7:Video monitoring / surveillance: typical values of resolutions, frame rates, and RAPs
ResolutionFrame rate, fpsPAM
2160p (4K),3840x216012, 25, 30RA, FIZD
5Mpixels, 2560x192012, 25, 30RA, FIZD
1080p, 1920x108025, 30RA, FIZD
1.3Mpixels, 1280x96025, 30RA, FIZD
720p, 1280x72025, 30RA, FIZD
SVGA, 800x60025, 30RA, FIZD

4.Requirements

Taking the requirements discussed above for specific video applications,this section proposes requirements for an Internet video codec.

4.1.General Requirements

4.1.1.Efficiency

The most basic requirement is coding efficiency -- i.e., compressionperformance on both "easy" and "difficult" content for applications and use cases inSection 2. The codec should provide higher coding efficiency overstate-of-the-art video codecs such as HEVC/H.265 and VP9 by at least 25%, inaccordance with the methodology described inSection 4.1 of this document. Forhigher resolutions, the coding efficiency improvements are expected to behigher than for lower resolutions.

4.1.2.Specification and Profiles

Good-quality specification and well-defined profiles and levels arerequired to enable device interoperability and facilitate decoderimplementations. A profile consists of a subset of entire bitstream syntaxelements, and consequently it also defines the necessary tools for decoding aconforming bitstream of that profile. A level imposes a set of numericallimits to the values of some syntax elements. An example of codec levels to besupported is presented inTable 8. An actual leveldefinition should include constraints on features that impact the decodercomplexity. For example, these features might be as follows: maximum bitrate,line buffer size, memory usage, etc.

Table 8:Codec levels
LevelExample picture resolution at highest frame rate
1128x96(12,288*)@30.0
176x144(25,344*)@15.0
2352x288(101,376*)@30.0
3352x288(101,376*)@60.0
640x360(230,400*)@30.0
4640x360(230,400*)@60.0
960x540(518,400*)@30.0
5720x576(414,720*)@75.0
960x540(518,400*)@60.0
1280x720(921,600*)@30.0
61,280x720(921,600*)@68.0
2,048x1,080(2,211,840*)@30.0
71,280x720(921,600*)@120.0
81,920x1,080(2,073,600*)@120.0
3,840x2,160(8,294,400*)@30.0
4,096x2,160(8,847,360*)@30.0
91,920x1,080(2,073,600*)@250.0
4,096x2,160(8,847,360*)@60.0
101,920x1,080(2,073,600*)@300.0
4,096x2,160(8,847,360*)@120.0
113,840x2,160(8,294,400*)@120.0
8,192x4,320(35,389,440*)@30.0
123,840x2,160(8,294,400*)@250.0
8,192x4,320(35,389,440*)@60.0
133,840x2,160(8,294,400*)@300.0
8,192x4,320(35,389,440*)@120.0

*NB: The quantities of pixels are presented for applications in which a picture can have an arbitrary size (e.g., screencasting)

4.1.3.Bitstream Syntax

Bitstream syntax should allow extensibility and backwardcompatibility. New features can be supported easily by using metadata (such asSEI messages, VUI, and headers) without affecting the bitstream compatibilitywith legacy decoders. A newer version of the decoder shall be able to playbitstreams of an older version of the same or lower profile and level.

4.1.4.Bitstream Model

A bitstream should have a model that allows easy parsing andidentification of the sample components (such as Annex B of[ADVANCED] or[NAL]). In particular, informationneeded for packet handling(e.g., frame type) should not require parsing anything below the headerlevel.

4.1.5.Perceptual-Quality Tools

Perceptual-quality tools (such as adaptive QP and quantization matrices) should be supported by the codec bitstream.

4.1.6.Buffer Model

The codec specification shall define a buffer model such as hypothetical reference decoder (HRD).

4.1.7.Integration

Specifications providing integration with system and delivery layers should be developed.

4.2.Basic Requirements

4.2.1.Input Source Formats

  • Bit depth: 8 and 10 bits (up to 12 bits for a high profile) per color component;
  • Color sampling formats:

    • YCbCr 4:2:0;
    • YCbCr 4:4:4, YCbCr 4:2:2, and YCbCr 4:0:0 (preferably in different profile(s)).
  • For profiles with bit depth of 10 bits per sample or higher, support of high dynamic range and wide color gamut.
  • Support of arbitrary resolution according to the level constraints forapplications in which a picture can have an arbitrary size (e.g., in screencasting).
  • Exemplary input source formats for codec profiles are shown inTable 9.
Table 9:Exemplary input source formats for codec profiles
ProfileBit depths per color componentColor sampling formats
18 and 104:0:0 and 4:2:0
28 and 104:0:0, 4:2:0 and 4:4:4
38, 10 and 124:0:0, 4:2:0, 4:2:2 and 4:4:4

4.2.2.Coding Delay

  • Support of configurations with zero structural delay, also referred toas "low-delay" configurations.

    • Note 1: End-to-end delay should be no more than 320 ms[QoE], butit is preferable for its value to be less than 100 ms[SG-16].
  • Support of efficient random access point encoding (such as intracoding andresetting of context variables), as well as efficient switching betweenmultiple quality representations.
  • Support of configurations with nonzero structural delay (such asout-of-order or multipass encoding) for applications without low-delayrequirements, if such configurations provide additional compression efficiencyimprovements.

4.2.3.Complexity

  • Feasible real-time implementation of both an encoder and a decoder supporting a chosen subset of tools for hardware and software implementation on a wide range of state-of-the-art platforms. The real-time encoder tools subset should provide meaningful improvement in compression efficiency at reasonable complexity of hardware and software encoder implementations as compared to real-time implementations of state-of-the-art video compression technologies such as HEVC/H.265 and VP9.
  • High-complexity software encoder implementations used by offline encoding applications can have 10x or more complexity increase compared to state-of-the-art video compression technologies such as HEVC/H.265 and VP9.

4.2.4.Scalability

  • Temporal (frame-rate) scalability should be supported.

4.2.5.Error Resilience

  • Tools that are complementary to the error-protectionmechanisms implemented on the transport level should be supported.
  • The codec should support mechanisms that facilitate packetization of a bitstream for common network protocols.
  • Packetization mechanisms should enable frame-level error recovery by means of retransmission or error concealment.
  • The codec should support effective mechanisms for allowing decoding and reconstruction of significant parts of pictures in the event that parts of the picture data are lost in transmission.
  • The bitstream specification shall support independently decodable subframeunits similar to slices or independent tiles. It shall be possible for theencoder to restrict the bitstream to allow parsing of the bitstream after apacket loss and communication of it to the decoder.

4.3.Optional Requirements

4.3.1.Input Source Formats

  • Bit depth: up to 16 bits per color component.
  • Color sampling formats: RGB 4:4:4.
  • Auxiliary channel (e.g., alpha channel) support.

4.3.2.Scalability

  • Resolution and quality (SNR) scalability that provides a low-compressionefficiency penalty (increase of up to 5% of BD-rate[PSNR] perlayer with reasonable increase of both computational and hardware complexity)can be supported in the main profile of the codec being developed by the NETVCWorking Group. Otherwise, a separate profile is needed to support these types ofscalability.
  • Computational complexity scalability (i.e., computational complexity isdecreasing along with degrading picture quality) is desirable.

4.3.3.Complexity

Tools that enable parallel processing (e.g., slices, tiles, and wave-frontpropagation processing) at both encoder and decoder sides are highly desirablefor many applications.

  • High-level multicore parallelism: encoder and decoder operation,especially entropy encoding and decoding, should allow multiple frames orsubframe regions (e.g., 1D slices, 2D tiles, or partitions) to be processedconcurrently, either independently or with deterministic dependencies that canbe efficiently pipelined.
  • Low-level instruction-set parallelism: favor algorithms that are SIMD/GPUfriendly over inherently serial algorithms

4.3.4.Coding Efficiency

Compression efficiency on noisy content, content with film grain, computergenerated content, and low resolution materials is desirable.

5.Evaluation Methodology

As shown inFigure 1, compression-performance testing isperformed in 3 overlapped ranges that encompass 10 different bitrate values:

Initially, for the codec selected as a reference one (e.g., HEVC or VP9), aset of 10 QP (quantization parameter) values should be specified as in[TESTING], and corresponding quality values should be calculated. InFigure 1, QP and quality values are denoted as QP0, QP1, QP2,..., QP8, QP9 andQ0, Q1, Q2,..., Q8, and Q9, respectively. To guarantee the overlaps of qualitylevels between the bitrate ranges of the reference and tested codecs, aquality-alignment procedure should be performed for each range's outermost(left- and rightmost) quality levels Qk of the reference codec (i.e., for Q0,Q3, Q6, and Q9) and the quality levels Q'k (i.e., Q'0, Q'3, Q'6, and Q'9) ofthe tested codec. Thus, these quality levels Q'k and, hence, the correspondingQP value QP'k (i.e., QP'0, QP'3, QP'6, and QP'9) of the tested codec should beselected using the following formulas:

Q'k =   min { abs(Q'i - Qk) },      i in RQP'k = argmin { abs(Q'i(QP'i) - Qk(QPk)) },       i in R

where R is the range of the QP indexes of the tested codec -- i.e., thecandidate Internet video codec. The inner quality levels (i.e., Q'1, Q'2, Q'4,Q'5, Q'7, and Q'8), as well as their corresponding QP values of each range(i.e., QP'1, QP'2, QP'4, QP'5, QP'7, and QP'8), should be as equidistantlyspaced as possible between the left- and rightmost quality levels withoutexplicitly mapping their values using the procedure described above.

QP'9 QP'8  QP'7 QP'6 QP'5 QP'4 QP'3 QP'2 QP'1 QP'0 <+----- ^     ^    ^    ^    ^    ^    ^    ^    ^    ^    | Tested |     |    |    |    |    |    |    |    |    |    | codecQ'0   Q'1  Q'2  Q'3  Q'4  Q'5  Q'6  Q'7  Q'8  Q'9  <+----- ^               ^              ^              ^ |               |              |              |Q0    Q1    Q2   Q3   Q4   Q5   Q6   Q7   Q8   Q9  <+----- ^    ^     ^    ^    ^    ^    ^    ^    ^    ^    | Reference |    |     |    |    |    |    |    |    |    |    | codecQP9  QP8   QP7  QP6  QP5  QP4  QP3  QP2  QP1  QP0  <+-----+----------------+--------------+--------------+--------->^                ^              ^              ^     Bitrate|-------LBR------|              |-----HBR------|                 ^              ^                 |------MBR-----|
Figure 1:Quality/QP alignment for compression performance evaluation

Since the QP mapping results may vary for different sequences, this qualityalignment procedure eventually needs to be performed separately for each qualityassessment index and each sequence used for codec-performance evaluation tofulfill the requirements described above.

To assess the quality of output (decoded) sequences, two indexes areseparately computed, PSNR[IMAGE] and MS-SSIM[IMAGE][MULTI-SCALE]. In the caseof the YCbCr color format, PSNR should be calculated for each color plane,whereas MS-SSIM is calculated for the luma channel only. In the case of the RGBcolor format, both metrics are computed for R, G, and B channels. Thus, foreach sequence, 30 RD-points for PSNR (i.e., three RD-curves, one for eachchannel) and 10 RD-points for MS-SSIM (i.e., one RD-curve, for luma channelonly) should be calculated in the case of YCbCr. If content is encoded as RGB,60 RD-points (30 for PSNR and 30 for MS-SSIM) should be calculated -- i.e., threeRD- curves (one for each channel) are computed for PSNR as well as threeRD-curves (one for each channel) for MS-SSIM.

Finally, to obtain an integral estimation, BD-rate savings[PSNR] should becomputed for each range and each quality index. In addition, average valuesover all 3 ranges should be provided for both PSNR and MS-SSIM. A list ofvideo sequences that should be used for testing are defined in[TESTING], as are the 10 QP valuesfor the reference codec. Testing processes should use theinformation on the codec applications presented in this document. As thereference for evaluation, state-of-the-art video codecs such as HEVC/H.265[HETEROGENEOUS][CODING] or VP9 must be used. The reference sourcecode of the HEVC/H.265 codeccan be found at[HEVC]. The HEVC/H.265 codec must be configuredaccording to[CONDITIONS]andTable 10.

Table 10:Intraperiods for different HEVC/H.265 encoding modes according to [CONDITIONS]
Intra-period, secondHEVC/H.265 encoding mode according to[CONDITIONS]
AIIntra Main or Intra Main10
RARandom access Main or
Random access Main10
FIZDLow delay Main or
Low delay Main10

According to the coding efficiency requirement described in Section 3.1.1,BD-rate savings calculated for each color plane and averaged for all the videosequences used to test the NETVC codec should be, at least,

Since values of the two objective metrics (PSNR and MS-SSIM) are availablefor some color planes, each value should meet these coding efficiencyrequirements. That is, the final BD-rate saving denoted as S is calculated fora given color plane as follows:

S = min { S_psnr, S_ms-ssim },

where S_psnr and S_ms-ssim are BD-rate savings calculated for the given color plane using PSNR and MS-SSIM metrics, respectively.

In addition to the objective quality measures defined above, subjectiveevaluation must also be performed for the final NETVC codec adoption. Forsubjective tests, the MOS-based evaluation procedure must be used as describedin Section 2.1 of[IMAGE]. For perception-oriented tools that primarily impact subjective quality, additional tests may also be individually assigned even for intermediate evaluation, subject to a decision of the NETVC WG.

6.Security Considerations

This document itself does not address any security considerations. However,it is worth noting that a codec implementation (for both an encoder and adecoder) should take into consideration the worst-case computationalcomplexity, memory bandwidth, and physical memory size needed to process thepotentially untrusted input (e.g., the decoded pictures used as references).

7.IANA Considerations

This document has no IANA actions.

8.References

8.1.Normative References

[CODING]
ITU-T,"High efficiency video coding",ITU-T Recommendation H.265,,<https://www.itu.int/rec/T-REC-H.265>.
[HETEROGENEOUS]
ISO,"Information technology -- High efficiency coding and media delivery in heterogeneous environments -- Part 2: High efficiency video coding",ISO/IEC 23008-2:2015,,<https://www.iso.org/standard/67660.html>.
[HEVC]
Fraunhofer Institute for Telecommunications,"High Efficiency Video Coding (HEVC) reference software (HEVC Test Model also known as HM)",<https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/>.
[IMAGE]
ISO,"Information technology -- Advanced image coding and evaluation -- Part 1: Guidelines for image coding system evaluation",ISO/IEC TR 29170-1:2017,,<https://www.iso.org/standard/63637.html>.
[PARAMETER]
ITU-R,"Parameter values for ultra-high definition television systems for production and international programme exchange",ITU-R Recommendation BT.2020-2,,<https://www.itu.int/rec/R-REC-BT.2020-2-201510-I/en>.
[QoE]
ITU-T,"Quality of Experience requirements for telepresence services",ITU-T Recommendation G.1091,,<https://www.itu.int/rec/T-REC-G.1091/en>.

8.2.Informative References

[ADVANCED]
ISO/IEC,"Information technology -- Coding of audio-visual objects -- Part 10: Advanced video coding",ISO/IEC DIS 14496-10,<https://www.iso.org/standard/75400.html>.
[COMPRESSION]
Federal Agencies Digital Guidelines Initiative,"Term: Compression, visually lossless",<http://www.digitizationguidelines.gov/term.php?term=compressionvisuallylossless>.
[CONDITIONS]
Bossen, F.,"Common HM test conditions and software reference configurations",Joint Collaborative Team on Video Coding (JCT-VC) of the ITU-T Video Coding Experts Group (ITU-T Q.6/SG 16) and ISO/IEC Moving Picture Experts Group (ISO/IEC JTC 1/SC 29/WG 11),Document JCTVC-L1100,,<http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=7281>.
[GAME]
Parhy, M.,"Game streaming requirement for Future Video Coding",ISO/IEC JTC 1/SC 29/WG 11 Moving Picture Experts Group N36771,Warsaw, Poland,.
[HDR]
Federal Agencies Digital Guidelines Initiative,"Term: High dynamic range imaging",<http://www.digitizationguidelines.gov/term.php?term=highdynamicrangeimaging>.
[HDTV]
ITU-R,"Parameter values for the HDTV standards for production and international programme exchange",ITU-R Recommendation BT.709,,<https://www.itu.int/rec/R-REC-BT.709>.
[HEVC-EXT]
Yu, H., Ed., McCann, K., Ed., Cohen, R., Ed., and P. Amon, Ed.,"Requirements for an extension of HEVC for coding of screen content",ISO/IEC JTC 1/SC 29/WG 11 Moving Picture Experts Group MPEG2013/N14174,San Jose, USA,,<https://mpeg.chiariglione.org/standards/mpeg-h/high-efficiency-video-coding/requirements-extension-hevc-coding-screen-content>.
[MULTI-SCALE]
Wang, Z., Simoncelli, E.P., and A.C. Bovik,"Multiscale structural similarity for image quality assessment",IEEE Thirty-Seventh Asilomar Conference on Signals, Systems and Computers,DOI 10.1109/ACSSC.2003.1292216,,<https://ieeexplore.ieee.org/document/1292216>.
[NAL]
ISO/IEC,"Information technology - Coding of audio-visual objects - Part 15: Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format",ISO/IEC 14496-15,<https://www.iso.org/standard/74429.html>.
[PSNR]
Bjontegaard, G.,"Calculation of average PSNR differences between RD-curves",SG 16 VCEG-M33,,<https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/>.
[SG-16]
Wenger, S.,"The case for scalability support in version 1 of Future Video Coding",SG 16 (Study Period 2013) Contribution 988,,<https://www.itu.int/md/T13-SG16-C-0988/en>.
[STUDIO]
ITU-R,"Studio encoding parameters of digital television for standard 4:3 and wide screen 16:9 aspect ratios",ITU-R Recommendation BT.601,,<https://www.itu.int/rec/R-REC-BT.601/>.
[TESTING]
Daede, T., Norkin, A., and I. Brailovskiy,"Video Codec Testing and Quality Measurement",Work in Progress,Internet-Draft, draft-ietf-netvc-testing-09,,<https://tools.ietf.org/html/draft-ietf-netvc-testing-09>.
[YOUTUBE]
YouTube,"Recommended upload encoding settings",<https://support.google.com/youtube/answer/1722171?hl=en>.

Acknowledgments

The authors would like to thank Mr. Paul Coverdale, Mr. VasilyRufitskiy, and Dr. Jianle Chen for many useful discussions on thisdocument and their help while preparing it, as well as Mr. Mo Zanaty,Dr. Minhua Zhou, Dr. Ali Begen, Mr. Thomas Daede, Mr. Adam Roach,Dr. Thomas Davies, Mr. Jonathan Lennox, Dr. Timothy Terriberry,Mr. Peter Thatcher, Dr. Jean-Marc Valin, Mr. Roman Danyliw, Mr. JackMoffitt, Mr. Greg Coppa, and Mr. Andrew Krupiczka for their valuablecomments on different revisions of this document.

Authors' Addresses

Alexey Filippov
Huawei Technologies
Email:alexey.filippov@huawei.com
Andrey Norkin
Netflix
Email:anorkin@netflix.com
Jose Roberto Alvarez
Huawei Technologies
Email:j.alvarez@ieee.org

[8]ページ先頭

©2009-2026 Movatter.jp