Movatterモバイル変換

RFC 8761	Video Codec Requirements and Evaluation	April 2020
Filippov, et al.	Informational	[Page]

3.Applications

In this section, an overview of video codec applications that are currentlyavailable on the Internet market is presented. It is worth noting that thereare different use cases for each application that define a target platform;hence, there are different types of communication channels involved (e.g.,wired or wireless channels) that are characterized by different QoSas well as bandwidth; for instance, wired channels are considerablymore free from error than wireless channels and therefore require different QoSapproaches. The target platform, the channel bandwidth, and thechannel quality determine resolutions, frame rates, and either quality orbitrates for video streams to be encoded or decoded.By default, color format YCbCr 4:2:0 is assumed forthe application scenarios listed below.¶

3.1.Internet Video Streaming

Typical content for this application is movies, TV series and shows, andanimation. Internet video streaming uses a variety of client devices and hasto operate under changing network conditions. For this reason, an adaptivestreaming model has been widely adopted. Video material is encoded atdifferent quality levels and different resolutions, which are then chosen by aclient depending on its capabilities and current network bandwidth. An examplecombination of resolutions and bitrates is shown inTable 1.¶

A video encoding pipeline in on-demand Internet video streaming typically operates as follows:¶

Video is encoded in the cloud by software encoders.¶
Source video is split into chunks, each of which is encoded separately, in parallel.¶
Closed-GOP encoding with intrapicture intervals of 2-5seconds (or longer) is used.¶
Encoding is perceptually optimized. Perceptual quality is important and should be considered during the codec development.¶

Table 1:Internet Video Streaming: Typical Values of Resolutions, Frame Rates, and PAMs
Resolution *	PAM	Frame Rate, FPS **
4K, 3840x2160	RA	24/1.001, 24, 25, 30/1.001, 30, 50, 60/1.001, 60, 100, 120/1.001, 120¶
2K (1080p), 1920x1080	RA
1080i, 1920x1080*	RA
720p, 1280x720	RA
576p (EDTV), 720x576	RA
576i (SDTV), 720x576*	RA
480p (EDTV), 720x480	RA
480i (SDTV), 720x480*	RA
512x384	RA
QVGA, 320x240	RA

*Note: Interlaced content can be handled at the higher system level and not necessarily by using specialized video coding tools. It is included in this table only for the sake of completeness, as most video content today is in the progressive format.¶

**Note: The set of frame rates presented in this table is taken from Table 2 in[1].¶

The characteristics and requirements of this application scenario are as follows:¶

High encoder complexity (up to 10x and more) can be tolerated sinceencoding happens once and in parallel for different segments.¶
Decoding complexity should be kept at reasonable levels to enable efficient decoder implementation.¶
Support and efficient encoding of a wide range of content types and formats is required:¶
- High Dynamic Range (HDR), Wide Color Gamut (WCG), high-resolution(currently, up to 4K), and high-frame-rate content are important use cases; thecodec should be able to encode such content efficiently.¶
- Improvement of coding efficiency at both lower and higher resolutions isimportant since low resolutions are used when streaming in low-bandwidthconditions.¶
- Improvement on both "easy" and "difficult" content in terms of compression efficiency at the same quality level contributes to the overall bitrate/storage savings.¶
- Film grain (and sometimes other types of noise) is often present in moviesand similar content; this is usually part of the creative intent.¶
Significant improvements in compression efficiency between generations ofvideo standards are desirable since this scenario typically assumes long-termsupport of legacy video codecs.¶
Random access points are inserted frequently (one per 2-5 seconds) to enable switching between resolutions and fast-forward playback.¶
The elementary stream should have a model that allows easy parsing andidentification of the sample components.¶
Middle QP values are normally used in streaming; this is also the rangewhere compression efficiency is important for this scenario.¶
Scalability or other forms of supporting multiple quality representationsare beneficial if they do not incur significant bitrate overhead and ifmandated in the first version.¶

3.2.Internet Protocol Television (IPTV)

This is a service for delivering television content over IP-based networks. IPTV may be classified into two main groups based on the type of delivery, as follows:¶

unicast (e.g., for video on demand), where delay is not crucial; and¶
multicast/broadcast (e.g., for transmitting news) wherezapping (i.e., stream changing) delay is important.¶

In the IPTV scenario, traffic is transmitted over managed (QoS-based)networks. Typical content used in this application is news, movies, cartoons,series, TV shows, etc. One important requirement for both groups is that randomaccess to pictures (i.e., the random access period (RAP)) should be kept smallenough (approximately 1-5 seconds). Optional requirements are as follows:¶

Temporal (frame-rate) scalability; and¶
Resolution and quality (SNR) scalability.¶

For this application, typical values of resolutions, frame rates, and PAMsare presented inTable 2.¶

Table 2:IPTV: Typical Values of Resolutions, Frame Rates, and PAMs
Resolution *	PAM	Frame Rate, FPS **
2160p (4K), 3840x2160	RA	24/1.001, 24, 25, 30/1.001, 30, 50, 60/1.001, 60, 100, 120/1.001, 120¶
1080p, 1920x1080	RA
1080i, 1920x1080*	RA
720p, 1280x720	RA
576p (EDTV), 720x576	RA
576i (SDTV), 720x576*	RA
480p (EDTV), 720x480	RA
480i (SDTV), 720x480*	RA

**Note: The set of frame rates presented in this table is takenfrom Table 2 in[1].¶

3.3.Video Conferencing

This is a form of video connection over the Internet. This form allowsusers to establish connections to two or more people by two- way video andaudio transmission for communication in real time. For this application, bothstationary and mobile devices can be used. The main requirements are asfollows:¶

Delay should be kept as low as possible (the preferable and maximumend-to-end delay values should be less than 100 ms[9] and 320 ms[2], respectively);¶
Temporal (frame-rate) scalability; and¶
Error robustness.¶

Support of resolution and quality (SNR) scalability is highlydesirable. For this application, typical values of resolutions, frame rates,and PAMs are presented inTable 3.¶

Table 3:Video Conferencing: Typical Values of Resolutions, Frame Rates, and PAMs
Resolution	Frame Rate, FPS	PAM
1080p, 1920x1080	15, 30	FIZD
720p, 1280x720	30, 60	FIZD
4CIF, 704x576	30, 60	FIZD
4SIF, 704x480	30, 60	FIZD
VGA, 640x480	30, 60	FIZD
360p, 640x360	30, 60	FIZD

3.4.Video Sharing

This is a service that allows people to upload and share video data (usinglive streaming or not) and watch those videos. It is also known as video hosting. Atypical User-Generated Content (UGC) scenario for this application is tocapture video using mobile cameras such as GoPros or cameras integrated intosmartphones (amateur video). The main requirements are as follows:¶

Random access to pictures for downloaded video data;¶
Temporal (frame-rate) scalability; and¶
Error robustness.¶

Support of resolution and quality (SNR) scalability is highlydesirable. For this application, typical values of resolutions, frame rates,and PAMs are presented inTable 4.¶

Typical values of resolutions and frame rates inTable 4 are taken from[10].¶

Table 4:Video Sharing: Typical Values of Resolutions, Frame Rates, and PAMs
Resolution	Frame Rate, FPS	PAM
2160p (4K), 3840x2160	24, 25, 30, 48, 50, 60	RA
1440p (2K), 2560x1440	24, 25, 30, 48, 50, 60	RA
1080p, 1920x1080	24, 25, 30, 48, 50, 60	RA
720p, 1280x720	24, 25, 30, 48, 50, 60	RA
480p, 854x480	24, 25, 30, 48, 50, 60	RA
360p, 640x360	24, 25, 30, 48, 50, 60	RA

3.5.Screencasting

This is a service that allows users to record and distribute video data from a computer screen. This service requires efficient compression ofcomputer-generated content with high visual quality up to visually andmathematically (numerically) lossless[11]. Currently, this applicationincludes business presentations (PowerPoint, Word documents, email messages,etc.), animation (cartoons), gaming content, and data visualization. Thistype of content is characterized by fast motion, rotation, smooth shade,3D effect, highly saturated colors with full resolution, clear textures andsharp edges with distinct colors[11], virtual desktopinfrastructure (VDI),screen/desktop sharing and collaboration, supervisory control and dataacquisition (SCADA) display, automotive/navigation display, cloud gaming, factory automationdisplay, wireless display, display wall,digital operating room (DiOR), etc. For this application, an importantrequirement is the support of low-delay configurations with zero structuraldelay for a wide range of video formats (e.g., RGB) in addition to YCbCr 4:2:0and YCbCr 4:4:4[11]. For this application, typical values of resolutions,frame rates, and PAMs are presented inTable 5.¶

Table 5:Screencasting for RGB and YCbCr 4:4:4 Format: Typical Values of Resolutions, Frame Rates, and PAMs
Resolution	Frame Rate, FPS	PAM
Input color format: RGB 4:4:4
5k, 5120x2880	15, 30, 60	AI, RA, FIZD
4k, 3840x2160	15, 30, 60	AI, RA, FIZD
WQXGA, 2560x1600	15, 30, 60	AI, RA, FIZD
WUXGA, 1920x1200	15, 30, 60	AI, RA, FIZD
WSXGA+, 1680x1050	15, 30, 60	AI, RA, FIZD
WXGA, 1280x800	15, 30, 60	AI, RA, FIZD
XGA, 1024x768	15, 30, 60	AI, RA, FIZD
SVGA, 800x600	15, 30, 60	AI, RA, FIZD
VGA, 640x480	15, 30, 60	AI, RA, FIZD
Input color format: YCbCr 4:4:4
5k, 5120x2880	15, 30, 60	AI, RA, FIZD
4k, 3840x2160	15, 30, 60	AI, RA, FIZD
1440p (2K), 2560x1440	15, 30, 60	AI, RA, FIZD
1080p, 1920x1080	15, 30, 60	AI, RA, FIZD
720p, 1280x720	15, 30, 60	AI, RA, FIZD

3.6.Game Streaming

This is a service that provides game content over the Internet to differentlocal devices such as notebooks and gaming tablets. In this category ofapplications, the server renders 3D games in a cloud server and streams the game toany device with a wired or wireless broadband connection[12]. There are low-latency requirements for transmitting user interactions andreceiving game data with a turnaround delay of less than 100 ms. This allowsanyone to play (or resume) full-featured games from anywhere on the Internet[12]. An example of this application is Nvidia Grid[12]. Another application scenario of this category is broadcast of video gamesplayed by people over the Internet in real time or for later viewing[12]. There are many companies, such as Twitch and YY in China, that enablegame broadcasting[12]. Games typically contain a lot ofsharp edges and large motion[12]. The main requirements areas follows:¶

Random access to pictures for game broadcasting;¶
Temporal (frame-rate) scalability; and¶
Error robustness.¶

Support of resolution and quality (SNR) scalability is highlydesirable. For this application, typical values of resolutions, frame rates,and PAMs are similar to ones presented inTable 3.¶

3.7.Video Monitoring and Surveillance

This is a type of live broadcasting over IP-based networks. Video streamsare sent to many receivers at the same time. A new receiver may connect to thestream at an arbitrary moment, so the random access period should be keptsmall enough (approximately, 1-5 seconds). Data are transmitted publicly inthe case of video monitoring and privately in the case of videosurveillance. For IP cameras that have to capture, process, and encode videodata, complexity -- including computational and hardware complexity, as wellas memory bandwidth -- should be kept low to allow real-time processing. Inaddition, support of a high dynamic range and a monochrome mode (e.g., forinfrared cameras) as well as resolution and quality (SNR) scalability is anessential requirement for video surveillance.In some use cases, highvideo signal fidelity is required even after lossy compression. Typical valuesof resolutions, frame rates, and PAMs for video monitoring and surveillanceapplications are presented inTable 6.¶

Table 6:Video Monitoring and Surveillance: Typical Values of Resolutions, Frame Rates, and PAMs
Resolution	Frame Rate, FPS	PAM
2160p (4K), 3840x2160	12, 25, 30	RA, FIZD
5Mpixels, 2560x1920	12, 25, 30	RA, FIZD
1080p, 1920x1080	25, 30	RA, FIZD
1.23Mpixels, 1280x960	25, 30	RA, FIZD
720p, 1280x720	25, 30	RA, FIZD
SVGA, 800x600	25, 30	RA, FIZD

4.Requirements

Taking the requirements discussed above for specific video applications,this section proposes requirements for an Internet video codec.¶

4.1.General Requirements

4.1.1.Coding Efficiency

The most fundamental requirement is coding efficiency, i.e., compressionperformance on both "easy" and "difficult" content for applications and usecases inSection 3. The codec should provide higher coding efficiency overstate-of-the-art video codecs such as HEVC/H.265 and VP9, at least 25%, inaccordance with the methodology described inSection 5 of this document. Forhigher resolutions, the improvements in coding efficiency are expected to behigher than for lower resolutions.¶

4.1.2.Profiles and Levels

Good-quality specification and well-defined profiles and levels arerequired to enable device interoperability and facilitate decoderimplementations. A profile consists of a subset of entire bitstream syntaxelements; consequently, it also defines the necessary tools for decoding aconforming bitstream of that profile. A level imposes a set of numericallimits to the values of some syntax elements. An example of codec levels to besupported is presented inTable 7. An actual leveldefinition should include constraints on features that impact the decodercomplexity. For example, these features might be as follows: maximum bitrate,line buffer size, memory usage, etc.¶

Table 7:Codec Levels
Level	Example picture resolution at highest frame rate
1	128x96(12,288)@30.0 176x144(25,344)@15.0¶
2	352x288(101,376*)@30.0
3	352x288(101,376)@60.0 640x360(230,400)@30.0¶
4	640x360(230,400)@60.0 960x540(518,400)@30.0¶
5	720x576(414,720)@75.0 960x540(518,400)@60.0 1280x720(921,600*)@30.0¶
6	1,280x720(921,600)@68.0 2,048x1,080(2,211,840)@30.0¶
7	1,280x720(921,600*)@120.0
8	1,920x1,080(2,073,600)@120.0 3,840x2,160(8,294,400)@30.0 4,096x2,160(8,847,360*)@30.0¶
9	1,920x1,080(2,073,600)@250.0 4,096x2,160(8,847,360)@60.0¶
10	1,920x1,080(2,073,600)@300.0 4,096x2,160(8,847,360)@120.0¶
11	3,840x2,160(8,294,400)@120.0 8,192x4,320(35,389,440)@30.0¶
12	3,840x2,160(8,294,400)@250.0 8,192x4,320(35,389,440)@60.0¶
13	3,840x2,160(8,294,400)@300.0 8,192x4,320(35,389,440)@120.0¶

*Note: The quantities of pixels are presented for applications in which a picture can have an arbitrary size (e.g., screencasting).¶

4.1.3.Bitstream Syntax

Bitstream syntax should allow extensibility and backwardcompatibility. New features can be supported easily by using metadata (such asSEI messages, VUI, and headers) without affecting the bitstream compatibilitywith legacy decoders. A newer version of the decoder shall be able to playbitstreams of an older version of the same or lower profile and level.¶

4.1.4.Parsing and Identification of Sample Components

A bitstream should have a model that allows easy parsing and identification ofthe sample components (such as Annex B of ISO/IEC 14496-10[18] or ISO/IEC 14496-15[19]). Inparticular, information needed for packet handling (e.g., frame type) shouldnot require parsing anything below the header level.¶

4.1.5.Perceptual Quality Tools

Perceptual quality tools (such as adaptive QP and quantization matrices)should be supported by the codec bitstream.¶

4.1.6.Buffer Model

The codec specification shall define a buffer model such as hypothetical reference decoder (HRD).¶

4.1.7.Integration

Specifications providing integration with system and delivery layers should be developed.¶

4.2.Basic Requirements

4.2.1.Input Source Formats

Input pictures coded by a video codec should have one of the following formats:¶

Bit depth: 8 and 10 bits (up to 12 bits for a high profile) per color component.¶
Color sampling formats:¶
- YCbCr 4:2:0¶
- YCbCr 4:4:4, YCbCr 4:2:2, and YCbCr 4:0:0 (preferably in different profile(s))¶
For profiles with bit depth of 10 bits per sample or higher, support of high dynamic range and wide color gamut.¶
Support of arbitrary resolution according to the level constraints forapplications in which a picture can have an arbitrary size (e.g., in screencasting).¶

Exemplary input source formats for codec profiles are shown inTable 8.¶

Table 8:Exemplary Input Source Formats for Codec Profiles
Profile	Bit depths per color component	Color sampling formats
1	8 and 10	4:0:0 and 4:2:0
2	8 and 10	4:0:0, 4:2:0, and 4:4:4
3	8, 10, and 12	4:0:0, 4:2:0, 4:2:2, and 4:4:4

4.2.2.Coding Delay

In order to meet coding delay requirements, a video codec should support all of the following:¶

Support of configurations with zero structural delay, also referred toas "low-delay" configurations.¶
- Note: End-to-end delay should be no more than 320 ms[2], but it is preferable for its value to be less than 100 ms[9].¶
Support of efficient random access point encoding (such as intracoding andresetting of context variables), as well as efficient switching betweenmultiple quality representations.¶
Support of configurations with nonzero structural delay (such asout-of-order or multipass encoding) for applications without low-delayrequirements, if such configurations provide additional compression efficiencyimprovements.¶

4.2.3.Complexity

Encoding and decoding complexity considerations are as follows:¶

Feasible real-time implementation of both an encoder and a decodersupporting a chosen subset of tools for hardware and software implementationon a wide range of state-of-the-art platforms. The subset of real-time encodertools should provide meaningful improvement in compression efficiency atreasonable complexity of hardware and software encoder implementations ascompared to real-time implementations of state-of-the-art video compressiontechnologies such as HEVC/H.265 and VP9.¶
High-complexity software encoder implementations used by offline encodingapplications can have a 10x or more complexity increase compared tostate-of-the-art video compression technologies such as HEVC/H.265 and VP9.¶

4.2.4.Scalability

The mandatory scalability requirement is as follows:¶

Temporal (frame-rate) scalability should be supported.¶

4.2.5.Error Resilience

In order to meet the error resilience requirement, a video codec shouldsatisfy all of the following conditions:¶

Tools that are complementary to the error-protectionmechanisms implemented on the transport level should be supported.¶
The codec should support mechanisms that facilitate packetization of a bitstream for common network protocols.¶
Packetization mechanisms should enable frame-level error recovery by means of retransmission or error concealment.¶
The codec should support effective mechanisms for allowing decoding and reconstruction of significant parts of pictures in the event that parts of the picture data are lost in transmission.¶
The bitstream specification shall support independently decodable subframeunits similar to slices or independent tiles. It shall be possible for theencoder to restrict the bitstream to allow parsing of the bitstream after apacket loss and to communicate it to the decoder.¶

4.3.Optional Requirements

4.3.1.Input Source Formats

It is a desired but not mandatory requirement for a video codec to supportsome of the following features:¶

Bit depth: up to 16 bits per color component.¶
Color sampling formats: RGB 4:4:4.¶
Auxiliary channel (e.g., alpha channel) support.¶

4.3.2.Scalability

Desirable scalability requirements are as follows:¶

Resolution and quality (SNR) scalability that provides a low-compressionefficiency penalty (increase of up to 5% of BD-rate[13] perlayer with reasonable increase of both computational and hardware complexity)can be supported in the main profile of the codec being developed by the NETVCWorking Group. Otherwise, a separate profile is needed to support these types ofscalability.¶
Computational complexity scalability (i.e., computational complexity isdecreasing along with degrading picture quality) is desirable.¶

4.3.3.Complexity

Tools that enable parallel processing (e.g., slices, tiles, and wave-frontpropagation processing) at both encoder and decoder sides are highly desirablefor many applications.¶

High-level multicore parallelism: encoder and decoder operation,especially entropy encoding and decoding, should allow multiple frames orsubframe regions (e.g., 1D slices, 2D tiles, or partitions) to be processedconcurrently, either independently or with deterministic dependencies that canbe efficiently pipelined.¶
Low-level instruction-set parallelism: favor algorithms that are SIMD/GPUfriendly over inherently serial algorithms¶

4.3.4.Coding Efficiency

Compression efficiency on noisy content, content with film grain, computergenerated content, and low resolution materials is desirable.¶

5.Evaluation Methodology

As shown inFigure 1, compression performance testing isperformed in three overlapped ranges that encompass ten different bitrate values:¶

Low bitrate range (LBR) is the range that contains the four lowestbitrates of the ten specified bitrates (one of the four bitrate values is sharedwith the neighboring range).¶
Medium bitrate range (MBR) is the range that contains the four mediumbitrates of the ten specified bitrates (two of the four bitrate values are shared with the neighboring ranges).¶
High bitrate range (HBR) is the range that contains the four highestbitrates of the ten specified bitrates (one of the four bitrate values isshared with the neighboring range).¶

Initially, for the codec selected as a reference one (e.g., HEVC or VP9), aset of ten QP (quantization parameter) values should be specified as in[14], and corresponding quality values should becalculated. InFigure 1, QP and quality values are denoted as "QP0"-"QP9" and"Q0"-"Q9", respectively. To guarantee the overlaps of qualitylevels between the bitrate ranges of the reference and tested codecs, aquality alignment procedure should be performed for each range's outermost(left- and rightmost) quality levels Qk of the reference codec (i.e., for Q0,Q3, Q6, and Q9) and the quality levels Q'k (i.e., Q'0, Q'3, Q'6, and Q'9) ofthe tested codec. Thus, these quality levels Q'k, and hence the correspondingQP value QP'k (i.e., QP'0, QP'3, QP'6, and QP'9), of the tested codec should beselected using the following formulas:¶

Q'k =   min { abs(Q'i - Qk) },      i in RQP'k = argmin { abs(Q'i(QP'i) - Qk(QPk)) },       i in R

where R is the range of the QP indexes of the tested codec, i.e., thecandidate Internet video codec. The inner quality levels (i.e., Q'1, Q'2, Q'4,Q'5, Q'7, and Q'8), as well as their corresponding QP values of each range(i.e., QP'1, QP'2, QP'4, QP'5, QP'7, and QP'8), should be as equidistantlyspaced as possible between the left- and rightmost quality levels withoutexplicitly mapping their values using the procedure described above.¶

QP'9 QP'8  QP'7 QP'6 QP'5 QP'4 QP'3 QP'2 QP'1 QP'0 <+----- ^     ^    ^    ^    ^    ^    ^    ^    ^    ^    | Tested |     |    |    |    |    |    |    |    |    |    | codecQ'0   Q'1  Q'2  Q'3  Q'4  Q'5  Q'6  Q'7  Q'8  Q'9  <+----- ^               ^              ^              ^ |               |              |              |Q0    Q1    Q2   Q3   Q4   Q5   Q6   Q7   Q8   Q9  <+----- ^    ^     ^    ^    ^    ^    ^    ^    ^    ^    | Reference |    |     |    |    |    |    |    |    |    |    | codecQP9  QP8   QP7  QP6  QP5  QP4  QP3  QP2  QP1  QP0  <+-----+----------------+--------------+--------------+--------->^                ^              ^              ^     Bitrate|-------LBR------|              |-----HBR------|                 ^              ^                 |------MBR-----|

Figure 1:Quality/QP Alignment for Compression Performance Evaluation

Since the QP mapping results may vary for different sequences, this qualityalignment procedure eventually needs to be performed separately for each qualityassessment index and each sequence used for codec performance evaluation tofulfill the requirements described above.¶

To assess the quality of output (decoded) sequences, two indexes (PSNR[3] and MS-SSIM[3][15]) are separately computed. In the case of the YCbCrcolor format, PSNR should be calculated for each color plane, whereas MS-SSIMis calculated for the luma channel only. In the case of the RGB color format,both metrics are computed for R, G, and B channels. Thus, for each sequence,30 RD-points for PSNR (i.e., three RD-curves, one for each channel) and 10RD-points for MS-SSIM (i.e., one RD-curve, for luma channel only) should becalculated in the case of YCbCr. If content is encoded as RGB, 60 RD-points(30 for PSNR and 30 for MS-SSIM) should be calculated (i.e., three RD-curves,one for each channel) are computed for PSNR as well as three RD-curves (onefor each channel) for MS-SSIM.¶

Finally, to obtain an integral estimation, BD-rate savings[13] should becomputed for each range and each quality index. In addition, average valuesover all three ranges should be provided for both PSNR and MS-SSIM. A list ofvideo sequences that should be used for testing, as well as the ten QP valuesfor the reference codec, are defined in[14]. Testing processes should use theinformation on the codec applications presented in this document. As thereference for evaluation, state-of-the-art video codecs such as HEVC/H.265[4][5] or VP9 must be used. The reference sourcecode of the HEVC/H.265 codeccan be found at[6]. The HEVC/H.265 codec must be configuredaccording to[16]andTable 9.¶

Table 9:Intraperiods for Different HEVC/H.265 Encoding Modes According to [16]
Intra-period, second	HEVC/H.265 encoding mode according to[16]
AI	Intra Main or Intra Main10
RA	Random access Main or Random access Main10¶
FIZD	Low delay Main or Low delay Main10¶

According to the coding efficiency requirement described inSection 4.1.1, BD-rate savings calculated for each color plane andaveraged for all the video sequences used to test the NETVC codec should be,at least,¶

25% if calculated over the whole bitrate range; and¶
15% if calculated for each bitrate subrange (LBR, MBR, HBR).¶

Since values of the two objective metrics (PSNR and MS-SSIM) are availablefor some color planes, each value should meet these coding efficiencyrequirements. That is, the final BD-rate saving denoted as S is calculated fora given color plane as follows:¶

S = min { S_psnr, S_ms-ssim }

where S_psnr and S_ms-ssim are BD-rate savings calculated for the given color plane using PSNR and MS-SSIM metrics, respectively.¶

In addition to the objective quality measures defined above, subjectiveevaluation must also be performed for the final NETVC codec adoption. Forsubjective tests, the MOS-based evaluation procedure must be used as describedin Section 2.1 of[3]. For perception-oriented tools that primarily impact subjective quality, additional tests may also be individually assigned even for intermediate evaluation, subject to a decision of the NETVC WG.¶

8.References

8.1.Normative References

[1]: ITU-R,"Parameter values for ultra-high definition television systems for production and international programme exchange",ITU-R Recommendation BT.2020-2,October 2015,<https://www.itu.int/rec/R-REC-BT.2020-2-201510-I/en>.
[2]: ITU-T,"Quality of Experience requirements for telepresence services",ITU-T Recommendation G.1091,October 2014,<https://www.itu.int/rec/T-REC-G.1091/en>.
[3]: ISO,"Information technology -- Advanced image coding and evaluation -- Part 1: Guidelines for image coding system evaluation",ISO/IEC TR 29170-1:2017,October 2017,<https://www.iso.org/standard/63637.html>.
[4]: ISO,"Information technology -- High efficiency coding and media delivery in heterogeneous environments -- Part 2: High efficiency video coding",ISO/IEC 23008-2:2015,May 2018,<https://www.iso.org/standard/67660.html>.
[5]: ITU-T,"High efficiency video coding",ITU-T Recommendation H.265,November 2019,<https://www.itu.int/rec/T-REC-H.265>.
[6]: Fraunhofer Institute for Telecommunications,"High Efficiency Video Coding (HEVC) reference software (HEVC Test Model also known as HM)",<https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/>.

8.2.Informative References

[7]: Federal Agencies Digital Guidelines Initiative,"Term: High dynamic range imaging",<http://www.digitizationguidelines.gov/term.php?term=highdynamicrangeimaging>.
[8]: Federal Agencies Digital Guidelines Initiative,"Term: Compression, visually lossless",<http://www.digitizationguidelines.gov/term.php?term=compressionvisuallylossless>.
[9]: Wenger, S.,"The case for scalability support in version 1 of Future Video Coding",SG 16 (Study Period 2013) Contribution 988,September 2015,<https://www.itu.int/md/T13-SG16-C-0988/en>.
[10]: YouTube,"Recommended upload encoding settings",<https://support.google.com/youtube/answer/1722171?hl=en>.
[11]: Yu, H., Ed., McCann, K., Ed., Cohen, R., Ed., and P. Amon, Ed.,"Requirements for an extension of HEVC for coding of screen content",ISO/IEC JTC 1/SC 29/WG 11 Moving Picture Experts Group MPEG2013/N14174,San Jose, USA,January 2014,<https://mpeg.chiariglione.org/standards/mpeg-h/high-efficiency-video-coding/requirements-extension-hevc-coding-screen-content>.
[12]: Parhy, M.,"Game streaming requirement for Future Video Coding",ISO/IEC JTC 1/SC 29/WG 11 Moving Picture Experts Group N36771,Warsaw, Poland,June 2015.
[13]: Bjontegaard, G.,"Calculation of average PSNR differences between RD-curves",SG 16 VCEG-M33,April 2001,<https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/>.
[14]: Daede, T., Norkin, A., and I. Brailovskiy,"Video Codec Testing and Quality Measurement",Work in Progress,Internet-Draft, draft-ietf-netvc-testing-09,31 January 2020,<https://tools.ietf.org/html/draft-ietf-netvc-testing-09>.
[15]: Wang, Z., Simoncelli, E.P., and A.C. Bovik,"Multiscale structural similarity for image quality assessment",IEEE Thirty-Seventh Asilomar Conference on Signals, Systems and Computers,DOI 10.1109/ACSSC.2003.1292216,November 2003,<https://ieeexplore.ieee.org/document/1292216>.
[16]: Bossen, F.,"Common HM test conditions and software reference configurations",Joint Collaborative Team on Video Coding (JCT-VC) of the ITU-T Video Coding Experts Group (ITU-T Q.6/SG 16) and ISO/IEC Moving Picture Experts Group (ISO/IEC JTC 1/SC 29/WG 11),Document JCTVC-L1100,April 2013,<http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=7281>.
[17]: ITU-R,"Studio encoding parameters of digital television for standard 4:3 and wide screen 16:9 aspect ratios",ITU-R Recommendation BT.601,March 2011,<https://www.itu.int/rec/R-REC-BT.601/>.
[18]: ISO/IEC,"Information technology -- Coding of audio-visual objects -- Part 10: Advanced video coding",ISO/IEC DIS 14496-10,<https://www.iso.org/standard/75400.html>.
[19]: ISO/IEC,"Information technology -- Coding of audio-visual objects -- Part 15: Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format",ISO/IEC 14496-15,<https://www.iso.org/standard/74429.html>.
[20]: ITU-R,"Parameter values for the HDTV standards for production and international programme exchange",ITU-R Recommendation BT.709,June 2015,<https://www.itu.int/rec/R-REC-BT.709>.

Level	Example picture resolution at highest frame rate
1	128x96(12,288)@30.0 176x144(25,344)@15.0¶
2	352x288(101,376*)@30.0
3	352x288(101,376)@60.0 640x360(230,400)@30.0¶
4	640x360(230,400)@60.0 960x540(518,400)@30.0¶
5	720x576(414,720)@75.0 960x540(518,400)@60.0 1280x720(921,600*)@30.0¶
6	1,280x720(921,600)@68.0 2,048x1,080(2,211,840)@30.0¶
7	1,280x720(921,600*)@120.0
8	1,920x1,080(2,073,600)@120.0 3,840x2,160(8,294,400)@30.0 4,096x2,160(8,847,360*)@30.0¶
9	1,920x1,080(2,073,600)@250.0 4,096x2,160(8,847,360)@60.0¶
10	1,920x1,080(2,073,600)@300.0 4,096x2,160(8,847,360)@120.0¶
11	3,840x2,160(8,294,400)@120.0 8,192x4,320(35,389,440)@30.0¶
12	3,840x2,160(8,294,400)@250.0 8,192x4,320(35,389,440)@60.0¶
13	3,840x2,160(8,294,400)@300.0 8,192x4,320(35,389,440)@120.0¶

Movatterモバイル変換

RFC 8761

Video Codec Requirements and Evaluation Methodology