Avideo coding format[a] (or sometimesvideo compression format) is an encoded format ofdigital video content, such as in a data file orbitstream. It typically uses a standardizedvideo compression algorithm, most commonly based ondiscrete cosine transform (DCT) coding andmotion compensation. A computersoftware orhardware component that compresses or decompresses a specific video coding format is avideo codec.
Some video coding formats are documented by a detailedtechnical specification document known as avideo coding specification. Some such specifications are written and approved bystandardization organizations astechnical standards, and are thus known as avideo coding standard. There arede facto standards and formal standards.
Video content encoded using a particular video coding format is normally bundled with an audio stream (encoded using anaudio coding format) inside amultimedia container format such asAVI,MP4,FLV,RealMedia, orMatroska. As such, the user normally does not have aH.264 file, but instead has avideo file, which is an MP4 container of H.264-encoded video, normally alongsideAAC-encoded audio. Multimedia container formats can contain one of several different video coding formats; for example, the MP4 container format can contain video coding formats such asMPEG-2 Part 2 or H.264. Another example is the initial specification for the file typeWebM, which specifies the container format (Matroska), but also exactly which video (VP8) and audio (Vorbis) compression format is inside the Matroska container, even though Matroska is capable of containingVP9 video, andOpus audio support was later added to theWebM specification.
Aformat is the layout plan for data produced or consumed by acodec.
Although video coding formats such as H.264 are sometimes referred to ascodecs, there is a clear conceptual difference between a specification and its implementations. Video coding formats are described in specifications, and software,firmware, or hardware to encode/decode data in a given video coding format from/to uncompressed video are implementations of those specifications. As an analogy, the video coding formatH.264 (specification) is to thecodecOpenH264 (specific implementation) what theC Programming Language (specification) is to the compilerGCC (specific implementation). Note that for each specification (e.g.,H.264), there can be many codecs implementing that specification (e.g.,x264, OpenH264,H.264/MPEG-4 AVC products and implementations).
This distinction is not consistently reflected terminologically in the literature. The H.264 specification callsH.261,H.262,H.263, andH.264video coding standards and does not contain the wordcodec.[2] TheAlliance for Open Media clearly distinguishes between theAV1 video coding format and the accompanying codec they are developing, but calls the video coding format itself avideo codec specification.[3] TheVP9 specification calls the video coding format VP9 itself acodec.[4]
As an example of conflation, Chromium's[5] and Mozilla's[6] pages listing their video formats support both call video coding formats, such as H.264codecs. As another example, in Cisco's announcement of a free-as-in-beer video codec, the press release refers to the H.264 video coding format as acodec ("choice of a common video codec"), but calls Cisco's implementation of a H.264 encoder/decoder acodec shortly thereafter ("open-source our H.264 codec").[7]
A video coding format does not dictate allalgorithms used by acodec implementing the format. For example, a large part of how video compression typically works is by findingsimilarities between video frames (block-matching) and then achieving compression by copying previously-coded similar subimages (such asmacroblocks) and adding small differences when necessary. Finding optimal combinations of such predictors and differences is anNP-hard problem,[8] meaning that it is practically impossible to find an optimal solution. Though the video coding format must support such compression across frames in the bitstream format, by not needlessly mandating specific algorithms for finding such block-matches and other encoding steps, the codecs implementing the video coding specification have some freedom to optimize and innovate in their choice of algorithms. For example, section 0.5 of the H.264 specification says that encoding algorithms are not part of the specification.[2] Free choice of algorithm also allows differentspace–time complexity trade-offs for the same video coding format, so a live feed can use a fast but space-inefficient algorithm, and a one-timeDVD encoding for later mass production can trade long encoding-time for space-efficient encoding.
The concept ofanalog video compression dates back to 1929, when R.D. Kell inBritain proposed the concept of transmitting only the portions of the scene that changed from frame-to-frame. The concept ofdigital video compression dates back to 1952, whenBell Labs researchers B.M. Oliver and C.W. Harrison proposed the use ofdifferential pulse-code modulation (DPCM) in video coding. In 1959, the concept ofinter-framemotion compensation was proposed byNHK researchers Y. Taki, M. Hatori and S. Tanaka, who proposed predictive inter-frame video coding in thetemporal dimension.[9] In 1967,University of London researchers A.H. Robinson and C. Cherry proposedrun-length encoding (RLE), alossless compression scheme, to reduce the transmission bandwidth ofanalog television signals.[10]
The earliest digital video coding algorithms were either foruncompressed video or usedlossless compression, both methods inefficient and impractical for digital video coding.[11][12] Digital video was introduced in the 1970s,[11] initially using uncompressedpulse-code modulation (PCM), requiring highbitrates around 45–200Mbit/s forstandard-definition (SD) video,[11][12] which was up to 2,000 times greater than thetelecommunicationbandwidth (up to 100 kbit/s) available until the 1990s.[12] Similarly, uncompressedhigh-definition (HD)1080p video requires bitrates exceeding 1 Gbit/s, significantly greater than the bandwidth available in the 2000s.[13]
Practicalvideo compression emerged with the development ofmotion-compensatedDCT (MC DCT) coding,[12][11] also called block motion compensation (BMC)[9] or DCT motion compensation. This is a hybrid coding algorithm,[9] which combines two keydata compression techniques:discrete cosine transform (DCT) coding[12][11] in thespatial dimension, and predictivemotion compensation in thetemporal dimension.[9]
DCT coding is alossy block compressiontransform coding technique that was first proposed byNasir Ahmed, who initially intended it forimage compression, while he was working atKansas State University in 1972. It was then developed into a practical image compression algorithm by Ahmed with T. Natarajan andK. R. Rao at theUniversity of Texas in 1973, and was published in 1974.[14][15][16]
The other key development was motion-compensated hybrid coding.[9] In 1974, Ali Habibi at theUniversity of Southern California introduced hybrid coding,[17][18][19] which combines predictive coding with transform coding.[9][20] He examined several transform coding techniques, including the DCT,Hadamard transform,Fourier transform, slant transform, andKarhunen-Loeve transform.[17] However, his algorithm was initially limited tointra-frame coding in the spatial dimension. In 1975, John A. Roese and Guner S. Robinson extended Habibi's hybrid coding algorithm to the temporal dimension, using transform coding in the spatial dimension and predictive coding in the temporal dimension, developinginter-frame motion-compensated hybrid coding.[9][21] For the spatial transform coding, they experimented with different transforms, including the DCT and thefast Fourier transform (FFT), developing inter-frame hybrid coders for them, and found that the DCT is the most efficient due to its reduced complexity, capable of compressing image data down to 0.25-bit perpixel for avideotelephone scene with image quality comparable to a typical intra-frame coder requiring 2-bit per pixel.[22][21]
The DCT was applied to video encoding by Wen-Hsiung Chen,[23] who developed a fast DCT algorithm with C.H. Smith and S.C. Fralick in 1977,[24][25] and foundedCompression Labs to commercialize DCT technology.[23] In 1979,Anil K. Jain and Jaswant R. Jain further developed motion-compensated DCT video compression.[26][9] This led to Chen developing a practical video compression algorithm, called motion-compensated DCT or adaptive scene coding, in 1981.[9] Motion-compensated DCT later became the standard coding technique for video compression from the late 1980s onwards.[11][27]
The first digital video coding standard wasH.120, developed by theCCITT (now ITU-T) in 1984.[28] H.120 was not usable in practice, as its performance was too poor.[28] H.120 used motion-compensated DPCM coding,[9] a lossless compression algorithm that was inefficient for video coding.[11] During the late 1980s, a number of companies began experimenting withdiscrete cosine transform (DCT) coding, a much more efficient form of compression for video coding. The CCITT received 14 proposals for DCT-based video compression formats, in contrast to a single proposal based onvector quantization (VQ) compression. TheH.261 standard was developed based on motion-compensated DCT compression.[11][27] H.261 was the first practical video coding standard,[28] and usespatents licensed from a number of companies, includingHitachi,PictureTel,NTT,BT, andToshiba, among others.[29] Since H.261, motion-compensated DCT compression has been adopted by all the major video coding standards (including theH.26x andMPEG formats) that followed.[11][27]
MPEG-1, developed by theMoving Picture Experts Group (MPEG), followed in 1991, and it was designed to compressVHS-quality video.[28] It was succeeded in 1994 byMPEG-2/H.262,[28] which was developed with patents licensed from a number of companies, primarilySony,Thomson andMitsubishi Electric.[30] MPEG-2 became the standard video format forDVD andSD digital television.[28] Its motion-compensated DCT algorithm was able to achieve acompression ratio of up to 100:1, enabling the development ofdigital media technologies such asvideo on demand (VOD)[12] andhigh-definition television (HDTV).[31] In 1999, it was followed byMPEG-4/H.263, which was a major leap forward for video compression technology.[28] It uses patents licensed from a number of companies, primarily Mitsubishi,Hitachi andPanasonic.[32]
The most widely used video coding format as of 2019[update] isH.264/MPEG-4 AVC.[33] It was developed in 2003, and uses patents licensed from a number of organizations, primarily Panasonic,Godo Kaisha IP Bridge andLG Electronics.[34] In contrast to the standard DCT used by its predecessors, AVC uses theinteger DCT.[23][35] H.264 is one of the video encoding standards forBlu-ray Discs; all Blu-ray Disc players must be able to decode H.264. It is also widely used by streaming internet sources, such as videos fromYouTube,Netflix,Vimeo, and theiTunes Store, web software such as theAdobe Flash Player andMicrosoft Silverlight, and also variousHDTV broadcasts over terrestrial (ATSC standards,ISDB-T,DVB-T orDVB-T2), cable (DVB-C), and satellite (DVB-S2).[36]
A main problem for many video coding formats has beenpatents, making it expensive to use or potentially risking a patent lawsuit due tosubmarine patents. The motivation behind many recently designed video coding formats such asTheora,VP8, andVP9 have been to create a (libre) video coding standard covered only by royalty-free patents.[37] Patent status has also been a major point of contention for the choice of which video formats the mainstreamweb browsers will support inside theHTML video tag.
The current-generation video coding format isHEVC (H.265), introduced in 2013. AVC uses the integer DCT with 4x4 and 8x8 block sizes, and HEVC uses integer DCT andDST transforms with varied block sizes between 4x4 and 32x32.[38] HEVC is heavily patented, mostly bySamsung Electronics,GE,NTT, andJVCKenwood.[39] It is challenged by theAV1 format, intended for free license. As of 2019[update], AVC is by far the most commonly used format for the recording, compression, and distribution of video content, used by 91% of video developers, followed by HEVC which is used by 43% of developers.[33]
Consumer video is generally compressed usinglossyvideo codecs, since that results in significantly smaller files thanlossless compression. Some video coding formats designed explicitly for either lossy or lossless compression, and some video coding formats such asDirac andH.264 support both.[49]
Uncompressed video formats, such asClean HDMI, is a form of lossless video used in some circumstances such as when sending video to a display over aHDMI connection. Some high-end cameras can also capture video directly in this format.[examples needed]
Interframe compression complicates editing of an encoded video sequence.[50]One subclass of relatively simple video coding formats are theintra-frame video formats, such asDV, in which each frame of the video stream is compressed independently without referring to other frames in the stream, and no attempt is made to take advantage of correlations between successive pictures over time for better compression. One example isMotion JPEG, which is simply a sequence of individuallyJPEG-compressed images. This approach is quick and simple, at the expense of the encoded video being much larger than a video coding format supportingInter frame coding.
Because interframe compression copies data from one frame to another, if the original frame is simply cut out (or lost in transmission), the following frames cannot be reconstructed properly. Makingcuts in intraframe-compressed video whilevideo editing is almost as easy as editing uncompressed video: one finds the beginning and ending of each frame, and simply copies bit-for-bit each frame that one wants to keep, and discards the frames one does not want. Another difference between intraframe and interframe compression is that, with intraframe systems, each frame uses a similar amount of data. In most interframe systems, certain frames (such asI-frames inMPEG-2) are not allowed to copy data from other frames, so they require much more data than other frames nearby.[51]
It is possible to build a computer-based video editor that spots problems caused when I frames are edited out while other frames need them. This has allowed newer formats likeHDV to be used for editing. However, this process demands a lot more computing power than editing intraframe compressed video with the same picture quality. But, this compression is not very effective to use for any audio format.[52]
A video coding format can define optional restrictions to encoded video, calledprofiles and levels. It is possible to have a decoder which only supports decoding a subset of profiles and levels of a given video format, for example to make the decoder program/hardware smaller, simpler, or faster.[citation needed]
Aprofile restricts which encoding techniques are allowed. For example, the H.264 format includes the profilesbaseline,main andhigh (and others). WhileP-slices (which can be predicted based on preceding slices) are supported in all profiles,B-slices (which can be predicted based on both preceding and following slices) are supported in themain andhigh profiles but not inbaseline.[53]
Alevel is a restriction on parameters such as maximum resolution and data rates.[53]
A significant advance in image coding methodology occurred with the introduction of the concept of hybrid transform/DPCM coding (Habibi, 1974).
H.263 is similar to, but more complex than H.261. It is currently the most widely used international video compression standard for video telephony on ISDN (Integrated Services Digital Network) telephone lines.