This articlemay containoriginal research. Pleaseimprove it byverifying the claims made and addinginline citations. Statements consisting only of original research should be removed.(May 2025) (Learn how and when to remove this message) |
| Deflate | |
|---|---|
| Developed by | Phil Katz,PKWare |
| Initial release | 21 August 1990; 35 years ago (1990-08-21) |
| Compression | LZ77,Huffman coding |
| Standard | IETFRFC 1951 |
| Open format? | Yes |
| Free format? | Yes |
| Website | www |
Incomputing,Deflate (stylized asDEFLATE, and also calledFlate[1][2]) is alossless data compression algorithm[a] that uses a combination ofLZ77 andHuffman coding. It was designed byPhil Katz, for version 2 of hisPKZIP archiving tool. Deflate was later specified inRequest for Comments (RFC) 1951 (1996).[4]
Katz also designed the originalalgorithm used to construct Deflate streams. This algorithm receivedsoftware patentU.S. patent 5,051,745, assigned toPKWare, Inc.[5][6] As stated in the RFC document, an algorithm producing Deflate files was widely thought to be implementable in a manner not covered by patents.[4] This led to its widespread use. For example, in thezlib data format,gzip file format, Portable Network Graphics (PNG) image file,ZIP file format for which Katz originally designed it. The patent has since expired.
Deflate compression takes any sequence ofbytes and outputs asequence of blocks (commonly calledDeflate stream). Deflate decompression takes a sequence of blocks and outputs the original sequence of bytes.
The firstbit (bit 0), in a byte, is theleast-significant. Multi-byte values uselittle-endian.[7]
Each block has a 3-bitheader with two fields:[8]
1 if this is thelast block in the sequence, else0.00:No compression (sometimes calledstored). Any bits up to the next byte boundary are ignored. The rest of the block contains 16-bitLEN, 16-bitNLEN (one's complement of LEN) and LEN bytes ofuncompressed data, i.e. up to 65,535 (216 − 1) bytes. Useful for incompressible data (e.g. high-entropy, random, or already compressed), adding minimal overhead (i.e. ~5 bytes per block).01: Astatic Huffman compressed block, using a pre-agreedHuffman tree defined in the RFC.10: Adynamic Huffman compressed block, complete with the Huffman table supplied.11:Reserved (error).Most compressible data will end up being encoded using method10, thedynamic Huffman encoding, which produces an optimized Huffman tree customized for each block of data individually. Instructions to generate the necessary Huffman tree immediately follow the block header. The static Huffman option is used for short messages, where the fixed saving gained by omitting the tree outweighs the percentage compression loss due to using a non-optimal (thus, not technically Huffman) code.
Compression is achieved through two steps:
Within compressed blocks, if a duplicate series of bytes is spotted (a repeated string), then a back-reference is inserted, linking to the prior location of that identical string instead. An encoded match to an earlier string consists of an8-bit length (3–258 bytes) and a 15-bit distance (1–32,768 bytes) to the start of the duplicate. Relative back-references can be made across any number of blocks, as long as the distance appears within the last 32 KiB of uncompressed data decoded (termed thesliding window).
If the distance is less than the length, the duplicate overlaps itself, indicating repetition. For example, a run of 10 identical bytes can be encoded as one byte, followed by a duplicate of length 9, starting with the prior byte.
Searching the preceding text for duplicate substrings is the most computationally expensive part of the Deflate algorithm, and the operation which compression level settings affect.
The second compression stage consists of replacing commonly used symbols with shorter representations and less commonly used symbols with longer representations. The method used isHuffman coding which creates an unprefixed tree of non-overlapping intervals, where the length of each sequence is inversely proportional to the logarithm of the probability of that symbol needing to be encoded. The more likely it is that a symbol has to be encoded, the shorter its bit-sequence will be.
A tree is created, containing space for 288 symbols:
A match length code will always be followed by a distance code. Based on the distance code read, further "extra" bits may be read in order to produce the final distance. The distance tree contains space for 32 symbols:
For the match distance symbols 2–29, the number of extra bits can be calculated as.
The two codes (the 288-symbol length/literal tree and the 32-symbol distance tree) are themselves encoded ascanonical Huffman codes by giving the bit length of the code for each symbol. The bit lengths are themselvesrun-length encoded to produce as compact a representation as possible. As an alternative to including the tree representation, the "static tree" option provides standard fixed Huffman trees. The compressed size using the static trees can be computed using the same statistics (the number of times each symbol appears) as are used to generate the dynamic trees, so it is easy for a compressor to choose whichever is smaller.
During the compression stage, it is theencoder that chooses the amount of time spent looking for matching strings. The zlib/gzipreference implementation allows the user to select from a sliding scale of likely resulting compression-level vs. speed of encoding. Options range from0 (do not attempt compression, just store uncompressed) to9 representing the maximum capability of the reference implementation in zlib/gzip.
Other Deflate encoders have been produced, all of which will also produce a compatiblebitstream capable of being decompressed by any existing Deflate decoder. Differing implementations will likely produce variations on the final encoded bit-stream produced. The focus with non-zlib versions of an encoder has normally been to produce a more efficiently compressed and smaller encoded stream.
Deflate64, specified by PKWARE, is a proprietary variant of Deflate. It's fundamentally the same algorithm. What has changed is the increase in dictionary size from 32 KB to 64 KB, an extension of the distance codes to 16 bits so that they may address a range of 64 KB, and the length code, which is extended to16-bit, so that it may define lengths of three to 65,538 bytes.[9] This leads to Deflate64 having a longer compression time, and potentially a slightly higher compression ratio, than Deflate.[10] Several free and/or open source projects support Deflate64, such as7-Zip,[11] while others, such aszlib, do not, because the procedure is proprietary,[12] and the performance increase over Deflate is small.[13]
Implementations of Deflate are freely available in many languages. Apps written inC typically use thezliblibrary (under the permissivezlib License). Apps inBorland Pascal (and compatible languages) can use paszlib. Apps inC++ can take advantage of the improved Deflate library in7-Zip. BothJava and.NET framework offer out-of-the-box support for Deflate in their libraries (respectively,java.util.zip andSystem.IO.Compression). Apps inAda can useZip-Ada (pure) orZLib-Ada.
AdvanceCOMP uses the higher compression ratio versions of Deflate in 7-Zip, libdeflate, and Zopfli to enable recompression ofgzip,PNG,multiple-image Network Graphics (MNG) andZIP files with the possibility of smaller file sizes than zlib is able to achieve at maximum settings.[17]
193f:0001) able to compress streams using Deflate at a rate of up to 3.0 Gbit/s (375 MB/s) for incoming uncompressed data. Accompanying theLinux kerneldevice driver for the AHA361-PCIX is an "ahagzip" utility and customized "mod_deflate_aha" able to use the hardware compression fromApache. The hardware is based on aXilinxVirtexfield-programmable gate array (FPGA) and four custom AHA3601application-specific integrated circuits (ASICs). The AHA361/AHA362 boards are limited to only handling static Huffman blocks and require software to be modified to add support. The cards could not support the full Deflate specification, meaning they could only reliably decode their own output (a stream that did not contain any dynamic Huffman type 2 blocks).17b4:0011) or PCI-X cards featuring between one and six compression engines with claimed processing speeds of up to 3.6 Gbit/s (450 MB/s). A version of the cards are available with the separate brandWebEnhance specifically designed for web-serving use rather thanstorage area network (SAN) or backup use; aPCI Express (PCIe) revision, theMX4E is also produced.PCI-ID: 193f:0363/193f:0364) with a new hardware AHA3610 encoder chip. The new chip was designed to be capable of a sustained 2.5 Gbit/s. Using two of these chips, the AHA363-PCIe board can process Deflate at a rate of up to 5.0 Gbit/s (625 MB/s) using the two channels (two compression and two decompression). The AHA364-PCIe variant is an encode-only version of the card designed for out-goingload balancers and instead has multiple register sets to allow 32 independentvirtual compression channels feeding two physical compression engines. Linux,Microsoft Windows, andOpenSolaris kernel device drivers are available for both of the new cards, along with a modified zlib system library so that dynamically linked applications can automatically use the hardware support without internal modification. The AHA367-PCIe board (PCI-ID: 193f:0367) is similar to the AHA363-PCIe but uses four AHA3610 chips for a sustained compression rate of 10 Gbit/s (1250 MB/s). Unlike the AHA362-PCIX, the decompression engines on the AHA363-PCIe and AHA367-PCIe boards are fully deflate compliant.Inflate is the decoding process that takes a Deflate bitstream for decompression and correctly produces the original full-size data or file.
The normal intent with an alternative Inflate implementation is highly optimized decoding speed, or extremely predictablerandom-access memory (RAM) use formicrocontrollerembedded systems.
PCDEZIP, Bob Flanders and Michael Holmes, published inPC Magazine 1994-01-11.Package flate implements the Deflate compressed data format, described in RFC issue 1951.
FlateDecode [...] Decompresses data encoded using the zlib/deflate compression method
{{cite web}}: CS1 maint: bot: original URL status unknown (link){{cite web}}: CS1 maint: bot: original URL status unknown (link)appnote.txt,.ZIP File Format SpecificationArchived 2014-12-05 at theWayback Machine; Section 10,X. Deflating – Method 8.