| XZ Utils | |
|---|---|
| Original author | Lasse Collin |
| Developer | The Tukaani Project |
| Stable release | 5.8.2 / 17 December 2025; 2 months ago (2025-12-17) |
| Written in | C |
| Operating system | Cross-platform |
| Type | Data compression |
| License |
|
| Website | tukaani |
| Repository | |
XZ Utils (previouslyLZMA Utils) is a set offree softwarecommand-linelossless data compressors, including the programs lzma and xz, forUnix-like operating systems and, from version 5.0 onwards,Microsoft Windows. For compression/decompression theLempel–Ziv–Markov chain algorithm (LZMA) is used. XZ Utils started as a Unix port ofIgor Pavlov's LZMA-SDK that has been adapted to fit seamlessly into Unix environments and their usual structure and behavior.
XZ Utils can compress and decompress thexz andlzma file formats. Since the LZMA format has been consideredlegacy,[2]XZ Utils by default compresses to xz. In addition, decompression of the .lz format used bylzip is supported since version 5.3.4.[3]
In most cases, xz achieves higher compression rates than alternatives likezip,[4]gzip andbzip2. Decompression speed is higher than bzip2, but lower than gzip. Compression can be much slower than gzip, and is slower than bzip2 for high levels of compression, and is most useful when a compressed file will be used many times.[5][6]
XZ Utils consists of two major components:
xz, the command-line compressor and decompressor (analogous togzip)Various command shortcuts exist, such aslzma (forxz --format=lzma),unxz (forxz --decompress; analogous togunzip) andxzcat (forunxz --stdout; analogous tozcat).
Both the behavior of the software and the properties of the file format have been designed to work similarly to those of the popular Unix compressing toolsgzip andbzip2.
Just like gzip and bzip, xz and lzma can only compress single files (or data streams) as input. They cannot bundle multiple files into a singlearchive – to do this an archiving program is used first, such astar.
Compressing an archive:
xzmy_archive.tar# results in my_archive.tar.xzlzmamy_archive.tar# results in my_archive.tar.lzma
Decompressing the archive:
unxzmy_archive.tar.xz# results in my_archive.tarunlzmamy_archive.tar.lzma# results in my_archive.tar
Version 1.22 or greater of theGNU implementation of tar has transparent support for tarballs compressed with lzma and xz, using theswitches--xz or-J for xz compression, and--lzma for LZMA compression.
Creating an archive and compressing it:
tar-c--xz-fmy_archive.tar.xz/some_directory# results in my_archive.tar.xztar-c--lzma-fmy_archive.tar.lzma/some_directory# results in my_archive.tar.lzma
Decompressing the archive and extracting its contents:
tar-x--xz-fmy_archive.tar.xz# results in /some_directorytar-x--lzma-fmy_archive.tar.lzma# results in /some_directory
Single-letter tar example for archive with compress and decompress with extract usingshort suffix:
tarcJfkeep.txzkeep# archive then compress the directory ./keep/ into the file ./keep.txztarxJfkeep.txz# decompress then extract the file ./keep.txz creating the directory ./keep/
xz has supported multi-threaded compression (with the-T flag)[7] since 2014, version 5.2.0;[3] since version 5.4.0 threaded decompression has been implemented. Threaded decompression requires multiple compressed blocks within a stream which are created by the threaded compression interface. The number of threads can be less than defined if the file is not big enough for threading with the given settings or if using more threads would exceed the memory usage limit.[7]
| xz (file format) | |
|---|---|
| Filename extension | .xz |
| Internet media type | application/x-xz |
| Magic number | FD 37 7A 58 5A 00 |
| Developed by | Lasse Collin Igor Pavlov |
| Initial release | 14 January 2009; 17 years ago (2009-01-14) |
| Latest release | 1.2.1 8 April 2024; 22 months ago (2024-04-08) |
| Type of format | Data compression |
| Open format? | Yes |
| Free format? | Yes |
| Website | tukaani |
An xz file is a sequence of one or morestreams. There may be null bytes (padding) after each stream.
The xz format improves on lzma by allowing for preprocessing filters (BCJ anddelta). The exact filters used are similar to those used in7z, as 7z's filters are available in the public domain via the LZMA SDK. xz's RISC-V BCJ filter is its own addition.
Meta research from 2016 shows that xz has the highest compression ratio amonglz4,zstd,zlib, and the slowest compression/decompression.[8]
The author oflzip claims that the xz format is inadequate for long-term archiving.[9]
Multi-byte values uselittle-endian.[10]
| Offset (bytes) | Field | Size (bytes) | Description | |||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Header magic number | 6 | Magic number. Must beFD 37 7A 58 5A 00. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| 6 | Flags | 2 | Flags. The first byte and the four most significant bits of the second byte must be zero (reserved for future use). The type of check (last field in the block structure) is encoded in the four least significant bits of the second byte:
| |||||||||||||||||||||||||||||||||||||||||||||||||||
| 8 | Header CRC32 | 4 | CRC-32 of the flags field. Used to distinguish between a corrupted file and unsupported flags (i.e. non-zero reserved bit). | |||||||||||||||||||||||||||||||||||||||||||||||||||
| 12 | Blocks | Varies | Sequence ofzero or moreblocks. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Varies | Index | Varies | Seeindex below. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Varies | Footer CRC32 | 4 | CRC-32 of the flags and backward size. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Varies | Backward Size | 4 | Size of the index field. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Varies | Flags | 2 | Copy of the flags field above. | |||||||||||||||||||||||||||||||||||||||||||||||||||
| Varies | Footer magic number | 2 | Magic number. Must be59 5A. |
| Offset (bytes) | Field | Size (bytes) | Description |
|---|---|---|---|
| 0 | Header size | 1 | Size of the header. Note:real_header_size = (encoded_header_size + 1) * 4. |
| 1 | Flags | 1 | Flags
|
| 2 | Compressed size | 0 or varies | Size of the compressed data. Present if bit 6 of the flags is set. Encoded as avariable-length integer. |
| Varies | Uncompressed size | 0 or varies | Size of the block after decompression. Present if bit 7 of the flags is set. Encoded as a variable-length integer. |
| Varies | Filter flags | Varies | Sequence of filter flags. The amount is encoded in bits 0–1 of the flags. |
| Varies | Header padding | Varies | As many null bytes as needed to make the header (i.e. fields before the compressed data) have the size specified in the header size field. |
| Varies | CRC32 | 4 | CRC-32 of all bytes in the block up to (not including) this field. |
| Varies | Compressed data | Varies | Thecompressed data. |
| Varies | Block padding | 0, 1, 2 or 3 | 0–3 null bytes to make the size of the block a multiple of 4. |
| Varies | Check | 0, 4, 8, or 32 | Error-detecting mechanism calculated from the data before compression. The type of check is encoded in the flags of the stream structure. |
| Offset (bytes) | Field | Size (bytes) | Description |
|---|---|---|---|
| 0 | Index indicator | 1 | Must be zero to distinguish the index from a block, because this field overlaps with the first field in the block structure. |
| 1 | Number of records | Varies | Number of records in the next field. Must be the same as the number of blocks in the stream. Encoded as a variable-length integer. |
| Varies | Records | Varies | Sequence of records. Each record contains two variable-length integers:
|
| Varies | Padding | 0, 1, 2 or 3 | 0–3 null-bytes to make the size of the index a multiple of 4. |
| Varies | CRC32 | 4 | CRC-32 of all bytes in the index except this field. |
Values from 0 to 127 are stored as is, in one byte. Values greater than 127 (and up to 2^63) are stored in two or more bytes (up to 9). All bytes except the last one have the most significant bit set.[11]
The followingPython code implements functions to encode and decode a variable-length integer.
defencode(num):ifnum>=2**63:raiseValueError("num must not have more than 63 bits")buf=b""whilenum>=0x80:buf+=(0x80|(num&0x7f)).to_bytes(length=1)num>>=7buf+=num.to_bytes(length=1)returnbufdefdecode(buf):iflen(buf)==0:raiseValueError("buf must not be empty")num=0fori,byteinenumerate(buf):ifi>8:raiseValueError("num must not have more than 63 bits")num|=(byte&0x7f)<<(i*7)ifbyte&0x80==0:returnnum
Development of XZ Utils took place within the Tukaani Project, a small group of developers who once maintained aLinux distribution based onSlackware. The chosen name "XZ" is not an abbreviation but instead appears to be a random given name for the data compressors, as there is no mention anywhere in the official specification on the meaning of "XZ".[12]The .xz file format specification version 1.0.0 was officially released in January 2009.[13]
All of thesource code for xz and liblzma has been released into thepublic domain. The XZ Utils source distribution additionally includes some optional scripts and an example program that are subject to various versions of theGNU General Public License (GPL).[1] The resulting software xz and liblzma binaries are public domain, unless the optional LGPLgetopt implementation is incorporated.[14]
Binaries are available forFreeBSD,NetBSD,Linux systems,Microsoft Windows, andFreeDOS. A number ofLinux distributions, includingFedora,Slackware,Ubuntu, andDebian use xz for compressing their software packages.Arch Linux previously used xz to compress packages,[15] but as of 27 December 2019, packages are compressed withZstandard compression.[16] Fedora Linux also switched to compressing its RPM packages with Zstandard with Fedora Linux 31.[17] TheGNU FTP archive also uses xz.
On 29 March 2024, Andres Freund, aPostgreSQL developer working atMicrosoft, announced that he had found a backdoor in XZ Utils, impacting versions 5.6.0 and 5.6.1. Malicious code for setting up thebackdoor had been hidden in compressed test files, and theconfigure script in thetar files was modified to trigger the hidden code. Freund started his investigation "After observing a few odd symptoms around liblzma (part of the xz package)"; specifically thatssh logins usingsshd were "taking a lot ofCPU" and producingvalgrind errors.[18] The vulnerability received aCommon Vulnerability Scoring System (CVSS) score of 10 (the highest).[19]
For instance, I compressed a directory having 37M size using both xz and zip. The zip file size was 31M, while the xz file was 16M after compression