Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 784 – Adding Zstandard to the standard library

PEP 784 – Adding Zstandard to the standard library

Author:
Emma Harper Smith <emma at python.org>
Sponsor:
Gregory P. Smith <greg at krypto.org>
Discussions-To:
Discourse thread
Status:
Final
Type:
Standards Track
Created:
06-Apr-2025
Python-Version:
3.14
Post-History:
07-Apr-2025
Resolution:
25-Apr-2025

Table of Contents

Important

This PEP is a historical document. The up-to-date, canonical documentation can now be found atcompression.zstd.

×

SeePEP 1 for how to propose changes.

Abstract

Zstandard is a widely adopted, mature, and highly efficient compressionstandard. This PEP proposes adding a new module to the Python standard librarycontaining a Python wrapper around Meta’szstd library, the defaultimplementation. Additionally, to avoid name collisions with packages on PyPIand to present a unified interface to Python users, compression modules in thestandard library will be moved under acompression.* package.

Motivation

CPython has modules for several different compression formats, such aszlib(DEFLATE),gzip,bzip2, andlzma, each widely used. Including popular compression algorithmsmatches Python’s “batteries included” philosophy of incorporating widely usefulstandards and utilities.lzma is the most recent such module, added inPython 3.3.

Since then, Zstandard has become the modernde facto preferred compressionlibrary for both high performance compression and decompression attaining highcompression ratios at reasonable CPU and memory cost. Zstandard achieves a muchhigher compression ratio than bzip2 or zlib (DEFLATE) while decompressingsignificantly faster than LZMA.

Zstandard has seenwidespread adoption in many different areas of computing.The numerous hardware implementations demonstrate long-term commitment toZstandard and an expectation that Zstandard will stay thede facto choice forcompression for years to come. This is further evidenced by Zstandard’s IETFstandardization inRFC 8478. Zstandard compression is also implemented inboth theZFS andBtrfs filesystems.

Zstandard’s highly efficient compression has supplanted other moderncompression formats, such asbrotli,lzo, anducl due to its highlyefficient compression. WhileLZ4 is still used in very high throughputscenarios, Zstandard can also be used in some of these contexts.While inclusion of LZ4 is out of scope, it would be a compelling futureaddition to thecompression namespace introduced by this PEP.

There are several bindings to Zstandard for Python available on PyPI, each withdifferent APIs and choices of how to bind thezstd library. One goal withintroducing an official module in the standard library is to reduce confusionfor Python users who want simple compression/decompression APIs for Zstandard.The existing packages can continue providing extended APIs or integratefeatures from newer Zstandard versions.

Another reason to add Zstandard support to the standard library is to resolvea long standing open issue (python/cpython#81276) requesting Zstandardsupport in thetarfile module. This issue has the 5th most “thumbs up”of open issues on the CPython tracker, and has garnered a significant amount ofdiscussion and interest. Additionally, the ZIP format standardizes aZstandard compression format ID, and integration with thezipfilemodule would allow opening ZIP archives using Zstandard compression. Thereference implementation for this PEP contains integration with thezipfile,tarfile, andshutil modules.

Zstandard compression could also be used to make Python wheel packages smallerand significantly faster to install. Anaconda found a sizeable speedup whenadopting Zstandard for the conda package format:

Conda’s download sizes are reduced ~30-40%, and extraction is dramatically faster.[…]We see approximately a 2.5x overall speedup, almost all thanks to the dramatically faster extraction speed of the zstd compression used in the new file format.

Anaconda blog on Zstandard adoption

Zstandard has a significantly higher compression ratio compared to wheel’sexisting zlib-based compression,according to lzbench, a comprehensivebenchmark of many different compression libraries and formats.While this PEP doesnot prescribe any changes to the wheel format or otherpackaging standards, having Zstandard bindings in the standard library wouldenable a future PEP to improve the user experience for Python wheel packages.

Rationale

Introduction of acompression package

Both thezstd andzstandard import names are claimed by projects onPyPI. To avoid breaking users of one of the existing bindings, this PEPproposes introducing a new namespace for compression libraries,compression. This name is already reserved on PyPI for use in thestandard library. The new Zstandard module will be namedcompression.zstd.Other compression modules will be re-exported in the newcompression package.

Providing a common namespace for compression modules has several advantages.First, it reduces user confusion about where to find compression modules.Second, the top levelcompression module could provide information on whichcompression formats are available, similar tohashlib’salgorithms_available. IfPEP 775 is accepted, acompression.algorithms_guaranteed could be provided as well, listingzlib. Finally, acompression namespace prevents future issues withmerging other compression formats into the standard library. New compressionformats will likely be published to PyPI prior to integration intoCPython. Therefore, any new compression format import name will likely alreadybe claimed by the time a module would be considered for inclusion in CPython.Putting compression modules under a package prefix prevents issues withpotential future name clashes.

Code that would like to remain compatible across Python versions may use thefollowing pattern to ensure compatibility:

try:fromcompression.lzmaimportLZMAFileexceptImportError:fromlzmaimportLZMAFile

This will use the newer import name when available and fall back to the oldname otherwise.

Implementation based onpyzstd

The implementation for this PEP is based on thepyzstd project.This project was chosen as the code wasoriginally written to be upstreamedto CPython by Ma Lin, who also wrote theoutput buffer implementation used inthe standard library today.The project has since been taken over by Rogdham and is published to PyPI. TheAPIs inpyzstd are similar to the APIs for other compression modules in thestandard library such asbz2 andlzma.

Minimum supported Zstandard version

The minimum supported Zstandard was chosen as v1.4.5, released in May of 2020.This version was chosen as a minimum based on reviewing the versions ofZstandard available in a number of Linux distribution package repositories,including LTS releases. This version choice is rather conservative to maximizecompatibility with existing LTS Linux distributions, but a newer Zstandardversion could likely be chosen given that newer Python releases are generallypackaged as part of newer distribution releases.

Specification

Thecompression namespace

A new namespace for compression modules will be introduced namedcompression. The top-level module for this package will be empty to beginwith, but a standard API for interacting with compression routines may beadded in the future to the toplevel.

Thecompression.zstd module

A new module,compression.zstd will be introduced with Zstandardcompression APIs that match other compression modules in the standard library,namely

  • compress() /decompress() - APIs for one-shot compressionor decompression
  • ZstdFile /open() - APIs for interacting with streamsand file-like objects
  • ZstdCompressor /ZstdDecompressor - APIs for incrementalcompression or decompression

It will also contain some Zstandard-specific functionality:

  • ZstdDict /train_dict() /finalize_dict() - APIs forinteracting with Zstandard dictionaries, which are useful for compressingmany small chunks of similar data

libzstd optional dependency

Thelibzstd library will become an optional dependency of CPython. If thelibrary is not available, thecompression.zstd module will be unavailable.This is handled automatically on Unix platforms as part of the normal buildenvironment detection.

On Windows,libzstd will be added tothe source dependenciesused to build libraries CPython depends on for Windows.

Other compression modules

New import namescompression.lzma,compression.bz2,compression.gzip andcompression.zlib will be introduced in Python 3.14re-exporting the contents of the existinglzma,bz2,gzip andzlib modules respectively. Thecompression sub-modules will becomethe canonical import names going forward. The use of the new compression nameswill be promoted over the original top level module names in the Pythondocumentation when the minimum supported Python version requirements makethat feasible.

The_compression module, given that it is marked private, will beimmediately renamed tocompression._common._streams. The new name wasselected due to the current contents of the module being I/O related helpersfor stream APIs (e.g.LZMAFile) in standard library compression modules.

Backwards Compatibility

This PEP introduces no backwards incompatible changes. There are currently noplans to deprecate or remove the existing compression modules. Any deprecationor removal of the existing modules is left to a future decision but will occurno sooner than 5 years from the acceptance of this PEP.

Security Implications

As with any new C code, especially code operating on potentially untrusted userinput, there are risks of memory safety issues. The author plans oncontributing integration with libfuzzer to enable fuzzing the_zstd codeand ensure it is robust. Furthermore, there are a number of tests that exercisethe compression and decompression routines. These tests pass without error whencompiled with AddressSanitizer.

Taking on a new dependency also always has security risks, but thezstdlibrary is mature, fuzzed on each commit, andparticipates in Meta’s bug bountyprogram.

How to Teach This

Documentation for the new module is in the reference implementation branch. Thedocumentation for existing modules will be updated to reference the new namesas well.

Reference Implementation

Thereference implementationcontains the_zstd C code, thecompression.zstd code, modifications totarfile,shutil, andzipfile, and tests for each new API andintegration added. It also contains the re-exports of other compressionmodules.

Rejected Ideas

Name the modulezstdlib and do not make a newcompression namespace

One option instead of making a newcompression namespace would be to finda different name, such aszstdlib, as the import name. Several other names,such aszst,libzstd, andzstdcomp were proposed as well. Indiscussion, the names were found to either be too easy to typo, or unintuitive.Furthermore, the issue of existing import names is likely to persist for futurecompression formats added to the standard library. LZ4, a common high speedcompression format, hasa package on PyPI,lz4, with the import namelz4. Instead of solving this issue for eachcompression format, it is better to solve it once and for all by using thealready-claimedcompression namespace.

Introduce an experimental_zstd package in Python 3.14

Since this PEP was published close to the beta cutoff for new features forPython 3.14, one proposal was to name the package a private module_zstdso that packaging tools could use it sooner, but not deciding on a name. Thiswould allow more time for discussion of the final module name during the 3.15development window. However, introducing a private module was not popular. Theexpectations and contract for external usage of a private module in thestandard library are unclear.

Introduce a standard library namespace instead ofcompression

One alternative to acompression namespace would be to introduce astd namespace for the entire standard library. However, this was seen astoo significant a change for 3.14, with no agreed upon semantics, migrationpath, or name for the package. Furthermore, a future PEP introducing astdnamespace could always define that thecompression sub-modules be flattenedinto thestd namespace.

Includezipfile andtarfile incompression

Compression is often used with archiving tools, so putting bothzipfileandtarfile under thecompression namespace is appealing. However,compression can be used beyond just archiving tools. For example, networkrequests can be gzip compressed. Furthermore, formats like tar do not includecompression themselves, instead relying on external compression. Therefore,this PEP does not propose movingzipfile ortarfile undercompression.

Do not includegzip undercompression

TheGZip format RFC defines a format which can include multipleblocks and metadata about its contents. In this way GZip is rather similar toarchive formats like ZIP and tar. Despite that, in usage GZip is often treatedas a compression format rather than an archive format. Looking at how differentlanguages classify GZip, the prevailing trend is to classify it as acompression format and not an archiving format.

LanguageCompression or ArchiveDocumentation Link
GolangCompressionhttps://pkg.go.dev/compress/gzip
RubyCompressionhttps://docs.ruby-lang.org/en/master/Zlib/GzipFile.html
RustCompressionhttps://github.com/rust-lang/flate2-rs
HaskellCompressionhttps://hackage.haskell.org/package/zlib
C#Compressionhttps://learn.microsoft.com/en-us/dotnet/api/system.io.compression.gzipstream
JavaArchivehttps://docs.oracle.com/javase/8/docs/api/java/util/zip/package-summary.html
NodeJSCompressionhttps://nodejs.org/api/zlib.html
Web APIsCompressionhttps://developer.mozilla.org/en-US/docs/Web/API/Compression_Streams_API
PHPCompressionhttps://www.php.net/manual/en/function.gzcompress.php
PerlCompressionhttps://perldoc.perl.org/IO::Compress::Gzip

In addition, thegzip module in Python mostly focuses on single blockcontent and has an API similar to other compression modules, making it a goodfit for thecompression namespace.

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0784.rst

Last modified:2025-05-24 04:38:02 GMT


[8]ページ先頭

©2009-2026 Movatter.jp