Important
This PEP is a historical document. The up-to-date, canonical spec,https://packaging.python.org/en/latest/specifications/binary-distribution-format/#the-dist-info-sboms-directory, is maintained on thePyPA specs page.
×
See thePyPA specification update process for how to propose changes.
Almost all Python packages today are accurately measurable by softwarecomposition analysis (SCA) tools. For projects that are not accuratelymeasurable, there is no existing mechanism to annotate a Python packagewith composition data to improve measurability.
Software Bill-of-Materials (SBOM) is a technology-and-ecosystem-agnosticmethod for describing software composition, provenance, heritage, and more.SBOMs are used as inputs for SCA tools, such as scanners for vulnerabilities andlicenses, and have been gaining traction in global software regulations andframeworks.
This PEP proposes using SBOM documents included in Python packages as ameans to improve automated software measurability for Python packages.
Python packages are particularly affected by the “phantom dependency”problem, where software components that aren’t written in Python are includedin Python packages for many reasons, such as ease of installation andcompatibility with standards:
These software components can’t be described using Python package metadata andthus are likely to be missed by software composition analysis (SCA) softwarewhich can mean vulnerable software components aren’t reported accurately.
For example,the Python package Pillow includes 16 shared object libraries in the wheel thatwere bundled by auditwheel as a part of the build. None of those shared objectlibraries are detected when using common SCA tools like Syft and Grype.If an SBOM document is included annotating all the included shared librariesthen SCA tools can identify the included software reliably.
Going beyond the runtime dependencies of a package: SBOMs can also record thetools and environments used to build a package. Recording the exact toolsand versions used to build a package is often required to establishbuild reproducibility.Build reproducibility is a property of software that can be used to detectincorrectly or maliciously modified software components when compared to theirupstream sources. Without a recorded list of build tools and versions it canbecome difficult to impossible for a third-party to verify build reproducibility.
SBOMs are required by recent software security regulations, like theSecure Software Development Framework (SSDF) and theCyber Resilience Act (CRA). Due to their inclusion in these regulations,the demand for SBOM documents of open source projects is expected to be high.One goal is to minimize the demands on open source project maintainers byenabling open source users that need SBOMs to self-serve using existingtooling.
Another goal is to enable contributions from users who need SBOMs to annotateprojects they depend on with SBOM information. Today there is no mechanism topropagate the results of those contributions for a Python package so there isno incentive for users to contribute this type of work.
Attempting to add every field offered by SBOM standards into Python packageCore Metadata would result in an explosion of new Core Metadata fields,including the need to keep up-to-date as SBOM standards continue to evolveto suit new needs in that space.
Instead, this proposal delegates SBOM-specific metadata to SBOM documents thatare included in Python packages into a named directory under dist-info.
This standard also doesn’t aim to replace Core Metadata with SBOMs,instead focusing on the SBOM information being supplemental to Core Metadata.Included SBOMs only contain information about dependencies included in thepackage archive or information about the top-level software in the package thatcan’t be encoded into Core Metadata but is relevant for the SBOM use-case(“software identifiers”, “purpose”, “support level”, etc).
Rather than requiring at most one included SBOM document per Python package,this PEP proposes that one or more SBOM documents may be included in a Pythonpackage. This means that code attempting to annotate a Python package with SBOMdata may do so without being concerned about corrupting data already containedwithin other SBOM documents.
Additionally, this PEP treats SBOM document data opaquely instead relying onfinal end-users of the SBOM data to process the contained SBOM data.This choice acknowledges that SBOM standards are an active area of developmentwhere there is not yet (and may never be) a single definitive SBOM standardand that SBOM standards can continue to evolve independent of Python packagingstandards. Already tools that consume SBOM documents support a multitude ofSBOM standards to handle this reality.
These decisions mean this PEP is capable of supporting any SBOM standardand does not favor one over the other, instead deferring the decision toproducing projects and tools and consuming user tooling.
The rollout of a new metadata version and field requires that many differentprojects and teams need to adopt the metadata version in sequence to avoidwidespread breakage. This effect usually means a substantial delay in howquickly users and tools can start using new packaging features.
For example, a single metadata version bump requiresupdates to PyPI, variouspyproject.toml parsing and schema projects,thepackaging library, wait for releases, thenpip and other installersneed to bundle the changes topackaging and release, then build backends canbegin emitting the new metadata version, again wait for releases, and only thencan projects begin using the new features. Even with this careful approach it’snot guaranteed that tools won’t break on new metadata versions and fields.
To avoid this delay, simplify overall how to include SBOMs, and to giveflexibility to build backends and tools, this PEP proposes using a subdirectoryunder.dist-info to safely add data to a Python package while avoiding theneed for new metadata fields and versions. This mechanism allows build backendsand tools to begin using the feature described in this PEP immediately afteracceptance without the head-of-line blocking on other projects adopting the PEP.
.dist-info or.data directoryThere are two top-level directories in binary distributions where files beyondthe software itself can be stored:.dist-info and.data.This specification chose to use the.dist-info directory for storingsubdirectories and files.
Firstly, the.data directory has no corresponding location in the installedpackage, compared to.dist-info which does preserve the link between thebinary distribution to the installed package in an environment. The.datadirectory instead has all its contents merged between all installed packages inan environment which can lead to collisions between similarly named files.
Secondly, subdirectories under the.data directory require new definitionsto the Pythonsysconfigmodule. This means defining additional directories require waiting for a changeto Python andusing the directory requires waiting for adoption of the newPython version by users. Subdirectories under.dist-info don’t have theserequirements, they can be used by any user, build backend, and installerimmediately after a new subdirectory name is registered regardless of Pythonor metadata version.
PEP 725(“Specifying external dependencies in pyproject.toml”) is a differentPEP with some similarities to PEP 770, such as attempting to describe non-Pythonsoftware within Python packaging metadata. This section aims to show how thesetwo PEPs are tracking different information and serving different use-cases:
virtual:compiler/c) or needing to link “theOpenSSL library” at build time (pkg:generic/openssl). PEP 770 describesconcrete dependencies, more akin to dependencies in a “lock file”, such asan exact name, version, architecture, andhash of a software library distributed through AlmaLinux distribution(pkg:rpm/almalinux/libssl3@3.2.0). For cases like build dependencies thismight result in a dependency being requested via PEP 725 and then recordedconcretely in an SBOM post-build with PEP 770.pyproject.toml by hand.The users of the information are build backends and users who want to buildsoftware from source.PEP 770 is primarily for tools which are capable of generating SBOM documentsto be included in a Python package archive and SBOM/SCA tools which want toSBOM documents about installed software to do some other task such asvulnerability scanning or software analysis.The changes necessary to implement this PEP include:
.dist-info/sboms.In addition to the above, an informational PEP will be created for toolsconsuming included SBOM documents and other Python package metadata togenerate complete SBOM documents for Python packages.
.dist-info/sboms directoryThis PEP introduces a new registry of reserved subdirectory names allowed inthe.dist-info directory for thedistribution archiveandinstalled project s project types. Future additions to this registrywill be made through the PEP process. The initial values in this registry are:
| Subdirectory name | PEP / Standard |
|---|---|
licenses | PEP 639 |
license_files | PEP 639 (draft-only) |
LICENSES | REUSE licensing framework |
sboms | PEP 770 |
SeeBackwards Compatibility for a complete methodology foravoiding backwards incompatibilities with selecting this directory name.
A few additions will be made to the existing specifications.
.dist-info/sboms subdirectoryis specified that the directory contains SBOM files..dist-info/sboms subdirectory is specified that the directorycontains SBOM files and that any files in this directory MUST be copied fromwheels by install tools.This PEP treats data contained within SBOM documents as opaque, recognizingthat SBOM standards are an active area of development. However, there are someconsiderations for SBOM data producers that when followed will improve theinteroperability and usability of SBOM data made available in Python packages:
PyPI and other indices MAY validate the contents of SBOM documents specified bythis PEP, but MUST NOT validate or reject data for unknownSBOM standards, versions, or fields.
.dist-info/sboms subdirectoryThe new reserved.dist-info/sboms subdirectory representsa new reservation that wasn’t previously documented, thus has the potential tobreak assumptions being made by already existing tools.
To check what.dist-info subdirectory names are in use todaya query acrossall files in package archives on PyPIwas executed:
SELECT(regexp_extract(archive_path,'.*\.dist-info/([^/]+)/',1)ASdirname,COUNT(DISTINCTproject_name)ASprojects)FROM'*.parquet'WHEREarchive_pathLIKE'%.dist-info/%/%'GROUPBYdirnameORDERBYprojectsDESC;
Note that this only includes records forfiles and thus won’t return results for empty directories. Empty directoriesbeing pervasively used and somehow load-bearing is unlikely, so is an acceptedrisk of using this method. This query yielded the following results:
| Subdirectory | Unique Projects |
|---|---|
licenses | 22,026 |
license_files | 1,828 |
LICENSES | 170 |
.ipynb_checkpoints | 85 |
license | 18 |
.wex | 9 |
dist | 8 |
include | 6 |
build | 5 |
tmp | 4 |
src | 3 |
calmjs_artifacts | 3 |
.idea | 2 |
Not shown above are around ~50 other subdirectory names that are used in asingle project. From these results we can see:
.dist-info are to do with licensing,one of which (licenses) is specified byPEP 639 and others(license_files,LICENSES) are from draft implementationsofPEP 639.sboms subdirectory doesn’t collide with existing use..dist-info appear to be either notwidespread or accidental.As a result of this query we can see there are already some projects placingdirectories under.dist-info, so we can’t require that build frontendsraise errors for unregistered subdirectories. Instead the recommendation isthat build frontends MAY warn the user or raise an error in this scenario.
SBOM documents are only as useful as the information encoded in them.If an SBOM document contains incorrect information then this can result inincorrect downstream analysis by SCA tools. For this reason, it’s importantfor tools including SBOM data into Python packages to be confident in theinformation they are recording. SBOMs are capable of recording “known unknowns”in addition to known data. This practice is recommended when not certain aboutthe data being recorded to allow for further analysis by users.
Because SBOM documents can encode information about the original systemwhere a Python package is built (for example, the operating system name andversion, less commonly the names of paths). This information has the potentialto “leak” through the Python package to installers via SBOMs. If thisinformation is sensitive, then that could represent a security risk.
Most typical users of Python and Python packages won’t need to know the detailsof this standard. The details of this standard are most important to eithermaintainers of Python packages and developers of SCA tools such asSBOM generation tools and vulnerability scanners.
Python package metadata can already describe the top-level software included ina package archive, but what if a package archive contains other softwarecomponents beyond the top-level software? For example, the Python wheel for“Pillow” contains a handful of other software libraries bundled inside, likelibjpeg,libpng,libwebp, and so on. This scenario is where this PEPis most useful, for adding metadata about bundled software to a Python package.
Some build tools may be able to automatically annotate bundled dependencies.Typically tools can automatically annotate bundled dependencies when thosedependencies come from a “packaging ecosystem” (such as PyPI, Linux distros,Crates.io, NPM, etc).
Developers of SBOM generation tooling will need to know about the existenceof this PEP and that Python packages may begin publishing SBOM documentswithin package archives. This information needs to be included as a part ofgenerating an SBOM document for a particular Python package or Pythonenvironment.
A follow-up informational PEP will be authored to describe how to transformPython packaging metadata, including the mechanism described in this PEP,into an SBOM document describing Python packages. Once the informational PEP iscomplete, tracking issues will be opened specifically linking to theinformational PEP to spur the adoption of PEP 770 by SBOM tools.
Abenchmark is being createdto compare the outputs of different SBOM tools when run with various Pythonpackaging inputs (package archive, installed package, environment, containerimage) is being created to track the progress of different SBOM generationtools. This benchmark will inform where tools have gaps in supportof this PEP and Python packages.
Many users of this PEP won’t know of its existence, instead their softwarecomposition analysis tools, SBOM tools, or vulnerability scanners will simplybegin giving more comprehensive information after an upgrade. For users that areinterested in the sources of this new information, the “tool” field of SBOMmetadata already provides linkages to the projects generating their SBOMs.
For users who need SBOM documents describing their open source dependencies thefirst step should always be “create them yourself”. Using the benchmarks abovea list of tools that are known to be accurate for Python packages can bedocumented and recommended to users. For projects which requireadditional manual SBOM annotation: tips for contributing this data and tools formaintaining the data can be recommended.
Note that SBOM documents can vary across different Python package archivesdue to variance in dependencies, Python version, platform, architecture, etc.For this reason users SHOULD only use the SBOM documents contained withinthe actual downloaded and installed Python package archive and not assume thatthe SBOM documents are the same for all archives in a given package release.
Auditwheel forkwhich generates CycloneDX SBOM documents to include in wheels describingbundled shared library files. These SBOM documents worked as expected for theSyft and Grype SBOM and vulnerability scanners.
There is no universally accepted SBOM standard and this area is stillrapidly evolving (for example, SPDX released a new major version of theirstandard in April 2024). Most discussion and development around SBOMs todayfocuses on two SBOM standards:CycloneDX andSPDX.
To avoid locking the Python ecosystem into a specificstandard ahead of when a clear winner emerges this PEP treats SBOM documentsas opaque and only makes recommendations to promote compatibility withdownstream consumers of SBOM document data.
None of the decisions in this PEP restrict a future PEP to selecta single SBOM standard. Tools that use SBOM data today already need to supportmultiple formats to handle this situation, so a future standard that updates torequire only one standard would have no effect on downstream SBOM tools.
A previous iteration of this specification used anSbom-File metadatafield to specify an SBOM file within a source or binary distribution archive.This would make the implementation similar toPEP 639 which uses theLicense-File field to enumerate license files in archives.
The primary issue with this approach is that SBOM files can originate from bothstatic and dynamic sources: like versioned source code, the build backend,or from tools adding SBOM files after the build has completed (like auditwheel).
Metadata fields must either be static or dynamic, not both. This isin direct conflict with the best-case scenario for SBOM data: that SBOM filesare added automatically by tools during the build of a Python package withoutuser-involvement or knowledge. Compare this situation to license files whichare almost always static.
The 639-style approach was ultimately dropped in favor of defining SBOMs simplyby their presence in the.dist-info/sboms directory. This approach allowsbuild backends and tools to add their own SBOM data without the static/dynamicconflict.
A future PEP will define the process for statically defining SBOM files to beadded to the.dist-info/sboms directory.
.dist-info files.Thanks to Karolina Surma for authoring and leadingPEP 639 to acceptance.This PEP’s initial design was heavily inspired byPEP 639 and adopts asimilar approach of using a subdirectory under.dist-info to store files.
This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.
Source:https://github.com/python/peps/blob/main/peps/pep-0770.rst
Last modified:2025-05-12 19:29:14 GMT