This PEP describes a way to record the provenance of installed Python distributions.The record is created by an installer and is available to users inthe form of a JSON fileprovenance_url.json in the.dist-info directory.The mentioned JSON file captures additional metadata to allow recording a URL to adistribution package together with the installed distribution hash.This proposal is built on top ofPEP 610 followingits correspondingcanonical PyPA spec and complementsdirect_url.jsonwithprovenance_url.json for when packages are identified by a name, andoptionally a version.
Installing a PythonProject involves downloading aDistribution Packagefrom aPackage Indexand extracting its content to an appropriate place. After the installationprocess is done, information about the release artifact used as well as its sourceis generally lost. However, there are use cases for keeping records ofdistributions used for installing packages and their provenance.
Python wheels can be built with different compiler flags or supportingdifferent wheel tags. In both cases, users might get into a situation in whichmultiple wheels might be considered by installers (possibly from differentpackage indexes) and immediately finding out which wheel file was actually usedduring the installation might be helpful. This way, developers can useinformation about wheels to debug issues making sure the desired wheel wasactually installed. Another use case could be tools reporting softwareinstalled, such as tools reporting a SBOM (Software Bill of Materials), that mightgive more accurate reports. Yet another use case could be reconstruction of thePython environment by pinning each installed package to a specific distributionartifact consumed from a Python package index.
The motivation described in this PEP is an extension ofRecording theDirect URL Origin of installed distributionsspecification. In addition to recording provenance information for packagesinstalled using a direct URL, installers should also do so for packagesinstalled by name (and optionally version) from Python package indexes.
The idea described in this PEP originated in a tool calledmicropipenvthat is used to installdistribution packages in containerizedenvironments (see the reported issuethoth-station/micropipenv#206).Currently, the assembled containerized application does not implicitly carryinformation about the provenance of installed distribution packages(unless these are installed from full URLs and recorded viadirect_url.json).This requires container image suppliers to linkcontainer images with the corresponding build process, its configuration andthe application source code for checking requirements files in cases whensoftware present in containerized environments needs to be audited.
Thesubsequent discussion in the Discourse thread also brought uppip’s new--report option that cangenerate a detailed JSON report aboutthe installation process. This option could help with the provenance problemthis PEP approaches. Nevertheless, this option needs to beexplicitly passedto pip to obtain the provenance information, and includes additional metadata thatmight not be necessary for checking the provenance (such as Python versionrequirements of each distribution package). Also, this option isspecific to pip as of the writing of this PEP.
Note the currentspec for recording installed packages defines aRECORD file thatrecords installed files, but not the distribution artifact from which thesefiles were obtained. Auditing installed artifacts can be performedbased on matching the entries listed in theRECORD file. However, thistechnique requires a pre-computed database of files each artifact provides or acomparison with the actual artifact content. Both approaches are relativelyexpensive and time consuming operations which could be eliminated with theproposedprovenance_url.json file.
Recording provenance information for installed distribution packages,both those obtained from direct URLs and by name/version from an index,can simplify auditing Python environments in general, beyond justthe specific use case for containerized applications mentioned earlier.A community projectpip-audit raised their possible interest inpypa/pip-audit#170.
The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHOULD”,“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL”in this document are to be interpreted as described inRFC 2119.
Theprovenance_url.json file SHOULD be created in the.dist-infodirectory by installers when installing aDistribution Packagespecified by name (and optionally byVersion Specifier).
This file MUST NOT be created when installing a distribution package from a requirementspecifying a direct URL reference (including a VCS URL).
Only one of the filesprovenance_url.json anddirect_url.json (fromRecording the Direct URL Origin of installed distributions specification and the corresponding specification oftheDirect URL Data Structure),may be present in a given.dist-info directory; installers MUST NOT addboth.
Theprovenance_url.json JSON file MUST be a dictionary, compliant withRFC 8259 and UTF-8 encoded.
If present, it MUST contain exactly two keys. The first MUST beurl, withtypestring. The second key MUST bearchive_info with a value definedbelow.
The value of theurl key MUST be the URL from which the distributionpackage was downloaded. If a wheel is built from a source distribution, theurl value MUST be the URL from which the source distribution wasdownloaded. If a wheel is downloaded and installed directly, theurl fieldMUST be the URL from which the wheel was downloaded. As in theDirect URLData Structure specification, theurlvalue MUST be stripped of any sensitive authentication information for securityreasons.
The user:password section of the URL MAY however be composed of environmentvariables, matching the following regular expression:
\$\{[A-Za-z0-9-_]+\}(:\$\{[A-Za-z0-9-_]+\})?Additionally, the user:password section of the URL MAY be a well-known,non-security sensitive string. A typical example isgit in the case of anURL such asssh://git@gitlab.com.
The value ofarchive_info MUST be a dictionary with a single keyhashes. The value ofhashes is a dictionary mapping hash functionnames to a hex-encoded digest of the file referenced by theurl value. Atleast one hash MUST be recorded. Multiple hashes MAY be included, and it is upto the consumer to decide what to do with multiple hashes (it may validate allof them or a subset of them, or nothing at all).
Each hash MUST be one of the single argument hashes provided byhashlib.algorithms_guaranteed, excludingsha1 andmd5 which MUST NOT be used.As of Python 3.11, withshake_128 andshake_256 excludedfor being multi-argument, the allowed set of hashes is:
>>>importhashlib>>>sorted(hashlib.algorithms_guaranteed-{"shake_128","shake_256","sha1","md5"})['blake2b', 'blake2s', 'sha224', 'sha256', 'sha384', 'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512', 'sha512']
Each hash MUST be referenced by the canonical name of the hash, always lower case.
Hashessha1 andmd5 MUST NOT be present, due to the securitylimitations of these hash algorithms. Conversely, hashsha256 SHOULDbe included.
Installers that cache distribution packages from an index SHOULD keepinformation related to the cached distribution artifact, so thattheprovenance_url.json file can be created even when installing distribution packagesfrom the installer’s cache.
Following theRecording installed projects specification,installers may keep additional installer-specific files in the.dist-infodirectory. To make sure this PEP does not cause any backwards compatibilityissues, acomprehensive survey of installers and librariesfound no current tools that are using a similarly-named file,or other major feasibility concerns.
TheWheel specification lists files that can bepresent in the.dist-info directory. None of these file names collide withthe proposedprovenance_url.json file from this PEP.
A comprehensive survey of the existing installers, libraries, and dependencymanagers in the Python ecosystem analyzed the implications of adding support forprovenance_url.json to each tool.In summary, no major backwards compatibility issues, conflicts or feasibility blockerswere found as of the time of writing of this PEP. More details about the surveycan be found in theAppendix: Survey of installers and libraries section.
This proposal does not make any changes to thedirect_url.json filedescribed inPEP 610 andits corresponding canonical PyPA spec.
The content ofprovenance_url.json file was designed in a way to eventuallyallow installers reuse some of the logic supportingdirect_url.json when adirect URL refers to a source archive or a wheel.
The main difference between theprovenance_url.json anddirect_url.jsonfiles are the mandatory keys and their values in theprovenance_url.json file.This helps make sure consumers of theprovenance_url.json file can relyon its content, if the file is present in the.dist-info directory.
One of the main security features of theprovenance_url.json file is theability to audit installed artifacts in Python environments. Tools can checkwhich Python package indexes were used to install Pythondistributionpackages as well as the hash digests of their releaseartifacts.
As an example, we can take the recent compromised dependency chain inthePyTorch incident.The PyTorch index provided a package namedtorchtriton. An attackerpublishedtorchtriton on PyPI, which ran a malicious binary. By checkingthe URL of the installed Python distribution stated in theprovenance_url.json file, tools can automatically check the source of theinstalled Python distribution. In case of the PyTorch incident, the URL oftorchtriton should point to the PyTorch index, not PyPI. Tools can helpidentifying such malicious Python distributions installed by checking theinstalled Python distribution URL. A more exact check can include also the hashof the installed Python distribution stated in theprovenance_url.jsonfile. Such checks on hashes can be helpful for mirrored Python package indexeswhere Python distributions are not distinguishable by their source URLs, makingsure only desired Python package distributions are installed.
A malicious actor can intentionally adjust the content ofprovenance_url.json to possibly hide provenance information of theinstalled Python distribution. A security check which would uncover suchmalicious activity is beyond scope of this PEP as it would require monitoringactions on the filesystem and eventually reviewing user or file permissions.
Theprovenance_url.json metadata file is intended for tools and is notdirectly visible to end users.
A validprovenance_url.json list multiple hashes:
{"archive_info":{"hashes":{"blake2s":"fffeaf3d0bd71dc960ca2113af890a2f2198f2466f8cd58ce4b77c1fc54601ff","sha256":"236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f","sha3_256":"c856930e0f707266d30e5b48c667a843d45e79bb30473c464e92dfa158285eab","sha512":"6bad5536c30a0b2d5905318a1592948929fbac9baf3bcf2e7faeaf90f445f82bc2b656d0a89070d8a6a9395761f4793c83187bd640c64b2656a112b5be41f73d"}},"url":"https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"}
A validprovenance_url.json listing a single hash entry:
{"archive_info":{"hashes":{"sha256":"236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f"}},"url":"https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"}
A validprovenance_url.json listing a source distribution which was used tobuild and install a wheel:
{"archive_info":{"hashes":{"sha256":"8bfe29f17c10e2f2e619de8033a07a224058d96b3bfe2ed61777596f7ffd7fa9"}},"url":"https://files.pythonhosted.org/packages/1d/43/ad8ae671de795ec2eafd86515ef9842ab68455009d864c058d0c3dcf680d/micropipenv-0.0.1.tar.gz"}
The following example includes ahash key in thearchive_infodictionary as originally designed in the data structure documented inRecording the Direct URL Origin of installed distributions. Thehash key MUST NOT be present to preventfrom any possible confusion withhashes and additional checks that would berequired to keep hash values in sync.
{"archive_info":{"hash":"sha256=236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f","hashes":{"sha256":"236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f"}},"url":"https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"}
Another example demonstrates an invalid hash name. The referenced hash name does notcorrespond to the canonical hash names described in this PEP andin the Python docs underhashlib.hash.name.
{"archive_info":{"hashes":{"SHA-256":"236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f"}},"url":"https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"}
The last example demonstrates aprovenance_url.json file with no hashesavailable for the downloaded artifact:
{"archive_info":{"hashes":{}}"url":"https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"}
These commands generate adirect_url.json file but do not generate aprovenance_url.json file. These examples follow examples fromDirectURL Data Structure specification:
pipinstallhttps://example.com/app-1.0.tgzpipinstallhttps://example.com/app-1.0.whlpipinstall"git+https://example.com/repo/app.git#egg=app&subdirectory=setup"pipinstall./apppipinstallfile:///home/user/apppipinstall--editable"git+https://example.com/repo/app.git#egg=app&subdirectory=setup" (in which case,url will be the local directory where the git repository has been cloned to, anddir_info will be present with"editable":true and novcs_info will be set)pipinstall-e./appCommands that generate aprovenance_url.json file but do not generateadirect_url.json file:
pipinstallapppipinstallapp~=2.2.0pipinstallapp--no-index--find-links"https://example.com/"This behaviour can be tested using changes to pip implemented in the PRpypa/pip#11865.
A proof-of-concept for creating theprovenance_url.json metadata file wheninstalling a PythonDistribution Package is available in the PR to pippypa/pip#11865. It reuses the already available implementation for thedirect URL data structure toprovide theprovenance_url.json metadata file for cases whendirect_url.json is not created.
A reference implementation for supporting theprovenance_url.json filein PDM exists is available inpdm-project/pdm#3013.
A prototype calledpip-preserve was developed todemonstrate creation ofrequirements.txt files consideringdirect_url.jsonandprovenance_url.json metadata files. This tool mimics thepipfreeze functionality, but the listing of installed packages also includesthe hashes of the Python distribution artifacts.
To further support this proposal,pip-sbom demonstrates creationof SBOM in the SPDX format. The tool uses information stored in theprovenance_url.jsonfile.
To preserve backwards compatibility with theRecording the Direct URL Origin of installed distributions, the file cannot be nameddirect_url.json, as per the text of that specification:
This file MUST NOT be created when installing a distribution from an othertype of requirement (i.e. name plus version specifier).
Such a change might introduce backwards compatibility issues for consumers ofdirect_url.json who rely on its presence only when distributions areinstalled using a direct URL reference.
Filedirect_url.json is already well established by theDirect URLData Structure specification and isalready used by installers. For example,pip usesdirect_url.json toreport a direct URL reference onpipfreeze. Deprecatingdirect_url.json would require additional changes to thepipfreezeimplementation in pip (see PRfridex/pip#2) and could introduce backwardscompatibility issues for already existingdirect_url.json consumers.
Direct URL Data Structurespecification discusses the possibility to include thehash key alongsidethehashes key in thearchive_info dictionary. This PEP explicitly doesnot include thehash key in theprovenance_url.json file and allowsonly thehashes key to be present. By doing so we eliminate possibleredundancy in the file, possible confusion, and any additional checks thatwould need to be done to make sure the hashes are in sync.
For cases when a wheel file is installed from pip’s cache and built using anolder version of pip, pip does not record hashes of the downloaded sourcedistributions. As we do not have hashes of these downloaded sourcedistributions, thehashes key in theprovenance_url.json file would notcontain any entries. In such cases, pip does not create anyprovenance_url.json file as the provenance information is not complete. Itis encouraged for consumers to rebuild wheels with a newer version of pip inthese cases.
uv developersraised a concern about requiring at least one hash in theprovenance_url.json fileas uv does not calculate distribution hashes unless explicitly required.However, requiring at least one hash aids in integrity checks fordistributions. This is important in scenarios involving lock files or whenidentifying distributions as part of SBOMs. Theprovenance_url.json filemandates the inclusion of at least one hash for the downloaded distribution.Installers that do not compute hashes of distributions as part of theinstallation process (e.g., due to performance reasons) can omit creating theprovenance_url.json file.
PEP 610 andits corresponding canonical PyPA specrecommend including thehashes key of thearchive_info in thedirect_url.json file but it is not required (per theRFC 2119 language):
A hashes key SHOULD be present as a dictionary mapping a hash name to a hexencoded digest of the file.
This PEP requires thehashes key be included inarchive_infoin theprovenance_url.json file if that file is created; per this PEP:
The value ofarchive_infoMUST be a dictionary with a single keyhashes.
By doing so, consumers ofprovenance_url.json can checkartifact digests when theprovenance_url.json file is created by installers.
A possibility was raised for storing the index URL as part of the file content.This index URL would represent the index configured in pip’s configuration orspecified using the--index-url or--extra-index-url options. Storingthis information was considered confusing, especially when using otherinstallation options like--find-links. Since the actual index URL is notstrictly bound to the location from which the wheel file was downloaded, wedecided not to store the index URL in theprovenance_url.json file.
We would like to get feedback on theprovenance_url.json file from the Condamaintainers. It is not clear whether Conda would like to adopt theprovenance_url.json file. Conda already stores provenance relatedinformation (similar to the provenance information proposed in this PEP) inJSON files located in theconda-meta directoryfollowing its actionsduring installation.
The proposedprovenance_url.json file was meant to be adopted primarily byPython installers. Other installers, such as APT or DNF, might record theprovenance of the installed downstream Python distributions in their ownway specific to downstream package management. The proposed file isnot expected to be created by these downstream package installers and thus theywere intentionally left out of this PEP. However, any input by developers ormaintainers of these installers is valuable to possibly enrich theprovenance_url.json file with information that would help in some way.
The function from pip’s internal API responsible for installing wheels, named_install_wheel,does not store anyprovenance_url.json file in the.dist-infodirectory. Additionally, a prototype introducing the mentioned file to pip inpypa/pip#11865 demonstrates incorporating logic for handling theprovenance_url.json file in pip’s source code.
As pip is used by some of the tools mentioned below to install Python packagedistributions, findings for pip apply to these tools, as well as pip does notallow parametrizing creation of files in the.dist-info directory in itsinternal API. Most of the tools mentioned below that use pip invoke pip as asubprocess which has no effect on the eventual presence of theprovenance_url.json file in the.dist-info directory.
distlib implements low-level functionality to manipulate thedist-info directory. The database of installed distributions does not useany file namedprovenance_url.json, based onthe distlib’s source code.
Pipenv uses pipto install Python package distributions.There wasn’t any additional identified logic that would cause backwardscompatibility issues when introducing theprovenance_url.json file in the.dist-info directory.
installer does not create aprovenance_url.json file explicitly.Nevertheless, as per theRecording Installed Projectsspecification, installer allows passing theadditional_metadata argument tocreate a file in the.dist-info directory - seethe source code.To avoid any backwards compatibility issues, any library or tool usinginstaller must not request creating theprovenance_url.json file using thementionedadditional_metadata argument.
The installation logic inPoetry depends on theinstaller.modern-installer configuration option (see docs).
For cases when theinstaller.modern-installer configuration option is settofalse, Poetry usespip for installing Python package distributions.
On the other hand, wheninstaller.modern-installer configuration option isset totrue, Poetry usesinstaller to install Python package distributions.As can be seen from the linked sources, there isn’t passed any additionalmetadata file namedprovenance_url.json that would cause compatibilityissues with this PEP.
Conda does not create anyprovenance_url.json filewhen Python package distributions are installed.
Hatch uses pipto install project dependencies.
Asmicropipenv is a wrapper on top of pip, it usespip to install Python distributions, for bothlock filesas well asfor requirements files.
Thamos uses micropipenvto install Python packagedistributions,hence any findings for micropipenv apply for Thamos.
PDM uses installerto install binary distributions.The only additional metadata file it eventually creates in the.dist-infodirectory isthe REFER_TO file.
uv is written in Rust and uses itsown installation logic when installingwheels.It does not create anyadditional filesin the.dist-info directory that would collide with theprovenance_url.json file naming.
Thanks to Dustin Ingram, Brett Cannon, and Paul Moore for the initial discussion inwhich this idea originated.
Thanks to Donald Stufft, Ofek Lev, and Trishank Kuppusamy for early feedbackand support to work on this PEP.
Thanks to Gregory P. Smith, Stéphane Bidoul, C.A.M. Gerlach, and Adam Turnerfor reviewing this PEP and providing valuable suggestions.
Thanks to Seth Michael Larson for support, providing valuable suggestions and forthe proposed pip-sbom prototype.
Thanks to Stéphane Bidoul and Chris Jerdonek forPEP 610, and relatedRecording the Direct URL Origin of installed distributions andDirect URL Data Structure specifications.
Thanks to Frost Ming for raising possible concern around storing index URL intheprovenance_url.json file and initial PEP 710 support in PDM.
Thanks to Charlie Marsh and Zanie Blue for inputs related to the uv installer.
Last, but not least, thanks to Donald Stufft for sponsoring this PEP.
This document is placed in the public domain or under the CC0-1.0-Universallicense, whichever is more permissive.
Source:https://github.com/python/peps/blob/main/peps/pep-0710.rst
Last modified:2025-07-06 09:23:40 GMT