This PEP proposes a backward-compatible two-phase transition processto speed up, simplify and robustify installing from thepypi.python.org (PyPI) package index. To ease the transition andminimize client-side friction,no changes to distutils or existinginstallation tools are required in order to benefit from the firsttransition phase, which will result in faster, more reliable installsfor most existing packages.
The first transition phase implements easy and explicit means for apackage maintainer to control which release file links are served topresent-day installation tools. The first phase also includes theimplementation of analysis tools for present-day packages, to supportcommunication with package maintainers and the automated setting ofdefault modes for controlling release file links. The first phasealso will default newly-registered projects on PyPI to only servelinks to release files which were uploaded to PyPI.
The second transition phase concerns end-user installation tools,which shall default to only install release files that are hosted onPyPI and tell the user if external release files exist, offering achoice to automatically use those external files. External releasefiles shall in the future be registered together with a checksumhash so that installation tools can verify the integrity of theeventual download (PyPI-hosted release files always carry sucha checksum).
Alternative PyPI server implementations should implement the newsimple index serving behaviour of transition phase 1 to avoidinstallation tools treating their release links as external ones inphase 2.
When PyPI went online, it offered release registration but had nofacility to host release files itself. When hosting was added, noautomated downloading tool existed yet. When Phillip Eby implementedautomated downloading (through setuptools), he made the choice toallow people to use download hosts of their choice. The finding ofexternally-hosted packages was implemented as follows:
simple/ index for a package contains all links foundby scraping them from that package’s long_description metadata forany release. Links in the “Download-URL” and “Home-page” metadatafields are givenrel=download andrel=homepage attributes,respectively.rel=homepage andrel=download links arecrawled by installation tools and, if HTML, are themselves scrapedfor release-file links in the above formats.See the easy_install documentation for a complete description of thisbehavior.[1]
Today, most packages indexed on PyPI host their release files onPyPI. Out of 29,117 total projects on PyPI, only 2,581 (less than 10%)include any links to installable files that are available onlyoff-PyPI.[2]
There are many reasons[3] why people have chosen externalhosting. To cite just a few:
Irrespective of the present-day validity of these reasons, thereclearly is a history why people choose to host files externally and iteven was for some time the only way you could do things. This PEPtakes the position that there remain some valid reasons forexternal hosting even today.
Today, python package installers (pip, easy_install, buildout, andothers) often need to query many non-PyPI URLs even if there are noexternally hosted files. Apart from querying pypi.python.org’ssimple index pages, also all homepages and download pages everspecified with any release of a package are crawled by an installer.The need for installers to crawl external sites slows downinstallation and makes for a brittle and unreliable installationprocess. Those sites and packages also don’t take part in thePEP 381 mirroring infrastructure, further decreasing reliabilityand speed of automated installation processes around the world.
Most packages are hosted directly on pypi.python.org[2]. Even forthese packages, installers still crawl their homepage anddownload-url, if specified. Many package uploaders are not aware thatspecifying the “homepage” or “download-url” in their package metadatawill needlessly slow down the installation process for all users.
Relying on third party sites also opens up more attack vectors forinjecting malicious packages into sites using automated installs. Asimple attack might just involve getting hold of an old now-unusedhomepage domain and placing malicious packages there. Moreover,performing a Man-in-The-Middle (MITM) attack between an installationsite and any of the download sites can inject malicious packages onthe installation site. As many homepages and download locations areusing HTTP and not HTTPS, such attacks are not hard to launch. SuchMITM attacks can easily happen even for packages which never intendedto host files externally as their homepages are contacted byinstallers anyway.
There is currently no way for package maintainers to avoidexternal-link crawling, other than removing all homepage/download urlmetadata for all historic releases. While a script[4] has beenwritten to perform this action, it is not a good general solutionbecause it removes useful metadata from PyPI releases.
Even if the sites referenced by “Homepage” and “Download-URL” linkswere not scraped for further links, there is no obvious way under thecurrent system for a package owner to link to an installable file froma long_description metadata field (which is shown as packagedocumentation on/pypi/PKG) without installation toolsautomatically considering that file a candidate for installation.Conversely, there is no way to explicitly register multiple externalrelease files without putting them in metadata fields.
These are the goals to be achieved by implementation of this PEP:
The first transition phase introduces a “hosting-mode” field for eachproject on PyPI, allowing package owners explicit control of whichrelease file links are served to present-day installation tools in themachine-readablesimple/ index. The first transition will, aftersuccessful hosting-mode manipulations by individual early-adopters,set a default hosting mode for existing packages, based on automatedanalysis.Maintainers will be notified one month ahead of any suchautomated change. At completion of the first transition phase,all present-day existing release and installation processes andtools are expected to continue working. Any remaining errors orproblems are expected to only relate to installation of individualpackages and can be easily corrected by package maintainers or PyPIadmins if maintainers are not reachable.
Also in the first phase, each link served in thesimple/ indexwill be explicitly marked asrel="internal" if it is hosted by theindex itself (even if on a separate domain, which may be the case ifthe index uses a CDN for file-serving). Any link not so marked will beconsidered an external link.
In the second transition phase, PyPI client installation tools shallbe updated to default to only installrel="internal" packagesunless a user specifies option(s) to permit installing from externallinks. Seesecond transition phase for details on how installersshould behave.
Maintainers of packages which currently host release files on non-PyPIsites shall receive instructions and tools to ease “re-hosting” oftheir historic and future package release files. This re-hosting toolMUST be available before automated hosting-mode changes are announcedto package maintainers.
The foundation of the first transition phase is the introduction ofthree “modes” of PyPI hosting for a package, affecting which links aregenerated for thesimple/ index. These modes are implementedwithout requiring changes to installation tools via changes to thealgorithm for generating the machine-readablesimple/ index.
The modes are:
pypi-scrape-crawl: no change from the current situation ofgenerating machine-readable links for installation tools, asoutlined in thehistory.pypi-scrape: for a package in this mode, links to be added tothesimple/ index are still scraped from packagemetadata. However, the “Home-page” and “Download-url” links aregivenrel=ext-homepage andrel=ext-download attributesinstead ofrel=homepage andrel=download. The effect of this(with no change in installation tools necessary) is that these linkswill not be followed and scraped for further candidate links bypresent-day installation tools: only installable files directlyhosted from PyPI or linked directly from PyPI metadata will beconsidered for installation. Installation tools MAY evolve to offeran option to use the new rel-attribution to crawl external pages butMUST NOT default to it.pypi-explicit: for a package in this mode, only links to releasefiles uploaded to PyPI, and external links to release filesexplicitly nominated by the package owner, will be added to thesimple/ index. PyPI will provide a new interface for packageowners to supply external release-file URLs. These URLs MUST includea URL fragment in the form “#hashtype=hashvalue” specifying a hashof the externally-linked file which installer tools MUST use tovalidate that they have downloaded the intended file.Thus the hope is that eventually all projects on PyPI can be migratedto thepypi-explicit mode, while preserving the ability to installrelease files hosted externally via installer tools. Deprecation ofhosting modes to eventually only allow thepypi-explicit mode isNOT REGULATED by this PEP but is expected to become feasible some timeafter successful implementation of the transition phases described inthis PEP. It is expected that deprecation requiresa new process todeal with abandoned packages because of unreachable maintainers forstill popular packages.
The proposed solution consists of multiple implementation andcommunication steps:
simple/ index toindex-hosted files withrel="internal", to make it easier forclient tools to distinguish these links in the second phase.<metaname="api-version"value="2"> to allsimple/ index pages, to allow clients to distinguish betweenindexes providing therel="internal" metadata and older onesthat do not.pypi-explicit mode(package owners can still switch to the other modes as desired).pypi-explicit mode in onemonth, and similarly to maintainers of projects in group B thattheir project will be automatically configured topypi-scrapemode. Inform them that this change is not expected to affectinstallability of their project at all, but will result in fasterand safer installs for their users. Encourage them to set thismode themselves sooner to benefit their users.pypi-scrape-crawl, list the URLs whichcurrently are crawled, and suggest that they either re-host theirpackages directly on PyPI and switch topypi-explicit, or atleast provide direct links to release files in PyPI metadata andswitch topypi-scrape. Provide instructions and tools to helpwith these transitions.For the second transition phase, maintainers of installation tools areasked to release two updates.
The first update shall provide clear warnings if externally-hostedrelease files (that is, files whose link does not includerel="internal") are selected for download, for which projects andURLs exactly this happens, and warn that in future versionsexternally-hosted downloads will be disabled by default.
The second update should change the default mode to allow onlyinstallation ofrel="internal" package files, and allowinstallation of externally-hosted packages only when the user suppliesan option.
The installer should distinguish between verifiable and non-verifiableexternal links. A verifiable external link is a direct link to aninstallable file from the PyPIsimple/ index that includes a hashin the URL fragment (“#hashtype=hashvalue”) which can be used toverify the integrity of the downloaded file. A non-verifiable externallink is any link (other than those explicitly supplied by the user ofan installer tool) without a hash, scraped from external HTML, orinjected into the search via some other non-PyPI source(e.g. setuptools’dependency_links feature).
Installers should provide a blanket option to allowinstalling any verifiable external link. Non-verifiable external linksshould only be installed if the user-provided option specifies exactlywhich external domains can be used or for which specific package namesexternal links can be used.
When download of an externally-hosted package is disallowed by thedefault configuration, the user should be notified, with instructionsfor how to make the install succeed and warnings about the implication(that a file will be downloaded from a site that is not part of thepackage index). The warning given for non-verifiable links shouldclearly state that the installer cannot verify the integrity of thedownloaded file. The warning given for verifiable external linksshould simply note that the file will be downloaded from an externalURL, but that the file integrity can be verified by checksum.
Alternative PyPI-compatible index implementations should upgrade tobegin providing therel="internal" metadata and the<metaname="api-version"value="2"> tag as soon as possible. Foralternative indexes which do not yet provide the meta tag in theirsimple/ pages, installation tools should providebackwards-compatible fallback behavior (treat links as internal as inpre-PEP times and provide a warning).
New distribution URLs may be submitted by performing a HTTP POST tothe URL:
With the following form-encoded data:
| Name | Value |
| :action | The string “urls” |
| name | The package name as a string |
| version | The release version as a string |
| new-url | The new URL to store |
| submit_new_url | The string “yes” |
The POST must be accompanied by an HTTP Basic Auth header encoding theusername and password of the user authorized to maintain the packageon PyPI.
The HTTP response to this request will be one of:
| Code | Meaning | URL submission implications |
| 200 | OK | Everything worked just fine |
| 400 | Bad request | Data provided for submission was malformed |
| 401 | Unauthorised | The username or password supplied were incorrect |
| 403 | Forbidden | User does not have permission to update thepackage information (not Owner or Maintainer) |
Phillip Eby for precise information and the basic ideas to implementthe transition via server-side changes only.
Donald Stufft for pushing away from external hosting and offering toimplement both a Pull Request for the necessary PyPI changes and theanalysis tool to drive the transition phase 1.
Marc-Andre Lemburg, Alyssa Coghlan and catalog-sig in general forthinking through issues regarding getting rid of “external hosting”.
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0438.rst
Last modified:2025-02-01 08:59:27 GMT