Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 438 – Transitioning to release-file hosting on PyPI

Author:
Holger Krekel <holger at merlinux.eu>, Carl Meyer <carl at oddbird.net>
BDFL-Delegate:
Richard Jones <richard at python.org>
Discussions-To:
Distutils-SIG list
Status:
Superseded
Type:
Process
Topic:
Packaging
Created:
15-Mar-2013
Post-History:
19-May-2013
Superseded-By:
470
Resolution:
Distutils-SIG message

Table of Contents

Abstract

This PEP proposes a backward-compatible two-phase transition processto speed up, simplify and robustify installing from thepypi.python.org (PyPI) package index. To ease the transition andminimize client-side friction,no changes to distutils or existinginstallation tools are required in order to benefit from the firsttransition phase, which will result in faster, more reliable installsfor most existing packages.

The first transition phase implements easy and explicit means for apackage maintainer to control which release file links are served topresent-day installation tools. The first phase also includes theimplementation of analysis tools for present-day packages, to supportcommunication with package maintainers and the automated setting ofdefault modes for controlling release file links. The first phasealso will default newly-registered projects on PyPI to only servelinks to release files which were uploaded to PyPI.

The second transition phase concerns end-user installation tools,which shall default to only install release files that are hosted onPyPI and tell the user if external release files exist, offering achoice to automatically use those external files. External releasefiles shall in the future be registered together with a checksumhash so that installation tools can verify the integrity of theeventual download (PyPI-hosted release files always carry sucha checksum).

Alternative PyPI server implementations should implement the newsimple index serving behaviour of transition phase 1 to avoidinstallation tools treating their release links as external ones inphase 2.

Rationale

History and motivations for external hosting

When PyPI went online, it offered release registration but had nofacility to host release files itself. When hosting was added, noautomated downloading tool existed yet. When Phillip Eby implementedautomated downloading (through setuptools), he made the choice toallow people to use download hosts of their choice. The finding ofexternally-hosted packages was implemented as follows:

  1. The PyPIsimple/ index for a package contains all links foundby scraping them from that package’s long_description metadata forany release. Links in the “Download-URL” and “Home-page” metadatafields are givenrel=download andrel=homepage attributes,respectively.
  2. Any of these links whose target is a file whose name appears to bein the form of an installable source or binary distribution, withname in the form “packagename-version.ARCHIVEEXT”, is considered apotential installation candidate by installation tools.
  3. Similarly, any links suffixed with an “#egg=packagename-version”fragment are considered an installation candidate.
  4. Additionally, therel=homepage andrel=download links arecrawled by installation tools and, if HTML, are themselves scrapedfor release-file links in the above formats.

See the easy_install documentation for a complete description of thisbehavior.[1]

Today, most packages indexed on PyPI host their release files onPyPI. Out of 29,117 total projects on PyPI, only 2,581 (less than 10%)include any links to installable files that are available onlyoff-PyPI.[2]

There are many reasons[3] why people have chosen externalhosting. To cite just a few:

  • release processes and scripts have been developed already and uploadto external sites
  • it takes too long to upload large files from some places in theworld
  • export restrictions e.g. for crypto-related software
  • company policies which require offering open source packages throughown sites
  • problems with integrating uploading to PyPI into one’s releaseprocess (because of release policies)
  • desiring download statistics different from those maintained by PyPI
  • perceived bad reliability of PyPI
  • not aware that PyPI offers file-hosting

Irrespective of the present-day validity of these reasons, thereclearly is a history why people choose to host files externally and iteven was for some time the only way you could do things. This PEPtakes the position that there remain some valid reasons forexternal hosting even today.

Problem

Today, python package installers (pip, easy_install, buildout, andothers) often need to query many non-PyPI URLs even if there are noexternally hosted files. Apart from querying pypi.python.org’ssimple index pages, also all homepages and download pages everspecified with any release of a package are crawled by an installer.The need for installers to crawl external sites slows downinstallation and makes for a brittle and unreliable installationprocess. Those sites and packages also don’t take part in thePEP 381 mirroring infrastructure, further decreasing reliabilityand speed of automated installation processes around the world.

Most packages are hosted directly on pypi.python.org[2]. Even forthese packages, installers still crawl their homepage anddownload-url, if specified. Many package uploaders are not aware thatspecifying the “homepage” or “download-url” in their package metadatawill needlessly slow down the installation process for all users.

Relying on third party sites also opens up more attack vectors forinjecting malicious packages into sites using automated installs. Asimple attack might just involve getting hold of an old now-unusedhomepage domain and placing malicious packages there. Moreover,performing a Man-in-The-Middle (MITM) attack between an installationsite and any of the download sites can inject malicious packages onthe installation site. As many homepages and download locations areusing HTTP and not HTTPS, such attacks are not hard to launch. SuchMITM attacks can easily happen even for packages which never intendedto host files externally as their homepages are contacted byinstallers anyway.

There is currently no way for package maintainers to avoidexternal-link crawling, other than removing all homepage/download urlmetadata for all historic releases. While a script[4] has beenwritten to perform this action, it is not a good general solutionbecause it removes useful metadata from PyPI releases.

Even if the sites referenced by “Homepage” and “Download-URL” linkswere not scraped for further links, there is no obvious way under thecurrent system for a package owner to link to an installable file froma long_description metadata field (which is shown as packagedocumentation on/pypi/PKG) without installation toolsautomatically considering that file a candidate for installation.Conversely, there is no way to explicitly register multiple externalrelease files without putting them in metadata fields.

Goals

These are the goals to be achieved by implementation of this PEP:

  • Package owners should be able to explicitly control which files arepresented by PyPI to installer tools as installationcandidates. Installation should not be slowed and made less reliableby extensive and unnecessary crawling of links that package ownersdid not explicitly nominate as installation files.
  • It should remain possible for package owners to choose to host theirrelease files on their own hosting, external to PyPI. It should beeasy for a user to request the installation of such releases usingautomated installer tools, especially if the external release fileswere registered together with a checksum hash.
  • Automated installer tools should not install externally-hostedpackagesby default, but require explicit authorization to do soby the user. When tools refuse to install such a package by default,they should tell the user exactly which external link(s) theinstaller needs to follow, and what option(s) the user can provideto authorize the tool to follow those links. PyPI should provide allnecessary metadata for installer tools to implement this easily andwithin a single request/reply interaction.
  • Migration from the status quo to the above points should be gradualand minimize breakage. This includes tooling that makes it easy forpackage owners with an existing release process that uploads tonon-PyPI hosting to also upload those release files to PyPI.

Solution / two transition phases

The first transition phase introduces a “hosting-mode” field for eachproject on PyPI, allowing package owners explicit control of whichrelease file links are served to present-day installation tools in themachine-readablesimple/ index. The first transition will, aftersuccessful hosting-mode manipulations by individual early-adopters,set a default hosting mode for existing packages, based on automatedanalysis.Maintainers will be notified one month ahead of any suchautomated change. At completion of the first transition phase,all present-day existing release and installation processes andtools are expected to continue working. Any remaining errors orproblems are expected to only relate to installation of individualpackages and can be easily corrected by package maintainers or PyPIadmins if maintainers are not reachable.

Also in the first phase, each link served in thesimple/ indexwill be explicitly marked asrel="internal" if it is hosted by theindex itself (even if on a separate domain, which may be the case ifthe index uses a CDN for file-serving). Any link not so marked will beconsidered an external link.

In the second transition phase, PyPI client installation tools shallbe updated to default to only installrel="internal" packagesunless a user specifies option(s) to permit installing from externallinks. Seesecond transition phase for details on how installersshould behave.

Maintainers of packages which currently host release files on non-PyPIsites shall receive instructions and tools to ease “re-hosting” oftheir historic and future package release files. This re-hosting toolMUST be available before automated hosting-mode changes are announcedto package maintainers.

Implementation

Hosting modes

The foundation of the first transition phase is the introduction ofthree “modes” of PyPI hosting for a package, affecting which links aregenerated for thesimple/ index. These modes are implementedwithout requiring changes to installation tools via changes to thealgorithm for generating the machine-readablesimple/ index.

The modes are:

  • pypi-scrape-crawl: no change from the current situation ofgenerating machine-readable links for installation tools, asoutlined in thehistory.
  • pypi-scrape: for a package in this mode, links to be added tothesimple/ index are still scraped from packagemetadata. However, the “Home-page” and “Download-url” links aregivenrel=ext-homepage andrel=ext-download attributesinstead ofrel=homepage andrel=download. The effect of this(with no change in installation tools necessary) is that these linkswill not be followed and scraped for further candidate links bypresent-day installation tools: only installable files directlyhosted from PyPI or linked directly from PyPI metadata will beconsidered for installation. Installation tools MAY evolve to offeran option to use the new rel-attribution to crawl external pages butMUST NOT default to it.
  • pypi-explicit: for a package in this mode, only links to releasefiles uploaded to PyPI, and external links to release filesexplicitly nominated by the package owner, will be added to thesimple/ index. PyPI will provide a new interface for packageowners to supply external release-file URLs. These URLs MUST includea URL fragment in the form “#hashtype=hashvalue” specifying a hashof the externally-linked file which installer tools MUST use tovalidate that they have downloaded the intended file.

Thus the hope is that eventually all projects on PyPI can be migratedto thepypi-explicit mode, while preserving the ability to installrelease files hosted externally via installer tools. Deprecation ofhosting modes to eventually only allow thepypi-explicit mode isNOT REGULATED by this PEP but is expected to become feasible some timeafter successful implementation of the transition phases described inthis PEP. It is expected that deprecation requiresa new process todeal with abandoned packages because of unreachable maintainers forstill popular packages.

First transition phase (PyPI)

The proposed solution consists of multiple implementation andcommunication steps:

  1. Implement in PyPI the three modes described above, with aninterface for package owners to select the mode for each packageand register explicit external file URLs.
  2. For packages in all modes, label links in thesimple/ index toindex-hosted files withrel="internal", to make it easier forclient tools to distinguish these links in the second phase.
  3. Add an HTML tag<metaname="api-version"value="2"> to allsimple/ index pages, to allow clients to distinguish betweenindexes providing therel="internal" metadata and older onesthat do not.
  4. Default all newly-registered packages topypi-explicit mode(package owners can still switch to the other modes as desired).
  5. Determine (via automated analysis[2]) which packages have allinstallable files available on PyPI itself (group A), which haveall installable files on PyPI or linked directly from PyPI metadata(group B), and which have installable versions available that arelinked only from external homepage/download HTML pages (group C).
  6. Send mail to maintainers of projects in group A that their projectwill be automatically configured topypi-explicit mode in onemonth, and similarly to maintainers of projects in group B thattheir project will be automatically configured topypi-scrapemode. Inform them that this change is not expected to affectinstallability of their project at all, but will result in fasterand safer installs for their users. Encourage them to set thismode themselves sooner to benefit their users.
  7. Send mail to maintainers of packages in group C that their packagehosting mode ispypi-scrape-crawl, list the URLs whichcurrently are crawled, and suggest that they either re-host theirpackages directly on PyPI and switch topypi-explicit, or atleast provide direct links to release files in PyPI metadata andswitch topypi-scrape. Provide instructions and tools to helpwith these transitions.

Second transition phase (installer tools)

For the second transition phase, maintainers of installation tools areasked to release two updates.

The first update shall provide clear warnings if externally-hostedrelease files (that is, files whose link does not includerel="internal") are selected for download, for which projects andURLs exactly this happens, and warn that in future versionsexternally-hosted downloads will be disabled by default.

The second update should change the default mode to allow onlyinstallation ofrel="internal" package files, and allowinstallation of externally-hosted packages only when the user suppliesan option.

The installer should distinguish between verifiable and non-verifiableexternal links. A verifiable external link is a direct link to aninstallable file from the PyPIsimple/ index that includes a hashin the URL fragment (“#hashtype=hashvalue”) which can be used toverify the integrity of the downloaded file. A non-verifiable externallink is any link (other than those explicitly supplied by the user ofan installer tool) without a hash, scraped from external HTML, orinjected into the search via some other non-PyPI source(e.g. setuptools’dependency_links feature).

Installers should provide a blanket option to allowinstalling any verifiable external link. Non-verifiable external linksshould only be installed if the user-provided option specifies exactlywhich external domains can be used or for which specific package namesexternal links can be used.

When download of an externally-hosted package is disallowed by thedefault configuration, the user should be notified, with instructionsfor how to make the install succeed and warnings about the implication(that a file will be downloaded from a site that is not part of thepackage index). The warning given for non-verifiable links shouldclearly state that the installer cannot verify the integrity of thedownloaded file. The warning given for verifiable external linksshould simply note that the file will be downloaded from an externalURL, but that the file integrity can be verified by checksum.

Alternative PyPI-compatible index implementations should upgrade tobegin providing therel="internal" metadata and the<metaname="api-version"value="2"> tag as soon as possible. Foralternative indexes which do not yet provide the meta tag in theirsimple/ pages, installation tools should providebackwards-compatible fallback behavior (treat links as internal as inpre-PEP times and provide a warning).

API For Submitting External Distribution URLs

New distribution URLs may be submitted by performing a HTTP POST tothe URL:

With the following form-encoded data:

NameValue
:actionThe string “urls”
nameThe package name as a string
versionThe release version as a string
new-urlThe new URL to store
submit_new_urlThe string “yes”

The POST must be accompanied by an HTTP Basic Auth header encoding theusername and password of the user authorized to maintain the packageon PyPI.

The HTTP response to this request will be one of:

CodeMeaningURL submission implications
200OKEverything worked just fine
400Bad requestData provided for submission was malformed
401UnauthorisedThe username or password supplied were incorrect
403ForbiddenUser does not have permission to update thepackage information (not Owner or Maintainer)

References

[1]
Phillip Eby, easy_install ‘Package Index “API”’ documentation,http://peak.telecommunity.com/DevCenter/EasyInstall#package-index-api
[2] (1,2,3)
Donald Stufft, automated analysis of PyPI project links,https://github.com/dstufft/pypi.linkcheck
[3]
Marc-Andre Lemburg, reasons for external hosting,https://mail.python.org/pipermail/catalog-sig/2013-March/005626.html
[4]
Holger Krekel, script to remove homepage/download metadata forall releaseshttps://mail.python.org/pipermail/catalog-sig/2013-February/005423.html

Acknowledgments

Phillip Eby for precise information and the basic ideas to implementthe transition via server-side changes only.

Donald Stufft for pushing away from external hosting and offering toimplement both a Pull Request for the necessary PyPI changes and theanalysis tool to drive the transition phase 1.

Marc-Andre Lemburg, Alyssa Coghlan and catalog-sig in general forthinking through issues regarding getting rid of “external hosting”.

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0438.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp