Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 470 – Removing External Hosting Support on PyPI

Author:
Donald Stufft <donald at stufft.io>
BDFL-Delegate:
Paul Moore <p.f.moore at gmail.com>
Discussions-To:
Distutils-SIG list
Status:
Final
Type:
Process
Topic:
Packaging
Created:
12-May-2014
Post-History:
14-May-2014, 05-Jun-2014, 03-Oct-2014, 13-Oct-2014, 26-Aug-2015
Replaces:
438
Resolution:
Distutils-SIG message

Table of Contents

Abstract

This PEP proposes the deprecation and removal of support for hosting filesexternally to PyPI as well as the deprecation and removal of the functionalityadded byPEP 438, particularly rel information to classify different types oflinks and the meta-tag to indicate API version.

Rationale

Historically PyPI did not have any method of hosting files nor any method ofautomatically retrieving installables, it was instead focused on providing acentral registry of names, to prevent naming collisions, and as a means ofdiscovery for finding projects to use. In the course of time setuptools beganto scrape these human facing pages, as well as pages linked from those pages,looking for things it could automatically download and install. Eventually thisbecame the “Simple” API which used a similar URL structure however iteliminated any of the extraneous links and information to make the API moreefficient. Additionally PyPI grew the ability for a project to upload releasefiles directly to PyPI enabling PyPI to act as a repository in addition to anindex.

This gives PyPI two equally important roles that it plays in the Pythonecosystem, that of index to enable easy discovery of Python projects andcentral repository to enable easy hosting, download, and installation of Pythonprojects. Due to the history behind PyPI and the very organic growth it hasexperienced the lines between these two roles are blurry, and this blurring hascaused confusion for the end users of both of these roles and this has in turncaused ire between people attempting to use PyPI in different capacities, mostoften when end users want to use PyPI as a repository but the author wants touse PyPI solely as an index.

This confusion comes down to end users of projects not realizing if a projectis hosted on PyPI or if it relies on an external service. This often manifestsitself when the external service is down but PyPI is not. People will see thatPyPI works, and other projects works, but this one specific one does not. Theyoftentimes do not realize who they need to contact in order to get this fixedor what their remediation steps are.

PEP 438 attempted to solve this issue by allowing projects to explicitlydeclare if they were using the repository features or not, and if they werenot, it had the installers classify the links it found as either “internal”,“verifiable external” or “unverifiable external”.PEP 438 was accepted andimplemented in pip 1.4 (released on Jul 23, 2013) with the final transitionimplemented in pip 1.5 (released on Jan 2, 2014).

PEP 438 was successful in bringing about more people to utilize PyPI’srepository features, an altogether good thing given the global CDN poweringPyPI providing speed ups for a lot of people, however it did so by introducinga new point of confusion and pain for both the end users and the authors.

By moving to using explicit multiple repositories we can make the lines betweenthese two roles much more explicit and remove the “hidden” surprises caused bythe current implementation of handling people who do not want to use PyPI as arepository.

Key User Experience Expectations

  1. Easily allow external hosting to “just work” when appropriately configuredat the system, user or virtual environment level.
  2. Eliminate any and all references to the confusing “verifiable external” and“unverifiable external” distinction from the user experience (both wheninstalling and when releasing packages).
  3. The repository aspects of PyPI should becomejust the default packagehosting location (i.e. the only one that is treated as opt-out rather thanopt-in by most client tools in their default configuration). Aside from thataspect, hosting on PyPI should not otherwise provide an enhanced userexperience over hosting your own package repository.
  4. Do all of the above while providing default behaviour that is secure againstmost attackers below the nation state adversary level.

Why Additional Repositories?

The two common installer tools, pip and easy_install/setuptools, both supportthe concept of additional locations to search for files to satisfy theinstallation requirements and have done so for many years. This means thatthere is no need to “phase” in a new flag or concept and the solution toinstalling a project from a repository other than PyPI will function regardlessof how old (within reason) the end user’s installer is. Not only has thisconcept existed in the Python tooling for some time, but it is a concept thatexists across languages and even extending to the OS level with OS packagetools almost universally using multiple repository support making it extremelylikely that someone is already familiar with the concept.

Additionally, the multiple repository approach is a concept that is usefuloutside of the narrow scope of allowing projects that wish to be included onthe index portion of PyPI but do not wish to utilize the repository portion ofPyPI. This includes places where a company may wish to host a repository thatcontains their internal packages or where a project may wish to have multiple“channels” of releases, such as alpha, beta, release candidate, and finalrelease. This could also be used for projects wishing to host files whichcannot be uploaded to PyPI, such as multi-gigabyte data files or, currently atleast, Linux Wheels.

Why Not PEP 438 or Similar?

While the additional search location support has existed in pip and setuptoolsfor quite some time support forPEP 438 has only existed in pip since the 1.4version, and still has yet to be implemented in setuptools. The design ofPEP 438 did mean that users still benefited for projects which did not requireexternal files even with older installers, however for projects whichdidrequire external files, users are still silently being given either potentiallyunreliable or, even worse, unsafe files to download. This system is also uniqueto Python as it arises out of the history of PyPI, this means that it is almostcertain that this concept will be foreign to most, if not all users, until theyencounter it while attempting to use the Python toolchain.

Additionally, the classification system proposed byPEP 438 has, in practice,turned out to be extremely confusing to end users, so much so that it is aposition of this PEP that the situation as it stands is completely untenable.The common pattern for a user with this system is to attempt to install aproject possibly get an error message (or maybe not if the project everuploaded something to PyPI but later switched without removing old files), seethat the error message suggests--allow-external, they reissue the commandadding that flag most likely getting another error message, see that this timethe error message suggests also adding--allow-unverified, and again issuethe command a third time, this time finally getting the thing they wish toinstall.

This UX failure exists for several reasons.

  1. If pip can locate files at all for a project on the Simple API it willsimply use that instead of attempting to locate more. This is generally theright thing to do as attempting to locate more would erase a large part ofthe benefit ofPEP 438. This means that if a projectever uploaded a filethat matches what the user has requested for install that will be usedregardless of how old it is.
  2. PEP 438 makes an implicit assumption that most projects would either uploadthemselves to PyPI or would update themselves to directly linking to releasefiles. While a large number of projects did ultimately decide to upload toPyPI, some of them did so only because the UX around whatPEP 438 was so badthat they felt forced to do so. More concerning however, is the fact thatvery few projects have opted to directly and safely link to files andinstead they still simply link to pages which must be scraped in order tofind the actual files, thus rendering the safe variant(--allow-external) largely useless.
  3. Even if an author wishes to directly link to their files, doing so safely isnon-obvious. It requires the inclusion of a MD5 hash (for historicalreasons) in the hash of the URL. If they do not include this then theirfiles will be considered “unverified”.
  4. PEP 438 takes a security centric view and disallows any form of a global optin for unverified projects. While this is generally a good thing, it createsextremely verbose and repetitive command invocations such as:
    $ pip install --allow-external myproject --allow-unverified myproject myproject$ pip install --allow-all-external --allow-unverified myproject myproject

Multiple Repository/Index Support

Installers SHOULD implement or continue to offer, the ability to point theinstaller at multiple URL locations. The exact mechanisms for a user toindicate they wish to use an additional location is left up to each individualimplementation.

Additionally the mechanism discovering an installation candidate when multiplerepositories are being used is also up to each individual implementation,however once configured an implementation should not discourage, warn, orotherwise cast a negative light upon the use of a repository simply because itis not the default repository.

Currently both pip and setuptools implement multiple repository support byusing the best installation candidate it can find from either repository,essentially treating it as if it were one large repository.

Installers SHOULD also implement some mechanism for removing or otherwisedisabling use of the default repository. The exact specifics of how that isachieved is up to each individual implementation.

Installers SHOULD also implement some mechanism for whitelisting andblacklisting which projects a user wishes to install from a particularrepository. The exact specifics of how that is achieved is up to eachindividual implementation.

ThePython packaging guide MUST be updatedwith a section detailing the options for setting up their own repository sothat any project that wishes to not host on PyPI in the future can referencethat documentation. This should include the suggestion that projects relying onhosting their own repositories should document in their project description howto install their project.

Deprecation and Removal of Link Spidering

A new hosting mode will be added to PyPI. This hosting mode will be calledpypi-only and will be in addition to the three thatPEP 438 has alreadygiven us which arepypi-explicit,pypi-scrape,pypi-scrape-crawl.This new hosting mode will modify a project’s simple api page so that it onlylists the files which are directly hosted on PyPI and will not link to anythingelse.

Upon acceptance of this PEP and the addition of thepypi-only mode, all newprojects will be defaulted to the PyPI only mode and they will be locked tothis mode and unable to change this particular setting.

An email will then be sent out to all of the projects which are hosted only onPyPI informing them that in one month their project will be automaticallyconverted to thepypi-only mode. A month after these emails have been sentany of those projects which were emailed, which still are hosted only on PyPIwill have their mode set permanently topypi-only.

At the same time, an email will be sent to projects which rely on hostingexternal to PyPI. This email will warn these projects that externally hostedfiles have been deprecated on PyPI and that in 3 months from the time of thatemail that all external links will be removed from the installer APIs. ThisemailMUST include instructions for converting their projects to be hostedon PyPI andMUST include links to a script or package that will enable themto enter their PyPI credentials and package name and have it automaticallydownload and re-host all of their files on PyPI. This emailMUST alsoinclude instructions for setting up their own index page. This email must alsocontain a link to the Terms of Service for PyPI as many users may have signedup a long time ago and may not recall what those terms are. Finally this emailmust also contain a list of the links registered with PyPI where we were ableto detect an installable file was located.

Two months after the initial email, another email must be sent to any projectsstill relying on external hosting. This email will include all of the sameinformation that the first email contained, except that the removal date willbe one month away instead of three.

Finally a month later all projects will be switched to thepypi-only modeand PyPI will be modified to remove the externally linked files functionality.

Summary of Changes

Repository side

  1. Deprecate and remove the hosting modes as defined byPEP 438.
  2. Restrict simple API to only list the files that are contained within therepository.

Client side

  1. Implement multiple repository support.
  2. Implement some mechanism for removing/disabling the default repository.
  3. Deprecate / RemovePEP 438

Impact

To determine impact, we’ve looked at all projects using a method of searchingPyPI which is similar to what pip and setuptools use and searched for allfiles available on PyPI, safely linked from PyPI, unsafely linked from PyPI,and finally unsafely available outside of PyPI. When the same file was foundin multiple locations it was deduplicated and only counted it in one locationbased on the following preferences: PyPI > Safely Off PyPI > Unsafely Off PyPI.This gives us the broadest possible definition of impact, it means that anysingle file for this project may no longer be visible by default, however thatfile could be years old, or it could be a binary file while there is a sdistavailable on PyPI. This means that thereal impact will likely be muchsmaller, but in an attempt not to miscount we take the broadest possibledefinition.

At the time of this writing there are 65,232 projects hosted on PyPI and ofthose, 59 of them rely on external files that are safely hosted outside of PyPIand 931 of them rely on external files which are unsafely hosted outside ofPyPI. This shows us that 1.5% of projects will be affected in some way by thischange while 98.5% will continue to function as they always have. In addition,only 5% of the projects affected are using the features provided byPEP 438 tosafely host outside of PyPI while 95% of them are exposing their users toRemote Code Execution via a Man In The Middle attack.

Frequently Asked Questions

I can’t host my project on PyPI because of <X>, what should I do?

First you should decide if <X> is something inherent to PyPI, or if PyPI couldgrow a feature to solve <X> for you. If PyPI can add a feature to enable you tohost your project on PyPI then you should propose that feature. However, if <X>is something inherent to PyPI, such as wanting to maintain control over yourown files, then you should setup your own package repository and instruct yourusers in your project’s description to add it to the list of repositories theirinstaller of choice will use.

My users have a worse experience with this PEP than before, how do I explain that?

Part of this answer is going to be specific to each individual project, you’llneed to explain to your users what caused you to decide to host in your ownrepository instead of utilizing one that they already have in their installer’sdefault list of repositories. However, part of this answer will also beexplaining that the previous behavior of transparently including external linkswas both a security hazard (given that in most cases it allowed a MITM toexecute arbitrary Python code on the end users machine) and a reliabilityconcern and thatPEP 438 attempted to resolve this by making them explicitlyopt in, but thatPEP 438 brought along with it a number of serious usabilityissues.PEP 470 represents a simplification of the model to a model that manyusers will be familiar with, which is common amongst Linux distributions.

Switching to a repository structure breaks my workflow or isn’t allowed by my host?

There are a number of cheap or free hosts that would gladly support what isrequired for a repository. In particular you don’t actually need to upload yourfiles anywhere differently as long as you can generate a host with the correctstructure that points to where your files are actually located. Many of thesehosts provide free HTTPS using a shared domain name, and free HTTPScertificates can be gotten fromStartSSL, or inthe near futureLetsEncrypt or they may be gottencheap from any number of providers.

Why don’t you provide <X>?

The answer here will depend on what <X> is, however the answers typically areone of:

  • We hadn’t been thought of it and nobody had suggested it before.
  • We don’t have sufficient experience with <X> to properly design a solutionfor it and would welcome a domain expert to help us provide it.
  • We’re an open source project and nobody has decided to volunteer to designand implement <X> yet.

Additional PEPs to propose additional features are always welcome, however theywould need someone with the time and expertise to accurately design <X>. Thisparticular PEP is intended to focus on getting us to a point where thecapabilities of PyPI are straightforward with an easily understood baselinethat is similar to existing models such as Linux distribution repositories.

Why should I register on PyPI if I’m running my own repository anyways?

PyPI serves two critical functions for the Python ecosystem. One of those is asa central repository for the actual files that get downloaded and installed bypip or another package manager and it is this function that this PEP isconcerned with and that you’d be replacing if you’re running your ownrepository. However, it also provides a central registry of who owns what namein order to prevent naming collisions, think of it sort of as DNS but forPython packages. In addition to making sure that names are handed out in afirst-come, first-served manner it also provides a single place for users to goto look search for and discover new projects. So the simple answer is, youshould still register your project with PyPI to avoid naming collisions and tomake it so people can still easily discover your project.

Rejected Proposals

Allow easier discovery of externally hosted indexes

A previous version of this PEP included a new feature added to both PyPI andinstallers that would allow project authors to enter into PyPI a list ofURLs that would instruct installers to ignore any files uploaded to PyPI andinstead return an error telling the end user about these extra URLs that theycan add to their installer to make the installation work.

This feature has been removed from the scope of the PEP because it proved toodifficult to develop a solution that avoided UX issues similar to those thatcaused so many problems with thePEP 438 solution. If needed, a future PEPcould revisit this idea.

Keep the current classification system but adjust the options

This PEP rejects several related proposals which attempt to fix some of theusability problems with the current system but while still keeping the generalgist ofPEP 438.

This includes:

  • Default to allowing safely externally hosted files, but disallow unsafelyhosted.
  • Default to disallowing safely externally hosted files with only a global flagto enable them, but disallow unsafely hosted.
  • Continue on the suggested path ofPEP 438 and remove the option to unsafelyhost externally but continue to allow the option to safely host externally.

These proposals are rejected because:

  • The classification system introduced inPEP 438 in an entirely unique conceptto PyPI which is not generically applicable even in the context of Pythonpackaging. Adding additional concepts comes at a cost.
  • The classification system itself is non-obvious to explain and topre-determine what classification of link a project will require entailsinspecting the project’s/simple/<project>/ page, and possibly any URLslinked from that page.
  • The ability to host externally while still being linked for automaticdiscovery is mostly a historic relic which causes a fair amount of pain andcomplexity for little reward.
  • The installer’s ability to optimize or clean up the user interface is limiteddue to the nature of the implicit link scraping which would need to be done.This extends to the--allow-* options as well as the inability todetermine if a link is expected to fail or not.
  • The mechanism paints a very broad brush when enabling an option, whilePEP 438 attempts to limit this with per package options. However a projectthat has existed for an extended period of time may oftentimes have severaldifferent URLs listed in their simple index. It is not unusual for at leastone of these to no longer be under control of the project. While anunregistered domain will sit there relatively harmless most of the time, pipwill continue to attempt to install from it on every discovery phase. Thismeans that an attacker simply needs to look at projects which rely on unsafeexternal URLs and register expired domains to attack users.

Implement this PEP, but Do Not Remove the Existing Links

This is essentially the backwards compatible version of this PEP. It attemptsto allow people using older clients, or clients which do not implement thisPEP to continue on as if nothing had changed. This proposal is rejected becausethe vast bulk of those scenarios are unsafe uses of the deprecated features. Itis the opinion of this PEP that silently allowing unsafe actions to take placeon behalf of end users is simply not an acceptable solution.

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0470.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp